Mandarin Singing Voice Synthesis Using ANN Vibrato-parameter Models
 
Hung-Yan Gu (古鴻炎) and Zheng-Fu Lin (林正甫)
e-mail: guhy@mail.ntust.edu.tw
2011



Abstract
    Vibrato is an important factor that affects the naturalness level of a synthetic singing voice. Therefore, the analysis and modeling of vibrato parameters are studied in this paper. The vibrato parameters of those syllables segmented from recorded songs are analyzed by using short-time Fourier transform and the method of analytic signal. After the vibrato parameter values for all training syllables are extracted and normalized, they are used to train an artificial neural network (ANN) for each type of vibrato parameter. Then, these ANN models are used to generate the values of vibrato parameters. Next, these parameter values and other music information are used together to control a harmonic-plus-noise model (HNM) to synthesize Mandarin singing voice signals. With the synthetic singing voice, subjective perception tests are conducted. The results show that the singing voice synthesized with the ANN generated vibrato parameters is much increased in the naturalness level. Therefore, the combination of the ANN vibrato models and the HNM signal model is not only feasible for singing voice synthesis but also convenient to provide multiple singing voice timbres.

 

(a)
Each Mandarin syllable has ony one recorded utterance for analyzing HNM (harmonic-plus-noise model) parameters, i.e. no chance to do unit selection.
 
 
 
 
Recording of program execution
(b)
The HNM parameters obtained from analyzing a source syllable are used to synthesize syllables of diverse musical characteristics (i.e., various combinations of different pitches and durations).
 
 
 
 
Download test-program
 
Pitch co-articulation
 
(c)
Papers for references:
   Singing-voice Synthesis Using ANN Vibrato-parameter Models
   Mandarin Singing-voice Synthesis Using an HNM Based Scheme
 
 
 
Conference paper





1. Synthetic singing voice: "Young_Dancing" (青春舞曲)
Female voice
Male voice

link to synthetic singing voice waveform link to synthetic singing voice waveform
Using ANN models to generate vibrato parameter values.


Assigning fixed values (0.03 or 0.015) to the vibrato parameters.

No vibrato expressing.

Direct concatenating of recorded syllables.

Score file
Score file for "Young_Dancing"





2. Synthetic singing voice: "KangDing_madrigal" (康定情歌)
Female voice
Male voice

link to synthetic singing voice waveform link to synthetic singing voice waveform
Using ANN models to generate vibrato parameter values.
link to synthetic singing voice waveform,   link to synthetic singing voice waveform
link to synthetic singing voice waveformlink to synthetic singing voice waveform Assigning fixed values (0.03 or 0.015) to the vibrato parameters.
link to synthetic singing voice waveform link to synthetic singing voice waveform No vibrato expressing.

link to synthetic singing voice waveform link to synthetic singing voice waveform Direct concatenating of recorded syllables.

Score file
Score file for "KangDing_madrigal"





3. Synthetic singing voice: "Ode to Joy" (快樂頌)
Female voice
Male voice

link to synthetic singing voice waveform link to synthetic singing voice waveform
Using ANN models to generate vibrato parameter values.
link to synthetic singing voice waveform
link to synthetic singing voice waveform Assigning fixed values (0.03 or 0.015) to the vibrato parameters.

No vibrato expressing.

Direct concatenating of recorded syllables.
Score file
Score file for "ode to joy"





4. Synthetic singing voice: "Fishing_Song" (捕 魚歌)
Female voice
Male voice

link to synthetic singing voice waveform link to synthetic singing voice waveform Using ANN models to generate vibrato parameter values.
link to synthetic singing voice waveformlink to synthetic singing voice waveform link to synthetic singing voice waveformlink to synthetic singing voice waveform Assigning fixed values (0.03 or 0.015) to the vibrato parameters.

link to synthetic singing voice waveform link to synthetic singing voice waveform No vibrato expressing.

Direct concatenating of recorded syllables.

Score file
Score file for "Fishing_Song"





5. Other synthetic songs:

Female voice A
Female voice B
Male voice
姑娘十八一朵花
18-years-old lady as a flower
score file link to synthetic singing voice waveform link to synthetic singing voice waveform link to synthetic singing voice waveform
星夜的別離
star-night leaving
 score file link to synthetic singing voice waveform link to synthetic singing voice waveform link to synthetic singing voice waveform
卡布利島
Capri island
  link to synthetic singing voice waveform link to synthetic singing voice waveform link to synthetic singing voice waveform
多娜 多娜
Dona, Dona
  link to synthetic singing voice waveform link to synthetic singing voice waveform link to synthetic singing voice waveform
噢!蘇珊娜
O!Susanna
score file link to synthetic singing voice waveform link to synthetic singing voice waveform link to synthetic singing voice waveform








Program interface:

program interface







Recording of
program execution:
(a)synthetic male voice,    (b)synthetic female voice,    (c)synthetic voice (female 2)
 

 

Test program:
Download test program.