Mandarin Singing Voice
Synthesis Using ANN Vibrato-parameter Models |
|
Hung-Yan Gu (古鴻炎) and Zheng-Fu
Lin (林正甫) e-mail: guhy@mail.ntust.edu.tw |
2011 |
Vibrato is an
important factor that affects the naturalness level of a synthetic
singing
voice. Therefore, the analysis and modeling of vibrato parameters are
studied
in this paper. The vibrato parameters of those syllables segmented from
recorded songs are analyzed by using short-time Fourier transform and
the
method of analytic signal. After the vibrato parameter values for all
training
syllables are extracted and normalized, they are used to train an
artificial
neural network (ANN) for each type of vibrato parameter. Then, these
ANN models
are used to generate the values of vibrato parameters. Next, these
parameter
values and other music information are used together to control a
harmonic-plus-noise
model (HNM) to synthesize Mandarin singing voice signals. With the
synthetic
singing voice, subjective perception tests are conducted. The results
show that
the singing voice synthesized with the ANN generated vibrato parameters
is much increased in the naturalness level. Therefore, the
combination of the ANN vibrato models and the HNM signal model is not
only
feasible for singing voice synthesis but also convenient to provide
multiple
singing voice timbres. |
(a) |
Each Mandarin syllable has ony one
recorded utterance for analyzing HNM (harmonic-plus-noise model)
parameters, i.e. no chance to do unit selection. |
|
|
|
Recording of program execution |
(b) |
The
HNM parameters obtained from analyzing a source syllable are used to
synthesize syllables of diverse
musical characteristics (i.e., various combinations of different
pitches and durations). |
|
|
|
Download
test-program Pitch co-articulation |
(c) |
Papers for references: Singing-voice Synthesis Using ANN Vibrato-parameter Models Mandarin Singing-voice Synthesis Using an HNM Based Scheme |
|
|
|
Conference paper |
Female
voice |
Male
voice |
|
Using ANN models to generate vibrato parameter values.
|
||
,
|
,
|
Assigning fixed values (0.03
or 0.015) to the vibrato
parameters. |
No vibrato expressing. |
||
Direct concatenating of recorded syllables. |
||
Score
file |
Score
file for "Young_Dancing" |
Female
voice |
Male
voice |
|
Using ANN models
to generate vibrato parameter values.
|
||
,
|
, | Assigning fixed values (0.03 or 0.015) to the vibrato parameters. |
No vibrato expressing. |
||
Direct concatenating of recorded syllables. |
||
Score
file |
Score
file for "KangDing_madrigal" |
Female
voice |
Male
voice |
|
Using ANN models
to generate vibrato parameter values.
|
||
,
|
, | Assigning fixed values (0.03 or 0.015) to
the vibrato parameters. |
No vibrato expressing. |
||
Direct concatenating of recorded syllables. | ||
Score
file |
Score
file for "ode to joy" |
Female
voice |
Male
voice |
|
Using ANN models to generate vibrato parameter values. |
||
, | , | Assigning fixed
values (0.03
or 0.015) to the vibrato parameters. |
No vibrato
expressing. |
||
Direct concatenating of recorded syllables. |
||
Score file |
Score file for "Fishing_Song" |
Female voice A |
Female voice B |
Male voice | ||
姑娘十八一朵花 18-years-old lady as a flower |
score file | |||
星夜的別離 star-night leaving |
score file | |||
卡布利島 Capri island |
||||
多娜 多娜 Dona, Dona |
||||
噢!蘇珊娜 O!Susanna |
score file |
Recording
of program execution: |
(a), (b), (c) |
|
|
Test program: |
Download test program. |