Mandarin Singing Voice Synthesis Using ANN Vibrato Parameter Models

Mandarin Singing Voice Synthesis Using ANN Vibrato-parameter Models
Hung-Yan Gu (古鴻炎) and Zheng-Fu Lin (林正甫) e-mail: guhy@mail.ntust.edu.tw	2011

Abstract

Vibrato is an important factor that affects the naturalness level of a synthetic singing voice. Therefore, the analysis and modeling of vibrato parameters are studied in this paper. The vibrato parameters of those syllables segmented from recorded songs are analyzed by using short-time Fourier transform and the method of analytic signal. After the vibrato parameter values for all training syllables are extracted and normalized, they are used to train an artificial neural network (ANN) for each type of vibrato parameter. Then, these ANN models are used to generate the values of vibrato parameters. Next, these parameter values and other music information are used together to control a harmonic-plus-noise model (HNM) to synthesize Mandarin singing voice signals. With the synthetic singing voice, subjective perception tests are conducted. The results show that the singing voice synthesized with the ANN generated vibrato parameters is much increased in the naturalness level. Therefore, the combination of the ANN vibrato models and the HNM signal model is not only feasible for singing voice synthesis but also convenient to provide multiple singing voice timbres.

(a)	Each Mandarin syllable has ony one recorded utterance for analyzing HNM (harmonic-plus-noise model) parameters, i.e. no chance to do unit selection.	Recording of program execution
(b)	The HNM parameters obtained from analyzing a source syllable are used to synthesize syllables of diverse musical characteristics (i.e., various combinations of different pitches and durations).	Download test-program Pitch co-articulation
(c)	Papers for references: Singing-voice Synthesis Using ANN Vibrato-parameter Models Mandarin Singing-voice Synthesis Using an HNM Based Scheme	Conference paper

1. Synthetic singing voice: "Young_Dancing" (青春舞曲)

Female voice	Male voice
		Using ANN models to generate vibrato parameter values.
,	,	Assigning fixed values (0.03 or 0.015) to the vibrato parameters.
		No vibrato expressing.
		Direct concatenating of recorded syllables.
Score file		Score file for "Young_Dancing"

2. Synthetic singing voice: "KangDing_madrigal" (康定情歌)

Female voice	Male voice
		Using ANN models to generate vibrato parameter values.
,	,	Assigning fixed values (0.03 or 0.015) to the vibrato parameters.
		No vibrato expressing.
		Direct concatenating of recorded syllables.
Score file		Score file for "KangDing_madrigal"

3. Synthetic singing voice: "Ode to Joy" (快樂頌)

Female voice	Male voice
		Using ANN models to generate vibrato parameter values.
,	,	Assigning fixed values (0.03 or 0.015) to the vibrato parameters.
		No vibrato expressing.
		Direct concatenating of recorded syllables.
Score file		Score file for "ode to joy"

4. Synthetic singing voice: "Fishing_Song" (捕魚歌)

Female voice	Male voice
		Using ANN models to generate vibrato parameter values.
,	,	Assigning fixed values (0.03 or 0.015) to the vibrato parameters.
		No vibrato expressing.
		Direct concatenating of recorded syllables.
Score file		Score file for "Fishing_Song"

5. Other synthetic songs:

		Female voice A	Female voice B	Male voice
姑娘十八一朵花 18-years-old lady as a flower	score file
星夜的別離 star-night leaving	score file
卡布利島 Capri island
多娜多娜 Dona, Dona
噢!蘇珊娜 O!Susanna	score file

Program interface:

Recording of program execution:	(a), (b), (c)

Test program:	Download test program.