Mandarin Speech
Synthesis Using Spectrum-Progression Model and HNM HNM: Harmonic-plus-Noise Model |
Hung-Yan Gu and Chang-Yi Wu e-mail: guhy@mail.ntust.edu.tw |
In this paper,
an ANN based spectrum-progression model is proposed to improve the
fluency level of synthetic Mandarin speech. First, each target syllable
(uttered in a sentence) is matched with its corresponding reference
syllable (uttered in isolation) by using dynamic time warping. Then,
each warped path, i.e. spectrum-progression path, is time normalized to
have fixed dimensions, and used to train an ANN based
spectrum-progression model (SPM). After training, the SPM is used
together with other modules such as text analysis, prosody parameter
generation, and signal sample generation to synthesize Mandarin speech.
Then, the synthetic speech is used to conduct perception tests. The
test results show that the SPM proposed here can indeed improve the
fluency level noticeably. |
text
file: text_s5 |
Avg. syllable
duration: 300ms; Pitch: 220Hz.
|
Recording
of program execution |
||
Using spectrum-progression
model and HNM. Timbre transformation: female => male(VTL:100/80%, Pitch:140Hz),, female => child(VTL:100/115%, Pitch:140Hz or 280Hz), |
|
|||
Using linear time mapping and HNM. |
||||
Using linear time
mapping and PSOLA. |
||||
Direct concatenation of recorded syllables. |
text
file: text_s3 |
Avg. syllable
duration: 330ms; Pitch: 220Hz.
|
Recording
of program execution |
||
Using spectrum-progression
model and HNM. |
|
|||
Using linear time mapping and HNM. | ||||
Using linear time
mapping and PSOLA. |
||||
Direct concatenation of recorded syllables. |
text
file: text_sn |
Avg. syllable
duration: 220ms; Pitch: 220Hz.
|
Recording
of program execution |
||||
Using spectrum-progression
model and HNM. Timbre transformation: female => male (VTL:100/85%, Pitch:120Hz),, |
|
|||||
Using linear time mapping and HNM. | ||||||
Using linear time
mapping and PSOLA. |