本論文研究了一種使用向量量化(VQ)與隱藏式馬可夫模型(HMM)為基礎的方法,來建立一個國語語句的音調演進的模式,稱為語句基週軌跡之隱藏式馬可夫模型(SPC-HMM) 。我們先對音節的基週軌跡作時間與音高的正規化及向量量化的處理,然後以語句的基週軌跡量化碼序列,配合相鄰音節的音高差數值,來訓練SPC-HMM,讓模型經由訓練的過程,來記錄語句在前後不同段落上,各種聲調組合應該對應的基週軌跡。在合成階段裡,根據所訓練出來的模型參數,可使用3D動態規劃演算法來產生機率上最佳的基週軌跡量化碼序列,此外也可配合上游處理的資訊來直接設定狀態轉移序列。由測試實驗得到SPC-HMM模型的內部測試的結果是,一個音節的音高均方根(rms)誤差平均為0.0426(相當於 5.1Hz於120Hz),外部測試的結果則是,一個音節的音高均方根誤差平均為0.0524(相當於 6.2Hz於120Hz)。
ABSTRACT
In this thesis a VQ/HMM based method for sentence pitch-contour generation for Mandarin text-to-speech applications is studied. It is named sentence pitch-contour hidden Markov Model (SPC-HMM). In this method, the pitch-contours of syllables comprised in a sentence are normalized on both time and frequency axes first. Then the normalized pitch-contours are vector quantized to discrete symbols. The symbol sequence of all training sentences and the pitch-height difference of adjacent syllables are then used to train SPC-HMM. By this training, the correspondence between a tone combination and its pitch-contour in different parts of a sentence can be recorded. In the synthesis phase, a 3D dynamic programming algorithm is designed to find the statistically best symbol sequence corresponding to a sentence pitch-contour. Alternately, the state transition sequence of SPC-HMM can be directly arranged according to the information from the text-analysis processing. Some pitch-contour generation experiments are conducted. The result show that the average rms error for a syllable’s pitch high is 0.0426(i.e., 5.1Hz at 120Hz) for inside test, and is 0.0524(i.e., 6.2Hz at 120Hz) for outside test.