|
The three TTS DLL functions below must be
invoked in sequence within a thread. InitTTS()
must be invoked first, GenSpeechFile() can
be invoked several times, and StopTTS() should
be invoked finally. |
|
Since these functions support multi-threaded
applications, it is o.k. to invoke InitTTS()
many times in different threads before GenSpeechFile()
is invoked. |
|
|
|
Long InitTTS(long
*Hndl, char *InitCondition); |
|
Long GenSpeechFile(long
Hndl, char *InputCharString, char *OutputFilePathAndName, long *Prcnt); |
|
Long StopTTS(long
Hndl); |
|
|
InitTTS( ) |
is used to initialize the environment, to
load parameters and signal files, and to assign a processing handle. If
succeeded, its return value is zero. Otherwise, a error code is returned. |
|
The first parameter, Hndl,
can carry in a desired handle number between 1 to 255. If you don't care
which number is assigned, Hndl must be set to zero before invoking InitTTS().
If succeeded, the assigned handle number is carried out also by this parameter. |
|
The second parameter, InitCondition,
can carry in the desired options. The details of the acceptable options
will be explained below. |
|
|
GenSpeechFile( ) |
is used to synthesize speech signal file according
to the text given. If succeeded, its return value is zero. Otherwise, a
error code is returned. |
|
The parameter, Hndl,
is used to carry in handle number. |
|
The parameter, InputCharString,
is used to carry in the text string to be synthesized. |
|
The parameter, OutputFilePathAndName,
is used to designate the output path and file name to store the synthesized
speech signal. |
|
The parameter,
Prcnt,
is currently not used but its value must be set to NULL before invoking
GenSpeechFile(). |
|
|
StopTTS( ) |
is used to free the memory space allocated
for a handle. The parameter, Hndl, is used
to carry in handle number. |
|
The options that can be put into the parameter,
InitCondition, before invoking GenSpeechFile(). |
|
|
"pathpr=...", |
e.g. "pathpr=c:\gtswin\" |
|
set working directory path. There must
be no blanks in the path name because blank is the delimiter for options.
checked before "gts.ini". |
|
|
"gts.ini" |
if used, the other options will be checked
from the first line of the file, gts.ini, except "pathpr=..." |
|
|
"sndsrc=#", |
# can be 0, 1, 2, or 3. select a source signal
file pair. |
"sndsrc=0" |
selects the pair, base.dic (11,025Hz
sampled male signal waveform), basepit.dic (pitch peak position information). |
"sndsrc=1"
(the default) |
selects the pair, base_.dic (22,050Hz sampled
male voice), basepit_.dic (pitch peak position information). |
"sndsrc=2" |
selects the pair, basef.dic (22,050Hz sampled
female voice but not provided for down-load), basepitf.dic (pitch peak
position information). |
"sndsrc=3" |
selects the pair, basef8.dic (11,025Hz
sampled female voice), basepitf8.dic (pitch peak position information). |
|
|
"msspkr=Mary"
(the default) |
or "msspkr=Mike" |
|
select English source signal speaker. |
|
|
"timeout=#", |
# is a floating point number. |
|
set time-out value in seconds. A time-out
error code is returned if time elapsed exceed the time-out value specified.
The default is 7.8 sec. if not specified. |
|
|
"noanti" |
no anti-aliasing processing to have
higher synthesis speed but signal quality will be slightly degraded. |
|
|
"interl" |
to promote synthesis speed for female
recorded source signal. Make use of pitch periods in interleaving manner. |
|
|
"fast" |
not new but implies both "noanti" and
"interl". Note that use "sndsrc=0" or "sndsrc=3" together to obtain fast
synthesis speed. |
|
|
"-16" |
output synthesized signal sample in
16 bits/sample PCM format. The default is 8 bits/sample mu-law format. |
|
|
"outhead" |
place a "wav" header record at the
start of the synthesized speech file. |
|
|
"outrt=#", |
# should be a number between 6000 and
24000. |
|
set the sampling rate of the synthesized
signal. The default is 8000 if not specified but 11025 at least is suggested.
Also, if outrt=11025 and sndsrc=(0 or 3) are specified, then no down-sampling
processing is required, which results in higher synthesis speed. |
|
|
"adpcm" |
store synthesized-signal sample in
ADPCM format. This implies "-16" and "outhead" because an utility program
will be invoked last to do format conversion. |
@>0, |
Comma is needed. Reset all parameters
to their default values. |
@>dxxx |
xxx represent three decimal digits. Value
range is from 160 to 990. This value controls speaking rate, i.e. xxx mini-seconds
in duration per syllable. |
@>txxx |
xxx represent three decimal digits. Value
range is from 060 to 400. This value controls tone height, i.e. xxx Hz
in average. |
@>vxxx |
xxx represent three decimal digits. Value
range is from 050 to 200. This value controls vocal-track shortening/lengthening
to have distinct timbres. |
@>cxxx |
xxx represent three decimal digits. Value
range is from 060 to 400. This is a combined control of tone height and
vocal-track length to have distinct timbres according to a single tone-height
value, xxx. |
@>Bxxx |
xxx represent three decimal digits. Value
range is from 050 to 250. This value controls the rate of pitch bending.
Values less than 100 means flating. Values larger than 100 means sharpening. |