Building Synthetic Voices | ||
---|---|---|
<<< Previous | Chapter 6. Text analysis | Next >>> |
Almost every one will expect a synthesizer to be able to speech numbers. As it is not feasible to list all possible digit strings in your lexicon. You will need to provide a function that returns a string of words for a given string of digits.
In its simplest form you should provide a function that decodes the string of digits. The example spanish_number (and spanish_number_from_digits} in the released Spanish voice (festvox_ellpc11k.tar.gz is a good general example.
A number of languages uses spaces within numbers where English might use commas. For example German, Polish and others text may contain
to denote sixty four thousand. As this will be multiple tokens in Festival's basic analysis it is necessary to write multiple conditions in your token_to_words function.64 000
In many languages, the pronunciation of a number depends on the thing that is being counted. For example the digit '1' in Spanish has multiple pronunciations depending on whether it is refering to a masculine or feminine object. In some languages this becomes much more complex where there are a number of possible declensions. In our Polish synthesizer we solved this by adding an extra argument to number generation function which then selected the actual number word (typically the final word in a number) based in the desired declension.
%%%%%%%%%%%%%%%%%%%
Example to be added
%%%%%%%%%%%%%%%%%%%
<<< Previous | Home | Next >>> |
Token to word rules | Up | Homograph disambiguation |