Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Festival Speech Synthesis System: - Speech Resource Pages
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
● The system is far too slow. Although machines are getting faster, it still takes too long to start the system and<br />
get it to speak some given text. Even so, on reasonable machines, <strong>Festival</strong> can generate the speech several<br />
times faster than it takes to say it. But even if it is five time faster, it will take 2 seconds to generate a 10<br />
second utterance. A 2 second wait is too long. Faster machines would improve this but a change in design is a<br />
better solution.<br />
● The system is too big. It takes a long time to compile even on quite large machines, and its foot print is still in<br />
the 10s of megabytes as is the run-time requirement. Although we have spent some time trying to fix this<br />
(optional modules have made the possibility of building a much smaller binary) we haven't done enough yet.<br />
● The signal quality of the voices isn't very good by today's standard of synthesizers, even given the<br />
improvement quality since the last release. This is partly our fault in not spending the time (or perhaps also<br />
not having enough expertise) on the low-level waveform synthesis parts of the system. This will improve in<br />
the future with better signal processing (under development) and better synthesis techniques (also under<br />
development).<br />
[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />
31. References<br />
allen87<br />
Allen J., Hunnicut S. and Klatt, D. Text-to-speech: the MITalk system, Cambridge University Press, 1987.<br />
abelson85<br />
Abelson H. and Sussman G. Structure and Interpretation of Computer Programs, MIT Press, 1985.<br />
black94<br />
Black A. and Taylor, P. "CHATR: a generic speech synthesis system.", Proceedings of COLING-94, Kyoto,<br />
Japan 1994.<br />
black96<br />
Black, A. and Hunt, A. "Generating F0 contours from ToBI labels using linear regression", ICSLP96, vol. 3,<br />
pp 1385-1388, Philadelphia, PA. 1996.<br />
black97b<br />
Black, A, and Taylor, P. "Assigning Phrase Breaks from Part-of-<strong>Speech</strong> Sequences", Eurospeech97, Rhodes,<br />
Greece, 1997.<br />
black97c<br />
Black, A, and Taylor, P. "Automatically clustering similar units for unit selection in speech synthesis",<br />
Eurospeech97, Rhodes, Greece, 1997.<br />
black98<br />
Black, A., Lenzo, K. and Pagel, V., "Issues in building general letter to sound rules.", 3rd ESCA Workshop on<br />
<strong>Speech</strong> <strong>Synthesis</strong>, Jenolan Caves, Australia, 1998.<br />
black99<br />
Black, A., and Lenzo, K., "Building Voices in the <strong>Festival</strong> <strong>Speech</strong> <strong>Synthesis</strong> <strong>System</strong>," unpublished document,<br />
Carnegie Mellon University, available at http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/<br />
breiman84<br />
Breiman, L., Friedman, J. Olshen, R. and Stone, C. Classification and regression trees, Wadsworth and<br />
Brooks, Pacific Grove, CA. 1984.<br />
campbell91<br />
Campbell, N. and Isard, S. "Segment durations in a syllable frame", Journal of Phonetics, 19:1 37-47, 1991.<br />
DeRose88<br />
DeRose, S. "Grammatical category disambiguation by statistical optimization". Computational Linguistics,<br />
14:31-39, 1988.<br />
dusterhoff97<br />
Dusterhoff, K. and Black, A. "Generating F0 contours for speech synthesis using the Tilt intonation theory"<br />
Proceedings of ESCA Workshop of Intonation, September, Athens, Greece. 1997<br />
dutoit97<br />
Dutoit, T. An introduction to Text-to-<strong>Speech</strong> <strong>Synthesis</strong> Kluwer Acedemic Publishers, 1997.<br />
hunt89<br />
Hunt, M., Zwierynski, D. and Carr, R. "Issues in high quality LPC analysis and synthesis", Eurospeech89,<br />
vol. 2, pp 348-351, Paris, France. 1989.<br />
jilka96<br />
Jilka M. Regelbasierte Generierung natuerlich klingender Intonation des Amerikanischen Englisch,