12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

● The system is far too slow. Although machines are getting faster, it still takes too long to start the system and<br />

get it to speak some given text. Even so, on reasonable machines, <strong>Festival</strong> can generate the speech several<br />

times faster than it takes to say it. But even if it is five time faster, it will take 2 seconds to generate a 10<br />

second utterance. A 2 second wait is too long. Faster machines would improve this but a change in design is a<br />

better solution.<br />

● The system is too big. It takes a long time to compile even on quite large machines, and its foot print is still in<br />

the 10s of megabytes as is the run-time requirement. Although we have spent some time trying to fix this<br />

(optional modules have made the possibility of building a much smaller binary) we haven't done enough yet.<br />

● The signal quality of the voices isn't very good by today's standard of synthesizers, even given the<br />

improvement quality since the last release. This is partly our fault in not spending the time (or perhaps also<br />

not having enough expertise) on the low-level waveform synthesis parts of the system. This will improve in<br />

the future with better signal processing (under development) and better synthesis techniques (also under<br />

development).<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

31. References<br />

allen87<br />

Allen J., Hunnicut S. and Klatt, D. Text-to-speech: the MITalk system, Cambridge University Press, 1987.<br />

abelson85<br />

Abelson H. and Sussman G. Structure and Interpretation of Computer Programs, MIT Press, 1985.<br />

black94<br />

Black A. and Taylor, P. "CHATR: a generic speech synthesis system.", Proceedings of COLING-94, Kyoto,<br />

Japan 1994.<br />

black96<br />

Black, A. and Hunt, A. "Generating F0 contours from ToBI labels using linear regression", ICSLP96, vol. 3,<br />

pp 1385-1388, Philadelphia, PA. 1996.<br />

black97b<br />

Black, A, and Taylor, P. "Assigning Phrase Breaks from Part-of-<strong>Speech</strong> Sequences", Eurospeech97, Rhodes,<br />

Greece, 1997.<br />

black97c<br />

Black, A, and Taylor, P. "Automatically clustering similar units for unit selection in speech synthesis",<br />

Eurospeech97, Rhodes, Greece, 1997.<br />

black98<br />

Black, A., Lenzo, K. and Pagel, V., "Issues in building general letter to sound rules.", 3rd ESCA Workshop on<br />

<strong>Speech</strong> <strong>Synthesis</strong>, Jenolan Caves, Australia, 1998.<br />

black99<br />

Black, A., and Lenzo, K., "Building Voices in the <strong>Festival</strong> <strong>Speech</strong> <strong>Synthesis</strong> <strong>System</strong>," unpublished document,<br />

Carnegie Mellon University, available at http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/<br />

breiman84<br />

Breiman, L., Friedman, J. Olshen, R. and Stone, C. Classification and regression trees, Wadsworth and<br />

Brooks, Pacific Grove, CA. 1984.<br />

campbell91<br />

Campbell, N. and Isard, S. "Segment durations in a syllable frame", Journal of Phonetics, 19:1 37-47, 1991.<br />

DeRose88<br />

DeRose, S. "Grammatical category disambiguation by statistical optimization". Computational Linguistics,<br />

14:31-39, 1988.<br />

dusterhoff97<br />

Dusterhoff, K. and Black, A. "Generating F0 contours for speech synthesis using the Tilt intonation theory"<br />

Proceedings of ESCA Workshop of Intonation, September, Athens, Greece. 1997<br />

dutoit97<br />

Dutoit, T. An introduction to Text-to-<strong>Speech</strong> <strong>Synthesis</strong> Kluwer Acedemic Publishers, 1997.<br />

hunt89<br />

Hunt, M., Zwierynski, D. and Carr, R. "Issues in high quality LPC analysis and synthesis", Eurospeech89,<br />

vol. 2, pp 348-351, Paris, France. 1989.<br />

jilka96<br />

Jilka M. Regelbasierte Generierung natuerlich klingender Intonation des Amerikanischen Englisch,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!