Festival Speech Synthesis System: - Speech Resource Pages

More documents

Recommendations

Info

● The system is far too slow. Although machines are getting faster, it still takes too long to start the system and get it to speak some given text. Even so, on reasonable machines, Festival can generate the speech several times faster than it takes to say it. But even if it is five time faster, it will take 2 seconds to generate a 10 second utterance. A 2 second wait is too long. Faster machines would improve this but a change in design is a better solution. ● The system is too big. It takes a long time to compile even on quite large machines, and its foot print is still in the 10s of megabytes as is the run-time requirement. Although we have spent some time trying to fix this (optional modules have made the possibility of building a much smaller binary) we haven't done enough yet. ● The signal quality of the voices isn't very good by today's standard of synthesizers, even given the improvement quality since the last release. This is partly our fault in not spending the time (or perhaps also not having enough expertise) on the low-level waveform synthesis parts of the system. This will improve in the future with better signal processing (under development) and better synthesis techniques (also under development). [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 31. References allen87 Allen J., Hunnicut S. and Klatt, D. Text-to-speech: the MITalk system, Cambridge University Press, 1987. abelson85 Abelson H. and Sussman G. Structure and Interpretation of Computer Programs, MIT Press, 1985. black94 Black A. and Taylor, P. "CHATR: a generic speech synthesis system.", Proceedings of COLING-94, Kyoto, Japan 1994. black96 Black, A. and Hunt, A. "Generating F0 contours from ToBI labels using linear regression", ICSLP96, vol. 3, pp 1385-1388, Philadelphia, PA. 1996. black97b Black, A, and Taylor, P. "Assigning Phrase Breaks from Part-of-Speech Sequences", Eurospeech97, Rhodes, Greece, 1997. black97c Black, A, and Taylor, P. "Automatically clustering similar units for unit selection in speech synthesis", Eurospeech97, Rhodes, Greece, 1997. black98 Black, A., Lenzo, K. and Pagel, V., "Issues in building general letter to sound rules.", 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, 1998. black99 Black, A., and Lenzo, K., "Building Voices in the Festival Speech Synthesis System," unpublished document, Carnegie Mellon University, available at http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/ breiman84 Breiman, L., Friedman, J. Olshen, R. and Stone, C. Classification and regression trees, Wadsworth and Brooks, Pacific Grove, CA. 1984. campbell91 Campbell, N. and Isard, S. "Segment durations in a syllable frame", Journal of Phonetics, 19:1 37-47, 1991. DeRose88 DeRose, S. "Grammatical category disambiguation by statistical optimization". Computational Linguistics, 14:31-39, 1988. dusterhoff97 Dusterhoff, K. and Black, A. "Generating F0 contours for speech synthesis using the Tilt intonation theory" Proceedings of ESCA Workshop of Intonation, September, Athens, Greece. 1997 dutoit97 Dutoit, T. An introduction to Text-to-Speech Synthesis Kluwer Acedemic Publishers, 1997. hunt89 Hunt, M., Zwierynski, D. and Carr, R. "Issues in high quality LPC analysis and synthesis", Eurospeech89, vol. 2, pp 348-351, Paris, France. 1989. jilka96 Jilka M. Regelbasierte Generierung natuerlich klingender Intonation des Amerikanischen Englisch,
Magisterarbeit, Institute of Natural Language Processing, University of Stuttgart. 1996 moulines90 Moulines, E, and Charpentier, N. "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones" Speech Communication, 9(5/6) pp 453-467. 1990. pagel98, Pagel, V., Lenzo, K., and Black, A. "Letter to Sound Rules for Accented Lexicon Compression", ICSLP98, Sydney, Australia, 1998. ritchie92 Ritchie G, Russell G, Black A and Pulman S. Computational Morphology: practical mechanisms for the English Lexicon, MIT Press, Cambridge, Mass. vansanten96 van Santen, J., Sproat, R., Olive, J. and Hirschberg, J. eds, "Progress in Speech Synthesis," Springer Verlag, 1996. silverman92 Silverman K., Beckman M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J "ToBI: a standard for labelling English prosody." Proceedings of ICSLP92 vol 2. pp 867-870, 1992 sproat97 Sproat, R., Taylor, P, Tanenblatt, M. and Isard, A. "A Markup Language for Text-to-Speech Synthesis", Eurospeech97, Rhodes, Greece, 1997. sproat98, Sproat, R. eds, "Multilingual Text-to-Speech Synthesis: The Bell Labs approach", Kluwer 1998. sable98, Sproat, R., Hunt, A., Ostendorf, M., Taylor, P., Black, A., Lenzo, K., and Edgington, M. "SABLE: A standard for TTS markup." ICSLP98, Sydney, Australia, 1998. taylor91 Taylor P., Nairn I., Sutherland A. and Jack M.. "A real time speech synthesis system", Eurospeech91, vol. 1, pp 341-344, Genoa, Italy. 1991. taylor96 Taylor P. and Isard, A. "SSML: A speech synthesis markup language" to appear in Speech Communications. wwwxml97 World Wide Web Consortium Working Draft "Extensible Markup Language (XML)Version 1.0 Part 1: Syntax", http://www.w3.org/pub/WWW/TR/WD-xml-lang-970630.html yarowsky96 Yarowsky, D., "Homograph disambiguation in text-to-speech synthesis", in "Progress in Speech Synthesis," eds. van Santen, J., Sproat, R., Olive, J. and Hirschberg, J. pp 157-172. Springer Verlag, 1996. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 32. Feature functions This chapter contains a list of a basic feature functions available for stream items in utterances. See section 14.6 Features. These are the basic features, which can be combined with relative features (such as n. for next, and relations to follow links). Some of these features are implemented as short C++ functions (e.g. asyl_in) while others are simple features on an item (e.g. pos). Note that functional feature take precidence over simple features, so accessing and feature called "X" will always use the function called "X" even if a the simple feature call "X" exists on the item. Unlike previous versions there are no features that are builtin on all items except addr (reintroduced in 1.3.1) which returns a unique string for that item (its the hex address on teh item within the machine). Features may be defined through Scheme too, these all have the prefix lisp_. The feature functions are listed in the form Relation.name where Relation is the name of the stream that the function is appropriate to and name is its name. Note that you will not require the Relation part of the name if the stream item you are applying the function to is of that type.
Page 1 and 2:
[Top] [Contents] [Index] [ ? ] Fest
Page 3 and 4:
The Festival Speech Synthesis Syste
Page 5 and 6:
3.3 Edinburgh Speech Tools Library
Page 7 and 8:
multiple methods, though we will of
Page 9 and 10:
for non-commercial use (we are work
Page 11 and 12:
festlex_CMU.tar.gz festlex_OALD.tar
Page 13 and 14:
held), and voices_dir (pointing to
Page 15 and 16:
Ensure your audio device actually w
Page 17 and 18:
$ festival Festival Speech Synthesi
Page 19 and 20:
eference to a manual section and re
Page 21 and 22:
[ < ] [ > ] [ > ] [Top] [Contents]
Page 23 and 24:
To convert a symbol whose print nam
Page 25 and 26:
filter A Unix shell program filter
Page 27 and 28:
into name and IP address. Note that
Page 29 and 30:
The boy saw the girl in the park
Page 31 and 32:
VOLUME Allows the specification of
Page 33 and 34:
festival/lib/tts.scm). [ < ] [ > ]
Page 35 and 36:
13.2 Defining lexicons Building new
Page 37 and 38:
(debug_output t) before compilation
Page 39 and 40:
) The above isn't the most efficien
Page 41 and 42:
The process involves the following
Page 43 and 44:
(y _epsilon_ i ii i@ ai uh y @ ai-@
Page 45 and 46:
lexicon by over 90%. The function r
Page 47 and 48:
(define (postlex_apos_s_check utt)
Page 49 and 50:
a list of syllables. Each member wi
Page 51 and 52:
Phrase This allows explicit phrasin
Page 53 and 54:
`(item.daughter2 ITEM)' Return the
Page 55 and 56:
`stress' This item's lexical stress
Page 57 and 58:
This pocket-watch was made in 1983.
Page 59 and 60:
((string-matches name "\\([dD][Rr]\
Page 61 and 62:
(set! simple_phrase_cart_tree ' ((R
Page 63 and 64:
accented (i.e. has an IntEvent rela
Page 65 and 66: (Utterance Words (boy (saw ((accent
Page 67 and 68: After prediction the segmental dura
Page 69 and 70: aa-ll &aa-l This states that the di
Page 71 and 72: The UniSyn_module_hooks are run bef
Page 73 and 74: for i in wave/*.wav do fname=`basen
Page 75 and 76: used on the signal, and/or up to th
Page 77 and 78: lib/voices/english/don_diphone/fest
Page 79 and 80: (Parameter.set 'Audio_Method 'irixa
Page 81 and 82: voice_el_diphone A male Castilian S
Page 83 and 84: ) (PhoneSet.silences '(#)) Note som
Page 85 and 86: (set! spanish_phrase_cart_tree ' ((
Page 87 and 88: (us_diphone_init (list '(name "el_l
Page 89 and 90: (define (voice_giant) "comment comm
Page 91 and 92: 25. Tools A number of basic data ma
Page 93 and 94: CART ::= QUESTION-NODE || ANSWER-NO
Page 95 and 96: (define (pos_cand_function w) ;; se
Page 97 and 98: some label files identify point typ
Page 99 and 100: Building the models and getting goo
Page 101 and 102: `./src/modules/diphone' An optional
Page 103 and 104: to this function should be added to
Page 105 and 106: #include "festival.h" static LISP u
Page 107 and 108: In yout `Makefile' for this directo
Page 109 and 110: Every effort has been made to minim
Page 111 and 112: A typical example use of `festival_
Page 113 and 114: A simpler C only interface example
Page 115: 29.2 Singing Synthesis As an intere
Page 119 and 120: B C adding new LISP objects 27.2.4
Page 121 and 122: F G H Edinburgh Speech Tools Librar
Page 123 and 124: M N O P load-path 6.3 Site initiali
Page 125 and 126: S resynthesis 14.7 Utterance I/O ru
Page 127 and 128: U V W ungrouped diphones 20.1 UniSy
Page 129 and 130: 12. Phonesets 13. Lexicons 13.1 Lex
Page 131 and 132: [Top] [Contents] [Index] [ ? ] Shor
show all

Festival Speech Synthesis System: - Speech Resource Pages

Create successful ePaper yourself

Delete template?

Save as template?