Festival Speech Synthesis System: - Speech Resource Pages

More documents

Recommendations

Info

Close the audio server down but wait until it is cleared. This is useful in scripts etc. when you wish to only exit when all audio is complete. (audio_mode 'shutup) Close the audio down now, stopping the current file being played and any in the queue. Note that this may take some time to take effect depending on which audio method you use. Sometimes there can be 100s of milliseconds of audio in the device itself which cannot be stopped. (audio_mode 'query) Lists the size of each waveform currently in the queue. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 24. Voices This chapter gives some general suggestions about adding new voices to Festival. Festival attempts to offer an environment where new voices and languages can easily be slotted in to the system. 24.1 Current voices Currently available voices 24.2 Building a new voice 24.3 Defining a new voice [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 24.1 Current voices Currently there are a number of voices available in Festival and we expect that number to increase. Each is elected via a function of the name `voice_*' which sets up the waveform synthesizer, phone set, lexicon, duration and intonation models (and anything else necessary) for that speaker. These voice setup functions are defined in `lib/voices.scm'. The current voice functions are voice_rab_diphone A British English male RP speaker, Roger. This uses the UniSyn residual excited LPC diphone synthesizer. The lexicon is the computer users version of Oxford Advanced Learners' Dictionary, with letter to sound rules trained from that lexicon. Intonation is provided by a ToBI-like system using a decision tree to predict accent and end tone position. The F0 itself is predicted as three points on each syllable, using linear regression trained from the Boston University FM database (f2b) and mapped to Roger's pitch range. Duration is predicted by decision tree, predicting zscore durations for segments trained from the 460 Timit sentence spoken by another British male speaker. voice_ked_diphone An American English male speaker, Kurt. Again this uses the UniSyn residual excited LPC diphone synthesizer. This uses the CMU lexicon, and letter to sound rules trained from it. Intonation as with Roger is trained from the Boston University FM Radio corpus. Duration for this voice also comes from that database. voice_kal_diphone An American English male speaker. Again this uses the UniSyn residual excited LPC diphone synthesizer. And like ked, uses the CMU lexicon, and letter to sound rules trained from it. Intonation as with Roger is trained from the Boston University FM Radio corpus. Duration for this voice also comes from that database. This voice was built in two days work and is at least as good as ked due to us understanding the process better. The diphone labels were autoaligned with hand correction. voice_don_diphone Steve Isard's LPC based diphone synthesizer, Donovan diphones. The other parts of this voice, lexicon, intonation, and duration are the same as voice_rab_diphone described above. The quality of the diphones is not as good as the other voices because it uses spike excited LPC. Although the quality is not as good it is much faster and the database is much smaller than the others.
voice_el_diphone A male Castilian Spanish speaker, using the Eduardo Lopez diphones. Alistair Conkie and Borja Etxebarria did much to make this. It has improved recently but is not as comprehensive as our English voices. voice_gsw_diphone This offers a male RP speaker, Gordon, famed for many previous CSTR synthesizers, using the standard diphone module. Its higher levels are very similar to the Roger voice above. This voice is not in the standard distribution, and is unlikely to be added for commercial reasons, even though it sounds better than Roger. voice_en1_mbrola The Roger diphone set using the same front end as voice_rab_diphone but uses the MBROLA diphone synthesizer for waveform synthesis. The MBROLA synthesizer and Roger diphone database (called en1) is not distributed by CSTR but is available for non-commercial use for free from http://tcts.fpms.ac.be/synthesis/mbrola.html. We do however provide the Festival part of the voice in `festvox_en1.tar.gz'. voice_us1_mbrola A female Amercian English voice using our standard US English front end and the us1 database for the MBROLA diphone synthesizer for waveform synthesis. The MBROLA synthesizer and the us1 diphone database is not distributed by CSTR but is available for non-commercial use for free from http://tcts.fpms.ac.be/synthesis/mbrola.html. We provide the Festival part of the voice in `festvox_us1.tar.gz'. voice_us2_mbrola A male Amercian English voice using our standard US English front end and the us2 database for the MBROLA diphone synthesizer for waveform synthesis. The MBROLA synthesizer and the us2 diphone database is not distributed by CSTR but is available for non-commercial use for free from http://tcts.fpms.ac.be/synthesis/mbrola.html. We provide the Festival part of the voice in `festvox_us2.tar.gz'. voice_us3_mbrola Another male Amercian English voice using our standard US English front end and the us2 database for the MBROLA diphone synthesizer for waveform synthesis. The MBROLA synthesizer and the us2 diphone database is not distributed by CSTR but is available for non-commercial use for free from http://tcts.fpms.ac.be/synthesis/mbrola.html. We provide the Festival part of the voice in `festvox_us1.tar.gz'. Other voices will become available through time. Groups other than CSTR are working on new voices. Particularly OGI's CSLU have release a number of American English voices, two Mexican Spanish voices and two German voices. All use OGI's their own residual excited LPC synthesizer which is distributed as a plug-in for Festival. (see http://www.cse.ogi.edu/CSLU/research/TTS for details). Other languages are being worked on including German, Basque, Welsh, Greek and Polish already have been developed and could be release soon. CSTR has a set of Klingon diphones though the text anlysis for Klingon still requires some work (If anyone has access to a good Klingon continous speech corpora please let us know.) Pointers and examples of voices developed at CSTR and elsewhere will be posted on the Festival home page. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 24.2 Building a new voice This section runs through the definition of a new voice in Festival. Although this voice is simple (it is a simplified version of the distributed spanish voice) it shows all the major parts that must be defined to get Festival to speak in a new voice. Thanks go to Alistair Conkie for helping me define this but as I don't speak Spanish there are probably many mistakes. Hopefully its pedagogical use is better than its ability to be understood in Castille. A much more detailed document on building voices in Festival has been written and is recommend reading for any one attempting to add a new voice to Festival black99. The information here is a little sparse though gives the basic requirements. The general method for defining a new voice is to define the parameters for all the various sub-parts e.g. phoneset,
Page 1 and 2:
[Top] [Contents] [Index] [ ? ] Fest
Page 3 and 4:
The Festival Speech Synthesis Syste
Page 5 and 6:
3.3 Edinburgh Speech Tools Library
Page 7 and 8:
multiple methods, though we will of
Page 9 and 10:
for non-commercial use (we are work
Page 11 and 12:
festlex_CMU.tar.gz festlex_OALD.tar
Page 13 and 14:
held), and voices_dir (pointing to
Page 15 and 16:
Ensure your audio device actually w
Page 17 and 18:
$ festival Festival Speech Synthesi
Page 19 and 20:
eference to a manual section and re
Page 21 and 22:
[ < ] [ > ] [ > ] [Top] [Contents]
Page 23 and 24:
To convert a symbol whose print nam
Page 25 and 26:
filter A Unix shell program filter
Page 27 and 28:
into name and IP address. Note that
Page 29 and 30: The boy saw the girl in the park
Page 31 and 32: VOLUME Allows the specification of
Page 33 and 34: festival/lib/tts.scm). [ < ] [ > ]
Page 35 and 36: 13.2 Defining lexicons Building new
Page 37 and 38: (debug_output t) before compilation
Page 39 and 40: ) The above isn't the most efficien
Page 41 and 42: The process involves the following
Page 43 and 44: (y _epsilon_ i ii i@ ai uh y @ ai-@
Page 45 and 46: lexicon by over 90%. The function r
Page 47 and 48: (define (postlex_apos_s_check utt)
Page 49 and 50: a list of syllables. Each member wi
Page 51 and 52: Phrase This allows explicit phrasin
Page 53 and 54: `(item.daughter2 ITEM)' Return the
Page 55 and 56: `stress' This item's lexical stress
Page 57 and 58: This pocket-watch was made in 1983.
Page 59 and 60: ((string-matches name "\\([dD][Rr]\
Page 61 and 62: (set! simple_phrase_cart_tree ' ((R
Page 63 and 64: accented (i.e. has an IntEvent rela
Page 65 and 66: (Utterance Words (boy (saw ((accent
Page 67 and 68: After prediction the segmental dura
Page 69 and 70: aa-ll &aa-l This states that the di
Page 71 and 72: The UniSyn_module_hooks are run bef
Page 73 and 74: for i in wave/*.wav do fname=`basen
Page 75 and 76: used on the signal, and/or up to th
Page 77 and 78: lib/voices/english/don_diphone/fest
Page 79: (Parameter.set 'Audio_Method 'irixa
Page 83 and 84: ) (PhoneSet.silences '(#)) Note som
Page 85 and 86: (set! spanish_phrase_cart_tree ' ((
Page 87 and 88: (us_diphone_init (list '(name "el_l
Page 89 and 90: (define (voice_giant) "comment comm
Page 91 and 92: 25. Tools A number of basic data ma
Page 93 and 94: CART ::= QUESTION-NODE || ANSWER-NO
Page 95 and 96: (define (pos_cand_function w) ;; se
Page 97 and 98: some label files identify point typ
Page 99 and 100: Building the models and getting goo
Page 101 and 102: `./src/modules/diphone' An optional
Page 103 and 104: to this function should be added to
Page 105 and 106: #include "festival.h" static LISP u
Page 107 and 108: In yout `Makefile' for this directo
Page 109 and 110: Every effort has been made to minim
Page 111 and 112: A typical example use of `festival_
Page 113 and 114: A simpler C only interface example
Page 115 and 116: 29.2 Singing Synthesis As an intere
Page 117 and 118: Magisterarbeit, Institute of Natura
Page 119 and 120: B C adding new LISP objects 27.2.4
Page 121 and 122: F G H Edinburgh Speech Tools Librar
Page 123 and 124: M N O P load-path 6.3 Site initiali
Page 125 and 126: S resynthesis 14.7 Utterance I/O ru
Page 127 and 128: U V W ungrouped diphones 20.1 UniSy
Page 129 and 130: 12. Phonesets 13. Lexicons 13.1 Lex
Page 131 and 132:
[Top] [Contents] [Index] [ ? ] Shor
show all

Festival Speech Synthesis System: - Speech Resource Pages

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?