Festival Speech Synthesis System: - Speech Resource Pages

More documents

Recommendations

Info

for i in a b c d e f g h i j k l m n o p q r s t u v w x y z do # Stop value for wagon STOP=2 echo letter $i STOP $STOP # Find training set for letter $i cat oald.train.feats | awk '{if ($6 == "'$i'") print $0}' >ltsdataTRAIN.$i.feats # split training set to get heldout data for stepwise testing traintest ltsdataTRAIN.$i.feats # Extract test data for letter $i cat oald.test.feats | awk '{if ($6 == "'$i'") print $0}' >ltsdataTEST.$i.feats # run wagon to predict model wagon -data ltsdataTRAIN.$i.feats.train -test ltsdataTRAIN.$i.feats.test \ -stepwise -desc ltsOALD.desc -stop $STOP -output lts.$i.tree # Test the resulting tree against wagon_test -heap 2000000 -data ltsdataTEST.$i.feats -desc ltsOALD.desc \ -tree lts.$i.tree done The script `traintest' splits the given file `X' into `X.train' and `X.test' with every tenth line in `X.test' and the rest in `X.train'. This script can take a significnat amount of time to run, about 6 hours on a Sun Ultra 140. Once the models are created the must be collected together into a single list structure. The trees generated by `wagon' contain fully probability distributions at each leaf, at this time this information can be removed as only the most probable will actually be predicted. This substantially reduces the size of the tress. (merge_models 'oald_lts_rules "oald_lts_rules.scm") (merge_models is defined within `lts_build.scm') The given file will contain a set! for the given variable name to an assoc list of letter to trained tree. Note the above function naively assumes that the letters in the alphabet are the 26 lower case letters of the English alphabet, you will need to edit this adding accented letters if required. Note that adding "'" (single quote) as a letter is a little tricky in scheme but can be done--the command (intern "'") will give you the symbol for single quote. To test a set of lts models load the saved model and call the following function with the test align file festival oald-table.scm oald_lts_rules.scm festival> (lts_testset "oald.test.align" oald_lts_rules) The result (after showing all the failed ones), will be a table showing the results for each letter, for all letters and for complete words. The failed entries may give some notion of how good or bad the result is, sometimes it will be simple vowel diferences, long versus short, schwa versus full vowel, other times it may be who consonants missing. Remember the ultimate quality of the letter sound rules is how adequate they are at providing acceptable pronunciations rather than how good the numeric score is. For some languages (e.g. English) it is necessary to also find a stree pattern for unknown words. Ultimately for this to work well you need to know the morphological decomposition of the word. At present we provide a CART trained system to predict stress patterns for English. If does get 94.6% correct for an unseen test set but that isn't really very good. Later tests suggest that predicting stressed and unstressed phones directly is actually better for getting whole words correct even though the models do slightly worse on a per phone basis black98. As the lexicon may be a large part of the system we have also experimented with removing entries from the lexicon if the letter to sound rules system (and stree assignment system) can correct predict them. For OALD this allows us to half the size of the lexicon, it could possibly allow more if a certain amount of fuzzy acceptance was allowed (e.g. with schwa). For other languages the gain here can be very signifcant, for German and French we can reduce the
lexicon by over 90%. The function reduce_lexicon in `festival/lib/lts_build.scm' was used to do this. A diccussion of using the above technique as a dictionary compression method is discussed in pagel98. A morphological decomposition algorithm, like that described in black91, may even help more. The technique described in this section and its relative merits with respect to a number of languages/lexicons and tasks is dicussed more fully in black98. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 13.6 Lexicon requirements For English there are a number of assumptions made about the lexicon which are worthy of explicit mention. If you are basically going to use the existing token rules you should try to include at least the following in any lexicon that is to work with them. ● The letters of the alphabet, when a token is identified as an acronym it is spelled out. The tokenization assumes that the individual letters of the alphabet are in the lexicon with their pronunciations. They should be identified as nouns. (This is to distinguish a as a determiner which can be schwa'd from a as a letter which cannot.) The part of speech should be nn by default, but the value of the variable token.letter_pos is used and may be changed if this is not what is required. ● One character symbols such as dollar, at-sign, percent etc. Its difficult to get a complete list and to know what the pronunciation of some of these are (e.g hash or pound sign). But the letter to sound rules cannot deal with them so they need to be explicitly listed. See the list in the function mrpa_addend in `festival/lib/dicts/oald/oaldlex.scm'. This list should also contain the control characters and eight bit characters. ● The possessive 's should be in your lexicon as schwa and voiced fricative (z). It should be in twice, once as part speech type pos and once as n (used in plurals of numbers acronyms etc. e.g 1950's). 's is treated as a word and is separated from the tokens it appears with. The post-lexical rule (the function postlex_apos_s_check) will delete the schwa and devoice the z in appropriate contexts. Note this postlexical rule brazenly assumes that the unvoiced fricative in the phoneset is s. If it is not in your phoneset copy the function (it is in `festival/lib/postlex.scm') and change it for your phoneset and use your version as a post-lexical rule. ● Numbers as digits (e.g. "1", "2", "34", etc.) should normally not be in the lexicon. The number conversion routines convert numbers to words (i.e. "one", "two", "thirty four", etc.). ● The word "unknown" or whatever is in the variable token.unknown_word_name. This is used in a few obscure cases when there just isn't anything that can be said (e.g. single characters which aren't in the lexicon). Some people have suggested it should be possible to make this a sound rather than a word. I agree, but <strong>Festival</strong> doesn't support that yet. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 13.7 Available lexicons Currently <strong>Festival</strong> supports a number of different lexicons. They are all defined in the file `lib/lexicons.scm' each with a number of common extra words added to their addendas. They are `CUVOALD' The Computer Users Version of Oxford Advanced Learner's Dictionary is available from the Oxford Text Archive ftp://ota.ox.ac.uk/pub/ota/public/dicts/710. It contains about 70,000 entries and is a part of the BEEP lexicon. It is more consistent in its marking of stress though its syllable marking is not what works best for our synthesis methods. Many syllabic `l''s, `n''s, and `m''s, mess up the syllabification algorithm, making results sometimes appear over reduced. It is however our current default lexicon. It is also the only lexicon with part of speech tags that can be distributed (for non-commercial use). `CMU' This is automatically constructed from `cmu_dict-0.4' available from many places on the net (see
Page 1 and 2: [Top] [Contents] [Index] [ ? ] Fest
Page 3 and 4: The Festival Speech Synthesis Syste
Page 5 and 6: 3.3 Edinburgh Speech Tools Library
Page 7 and 8: multiple methods, though we will of
Page 9 and 10: for non-commercial use (we are work
Page 11 and 12: festlex_CMU.tar.gz festlex_OALD.tar
Page 13 and 14: held), and voices_dir (pointing to
Page 15 and 16: Ensure your audio device actually w
Page 17 and 18: $ festival Festival Speech Synthesi
Page 19 and 20: eference to a manual section and re
Page 21 and 22: [ < ] [ > ] [ > ] [Top] [Contents]
Page 23 and 24: To convert a symbol whose print nam
Page 25 and 26: filter A Unix shell program filter
Page 27 and 28: into name and IP address. Note that
Page 29 and 30: The boy saw the girl in the park
Page 31 and 32: VOLUME Allows the specification of
Page 33 and 34: festival/lib/tts.scm). [ < ] [ > ]
Page 35 and 36: 13.2 Defining lexicons Building new
Page 37 and 38: (debug_output t) before compilation
Page 39 and 40: ) The above isn't the most efficien
Page 41 and 42: The process involves the following
Page 43: (y _epsilon_ i ii i@ ai uh y @ ai-@
Page 47 and 48: (define (postlex_apos_s_check utt)
Page 49 and 50: a list of syllables. Each member wi
Page 51 and 52: Phrase This allows explicit phrasin
Page 53 and 54: `(item.daughter2 ITEM)' Return the
Page 55 and 56: `stress' This item's lexical stress
Page 57 and 58: This pocket-watch was made in 1983.
Page 59 and 60: ((string-matches name "\\([dD][Rr]\
Page 61 and 62: (set! simple_phrase_cart_tree ' ((R
Page 63 and 64: accented (i.e. has an IntEvent rela
Page 65 and 66: (Utterance Words (boy (saw ((accent
Page 67 and 68: After prediction the segmental dura
Page 69 and 70: aa-ll &aa-l This states that the di
Page 71 and 72: The UniSyn_module_hooks are run bef
Page 73 and 74: for i in wave/*.wav do fname=`basen
Page 75 and 76: used on the signal, and/or up to th
Page 77 and 78: lib/voices/english/don_diphone/fest
Page 79 and 80: (Parameter.set 'Audio_Method 'irixa
Page 81 and 82: voice_el_diphone A male Castilian S
Page 83 and 84: ) (PhoneSet.silences '(#)) Note som
Page 85 and 86: (set! spanish_phrase_cart_tree ' ((
Page 87 and 88: (us_diphone_init (list '(name "el_l
Page 89 and 90: (define (voice_giant) "comment comm
Page 91 and 92: 25. Tools A number of basic data ma
Page 93 and 94: CART ::= QUESTION-NODE || ANSWER-NO
Page 95 and 96:
(define (pos_cand_function w) ;; se
Page 97 and 98:
some label files identify point typ
Page 99 and 100:
Building the models and getting goo
Page 101 and 102:
`./src/modules/diphone' An optional
Page 103 and 104:
to this function should be added to
Page 105 and 106:
#include "festival.h" static LISP u
Page 107 and 108:
In yout `Makefile' for this directo
Page 109 and 110:
Every effort has been made to minim
Page 111 and 112:
A typical example use of `festival_
Page 113 and 114:
A simpler C only interface example
Page 115 and 116:
29.2 Singing Synthesis As an intere
Page 117 and 118:
Magisterarbeit, Institute of Natura
Page 119 and 120:
B C adding new LISP objects 27.2.4
Page 121 and 122:
F G H Edinburgh Speech Tools Librar
Page 123 and 124:
M N O P load-path 6.3 Site initiali
Page 125 and 126:
S resynthesis 14.7 Utterance I/O ru
Page 127 and 128:
U V W ungrouped diphones 20.1 UniSy
Page 129 and 130:
12. Phonesets 13. Lexicons 13.1 Lex
Page 131 and 132:
[Top] [Contents] [Index] [ ? ] Shor
show all

Festival Speech Synthesis System: - Speech Resource Pages

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?