Festival Speech Synthesis System: - Speech Resource Pages

More documents

Recommendations

Info

trying to fix bugs remotely. We thank you for putting up with us and are pleased you've taken the time to help us improve our system. Many of you have come up with uses we hadn't thought of, which is always rewarding. Even if you haven't actively responded, the fact that you use the system at all makes it worthwhile. [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 4. What is new Compared to the the previous major release (1.3.0 release Aug 1998) 1.4.0 is not functionally so different from its previous versions. This release is primarily a consolidation release fixing and tidying up some of the lower level aspects of the system to allow better modularity for some of our future planned modules. ● Copyright change: The system is now free and has no commercial restriction. Note that currently on the US voices (ked and kal) are also now unrestricted. The UK English voices depend on the Oxford Advanced Learners' Dictionary of Current English which cannot be used for commercial use without permission from Oxford University Press. ● Architecture tidy up: the interfaces to lower level part parts of the system have been tidied up deleting some of the older code that was supported for compatibility reasons. This is a much higher dependence of features and easier (and safer) ways to register new objects as feature values and Scheme objects. Scheme has been tidied up. It is no longer "in one defun" but "in one directory". ● New documentation system for speech tools: A new docbook based documentation system has been added to the speech tools. Festival's documentation will will move over to this sometime soon too. ● initial JSAPI support: both JSAPI and JSML (somewhat similar to Sable) now have initial impelementations. They of course depend on Java support which so far we have only (successfully) investgated under Solaris and Linux. ● Generalization of statistical models: CART, ngrams, and WFSTs are now fully supported from Lisp and can be used with a generalized viterbi function. This makes adding quite complex statistical models easy without adding new C++. ● Tilt Intonation modelling: Full support is now included for the Tilt intomation models, both training and use. ● Documentation on Bulding New Voices in Festival: documentation, scripts etc. for building new voices and languages in the system, see http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/ [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 5. Overview Festival is designed as a speech synthesis system for at least three levels of user. First, those who simply want high quality speech from arbitrary text with the minimum of effort. Second, those who are developing language systems and wish to include synthesis output. In this case, a certain amount of customization is desired, such as different voices, specific phrasing, dialog types etc. The third level is in developing and testing new synthesis methods. This manual is not designed as a tutorial on converting text to speech but for documenting the processes and use of our system. We do not discuss the detailed algorithms involved in converting text to speech or the relative merits of
multiple methods, though we will often give references to relevant papers when describing the use of each module. For more general information about text to speech we recommend Dutoit's `An introduction to Text-to- Speech Synthesis' dutoit97. For more detailed research issues in TTS see sproat98 or vansanten96. 5.1 Philosophy Why we did it like it is 5.2 Future How much better its going to get [ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ] 5.1 Philosophy One of the biggest problems in the development of speech synthesis, and other areas of speech and language processing systems, is that there are a lot of simple well-known techniques lying around which can help you realise your goal. But in order to improve some part of the whole system it is necessary to have a whole system in which you can test and improve your part. Festival is intended as that whole system in which you may simply work on your small part to improve the whole. Without a system like Festival, before you could even start to test your new module you would need to spend significant effort to build a whole system, or adapt an existing one before you could start working on your improvements. Festival is specifically designed to allow the addition of new modules, easily and efficiently, so that development need not get bogged down in re-implementing the wheel. But there is another aspect of Festival which makes it more useful than simply an environment for researching into new synthesis techniques. It is a fully usable text-to-speech system suitable for embedding in other projects that require speech output. The provision of a fully working easy-to-use speech synthesizer in addition to just a testing environment is good for two specific reasons. First, it offers a conduit for our research, in that our experiments can quickly and directly benefit users of our synthesis system. And secondly, in ensuring we have a fully working usable system we can immediately see what problems exist and where our research should be directed rather where our whims take us. These concepts are not unique to Festival. ATR's CHATR system (black94) follows very much the same philosophy and Festival benefits from the experiences gained in the development of that system. Festival benefits from various pieces of previous work. As well as CHATR, CSTR's previous synthesizers, Osprey and the Polyglot projects influenced many design decisions. Also we are influenced by more general programs in considering software engineering issues, especially GNU Octave and Emacs on which the basic script model was based. Unlike in some other speech and language systems, software engineering is considered very important to the development of Festival. Too often research systems consist of random collections of hacky little scripts and code. No one person can confidently describe the algorithms it performs, as parameters are scattered throughout the system, with tricks and hacks making it impossible to really evaluate why the system is good (or bad). Such systems do not help the advancement of speech technology, except perhaps in pointing at ideas that should be further investigated. If the algorithms and techniques cannot be described externally from the program such that they can reimplemented by others, what is the point of doing the work? Festival offers a common framework where multiple techniques may be implemented (by the same or different researchers) so that they may be tested more fairly in the same environment. As a final word, we'd like to make two short statements which both achieve the same end but unfortunately perhaps not for the same reasons: Good software engineering makes good research easier But the following seems to be true also If you spend enough effort on something it can be shown to be better than its competitors.
Page 1 and 2: [Top] [Contents] [Index] [ ? ] Fest
Page 3 and 4: The Festival Speech Synthesis Syste
Page 5: 3.3 Edinburgh Speech Tools Library
Page 9 and 10: for non-commercial use (we are work
Page 11 and 12: festlex_CMU.tar.gz festlex_OALD.tar
Page 13 and 14: held), and voices_dir (pointing to
Page 15 and 16: Ensure your audio device actually w
Page 17 and 18: $ festival Festival Speech Synthesi
Page 19 and 20: eference to a manual section and re
Page 21 and 22: [ < ] [ > ] [ > ] [Top] [Contents]
Page 23 and 24: To convert a symbol whose print nam
Page 25 and 26: filter A Unix shell program filter
Page 27 and 28: into name and IP address. Note that
Page 29 and 30: The boy saw the girl in the park
Page 31 and 32: VOLUME Allows the specification of
Page 33 and 34: festival/lib/tts.scm). [ < ] [ > ]
Page 35 and 36: 13.2 Defining lexicons Building new
Page 37 and 38: (debug_output t) before compilation
Page 39 and 40: ) The above isn't the most efficien
Page 41 and 42: The process involves the following
Page 43 and 44: (y _epsilon_ i ii i@ ai uh y @ ai-@
Page 45 and 46: lexicon by over 90%. The function r
Page 47 and 48: (define (postlex_apos_s_check utt)
Page 49 and 50: a list of syllables. Each member wi
Page 51 and 52: Phrase This allows explicit phrasin
Page 53 and 54: `(item.daughter2 ITEM)' Return the
Page 55 and 56: `stress' This item's lexical stress
Page 57 and 58:
This pocket-watch was made in 1983.
Page 59 and 60:
((string-matches name "\\([dD][Rr]\
Page 61 and 62:
(set! simple_phrase_cart_tree ' ((R
Page 63 and 64:
accented (i.e. has an IntEvent rela
Page 65 and 66:
(Utterance Words (boy (saw ((accent
Page 67 and 68:
After prediction the segmental dura
Page 69 and 70:
aa-ll &aa-l This states that the di
Page 71 and 72:
The UniSyn_module_hooks are run bef
Page 73 and 74:
for i in wave/*.wav do fname=`basen
Page 75 and 76:
used on the signal, and/or up to th
Page 77 and 78:
lib/voices/english/don_diphone/fest
Page 79 and 80:
(Parameter.set 'Audio_Method 'irixa
Page 81 and 82:
voice_el_diphone A male Castilian S
Page 83 and 84:
) (PhoneSet.silences '(#)) Note som
Page 85 and 86:
(set! spanish_phrase_cart_tree ' ((
Page 87 and 88:
(us_diphone_init (list '(name "el_l
Page 89 and 90:
(define (voice_giant) "comment comm
Page 91 and 92:
25. Tools A number of basic data ma
Page 93 and 94:
CART ::= QUESTION-NODE || ANSWER-NO
Page 95 and 96:
(define (pos_cand_function w) ;; se
Page 97 and 98:
some label files identify point typ
Page 99 and 100:
Building the models and getting goo
Page 101 and 102:
`./src/modules/diphone' An optional
Page 103 and 104:
to this function should be added to
Page 105 and 106:
#include "festival.h" static LISP u
Page 107 and 108:
In yout `Makefile' for this directo
Page 109 and 110:
Every effort has been made to minim
Page 111 and 112:
A typical example use of `festival_
Page 113 and 114:
A simpler C only interface example
Page 115 and 116:
29.2 Singing Synthesis As an intere
Page 117 and 118:
Magisterarbeit, Institute of Natura
Page 119 and 120:
B C adding new LISP objects 27.2.4
Page 121 and 122:
F G H Edinburgh Speech Tools Librar
Page 123 and 124:
M N O P load-path 6.3 Site initiali
Page 125 and 126:
S resynthesis 14.7 Utterance I/O ru
Page 127 and 128:
U V W ungrouped diphones 20.1 UniSy
Page 129 and 130:
12. Phonesets 13. Lexicons 13.1 Lex
Page 131 and 132:
[Top] [Contents] [Index] [ ? ] Shor
show all

Festival Speech Synthesis System: - Speech Resource Pages

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?