12.12.2012 Views

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

Festival Speech Synthesis System: - Speech Resource Pages

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

25. Tools<br />

A number of basic data manipulation tools are supported by <strong>Festival</strong>. These often make building new modules very<br />

easy and are already used in many of the existing modules. They typically offer a Scheme method for entering data,<br />

and Scheme and C++ functions for evaluating it.<br />

25.1 Regular expressions<br />

25.2 CART trees Building and using CART<br />

25.3 Ngrams Building and using Ngrams<br />

25.4 Viterbi decoder Using the Viterbi decoder<br />

25.5 Linear regression Building and using linear regression models<br />

[ < ] [ > ] [ > ] [Top] [Contents] [Index] [ ? ]<br />

25.1 Regular expressions<br />

Regular expressions are a formal method for describing a certain class of mathematical languages. They may be<br />

viewed as patterns which match some set of strings. They are very common in many software tools such as scripting<br />

languages like the UNIX shell, PERL, awk, Emacs etc. Unfortunately the exact form of regualr expressions often<br />

differs slightly between different applications making their use often a little tricky.<br />

<strong>Festival</strong> support regular expressions based mainly of the form used in the GNU libg++ Regex class, though we have<br />

our own implementation of it. Our implementation (EST_Regex) is actually based on Henry Spencer's<br />

`regex.c' as distributed with BSD 4.4.<br />

Regular expressions are represented as character strings which are interpreted as regular expressions by certain<br />

Scheme and C++ functions. Most characters in a regular expression are treated as literals and match only that<br />

character but a number of others have special meaning. Some characters may be escaped with preceeding backslashes<br />

to change them from operators to literals (or sometime literals to operators).<br />

.<br />

Matches any character.<br />

$<br />

matches end of string<br />

^<br />

matches beginning of string<br />

X*<br />

matches zero or more occurrences of X, X may be a character, range of parenthesized expression.<br />

X+<br />

matches one or more occurrences of X, X may be a character, range of parenthesized expression.<br />

X?<br />

matches zero or one occurrence of X, X may be a character, range of parenthesized expression.<br />

[...]<br />

a ranges matches an of the values in the brackets. The range operator "-" allows specification of ranges e.g. az<br />

for all lower case characters. If the first character of the range is ^ then it matches anything character except<br />

those specificed in the range. If you wish - to be in the range you must put that first.<br />

\\(...\\)<br />

Treat contents of parentheses as single object allowing operators *, +, ? etc to operate on more than single<br />

characters.<br />

X\\|Y<br />

matches either X or Y. X or Y may be single characters, ranges or parenthesized expressions.<br />

Note that actuall only one backslash is needed before a character to escape it but becuase these expressions are most<br />

often contained with Scheme or C++ strings, the escpae mechanaism for those strings requires that backslash itself be<br />

escaped, hence you will most often be required to type two backslashes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!