05.12.2012 Views

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ATAL introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping <strong>and</strong><br />

interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic<br />

synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios {y(sub<br />

i) = Ln ((1+k(sub i))/(1-k(sub i))} where k(sub i) are the reflection coefficients obtained from short-time stationary LPC<br />

analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of<br />

articulatory targets (vector quantization code book). A set of speech segments (‘polysons’) has been encoded using this<br />

technique. It includes diphones, demi-syllables, <strong>and</strong> other units that are difficult to segment. Temporal decomposition using<br />

target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analytically explained<br />

<strong>and</strong> modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic<br />

speech synthesis.<br />

Author<br />

Phonetics; Reflectance; Decomposition; Interpolation; Syllables; Speech<br />

20060001663 Thomson SINTRA ASM, France<br />

A Continuous Speech Dialog System for the Oral Control of a Sonar Console<br />

Alinat, Pierre; Gallais, Evelyne; Haton, Jean-Paul; Pierrel, Jean-Marie; Richard, Pascal; IEEE International Conference on<br />

Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87); Volume 1; 1987, pp. 10.3.1-10.3.4; In English; See also<br />

20060001583; Copyright; Avail.: Other Sources<br />

We present in this paper an application of a continuous speech underst<strong>and</strong>ing system to the control of a sonar console by<br />

a human operator. This application is really useful since it does correspond to a practical need of the operator who has his eyes<br />

busy looking at the sonar screen. It presents three original features : the acoustic-phonetic decoding part of the system is not<br />

based on a classical phoneme labeling process but it yields for each segment of speech a set of acoustic phonetic labels that<br />

describe this segment very precisely. This phonetic labeling is associated with a special procedure for lexical access; - the<br />

system is actually a man-machine dialog system <strong>and</strong> not only a system capable of underst<strong>and</strong>ing a single sentence as it is often<br />

the case. The history of the dialog is used as a special knowledge source during the underst<strong>and</strong>ing process. This point highly<br />

increases the overall performance of the system ; - the underst<strong>and</strong>ing process, <strong>and</strong> particularly the emission of hypotheses, is<br />

based on the knowledge about the language (lexicon <strong>and</strong> syntax) but also on the context of the dialog ; pragmatic knowledge<br />

is thus intimately associated with the analysis of a sentence. The paper presents the architecture of the system <strong>and</strong> its various<br />

components. It also discusses experimental results obtained in a multispeaker mode.<br />

Author<br />

Sonar; Speech Recognition; Remote Consoles; Phonetics; Systems Engineering<br />

20060001672 American Telephone <strong>and</strong> Telegraph Co., NJ, USA<br />

Continuous Speech Recognition by Means of Acoustic/Phonetic Classification Obtained from a Hidden Markov Model<br />

Levinson, S. E.; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87); Volume 1; 1987,<br />

pp. 3.8.1-3.8.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

This paper describes an experimental continuous speech recognition system comprising procedures for acoustic/phonetic<br />

classification, lexical access <strong>and</strong> sentence retrieval. Speech is assumed to be composed of a small number of phonetic units<br />

which may be identified with the states of a hidden Markov model. The acoustic correlates of the phonetic units are then<br />

characterized by the observable Gaussian process associated with the corresponding state of the underlying Markov chain.<br />

Once the parameters of such a model are determined, a phonetic transcription of an utterance can be obtained by means of<br />

a Viterbi-like algorithm. Given a lexicon in which each entry is orthographically represented in terms of the chosen phonetic<br />

units, a word lattice is produced by a lexical access procedure. Lexical items whose orthography matches subsequences of the<br />

phonetic transcription are sought by means of a hash coding technique <strong>and</strong> their likelihoods are computed directly from the<br />

corresponding interval of acoustic measurements. The recognition process is completed by recovering from the word lattice,<br />

the string of words of maximum likelihood conditioned on the measurements. The desired string is derived by a best-first<br />

search algorithm. In an experimental evaluation of the system, the parameters of an acoustic/phonetic model were estimated<br />

from fluent utterances of 37 seven-digit numbers. A digit recognition rate of 96% was then observed on an independent test<br />

set of 59 utterances of the same form from the same speaker. Half of the observed errors resulted from insertions while<br />

deletions <strong>and</strong> substitutions accounted equally for the other half.<br />

Author<br />

Acoustic Measurement; Speech Recognition; Phonetics; Maximum Likelihood Estimates; Markov Chains; Classifications<br />

206

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!