05.12.2012 Views

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

discrimination information (or the cross-entropy) between the source <strong>and</strong> the model is proposed. This approach does not<br />

require the commonly used assumption that the source to be modeled is a hidden Markov process. The algorithm is started<br />

from the model estimated by the traditional maximum likelihood (ML) approach <strong>and</strong> alternatively decreases the discrimination<br />

information over all probability distributions of the source which agree with the given measurements <strong>and</strong> all hidden Markov<br />

models. The proposed procedure generalizes the Baum algorithm for ML hidden Markov modeling. The procedure is shown<br />

to be a descent algorithm for the discrimination information measure <strong>and</strong> its local convergence is proved.<br />

Author<br />

Markov Processes; Information Theory; Information Systems; Probability Distribution Functions; Maximum Likelihood<br />

Estimates<br />

20060001627 International Business Machines Corp., Paris, France<br />

Context-Dependent Phonetic Markov Models for Large Vocabulary Speech Recognition<br />

Derouault, Anne-Marie; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87); Volume<br />

1; 1987, pp. 10.1.1 - 10.1.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

One approach to large vocabulary speech recognition, is to build phonetic Markov models, <strong>and</strong> to concatenate them to<br />

obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which<br />

accepts a 10,000 words vocabulary ([3]), <strong>and</strong> recently 200,000 words vocabulary ([5]). Since there is one machine per<br />

phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper,<br />

we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition.<br />

We show that both the analysis of the errors made by the recognizer, <strong>and</strong> linguistic facts about phonetic context influence,<br />

suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems,<br />

<strong>and</strong> still account for the most important coarticulation effects. We present our experiments with a system applying these<br />

principles to a set of models for French. With this new system including context-dependent machines, the phoneme recognition<br />

rate goes from 82.2% to 85.3%, <strong>and</strong> the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.<br />

Author<br />

Context; Phonemes; Error Analysis; Phonetics; Words (Language); Speech Recognition; Linguistics<br />

20060001657 Mitre Corp., McLean, VA, USA<br />

Information-Theoretic Compressibility of Speech Data<br />

Ramsey, L. Thomas; Gribble, David; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP<br />

‘87); Volume 1; 1987, pp. 1.6.1 - 1.6.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

Two st<strong>and</strong>ard reversible coding algorithms, Ziv-Lempel <strong>and</strong> a dynamic Huffman algorithm, are applied to various types<br />

of speech data. The data tested were PCM, DPCM, <strong>and</strong> prediction residuals from LPC. Neither algorithm shows much promise<br />

on small amounts of data, but both performed well on large amounts. Typically the Ziv-Lempel required about 12 seconds of<br />

data (with 8000 samples per second) to reach a stable compression rate. The dynamic Huffman coding took much less time<br />

to warm up’, often needing something like 64 milliseconds. Approximately 66 seconds of PCM with 12 bits per samples was<br />

compressed 6.4% by the Ziv-Lempel coding <strong>and</strong> 20.7% by the dynamic Huffman coding. The same numbers for DPCM with<br />

13 bits per sample are 17.7% <strong>and</strong> 35.6% respectively. The prediction residuals had compression rates very close to those of<br />

DPCM, regardless of whether 1, 2, 5, or 10 prediction coefficients were used.<br />

Author<br />

Information Theory; Compressibility; Predictions; Speech; Coeffıcients; Differential Pulse Code Modulation<br />

20060001668 American Telephone <strong>and</strong> Telegraph Co., NJ, USA<br />

A Connected Speech Recognition System Based on Spotting Diphone-Like Segments - Preliminary Results<br />

Rosenberg, A. E.; Colla, A. M.; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87);<br />

Volume 1; 1987, pp. 3.6.1-3.6.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

A template-based connected speech recognition system, which represents words as sequences of diphone-like segments,<br />

has been implemented <strong>and</strong> evaluated. The inventory of segments is divided into two principal classes: ‘steady-state’ speech<br />

sounds such as vowels, fricatives, <strong>and</strong> nasals, <strong>and</strong> ‘composite’ speech sounds consisting of sequences of two or more speech<br />

sounds in which the transitions from one sound to another are intrinsic to the representation of the composite sound. Templates<br />

representing these segments are extracted from labelled training utterances. Words are represented by network models whose<br />

branches are diphone segments. Word juncture phenomena are accommodated by including segment branches that characterize<br />

transition pronunciations between specified classes of words. The recognition of a word in a specified utterance takes place<br />

218

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!