05.12.2012 Views

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

y ‘spotting’ all the segments contained in the model of the word. Putative words <strong>and</strong> word combinations are found by<br />

searching for best scoring sequences of segments specified by the models subject to segment separation constraints. A pruning<br />

procedure finds the best scoring string of words subject to constraints on word lengths. separations, <strong>and</strong> overlaps. An<br />

evaluation of the recognizer has been carried out on a database of connected digit utterances spoken by a single male talker.<br />

Templates are extracted from half the database consisting of 2100 digit utterances <strong>and</strong> system performance tested on the<br />

remaining 2100 utterances. The performance obtained to date is approximately 2% digit error rate <strong>and</strong> 7 to 8% digit string error<br />

rate.<br />

Author<br />

Speech Recognition; Words (Language); Vowels; Speech; Sequencing<br />

20060001670 BBN Systems <strong>and</strong> Technologies Corp., Cambridge, MA, USA<br />

BYBLOS: The BNN Continuous Speech Recognition System<br />

Chow, Y.L.; Dunham, M.O.; Kimball, O. A.; Krasner, M. A.; Kubala, G. F.; Makhoul, J.; Price, P. J.; Roucos, S.; Schwartz,<br />

R. M.; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87); Volume 1; 1987,<br />

pp. 3.7.1-3.7.4; In English; See also 20060001583<br />

Contract(s)/Grant(s): N00039-85-C-0423; Copyright; Avail.: Other Sources<br />

In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large<br />

vocabulary applications, integrates acoustic, phonetic, lexical, <strong>and</strong> linguistic knowledge sources to achieve high recognition<br />

performance. The basic approach, as described in previous papers, makes extensive use of robust context-dependent models<br />

of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system,<br />

including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar <strong>and</strong><br />

decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech<br />

across: speakers, task domains, <strong>and</strong> grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech<br />

is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using<br />

grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97%<br />

using a grammar.<br />

Author<br />

Signal Processing; Speech Recognition; Phonetics; Linguistics; Grammars; Decoders<br />

20060001673 American Telephone <strong>and</strong> Telegraph Co., USA<br />

Performance Evaluation of a Connected Digit Recognizer<br />

Rabiner, L. R.; Wilpon, J. G.; Juang, B. H.; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing<br />

(ICASSP ‘87); Volume 1; 1987, pp. 3.10.1-3.10.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

In this paper we discuss a system for automatically recognizing fluently spoken digit strings based on whole word<br />

reference units. The system that we will describe can use either hidden Markov model (HMM) technology or template-based<br />

technology. The training procedure derives the digit reference patterns (either templates or statistical models) from connected<br />

digit strings. To evaluate the performance of the overall connected digit recognizer, a set of 50 people (25 men, 25 women),<br />

from the non-technical local population, was each asked to record 1200 r<strong>and</strong>om connected digit strings over local dialed-up<br />

telephone lines. Both a speaker trained <strong>and</strong> a multispeaker training set was created, <strong>and</strong> a full performance evaluation was<br />

made. Results show that the average string accuracy for unknown <strong>and</strong> known length strings, in the speaker trained mode, was<br />

98% <strong>and</strong> 99% respectively; in the multi-speaker mode the average string accuracies were 94% <strong>and</strong> 96.6% respectively.<br />

Author<br />

Performance Tests; Templates; Evaluation; Mathematical Models; Speech<br />

20060001738 International Business Machines Corp., Paris, France<br />

Speech Recognition with Very Large Size Dictionary<br />

Merialdo, Bernard; IEEE International Conference on Acoustics, Speech, <strong>and</strong> Signal Processing (ICASSP ‘87); Volume 1;<br />

1987, pp. 10.2.1 - 10.2.4; In English; See also 20060001583; Copyright; Avail.: Other Sources<br />

This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary<br />

(VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps: 1) a Syllable Match procedure<br />

uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.<br />

2) from this list, a Word Match procedure uses the dictionary to build partial word hypothesis. 3) then a Sentence Match<br />

procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original<br />

219

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!