04.02.2013 Views

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

206 VOIC([ OMMUNICATION WITH COMPUTERS<br />

input words: ABCD parser candidates: ABCD<br />

ABC ACD BCD . .<br />

AB AC AD BC , .<br />

A B C D<br />

Figure 9.17 The <strong>Conversational</strong> Desktop attempted to parse all substrings<br />

of the input utterance. Each substring was evaluated, and those<br />

which could be parsed were candidates for selection.<br />

Because any word reported by a speech recognizer might be a spurious insertion<br />

error, all substrings of the input words were parsed as shown in Figure 9.17.<br />

If any words were recognized correctly, one of the substrings would contain<br />

exactly all these words without the insertions. All of the substrings were submitted<br />

to the parser, which had to reject those strings containing grammatically correct<br />

insertions, but accept the substring that represented the original sentence<br />

minus the insertion.<br />

Rejection errors caused incomplete sentences and resulted in fragments of syntactically<br />

correct tokens. To cope with such errors, the grammar was modified to<br />

describe sentence fragments as well as complete sentences, e.g., the word<br />

sequence "Barry Friday at two o'clock" is a well-formed fragment from a sentence<br />

about scheduling a meeting. This parsing strategy also handled those substitution<br />

errors resulting in semantically incongruous sentences; if one of the substrings<br />

contained exactly the correct words, then the resulting sentence fragment<br />

would be accepted by the parser's rules for fragments.<br />

These parsing strategies had to be combined because any type of error could<br />

occur in each sentence. Because of the fragment rules, multiple substrings from<br />

each input sequence would usually be accepted by the parser; the next stage of<br />

analysis selected the best of these. Evaluation of "best" was based on empirically<br />

derived weightings ofthe following.<br />

* Completeness: Assuming that the user spoke well-formed sentences,<br />

a successfully parsed complete sentence was more likely spoken<br />

than was a sentence fragment.<br />

* Number of words: A greater number of words in an input string<br />

was preferred because recognition results were more likely to be correct<br />

than erroneous. This judgment was also based on the assumption<br />

that the user spoke sentences based only on words in the<br />

recognizer's vocabulary. If both "ABC" and "AB"could be parsed,<br />

"ABC" would receive preference frm this metric.<br />

* Sequences of words: Because of the difficulty of word endpoint<br />

detection, connected recognizers tend to produce runs of correct or<br />

erroneous results. If the endpoint of one word is incorrectly identified,<br />

not only is that word likely to be misrecognized but also the following<br />

word. The scoring metric favored substrings that included

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!