MAS.632 Conversational Computer Systems - MIT OpenCourseWare

More documents

Recommendations

Info

EnCorrntecion isngSpeedi Reconioin optional termination techniques discussed in Chapter 6. If the user always responds, this is identical to explicit termination. Although the application is idle during this pause, the user can be performing other tasks and thinking about what to do next. Because the application has confirmed what it is about to do, the user need not devote additional attention to the task. The previous section described confirmation techniques to ensure that erroneous recognition did not occur or was canceled by the user. This was based on a binary judgment by the user as to whether recognition was correct; rejection caused the entire input to be discarded. While this may be necessary in error-critical situations, a more effective strategy is to attempt to detect recognition errors and invoke repair dialogue to resolve misrecognized words, An error may be noticed by the application or it may escape detection only to be noticed by the user. The application may detect that an error occurred because the resulting sentence is syntactically ill formed or semantically meaningless; techniques for such analysis are discussed in Chapter 9. Or it may be that an apparently well-formed request is meaningless in the current state of the application and its associated data. For example, consider a spreadsheet in which the rows are numbered and the columns are lettered. "Add row five and row B" is a syntactically well-formed sentence in and of itself but does not describe an operation that can be performed on this spreadsheet. "Add row five and row twelve" could be a valid operation, but if row twelve has not yet had values assigned to it, the operation still cannot be performed. An application supports a more limited set of operations than can be described with the associated vocabulary, and the current state of the application further constrains acceptable user input. Depending on the class of recognition error (rejection, insertion, or substitution), the application may be faced with an incomplete utterance, a meaningless utterance, or a valid request that is unfortunately not what the user asked. Ifthe application detects the error, how can it determine what the user wants? If the user detects the error, how can the application allow the user to make repairs? In addition to identifying which word was detected, some recognizers supply additional information that can be used during error recovery. One such piece of useful information is the word that is the next best match to the spoken input. The similarity scores for each word indicate how reliable that choice is and how plausible the second choice is if the first is deemed to be in error. If the first choice is not meaningful in a particular context, the second choice may prove to be an acceptable alternative. Some recognizers are capable of reporting the confusability ofthe words in its vocabulary. Confusability is a measure of how similar each word is to every other word in the vocabulary and gives an indication of the probability that any other 167
168 VOICE COMMUNICAlION WITH COM IUTEIS word will be substituted. If the recognized word is inappropriate in the current context but another word that is highly similar is appropriate, the application could use the alternate instead of the recognized word. In the spreadsheet example, the phrase "row B" was suspect. Either second choice or confusability information could be used to indicate whether the user actually spoke "row three" or "column B," for example. Alfnawte lq Modffdies In addition to speech, other input channels may be used to point or select the data represented by an application. Hauptman found that users were able to make effective use of pointing in combination with limited vocabulary speech recognizers to describe graphical manipulations [Hauptmann 19891. Pointing devices may be two dimensional (mouse, stylus, touch screen, joystick) or three dimensional (gesture input using a variety of magnetic or ultrasonic techniques). Eye trackers report the angle of the user's gaze and can be used to manipulate an object by looking at it [Jacob 1991]. The selection gesture may be redundant to some words in a spoken command; this is especially true of deictic pronouns ("this," "that," "these," "those"). If the gesture can be correctly synchronized with the reporting of speech recognition results, the gestures can be used in place of any pronouns not recognized. This technique was first used in Put That There, a project described later in this chapter. Carnegie-Mellon University's Office Manager (OM) system used mouse-based pointing as a means of explicit error recovery [Rudnicky et al. 1991]; after recognition, a text display of the user's request was presented, and the user could click with the mouse on a word to be changed. Voice has been combined with gesture in a three-dimensional model building task [Weimer and Ganapathy 1992, Weimer and Ganapathy 19891. Starker and Bolt combined speech synthesis and eye tracking into a system that discussed what a user was looking at [Starker and Bolt 19901; and more recently the same research group has incorporated speech recognition as well [Koons et al. 1993]. Dig WIthk Usr The errors that an application detects result in a meaningless or incomplete request. The application must then employ techniques to elicit additional information from the user. The simplest but not very effective technique is to ask the user to repeat the request. If the user repeats the request word for word, it is quite possible that the same recognition error will occur again. Ifthe user instead rephrases the request, different recognition errors may occur, and it is challenging to merge the two requests to decide what the user really wanted. It is better for the application to ask the user one or more specific questions to resolve any input ambiguity. The phrasing of the questions is critical as the form in which a question is asked strongly influences the manner in which it is answered. The goal of efficiency suggests that the questions be brief and very direct. The goal of minimizing user frustration would also suggest asking ques
Page 1 and 2:
MIT OpenCourseWare http://ocw.mit.e
Page 3 and 4:
VNR Computer Library Advanced Topic
Page 5 and 6:
To Ava andKaya for the patienceto s
Page 7 and 8:
vi VOICE COMMUNICATION WITH COMPUTE
Page 9 and 10:
viE VOICE COMMUNICATION WITH COMPUT
Page 11 and 12:
VOICE COMMUNICATION WITH COMPUTEgS
Page 14 and 15:
Speaking of Talk As we enter the ne
Page 16 and 17:
Speaking ofTalk The short-term fix
Page 18 and 19:
Preface This book began as graduate
Page 20:
RFe[e ih This is just the beginning
Page 23 and 24:
xxii VOICE OMMUNICEAIIO WITH COMPUT
Page 25 and 26:
2 VOICE COMMUNICATION WITH COMPUTER
Page 27 and 28:
4 VOICE COMMUNICATION WITH COMPUTEI
Page 29 and 30:
6 VOICE (OMMUNICAIION WITH COMPUTER
Page 31 and 32:
S YVOICE COMMUNICATION WITH COMPUTE
Page 33 and 34:
10 VOICE COMMUHICATION WITH (OMPUIE
Page 35 and 36:
12 VOICE COMMUNICATION WITH COMPUTE
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
18 YVOII (OMMUNICATION WITH (OMPUTE
Page 43 and 44:
20 VOIC[ COMMUNICATION WITH COMPUTE
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
28 VOICE (OMMUNICATION WITH COMPUTE
Page 53 and 54:
30 VOICE COMMUNICATION WITH (OMPUTE
Page 55 and 56:
32 VOICE (OMMUHICATIOI WITH (OMPUTE
Page 57 and 58:
Page 59 and 60:
3 Speech Coding This chapter introd
Page 61 and 62:
38 VOICE (OMMUNICATlON WITH COMPUTE
Page 63 and 64:
40 VOICE COMMURItATION WITH (OMPUTE
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
48 VOICE COMMUNICATION WITi COMPUTE
Page 73 and 74:
IT -010 !Mm1-tgq Vm-0 50 VOICE COMM
Page 75 and 76:
52 VOICE (OMMUNICAlION WITH COMPUTE
Page 77 and 78:
Page 79 and 80:
56 VOICE COMMUNI(ATION WITH COMPUTE
Page 81 and 82:
Page 83 and 84:
4 Applications and Editing of Store
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
78 YOICE COMMUNICATION WITH COMPUTE
Page 103 and 104:
Page 105 and 106:
5 Speech Synthesis The previous two
Page 107 and 108:
Page 109 and 110:
6 YVOICE (OMMUNICATION WITH [OMPUTE
Page 111 and 112:
n VOICE COMMUNICATION WITH COMPUTER
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
98 VOICE COMMUlICATION WITH COMPUTE
Page 123 and 124:
Interactive Voice Response This cha
Page 125 and 126:
102 VOICE COMMUNICATION WITH COMPUT
Page 127 and 128:
104 VOICE COMMUNICATION WITH (OMPUT
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
110 VOICE (DMMUNICATION WITH COMPUT
Page 135 and 136:
112 VOICE COMMUNICATIOH WITH COMPUT
Page 137 and 138:
Page 139 and 140: 116 VOICE COMMUNICATION WITH COMPUT
Page 153 and 154: 13 VOICE COMMUNICAIIOR WITH COMPUTE
Page 155 and 156: 7 Speech Recognition This chapter i
Page 161 and 162: 138 VOICE (OMMUNICATION WITH COMPUT
Page 175 and 176: 152 VOICE COMMUNICATION WITH (OMPUT
Page 177 and 178: USES OF VOICE INPUT Sole IW,1 Canna
Page 181 and 182: In VOICE COMMUNICATION WITH COMPUTE
Page 183 and 184: 160 VOICE [OMMUHICATION WIITH OMPUT
Page 189: 166 VOICE COMMUNICATION WITH COMPUT
Page 195 and 196: 172 VOICE (OMMURICAION WITH COMPUTE
Page 205 and 206: 182 OICE (COMMUNIICAION WITH COMPUT
Page 207 and 208: 1I4 VOICE COMMUNICATION WITH COMPUT
Page 211 and 212: 188 VOICE COMMUNICATION WIIH (OMPUT
Page 213 and 214: 190 VOICE COMMUNICATION WIIH (OMPUT
Page 215 and 216: 192 VOIC0 COMMUNICATION WITH COMPUT
Page 217 and 218: 194 VOI(E COMMUNIATIION WITH (OMPUT
Page 221 and 222: 198 VOICI COMMUNICATION WITH COMPUT
Page 225 and 226: 202 VOICE COMMUNICATIOR WIIH COMPUT
Page 227 and 228: 204 VOICE COMMUNICAIION WIlH COMPUT
Page 229 and 230: 206 VOIC([ OMMUNICATION WITH COMPUT
Page 231 and 232: 20 VOICE COMMUNICATION WITH COMPUTE
Page 233 and 234: 10 Basics of Telephones This and th
Page 235 and 236: 212 VOICE COMMUNICATION WITH (OMPUT
Page 239 and 240: 216 VOICE COMMUNICATION WITH COMPUI
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
224 VOICE COMMUNICATION WIIH (OMPUT
Page 249 and 250:
226 VOICE (COMMUNICATION WITH COMPU
Page 251 and 252:
Page 253 and 254:
11 Telephones and Computers The pre
Page 255 and 256:
Page 257 and 258:
214 VOICE COMMUHiCATION WITH COMPUI
Page 259 and 260:
236 VOICE COMMUNICATIOK WITH COMPUT
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
242 VOICE COMMUHICATIOI WITH COMPUI
Page 267 and 268:
Page 269 and 270:
246 VOICE (OMMUNICAIION WITH COMPUT
Page 271 and 272:
248 VOICE (COMMUNICATION WITH COMPU
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
254 VOICE COMMUNICAIION WITH COMPUT
Page 279 and 280:
25 VOICE (OMMUNICATIOK WITH COMPUTE
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
12 Desktop Audio Because of limitat
Page 293 and 294:
Page 295 and 296:
2f2 VOICE COMMUIICAIION WITH COMPUT
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
Page 307 and 308:
284 VOIC[ COMMIUWlATION WITH COMPUT
Page 309 and 310:
Page 311 and 312:
211 VOICE (OMMUNICATION WITH COMPUT
Page 313 and 314:
290 VOICE COMMIUNICATIO0 WITH COMPU
Page 315 and 316:
Page 317 and 318:
294 VOICE (OMMUNICATION WITH COMPUT
Page 319 and 320:
Page 321 and 322:
298 VOICE (OMMUNICATION WITH (COMPU
Page 323 and 324:
Page 325 and 326:
Page 328 and 329:
Bibliography Ades, S. and D. C. Swi
Page 330 and 331:
Biliograply o30 Daly, N. A. and V W
Page 332 and 333:
Bib~lgraphy 309 Klatt, D. H. "Revie
Page 334 and 335:
liblgraphy 311 G.Peterson, W. Wang,
Page 336 and 337:
Bbliography 313 Spiegel, M. F "Usin
Page 338 and 339:
Index p-law, 45, 220, 224 acoustics
Page 340 and 341:
Integrated Services Digital Network
Page 342:
vocal tract, 19-24, 23 voice mail,
show all

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?