MAS.632 Conversational Computer Systems - MIT OpenCourseWare

More documents

Recommendations

Info

T.rn Mooe Robudst (nion0tion 299 nature of questions that we ask strongly influences the speaker's response, and we should guide the conversation to a successful conclusion. By echoing portions of what has been thought to be understood, the listener also invites correction by the speaker. The principle ofimplicit confirmation suggests that an echoed statement can be assumed to be correct unless it is contested by the original speaker. We often describe an object partially, identifying only its salient features. Grice's maxims support this behavior; they indicate that we should speak only as much as is necessary, i.e., if a complete description is not necessary to identify an object, then we should not elaborate. Humans excel at understanding each other from fragmentary utterances, but these sentence fragments can wreck havoc on computational parsers and semantic analyzers. "Conceptual" parsers operate on the basis of keyword detection, e.g., by placing keywords into slots with appropriate roles in a frame. But simply detecting keywords may be insufficient for identifying their role even in a syntactically simple utterance such as "The lion roared at the elephant in the tiger's cage"; here we must identify the syntactic relationships between the animals if we are to infer their roles in the activity described. A conversational speech system should have the ability to answer questions that may fall into several categories. Hypothetical questions like "IfI were to go on Tuesday could I get a lower fare ticket?" are often a prelude to further transaction and establish the basis of the next request. Questions about ability like "Can you pass the salt?" are often indirect speech acts. But other times answering questions about ability is essential as speech systems are not omnipotent but perform in a limited domain. In a graphical user interface, the user can pull down menus and read options in the process of deciding how to make a choice. In a speech system, this is not possible: the application would need to continually recite all possible options because of the transient nature of speech. Instead, the user must be able to ask the system what its capabilities are, and it must be able to reply in a rational manner. Combining clarifying protocols with the various aspects of discourse described in Chapter 9, we begin to appreciate the complexity of conversation. There may be a round of clarifying discourse to resolve ambiguities about what one party has just said. This contributes to subgoals-mutual understanding on an utteranceby-utterance basis-in the context of the discourse. But discourse is usually not about simply understanding each other; certainly in the case of a human speaking to a computer, the human wishes the computer to perform some action or service. The computer must track the discourse focus and understand the user's goals at each step while formulating its own subgoals to understand the utterance. This is very difficult, yet it provides a powerful basis for graceful conversational interaction. SPEECH RECOGNITION AND ROBUST PARSING One disturbing trend in speech recognition research is putting more and more linguistic knowledge into the word recognition process (this is good) but then using this knowledge to excessively constrain recognition, which leaves little
300 VOICE COMMUNICATION WITH COMPUTERS room for speaker or recognizer error. Linguistic constraints, most prominently rules of syntax or word combination probabilities, limit the perplexity of the recognition task and this results in both significantly improved accuracy and the potential for much larger recognition vocabularies. Such constraints are certainly necessary to understand fluent speech and humans use them as well. But once a language is completely specified, what happens when the human errs? In the majority of current, experimental, large-vocabulary connected speech recognizers, omission of a single word is likely to result in no recognition result reported at all. If the recognizer is configured to hear seven or ten digit telephone numbers and the talker stops after six digits, the recognizer may report nothing or possibly only the occurrence of an error without any hint as to its nature. But consider the conversational techniques discussed earlier in this chapter as well as in Chapter 8. It is extremely unproductive for the application to remain mute or to utter unhelpfully "What was that?" A preferable alternative might be to echo the digits back to the user either in entirety or in part. Another might be to use back-channel style encouragement ("Uh-huh" spoken with a somewhat rising pitch) to indicate that the application is waiting for more digits and anticipates that the user has not yet completed his or her turn. Further misunderstandings might cause the application to explain what it thinks the user has requested and why the request is not yet complete, e.g., "I can dial the phone for you but you must specify at least seven digits for a valid number." In the above scenario, the human made a clear mistake by speaking only six digits. But a cooperative system will not castigate the user, rather it will try to help complete the task. Perhaps the user accidently missed a digit or cannot decipher a digit in someone else's handwriting, or paused to obtain reassurance that the system was still listening and realized that a call was being placed. Such occurrences are not even limited to situations in which the human makes such an unequivocal mistake; spoken language is ripe with false starts, ill-formed grammatical constructs, and ellipsis. When these happen, is there any hope for an application using speech recognition to recover gracefully and cooperatively? One approach, similar to that used by Conversational Desktop and described in Chapter 9, is to define the recognizer's grammar so that all possible sentence fragments appear as valid utterances. When used during the recognition phase of discourse, such an under-constrained grammar would lead to reduced recognition accuracy due to increased perplexity. But humans are effective at conversing to attain a goal and, to the extent that we can engage in question-answering, we are capable of getting acceptable performance even in the face of recognition errors [Hunnicutt et al. 1992] so letting some errors occur and then negotiating with the user may be more productive than insisting on near-perfect recognition before reporting any results. Two approaches from MIT's Spoken Language Systems Group illustrate practical hybrid solutions. One approach reported by [Seneff 1992] employs conventional parsing based on syntactic rules, but when a well-formed parse is not found, the parser switches to a more semantically oriented mode based on analysis of keywords in the context of the task domain. Additionally, conventional syntactic parsing can be improved by weighting word choices with probability
Page 1 and 2:
MIT OpenCourseWare http://ocw.mit.e
Page 3 and 4:
VNR Computer Library Advanced Topic
Page 5 and 6:
To Ava andKaya for the patienceto s
Page 7 and 8:
vi VOICE COMMUNICATION WITH COMPUTE
Page 9 and 10:
viE VOICE COMMUNICATION WITH COMPUT
Page 11 and 12:
VOICE COMMUNICATION WITH COMPUTEgS
Page 14 and 15:
Speaking of Talk As we enter the ne
Page 16 and 17:
Speaking ofTalk The short-term fix
Page 18 and 19:
Preface This book began as graduate
Page 20:
RFe[e ih This is just the beginning
Page 23 and 24:
xxii VOICE OMMUNICEAIIO WITH COMPUT
Page 25 and 26:
2 VOICE COMMUNICATION WITH COMPUTER
Page 27 and 28:
4 VOICE COMMUNICATION WITH COMPUTEI
Page 29 and 30:
6 VOICE (OMMUNICAIION WITH COMPUTER
Page 31 and 32:
S YVOICE COMMUNICATION WITH COMPUTE
Page 33 and 34:
10 VOICE COMMUHICATION WITH (OMPUIE
Page 35 and 36:
12 VOICE COMMUNICATION WITH COMPUTE
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
18 YVOII (OMMUNICATION WITH (OMPUTE
Page 43 and 44:
20 VOIC[ COMMUNICATION WITH COMPUTE
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
28 VOICE (OMMUNICATION WITH COMPUTE
Page 53 and 54:
30 VOICE COMMUNICATION WITH (OMPUTE
Page 55 and 56:
32 VOICE (OMMUHICATIOI WITH (OMPUTE
Page 57 and 58:
Page 59 and 60:
3 Speech Coding This chapter introd
Page 61 and 62:
38 VOICE (OMMUNICATlON WITH COMPUTE
Page 63 and 64:
40 VOICE COMMURItATION WITH (OMPUTE
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
48 VOICE COMMUNICATION WITi COMPUTE
Page 73 and 74:
IT -010 !Mm1-tgq Vm-0 50 VOICE COMM
Page 75 and 76:
52 VOICE (OMMUNICAlION WITH COMPUTE
Page 77 and 78:
Page 79 and 80:
56 VOICE COMMUNI(ATION WITH COMPUTE
Page 81 and 82:
Page 83 and 84:
4 Applications and Editing of Store
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
78 YOICE COMMUNICATION WITH COMPUTE
Page 103 and 104:
Page 105 and 106:
5 Speech Synthesis The previous two
Page 107 and 108:
Page 109 and 110:
6 YVOICE (OMMUNICATION WITH [OMPUTE
Page 111 and 112:
n VOICE COMMUNICATION WITH COMPUTER
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
98 VOICE COMMUlICATION WITH COMPUTE
Page 123 and 124:
Interactive Voice Response This cha
Page 125 and 126:
102 VOICE COMMUNICATION WITH COMPUT
Page 127 and 128:
104 VOICE COMMUNICATION WITH (OMPUT
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
110 VOICE (DMMUNICATION WITH COMPUT
Page 135 and 136:
112 VOICE COMMUNICATIOH WITH COMPUT
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
13 VOICE COMMUNICAIIOR WITH COMPUTE
Page 155 and 156:
7 Speech Recognition This chapter i
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
138 VOICE (OMMUNICATION WITH COMPUT
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
USES OF VOICE INPUT Sole IW,1 Canna
Page 179 and 180:
Page 181 and 182:
In VOICE COMMUNICATION WITH COMPUTE
Page 183 and 184:
160 VOICE [OMMUHICATION WIITH OMPUT
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
168 VOICE COMMUNICAlION WITH COM IU
Page 193 and 194:
Page 195 and 196:
172 VOICE (OMMURICAION WITH COMPUTE
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
182 OICE (COMMUNIICAION WITH COMPUT
Page 207 and 208:
1I4 VOICE COMMUNICATION WITH COMPUT
Page 209 and 210:
Page 211 and 212:
188 VOICE COMMUNICATION WIIH (OMPUT
Page 213 and 214:
Page 215 and 216:
192 VOIC0 COMMUNICATION WITH COMPUT
Page 217 and 218:
194 VOI(E COMMUNIATIION WITH (OMPUT
Page 219 and 220:
Page 221 and 222:
198 VOICI COMMUNICATION WITH COMPUT
Page 223 and 224:
Page 225 and 226:
202 VOICE COMMUNICATIOR WIIH COMPUT
Page 227 and 228:
204 VOICE COMMUNICAIION WIlH COMPUT
Page 229 and 230:
206 VOIC([ OMMUNICATION WITH COMPUT
Page 231 and 232:
Page 233 and 234:
10 Basics of Telephones This and th
Page 235 and 236:
Page 237 and 238:
Page 239 and 240:
216 VOICE COMMUNICATION WITH COMPUI
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
226 VOICE (COMMUNICATION WITH COMPU
Page 251 and 252:
Page 253 and 254:
11 Telephones and Computers The pre
Page 255 and 256:
Page 257 and 258:
214 VOICE COMMUHiCATION WITH COMPUI
Page 259 and 260:
236 VOICE COMMUNICATIOK WITH COMPUT
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
242 VOICE COMMUHICATIOI WITH COMPUI
Page 267 and 268:
Page 269 and 270:
246 VOICE (OMMUNICAIION WITH COMPUT
Page 271 and 272: 248 VOICE (COMMUNICATION WITH COMPU
Page 273 and 274: 250 VOICE COMMUNICATION WITH COMPUT
Page 277 and 278: 254 VOICE COMMUNICAIION WITH COMPUT
Page 279 and 280: 25 VOICE (OMMUNICATIOK WITH COMPUTE
Page 291 and 292: 12 Desktop Audio Because of limitat
Page 295 and 296: 2f2 VOICE COMMUIICAIION WITH COMPUT
Page 305 and 306: 22 VOICE COMMUNICATION WITH COMPUTE
Page 307 and 308: 284 VOIC[ COMMIUWlATION WITH COMPUT
Page 311 and 312: 211 VOICE (OMMUNICATION WITH COMPUT
Page 313 and 314: 290 VOICE COMMIUNICATIO0 WITH COMPU
Page 317 and 318: 294 VOICE (OMMUNICATION WITH COMPUT
Page 321: 298 VOICE (OMMUNICATION WITH (COMPU
Page 328 and 329: Bibliography Ades, S. and D. C. Swi
Page 330 and 331: Biliograply o30 Daly, N. A. and V W
Page 332 and 333: Bib~lgraphy 309 Klatt, D. H. "Revie
Page 334 and 335: liblgraphy 311 G.Peterson, W. Wang,
Page 336 and 337: Bbliography 313 Spiegel, M. F "Usin
Page 338 and 339: Index p-law, 45, 220, 224 acoustics
Page 340 and 341: Integrated Services Digital Network
Page 342: vocal tract, 19-24, 23 voice mail,
show all

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?