MAS.632 Conversational Computer Systems - MIT OpenCourseWare

More documents

Recommendations

Info

Bosi of Telephors circuits with patch cords in the central office switchboards completing the circuits. Now it is most likely that once the circuit leaves the local central office, the voice is digitized. The digitized voice is then combined with several other digitized conversations, transmitted some distance by wire, fiber-optic cable, microwaves, or radio transmission to a satellite, eventually demultiplexed from the stream of many conversations, turned back into an analog signal at a distant central office, and sent down the final loop. This entire procedure is virtually transparent to the user except possibly for the delays associated with transmission time on satellite links; we may think ofthe network as switching virtual analog circuits. Recall that the loop is a two-wire system. Both parties may talk at the same time and consequently one transmit signal (what one party says) and one receive signal (what the other party says) become mixed on the line. A hybrid circuit in the telephone attempts to separate these signals; this is a difficult task in that each connection has a different phase and amplitude response. In other words, the acoustic response of a line varies as a function of frequency, which, in turn, varies from call to call. Some of the signal from the talker's microphone comes back to the talker's earpiece: this is called sidetone and provides useful feedback to the caller as a cue to the line quality and a hint as to when to speak up. But there are several drawbacks to the mix between transmit and receive signals. With telephone-based speech recognition, it is difficult to build an application in which the caller can interrupt ("barge in") while the computer is talking because the computer's speech will be feeding back into the recognizer. Speaker phones encounter feedback problems when the transmitted signal comes out through the speaker and back into the microphone.'" Consequently, speaker phones are half duplex, meaning that at any moment they are either transmitting or receiving but not both concurrently. Whichever party speaks louder will monopolize the channel ("get the floor"). This makes it much harder to interrupt or offer back channel responses (See chapter 9) and as a result conversation becomes awkward. Several companies (Shure, NEC, VideoTelecom, to name a few) sell full duplex echo-cancellation systems that can be used in a telephone environment. These more sophisticated systems determine the impulse response of the system, i.e., the correspondence between the transmitted and received signal to properly subtract the transmitted element from the received element of the conversation. These devices are still relatively expensive ($1000 and higher in 1993) pieces of equipment intended for conference rooms, but lower priced consumer models for the desktop are just around the corner. Telephone audio is band limited to lie between approximately 300 and 3100 Hz. The lower bound is established by blocking capacitors that separate the audio " This is not a problem with handsets since the signal coming out of the earpiece is not strong enough to make its way into the mouthpiece. A piece of cotton may often be found inside the handset to prevent acoustic coupling through the hollow body of the handset. 219
220 VOICE COMMUNICATION WITH COMPUTERS signal from the battery current. The higher frequency is set in part by characteristics of the carbon microphone and in part by the constraints of the digital portions of the network. It is important to limit the upper frequency. The digital segment of the network carries the signal sampled at 8 kHz with 8 bits of p-law encoded data per sample. As discussed in Chapter 3, g-law encoding is a logarithmic pulse code modulation scheme, and these 8 bits allow about 12 bits ofreal dynamic range. The 8 kHz sampling rate cannot represent a signal with frequency components exceeding 4kHz without introducing aliasing distortion; this translates to a little higher than 3kHz for real-world audio filters. Because of the limited bandwidth some acoustic information is lost during a telephone conversation. A significant amount of low frequency information is lost, generally including the fundamental frequency of voicing, although this does not prevent us from perceiving the pitch of the speech. The higher frequency limit interferes most significantly with the intelligibility of fricatives. These particular frequency limits were chosen with care to allow adequate intelligibility at minimum cost (in terms of bandwidth). Much of the trouble that consumers experience with understanding telephone conversations stems from noisy lines or inadequate amplitude rather than with the bandwidth limitations. Still it is often more difficult to identify a caller over the telephone than in person. On a positive note, users may find synthetic speech less objectionable over the telephone as they are already accustomed to its band-limited characteristics. Conference calls allow more than two parties to be connected simultaneously. Such a call may be established by a single subscriber with conference service, which allows two calls to be merged by one line; the calls are actually merged in the switch at the request of the common telephone. Local conferencing usually can merge three callers, i.e., two incoming calls; for a larger group a conference service must be involved to establish a bridge across the circuits. With a limited number of lines, conferencing can be accomplished by simply adding the signals and perhaps decreasing their amplitude first to prevent clipping speech when multiple parties speak simultaneously. For more than two or three parties, simply adding all the signals is undesirable since the background noise accumulated from each line results in a poor signal-to-noise ratio for the single person speaking. It is better to add together the signals from the one or two lines producing the loudest signals, presumably those conferees who are talking at a given time, and mute the remaining lines to keep their noise out ofthe conference. This technique interferes with interruption; if many parties speak at once, some will not be heard. Communication is also horribly confounded if inexpensive half-duplex speaker phones are used. Conference calls are often used as the audio paths for teleconferences in which multiple sites share voice as well as video. Teleconferencing usually requires specially equipped conference rooms. There is growing research interest in less formal conferences in which low cost video equipment is located in ordinary offices, connected together for "video calls," and possibly sharing windows of workstation screens among conferees. Devices such as modems and facsimile (fax) machines transmit digital data over analog voice circuits. They do so by generating different sounds for "1"bits
Page 1 and 2:
MIT OpenCourseWare http://ocw.mit.e
Page 3 and 4:
VNR Computer Library Advanced Topic
Page 5 and 6:
To Ava andKaya for the patienceto s
Page 7 and 8:
vi VOICE COMMUNICATION WITH COMPUTE
Page 9 and 10:
viE VOICE COMMUNICATION WITH COMPUT
Page 11 and 12:
VOICE COMMUNICATION WITH COMPUTEgS
Page 14 and 15:
Speaking of Talk As we enter the ne
Page 16 and 17:
Speaking ofTalk The short-term fix
Page 18 and 19:
Preface This book began as graduate
Page 20:
RFe[e ih This is just the beginning
Page 23 and 24:
xxii VOICE OMMUNICEAIIO WITH COMPUT
Page 25 and 26:
2 VOICE COMMUNICATION WITH COMPUTER
Page 27 and 28:
4 VOICE COMMUNICATION WITH COMPUTEI
Page 29 and 30:
6 VOICE (OMMUNICAIION WITH COMPUTER
Page 31 and 32:
S YVOICE COMMUNICATION WITH COMPUTE
Page 33 and 34:
10 VOICE COMMUHICATION WITH (OMPUIE
Page 35 and 36:
12 VOICE COMMUNICATION WITH COMPUTE
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
18 YVOII (OMMUNICATION WITH (OMPUTE
Page 43 and 44:
20 VOIC[ COMMUNICATION WITH COMPUTE
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
28 VOICE (OMMUNICATION WITH COMPUTE
Page 53 and 54:
30 VOICE COMMUNICATION WITH (OMPUTE
Page 55 and 56:
32 VOICE (OMMUHICATIOI WITH (OMPUTE
Page 57 and 58:
Page 59 and 60:
3 Speech Coding This chapter introd
Page 61 and 62:
38 VOICE (OMMUNICATlON WITH COMPUTE
Page 63 and 64:
40 VOICE COMMURItATION WITH (OMPUTE
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
48 VOICE COMMUNICATION WITi COMPUTE
Page 73 and 74:
IT -010 !Mm1-tgq Vm-0 50 VOICE COMM
Page 75 and 76:
52 VOICE (OMMUNICAlION WITH COMPUTE
Page 77 and 78:
Page 79 and 80:
56 VOICE COMMUNI(ATION WITH COMPUTE
Page 81 and 82:
Page 83 and 84:
4 Applications and Editing of Store
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
78 YOICE COMMUNICATION WITH COMPUTE
Page 103 and 104:
Page 105 and 106:
5 Speech Synthesis The previous two
Page 107 and 108:
Page 109 and 110:
6 YVOICE (OMMUNICATION WITH [OMPUTE
Page 111 and 112:
n VOICE COMMUNICATION WITH COMPUTER
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
98 VOICE COMMUlICATION WITH COMPUTE
Page 123 and 124:
Interactive Voice Response This cha
Page 125 and 126:
102 VOICE COMMUNICATION WITH COMPUT
Page 127 and 128:
104 VOICE COMMUNICATION WITH (OMPUT
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
110 VOICE (DMMUNICATION WITH COMPUT
Page 135 and 136:
112 VOICE COMMUNICATIOH WITH COMPUT
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
13 VOICE COMMUNICAIIOR WITH COMPUTE
Page 155 and 156:
7 Speech Recognition This chapter i
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
138 VOICE (OMMUNICATION WITH COMPUT
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
152 VOICE COMMUNICATION WITH (OMPUT
Page 177 and 178:
USES OF VOICE INPUT Sole IW,1 Canna
Page 179 and 180:
Page 181 and 182:
In VOICE COMMUNICATION WITH COMPUTE
Page 183 and 184:
160 VOICE [OMMUHICATION WIITH OMPUT
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192: 168 VOICE COMMUNICAlION WITH COM IU
Page 193 and 194: 170 VOICE COMMUNICATION WITH COMPUT
Page 195 and 196: 172 VOICE (OMMURICAION WITH COMPUTE
Page 205 and 206: 182 OICE (COMMUNIICAION WITH COMPUT
Page 207 and 208: 1I4 VOICE COMMUNICATION WITH COMPUT
Page 211 and 212: 188 VOICE COMMUNICATION WIIH (OMPUT
Page 215 and 216: 192 VOIC0 COMMUNICATION WITH COMPUT
Page 217 and 218: 194 VOI(E COMMUNIATIION WITH (OMPUT
Page 221 and 222: 198 VOICI COMMUNICATION WITH COMPUT
Page 225 and 226: 202 VOICE COMMUNICATIOR WIIH COMPUT
Page 227 and 228: 204 VOICE COMMUNICAIION WIlH COMPUT
Page 229 and 230: 206 VOIC([ OMMUNICATION WITH COMPUT
Page 231 and 232: 20 VOICE COMMUNICATION WITH COMPUTE
Page 233 and 234: 10 Basics of Telephones This and th
Page 235 and 236: 212 VOICE COMMUNICATION WITH (OMPUT
Page 239 and 240: 216 VOICE COMMUNICATION WITH COMPUI
Page 241: 218 VOICE COMMUNICATION WITH COMPUT
Page 249 and 250: 226 VOICE (COMMUNICATION WITH COMPU
Page 253 and 254: 11 Telephones and Computers The pre
Page 257 and 258: 214 VOICE COMMUHiCATION WITH COMPUI
Page 259 and 260: 236 VOICE COMMUNICATIOK WITH COMPUT
Page 265 and 266: 242 VOICE COMMUHICATIOI WITH COMPUI
Page 269 and 270: 246 VOICE (OMMUNICAIION WITH COMPUT
Page 271 and 272: 248 VOICE (COMMUNICATION WITH COMPU
Page 277 and 278: 254 VOICE COMMUNICAIION WITH COMPUT
Page 279 and 280: 25 VOICE (OMMUNICATIOK WITH COMPUTE
Page 291 and 292: 12 Desktop Audio Because of limitat
Page 293 and 294:
Page 295 and 296:
2f2 VOICE COMMUIICAIION WITH COMPUT
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
Page 307 and 308:
284 VOIC[ COMMIUWlATION WITH COMPUT
Page 309 and 310:
Page 311 and 312:
Page 313 and 314:
290 VOICE COMMIUNICATIO0 WITH COMPU
Page 315 and 316:
Page 317 and 318:
Page 319 and 320:
Page 321 and 322:
298 VOICE (OMMUNICATION WITH (COMPU
Page 323 and 324:
Page 325 and 326:
Page 328 and 329:
Bibliography Ades, S. and D. C. Swi
Page 330 and 331:
Biliograply o30 Daly, N. A. and V W
Page 332 and 333:
Bib~lgraphy 309 Klatt, D. H. "Revie
Page 334 and 335:
liblgraphy 311 G.Peterson, W. Wang,
Page 336 and 337:
Bbliography 313 Spiegel, M. F "Usin
Page 338 and 339:
Index p-law, 45, 220, 224 acoustics
Page 340 and 341:
Integrated Services Digital Network
Page 342:
vocal tract, 19-24, 23 voice mail,
show all

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

Create successful ePaper yourself

Delete template?

Save as template?