MAS.632 Conversational Computer Systems - MIT OpenCourseWare

More documents

Recommendations

Info

Slue Removal Speeh codio 55 as the wrong differences will be accessed from the lookup table to produce the coder output. With coders such as ADPCM predictable results can be achieved if cutting and splicing is forced to occur during silent parts of the signal; during silent portions the predictor returns to a known state, i.e., its minimum value. As long as enough low value samples from the silent period are retained in the data stream so as to force the predictor down to its minimum, another speech segment starting from the minimum value prediction (just after silence) can be inserted. This precludes fine editing at the syllable level but may suffice for applications involving editing phrases or sentences. Adaptation is not an issue with PCM encoding as the value of each sample is independent of any prior state. However, when pieces of PCM data are spliced together, the continuity of the waveform is momentarily disrupted, producing a "click" or "pop" during playback. This is due to changing the shape of the waveform, not just its value, from sample to sample. The discontinuity is less severe when the splice takes place during low energy portions ofthe signal, since the two segments are less likely to represent drastically different waveform shapes. The noise can be further reduced by both interpolating across the splice and reducing the amplitude of the samples around the splice for a short duration. A packetization approach may make it possible to implement cut-and-paste editing using coders that have internal state. For example, LPC speech updates the entire set of decoder parameters every 20 milliseconds. LPC is really a stream of data packets, each defining an unrelated portion of sound; this allows edits at a granularity of 20 milliseconds. The decoder is designed to interpolate among the disparate sets of parameter values it receives with each packet allowing easy substitution of other packets while editing. A similar packetization scheme can be implemented with many of the coding schemes by periodically including a copy of the coder's internal state information in the data stream. With ADPCM for example, the predictor and output signal value would be included perhaps every 100 milliseconds. Each packet of data would include a header containing the decoder state, and the state would be reset from this data each time a packet was read. This would allow editing to be performed on chunks of sound with lengths equal to multiples of the packet size. In the absence of such a packetization scheme, it is necessary to first convert the entire sound file to PCM for editing. Editing is performed on the PCM data and then is converted back to the desired format by recoding the signal based on the new data stream. In spontaneous speech there are substantial periods of silence.'o The speaker often pauses briefly to gather his or her thoughts either in midsentence or "Spontaneous speech is normal unrehearsed speech. Reading aloud or reciting a passage from memory are not spontaneous.
56 VOICE COMMUNI(ATION WITH COMPUTERS between sentences. Although the term "sentence" may be used fairly loosely with spontaneous speech, there are speech units marked by pauses for breath that usually correspond roughly to clauses or sentences. Removing these silences during recording can save storage space. Such techniques are commonly applied to almost any encoding technique. For example, silence removal is used with most voice mail systems regardless of the encoding scheme. The amount of storage space saved by silence removal is directly proportional to the amount of speech that was pauses. A form of silence removal is actually used in many telephone networks. There is no need to keep a transmission channel occupied transferring the audio data associated with the frequent pauses in a conversation. By detecting these pauses, the channel may be freed to carry another conversation for the duration of the pause. When one of the parties starts speaking again, the channel must be restored to transmit the conversation. In practice, a small number ofchannels are used to carry a larger number of conversations; whenever a new spurt of speech occurs during one of the conversations, a free channel is assigned to it. This channel-sharing scheme was originally implemented as an analog system known as Time Assigned Speech Interpolation (TASI) [O'Neill 19591 to carry an increased number of calls over limited-capacity, trans-Atlantic telephone cables. Today, digital techniques are employed to implement "virtual circuits," so called because the actual channel being used to carry a specific conversation will vary over time. The channel refers to a particular slot number in a digital data stream carrying a large number of conversations simultaneously. During silence, the slot associated with the conversation is freed and made available to the next conversation that begins. When the first conversation resumes, the slot it formerly occupied may be in use, in which case it is assigned to a new slot. These adjustments are transparent to equipment outside of the network. The number of channels required to carry the larger number of conversations is a property of the statistical nature of the speaking and silence intervals in ordinary conversations and is an important aspect of network traffic management. During a telephone conversation, no time compression is achieved by silence removal; although audio data may not be transmitted across the network during silences, the listener must still wait for the talker to resume. In a recorded message with pauses removed, less time is required for playback than transpired during recording. Silence removal is generally effective, but some of the pauses in speech contain meaning; part of the richness of the message may be the indication ofthe talker's thought process and degree ofdeliberation implied by the location and duration of the pauses. Several techniques can be used to reconstruct the missing pauses. Instead of simply removing the pauses from the stored data, a special data value may be inserted into the file indicating the presence of a pause and its duration. During playback this information indicates to the decoder to insert the right amount of silence. Since the ambient (background) noise of the recording is replaced by absolute silence, an audible discontinuity is noticeable. Another technique involves recording a small amount of the background noise and then playing it
Page 1 and 2:
MIT OpenCourseWare http://ocw.mit.e
Page 3 and 4:
VNR Computer Library Advanced Topic
Page 5 and 6:
To Ava andKaya for the patienceto s
Page 7 and 8:
vi VOICE COMMUNICATION WITH COMPUTE
Page 9 and 10:
viE VOICE COMMUNICATION WITH COMPUT
Page 11 and 12:
VOICE COMMUNICATION WITH COMPUTEgS
Page 14 and 15:
Speaking of Talk As we enter the ne
Page 16 and 17:
Speaking ofTalk The short-term fix
Page 18 and 19:
Preface This book began as graduate
Page 20:
RFe[e ih This is just the beginning
Page 23 and 24:
xxii VOICE OMMUNICEAIIO WITH COMPUT
Page 25 and 26:
2 VOICE COMMUNICATION WITH COMPUTER
Page 27 and 28: 4 VOICE COMMUNICATION WITH COMPUTEI
Page 29 and 30: 6 VOICE (OMMUNICAIION WITH COMPUTER
Page 31 and 32: S YVOICE COMMUNICATION WITH COMPUTE
Page 33 and 34: 10 VOICE COMMUHICATION WITH (OMPUIE
Page 35 and 36: 12 VOICE COMMUNICATION WITH COMPUTE
Page 41 and 42: 18 YVOII (OMMUNICATION WITH (OMPUTE
Page 43 and 44: 20 VOIC[ COMMUNICATION WITH COMPUTE
Page 51 and 52: 28 VOICE (OMMUNICATION WITH COMPUTE
Page 53 and 54: 30 VOICE COMMUNICATION WITH (OMPUTE
Page 55 and 56: 32 VOICE (OMMUHICATIOI WITH (OMPUTE
Page 59 and 60: 3 Speech Coding This chapter introd
Page 61 and 62: 38 VOICE (OMMUNICATlON WITH COMPUTE
Page 63 and 64: 40 VOICE COMMURItATION WITH (OMPUTE
Page 71 and 72: 48 VOICE COMMUNICATION WITi COMPUTE
Page 73 and 74: IT -010 !Mm1-tgq Vm-0 50 VOICE COMM
Page 75 and 76: 52 VOICE (OMMUNICAlION WITH COMPUTE
Page 77: 54 VOICE COMMUNICATION WITH COMPUTE
Page 83 and 84: 4 Applications and Editing of Store
Page 101 and 102: 78 YOICE COMMUNICATION WITH COMPUTE
Page 105 and 106: 5 Speech Synthesis The previous two
Page 109 and 110: 6 YVOICE (OMMUNICATION WITH [OMPUTE
Page 111 and 112: n VOICE COMMUNICATION WITH COMPUTER
Page 121 and 122: 98 VOICE COMMUlICATION WITH COMPUTE
Page 123 and 124: Interactive Voice Response This cha
Page 125 and 126: 102 VOICE COMMUNICATION WITH COMPUT
Page 127 and 128: 104 VOICE COMMUNICATION WITH (OMPUT
Page 129 and 130:
106 VOICE COMMUNICATION WITH COMPUT
Page 131 and 132:
Page 133 and 134:
110 VOICE (DMMUNICATION WITH COMPUT
Page 135 and 136:
112 VOICE COMMUNICATIOH WITH COMPUT
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
13 VOICE COMMUNICAIIOR WITH COMPUTE
Page 155 and 156:
7 Speech Recognition This chapter i
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
138 VOICE (OMMUNICATION WITH COMPUT
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
152 VOICE COMMUNICATION WITH (OMPUT
Page 177 and 178:
USES OF VOICE INPUT Sole IW,1 Canna
Page 179 and 180:
Page 181 and 182:
In VOICE COMMUNICATION WITH COMPUTE
Page 183 and 184:
160 VOICE [OMMUHICATION WIITH OMPUT
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
168 VOICE COMMUNICAlION WITH COM IU
Page 193 and 194:
Page 195 and 196:
172 VOICE (OMMURICAION WITH COMPUTE
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
182 OICE (COMMUNIICAION WITH COMPUT
Page 207 and 208:
1I4 VOICE COMMUNICATION WITH COMPUT
Page 209 and 210:
Page 211 and 212:
188 VOICE COMMUNICATION WIIH (OMPUT
Page 213 and 214:
Page 215 and 216:
192 VOIC0 COMMUNICATION WITH COMPUT
Page 217 and 218:
194 VOI(E COMMUNIATIION WITH (OMPUT
Page 219 and 220:
Page 221 and 222:
198 VOICI COMMUNICATION WITH COMPUT
Page 223 and 224:
Page 225 and 226:
202 VOICE COMMUNICATIOR WIIH COMPUT
Page 227 and 228:
204 VOICE COMMUNICAIION WIlH COMPUT
Page 229 and 230:
206 VOIC([ OMMUNICATION WITH COMPUT
Page 231 and 232:
20 VOICE COMMUNICATION WITH COMPUTE
Page 233 and 234:
10 Basics of Telephones This and th
Page 235 and 236:
212 VOICE COMMUNICATION WITH (OMPUT
Page 237 and 238:
Page 239 and 240:
216 VOICE COMMUNICATION WITH COMPUI
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
226 VOICE (COMMUNICATION WITH COMPU
Page 251 and 252:
Page 253 and 254:
11 Telephones and Computers The pre
Page 255 and 256:
Page 257 and 258:
214 VOICE COMMUHiCATION WITH COMPUI
Page 259 and 260:
236 VOICE COMMUNICATIOK WITH COMPUT
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
242 VOICE COMMUHICATIOI WITH COMPUI
Page 267 and 268:
Page 269 and 270:
246 VOICE (OMMUNICAIION WITH COMPUT
Page 271 and 272:
248 VOICE (COMMUNICATION WITH COMPU
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
254 VOICE COMMUNICAIION WITH COMPUT
Page 279 and 280:
25 VOICE (OMMUNICATIOK WITH COMPUTE
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
12 Desktop Audio Because of limitat
Page 293 and 294:
Page 295 and 296:
2f2 VOICE COMMUIICAIION WITH COMPUT
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
22 VOICE COMMUNICATION WITH COMPUTE
Page 307 and 308:
284 VOIC[ COMMIUWlATION WITH COMPUT
Page 309 and 310:
Page 311 and 312:
Page 313 and 314:
290 VOICE COMMIUNICATIO0 WITH COMPU
Page 315 and 316:
Page 317 and 318:
Page 319 and 320:
Page 321 and 322:
298 VOICE (OMMUNICATION WITH (COMPU
Page 323 and 324:
Page 325 and 326:
Page 328 and 329:
Bibliography Ades, S. and D. C. Swi
Page 330 and 331:
Biliograply o30 Daly, N. A. and V W
Page 332 and 333:
Bib~lgraphy 309 Klatt, D. H. "Revie
Page 334 and 335:
liblgraphy 311 G.Peterson, W. Wang,
Page 336 and 337:
Bbliography 313 Spiegel, M. F "Usin
Page 338 and 339:
Index p-law, 45, 220, 224 acoustics
Page 340 and 341:
Integrated Services Digital Network
Page 342:
vocal tract, 19-24, 23 voice mail,
show all

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?