28.02.2013 Views

Introduction to Acoustics

Introduction to Acoustics

Introduction to Acoustics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

sentences (see Fig. 13.20). They varied the ITDs ofthe<br />

two sentences in the range 0 <strong>to</strong> ±181 µs. For example,<br />

one sentence might lead in the left ear by 45 µs, while<br />

the other sentence would lead in the right ear by 45 µs<br />

(as in Fig. 13.20). The sentences were based on natural<br />

speech but were processed so that each was spoken on<br />

a mono<strong>to</strong>ne, i. e., with constant F0. TheF0 difference<br />

between the two sentences was varied from 0 <strong>to</strong> 4 semi<strong>to</strong>nes.<br />

Subjects were instructed <strong>to</strong> attend <strong>to</strong> one particular<br />

sentence. At a certain point, the two sentences contained<br />

two different target words aligned in starting time and<br />

duration (“dog” and “bird”). The F0sandtheITDsofthe<br />

two target words were varied independently from those<br />

of the two sentences. Subjects had <strong>to</strong> indicate which of<br />

the two target words they heard in the attended sentence.<br />

They reported the target word that had the same ITD as<br />

the attended sentence much more often than the target<br />

word with the opposite ITD. In other words, the target<br />

word with the same ITD as the attended sentence was<br />

grouped with that sentence. This was true even when the<br />

target word had the same ITD as the attended sentence<br />

but a different F0. Thus, subjects grouped words across<br />

time according <strong>to</strong> their perceived location, independent<br />

of F0 differences. Darwin and Hukin [13.198] concluded<br />

that listeners who try <strong>to</strong> track a particular sound source<br />

over time direct attention <strong>to</strong> audi<strong>to</strong>ry objects at a particular<br />

subjective location. The audi<strong>to</strong>ry objects themselves<br />

may be formed using cues other than ITD, for example,<br />

onset and offset asynchrony and harmonicity.<br />

It should be noted that, for discrete sequences of<br />

musical <strong>to</strong>nes, the audi<strong>to</strong>ry system does not necessarily<br />

form streams according <strong>to</strong> perceived location, especially<br />

when that cue competes with other cues. This is illustrated<br />

by an effect, called the scale illusion, reported<br />

by Deutsch [13.217]. She presented two sequences of<br />

<strong>to</strong>nes via headphones, one sequence <strong>to</strong> each ear. The nth<br />

<strong>to</strong>ne in the left ear was synchronous with the nth <strong>to</strong>ne<br />

in the right ear. The sequences were created by repetitive<br />

presentation of the C major scale in both ascending<br />

and descending form, such that when a component of<br />

the ascending scale was in one ear, a component of<br />

the descending scale was in the other, and vice versa.<br />

However, the <strong>to</strong>nes from each scale alternated between<br />

ears. Within each ear there were often large jumps in<br />

frequency between successive <strong>to</strong>nes. Most subjects perceived<br />

the sounds as two streams, organized by the<br />

frequency proximity of successive <strong>to</strong>nes. One stream<br />

(which was often heard <strong>to</strong>wards one ear) was heard as<br />

a musical scale that started high, descended and then increased<br />

again, while the other stream (which was usually<br />

heard <strong>to</strong>wards the opposite ear) was heard as a scale that<br />

Psychoacoustics 13.8 Audi<strong>to</strong>ry Scene Analysis 491<br />

started low, ascended, and then decreased again. Thus,<br />

the true location of the <strong>to</strong>nes had little influence on the<br />

formation of the perceptual streams.<br />

Another example come from the opening bars of the<br />

last movement of Tchaikovsky’s sixth symphony. This<br />

contains interleaved notes played by the first and second<br />

violins, who according <strong>to</strong> 19th century cus<strong>to</strong>m sat on<br />

opposite sides of the stage. These notes are perceived<br />

as a single stream, despite the difference in location,<br />

presumably because of the frequency proximity between<br />

successive notes.<br />

A number of composers have exploited the fact that<br />

stream segregation occurs for <strong>to</strong>nes that are widely separated<br />

in frequency. By playing a sequence of <strong>to</strong>nes<br />

in which alternate notes are chosen from separate frequency<br />

ranges, an instrument such as the flute, which is<br />

only capable of playing one note at a time, can appear<br />

<strong>to</strong> be playing two themes at once. Many fine examples<br />

of this are available in the works of Bach, Telemann and<br />

Vivaldi.<br />

Judgment of Temporal Order<br />

It is difficult <strong>to</strong> judge the temporal order of sounds<br />

that are perceived in different streams. An example<br />

of this comes from the work of Broadbent and Ladefoged<br />

[13.218]. They reported that extraneous sounds in<br />

sentences were grossly mislocated. For example, a click<br />

might be reported as occurring a word or two away<br />

from its actual position. Surprisingly poor performance<br />

was also reported by Warren et al. [13.219] for judgments<br />

of the temporal order of three or four unrelated<br />

items, such as a hiss, a <strong>to</strong>ne, and a buzz. Most subjects<br />

could not identify the order when each successive item<br />

lasted as long as 200 ms. Naive subjects required that<br />

each item last at least 700 ms <strong>to</strong> identify the order of<br />

four sounds presented in an uninterrupted repeated sequence.<br />

These durations are well above those which are<br />

normally considered necessary for temporal resolution.<br />

The poor order discrimination described by Warren<br />

et al. is probably a result of stream segregation.<br />

The sounds they used do not represent a coherent class.<br />

They have different temporal and spectral characteristics,<br />

and, as for <strong>to</strong>nes widely differing in frequency, they<br />

do not form a single perceptual stream. Items in different<br />

streams appear <strong>to</strong> float about with respect <strong>to</strong> each other<br />

in subjective time. Thus, temporal order judgments are<br />

difficult. It should be emphasized that the relatively poor<br />

performance reported by Warren et al. [13.219] is found<br />

only in tasks requiring absolute identification of the order<br />

of sounds and not in tasks which simply require<br />

the discrimination of different sequences. Also, with<br />

Part D 13.8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!