28.02.2013 Views

Introduction to Acoustics

Introduction to Acoustics

Introduction to Acoustics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

750 Part E Music, Speech, Electroacoustics<br />

Part E 18.2<br />

filter bank. The brain does not see the wide-band audio<br />

signal arriving at the ear, as it might appear on an oscilloscope,<br />

but instead processes the outputs of the basilar<br />

membrane filter bank, subject <strong>to</strong> effective half-wave rectification<br />

and loss of high-frequency synchrony by the<br />

neurons (Fig. 18.6). Output neural signals from each ear<br />

are ultimately processed by multiple areas of the brain.<br />

So, it is not surprising that issues of timing are a little<br />

ill-defined, and that there are a number of time constants<br />

associated with different aspects of hearing.<br />

For starters, there is the basic ability of the neurons<br />

in the audi<strong>to</strong>ry nerve <strong>to</strong> follow the individual cycles of<br />

audio impinging on the basilar membrane; i.e. <strong>to</strong> exhibit<br />

phase locking. Current research seems <strong>to</strong> put the upper<br />

frequency of this activity at about 5 kHz, although<br />

this seems somewhat at odds with lateralization of sine<br />

waves on the basic of interaural time difference, which<br />

only extends up <strong>to</strong> about 1500 Hz.<br />

The ability <strong>to</strong> follow individual cycles of audio may<br />

aid in pitch perception, as the effective shapes of the<br />

basilar membrane filters appear <strong>to</strong>o broad <strong>to</strong> account for<br />

the observed pitch resolution. However, the physiological<br />

mechanisms for how this might be accomplished are<br />

still the subject of some debate.<br />

Other time constants appear <strong>to</strong> apply <strong>to</strong> the aggregate<br />

audio, regardless of the presence of a filter bank<br />

at the input. Already noted are the masking time constants,<br />

which substantially extend only 1–2 ms prior <strong>to</strong><br />

the onset of a loud sound, but continue for several tens<br />

of milliseconds after its cessation.<br />

Each of these may be related <strong>to</strong> a more fundamental<br />

time constant. The short time constant of about 1–2 ms<br />

may represent the shortest time one can perceive without<br />

relying on spectral cues, while the post-masking interval<br />

may be related <strong>to</strong> the fusion time. The fusion time<br />

in particular seems <strong>to</strong> represent a kind of acoustic integration<br />

time, or the limit of primary acoustic memory,<br />

typically on the order of 30–50 ms, and appears <strong>to</strong> be associated<br />

with the lower frequency limit of the audio band<br />

(20–30 Hz). It also seems <strong>to</strong> explain why echoes are only<br />

heard as such in large cathedral-like rooms, as they are<br />

integrated with the direct arrivals in normal-size rooms.<br />

Table 18.1 Summary of approximate perceptual time/amplitude<br />

limits<br />

Parameter Range JND<br />

Amplitude 120 dB 0.25 dB<br />

Premask N/A 2ms<br />

Postmask N/A 30–50ms<br />

Binaural timing 700 µs 10 µs<br />

On the other hand, binaural timing clearly exhibits<br />

higher resolution, as differences of interaural timing on<br />

the order of 10 ms may be audible.<br />

18.2.4 Spatial Acuity<br />

Strictly speaking, the spatiality of a sound field is not<br />

perceived directly, but is instead derived from analysis<br />

of the physical attributes already described. However,<br />

given its importance <strong>to</strong> audio and electroacoustics, it is<br />

nonetheless useful <strong>to</strong> review the processes involved.<br />

Specification of the position of a sound source<br />

relative <strong>to</strong> a listener in three-space requires three coordinates.<br />

From the mechanisms used by the human ear,<br />

the natural selection is <strong>to</strong> use a combination of azimuth,<br />

elevation, and distance. Of these, distance is generally<br />

considered <strong>to</strong> be an inferred metric, based on the amplitude,<br />

spectral balance, and reverberation content of<br />

a source relative <strong>to</strong> other sources, and is not especially<br />

accurate in any absolute sense.<br />

This leaves directional estimation as the primary<br />

localization task. It appears that the ear uses separate<br />

processes <strong>to</strong> determine direction in a left/right sense and<br />

a front/<strong>to</strong>p/back sense. Given the physical placement of<br />

the ears, it is not surprising that the left/right decision<br />

is the more direct and accurate, depending on the interaural<br />

amplitude difference (IAD) and interaural time<br />

difference (ITD) localization cues.<br />

Although the ear is sensitive <strong>to</strong> IAD at all audible<br />

frequencies, as can be verified by headphone listening,<br />

the head provides little shadowing effect at low frequencies,<br />

since the wavelengths become much longer than<br />

the size of the head, so IAD is mostly useful at middle<br />

<strong>to</strong> high frequencies.<br />

The JND for IAD is about 1 dB, corresponding <strong>to</strong><br />

about 1 degree of an arc for a front/centered highfrequency<br />

source. As a sound moves around <strong>to</strong> the side<br />

of the head, the absolute IAD will tend <strong>to</strong> increase, and<br />

the differential IAD per degree will decrease, causing<br />

the angular JND <strong>to</strong> decrease accordingly.<br />

The ITD lateralization cue has a rather as<strong>to</strong>nishing<br />

JND of just 10 ms, for a source on the centerline between<br />

the ears, corresponding <strong>to</strong> an angle of about 1 degree.<br />

For signals off <strong>to</strong> one side, the ITD will typically reach<br />

a maximum value of just 750 ms – 3/4 of a millisecond.<br />

As with the IAD cue, the sensitivity of the ITD cue<br />

declines as a source moves around <strong>to</strong> the side of the<br />

head.<br />

Also like IAD, this cue is usable at all frequencies,<br />

albeit with a key qualification. At frequencies below<br />

about 1500 Hz, the neural response from the basilar

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!