28.02.2013 Views

Introduction to Acoustics

Introduction to Acoustics

Introduction to Acoustics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

774 Part E Music, Speech, Electroacoustics<br />

Part E 18.5<br />

put signals are most commonly used for both the analysis<br />

sections of the coder and the signal path. Indeed, the<br />

fundamental data conveyed by most perceptual coders<br />

is not PCM time-domain data, but quantized frequencydomain<br />

information from which the decoder eventually<br />

produces PCM output by way of a synthesis filter bank.<br />

The most commonly used digital filter bank is a variant<br />

of the FFT called the time-domain alias cancellation<br />

(TDAC) transform, developed by Princen and Bradley<br />

in 1986 [18.69,70]. This transform has the highly desirable<br />

property of being critically sampled, meaning that it<br />

produces exactly the same number of output frequencydomain<br />

samples as there are input PCM samples. There<br />

is also a companion inverse TDAC transform for producing<br />

output PCM from a set of decoded frequency-domain<br />

input samples.<br />

In operation, PCM samples input <strong>to</strong> the encoder<br />

are divided in<strong>to</strong> regular blocks of some predetermined<br />

length, and each block is transformed <strong>to</strong> the frequency<br />

domain, conveyed <strong>to</strong> the decoder, reconstituted<br />

as a block of PCM samples, and the successive blocks<br />

strung <strong>to</strong>gether, often in an overlapping manner, <strong>to</strong><br />

recover the final outputs. This process is sometimes<br />

described as block-oriented processing using running<br />

transforms. The practice of coding an entire block of<br />

samples as a single entity facilitates data rates of less than<br />

one bit per original audio sample. One downside of block<br />

processing is that events which occupy a small fraction<br />

of a block, like a sharp transient, may become smeared<br />

across a range of frequencies, consuming a large number<br />

of bits. A number of techniques have evolved <strong>to</strong> deal<br />

with this situation, most commonly the use of block<br />

switching, wherein such short events are detected (by,<br />

for example, a transient detec<strong>to</strong>r routine), and the block<br />

size is temporarily shortened <strong>to</strong> more closely isolate the<br />

transient within the shortened block.<br />

The filter-bank output is of direct use in pursuit of<br />

the first two perceptual coding techniques listed above,<br />

principally by deriving the magnitude of the frequencydomain<br />

signals as a function of frequency <strong>to</strong> obtain<br />

a discrete approximation <strong>to</strong> the power spectral density<br />

(PSD) of the signal block. This in turn is processed by<br />

a routine implementing a perceptual model of human<br />

hearing <strong>to</strong> derive a masking curve, which specifies the<br />

threshold of hearing as a function of frequency for that<br />

signal block. Any spectral component falling below the<br />

masking curve will be inaudible, so need not be coded,<br />

as per the first listed coding technique. The remaining<br />

spectral components must be preserved if audible<br />

alteration is <strong>to</strong> be avoided, but the quantization precision<br />

is only that which is required <strong>to</strong> render the level of<br />

the quantization noise below the masking curve. Instead<br />

of the 120 dB/20 bit range one might require for PCM<br />

audio, the instantaneous masking range is more often<br />

on the order of 20–30 dB, about 4–5 bits per sample,<br />

assuming 6 dB per bit SNR. Thus, between suppression<br />

of inaudible components and dynamic quantization<br />

of audible components, the data rate can be expected<br />

<strong>to</strong> be reduced <strong>to</strong> something less than 4 bits per sample.<br />

Of course, an effective bit-stream syntax pro<strong>to</strong>col<br />

must be devised <strong>to</strong> signal these conditions efficiently<br />

<strong>to</strong> the decoder, and considerable ingenuity has been<br />

brought <strong>to</strong> bear on that issue <strong>to</strong> ensure the requirement<br />

is met.<br />

Notable lossy audio coders based on these principles<br />

include AC-3,MP3,DTS,WMA,Ogg,andAAC.These<br />

have been instrumental enablers of such technologies as<br />

portable music players, DVD’s, digital soundtracks on<br />

35 mm film, and satellite radio.<br />

It should be noted that perceptual coding is much<br />

more compatible with constant-bit-rate operation than<br />

is lossless coding, for as the complexity of a signal increases,<br />

which might otherwise increase the required<br />

data rate, the masking afforded by that signal also<br />

increases, reducing the average quantization accuracy<br />

required, thereby holding the required data rate <strong>to</strong> a more<br />

nearly constant rate. In effect, the human audi<strong>to</strong>ry system<br />

can only absorb so much information per unit time,<br />

so as long as the coder accurately models that behavior,<br />

the required data rate should be largely constant. Of<br />

course, very simple signals, like silence, are not likely<br />

<strong>to</strong> require the same data rate as more complex signals,<br />

in which case a constant-bitrate coder may simply use<br />

far more than the minimum required data rate in order <strong>to</strong><br />

maintain a constant transmission rate. Some coders can<br />

defer use of bits in the presence of simple signals, savingtheminabit<br />

bucket until needed by a more complex<br />

signal.<br />

The third listed technique used in perceptual coding,<br />

the application of higher-level abstractions <strong>to</strong> describe<br />

the signal, is much more general and open-ended; and is<br />

still an area of active investigation. It can be something<br />

as simple as using a single PSD curve <strong>to</strong> approximate<br />

the actual PSD curves of several successive transform<br />

blocks, or using decoder-generated noise in place of<br />

actually transmitting noise-like signals. More abstract<br />

approaches may fall under the heading of parametric<br />

coding.<br />

One notable example of parametric coding is bandwidth<br />

extension, in which the entire high-frequency end<br />

of the spectrum is suppressed by the encoder, and is reconstituted<br />

by the decoder from analysis of the harmonic

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!