15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ealized, e.g., with the following procedure:<br />

• The bit allocation unit searches for the analysis filter subband with the lowest MNR and allocates<br />

code bits to this subband; then the SNR(m) value is updated for this subband and the actual MNR<br />

is computed with Eq. (27.60).<br />

• The process is repeated until no more code bits can be allocated.<br />

An important problem, resulting from the transformation of the audio signal (via an analysis filter<br />

bank) into the frequency domain, is the appearance of pre-echoes, occurring in silent signal periods followed<br />

by sudden sound attacks (e.g., of a percussive character). This phenomenon is caused by quantization<br />

errors, which are irrelevant in loud and stationary signal parts but are immediately audible in silent signal<br />

parts. In TCs, the inverse transform in the receiver distributes the quantization errors over the whole block<br />

of samples cut with the respective time window. In SBCs, this effect occurs due to transients. A possible<br />

method for suppression of pre-echos is the adaptive window switching (cf., Fig. 27.19) [46]. Windows of<br />

short lengths should be used in nonstationary parts of the signal, while in stationary signal parts wide<br />

windows (improving the overall coding efficiency) should be used. Typically, the block size vary between<br />

N = 64 and N = 1024.<br />

Further reduction of audio bit rate is still possible by resignation from the full perceptual transparency.<br />

In many cases, especially in multimedia and/or in mobile-access applications, a not annoying reduction<br />

of fidelity of some audio components of secondary importance, is acceptable. The whole audio scene<br />

can be divided into a number of individual audio objects: a conversation, a background noise, a background<br />

music, sounds produced by particular sources, etc. These objects can be coded and transmitted<br />

separately. Furthermore, some of them may be added synthetically at the receiver. Such coding philosophy<br />

is used in the so-called structured audio format implemented in the MPEG-4 standard (cf., section 27.10).<br />

By this means, a very flexible scalability of audio quality can be realized. This is very useful when audio<br />

has to be transmitted through channels of varying capacity and/or is to be received with decoders of<br />

various quality and complexity.<br />

27.10 Audio Coding Standards<br />

MUSICAM and MPEG Standards<br />

Among standards for digital coding of high quality audio, the most important role play moving picture<br />

expert group (MPEG) standards designed for various communications and multimedia applications.<br />

They are elaborated as a result of efforts of the working group WG 11 within the International Organization<br />

for Standardization (ISO/IEC).<br />

The first result was MPEG-1 standard IS 11172 designed (in its audio part) for a two-channel audio,<br />

approximately with a CD quality [16]. This standard consists of three layers I, II, and III, of increasing<br />

efficiency. For transparent transmission, they enable bit rates of 384, 192, and 128 kb/s, respectively.<br />

MPEG-1 supports sampling rates of 32, 44.1, and 48 ksamples/s. Layer II of MPEG-1 is based on the<br />

masking-pattern universal subband integrated coding and multiplexing (MUSICAM) standard designed<br />

for digital audio broadcasting (DAB) system. Layer III of MPEG-1 has become very popular in Internet<br />

due to ∗.mp3 audio files.<br />

The next step of the standardization was MPEG-2 AAC (advanced audio coding) standard IS 13818<br />

designed for high definition television (HDTV) [17]. It offers a multichannel (surround) sound for high<br />

spatial realism, provides low bit rate audio (below 64 kb/s), and also supports low sampling rates of 16,<br />

22.05, and 24 ksamples/s.<br />

The third generation standard MPEG-4 has been designed for a broad area of various communications<br />

(especially mobile access) and multimedia applications and is characterized by high flexibility,<br />

scalability, and universalism [18]. It supports bit rates between 2 and 64 kb/s and offers additional services<br />

as text-to-speech (TTS) conversion, structured audio format, and interface between TTS and synthetic<br />

moving face models (talking heads), which are driven from speech.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!