03.08.2013 Views

ITU-T G.719: A New Low-Complexity Full-Band (20KHZ ... - Polycom

ITU-T G.719: A New Low-Complexity Full-Band (20KHZ ... - Polycom

ITU-T G.719: A New Low-Complexity Full-Band (20KHZ ... - Polycom

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, <strong>New</strong> Paltz, NY<br />

<strong>ITU</strong>-T <strong>G.719</strong>: A NEW LOW-COMPLEXITY FULL-BAND (20 KHZ) AUDIO CODING<br />

STANDARD FOR HIGH-QUALITY CONVERSATIONAL APPLICATIONS<br />

Minjie Xie∗ and Peter Chu Anisse Taleb and Manuel Briand<br />

<strong>Polycom</strong>, Inc.<br />

10 Mall Road, #110, Burlington, MA 01803, USA<br />

Peter.Chu@polycom.com<br />

ABSTRACT<br />

This paper describes a new low-complexity full-band (20 kHz)<br />

audio coding algorithm which has been recently standardized<br />

by <strong>ITU</strong>-T as Recommendation <strong>G.719</strong>. The algorithm is<br />

designed to provide 20 Hz - 20 kHz audio bandwidth using a 48<br />

kHz sample rate, operating at 32 - 128 kbps. This codec<br />

features very high audio quality and low computational<br />

complexity and is suitable for use in applications such as<br />

videoconferencing, teleconferencing, and streaming audio over<br />

the Internet. Subjective test results from the Optimization/<br />

Characterization phase of <strong>G.719</strong> are also presented in the paper.<br />

Index Terms— Audio coding, full-band, low<br />

complexity, adaptive time-frequency transform, fast<br />

lattice vector quantization<br />

1. INTRODUCTION<br />

In hands-free videoconferencing and teleconferencing markets,<br />

there is strong and increasing demand for audio coding<br />

providing the full human auditory bandwidth of 20 Hz to 20<br />

kHz. This is because:<br />

• Conferencing systems are increasingly used for more<br />

elaborate presentations, often including music and sound<br />

effects (i.e. animal sounds, musical instruments, vehicles or<br />

nature sounds, etc.) which occupy a wider audio band than<br />

speech. Presentations involve remote education of music,<br />

playback of audio and video from DVDs and VCRs,<br />

audio/video clips from PCs, and elaborate audio-visual<br />

presentations from, for example, PowerPoint.<br />

• Users perceive the bandwidth of 20 Hz to 20 kHz as<br />

representing the ultimate goal for audio bandwidth. The<br />

resulting market pressures are causing a shift in this<br />

direction, now that sufficient IP bitrate and audio coding<br />

technology are available to deliver this.<br />

As with any audio codec for hands-free videoconferencing<br />

use, the requirements include:<br />

• <strong>Low</strong> latency (support natural conversation)<br />

• <strong>Low</strong> complexity (free cycles for video processing and other<br />

audio processing; reduce cost)<br />

• High quality on all signal types<br />

To meet the market need for such a full-band audio coding<br />

standard, <strong>ITU</strong>-T Q10/SG16 launched the standardization of<br />

<strong>Low</strong>-<strong>Complexity</strong> <strong>Full</strong>-<strong>Band</strong> Audio Coding Extension to G.722.1<br />

(G.722.1FB) in November 2006. The <strong>ITU</strong>-T G.722.1FB<br />

Ericsson Research – Multimedia Technologies<br />

Färögatan 6, Kista, Sweden<br />

{anisse.taleb, manuel.briand}@ericsson.com<br />

Qualification phase was completed in June 2007 and two<br />

candidates submitted respectively by Ericsson AB and <strong>Polycom</strong>,<br />

Inc. were qualified. In September 2007, the two proponents<br />

announced their collaboration in an Optimization/<br />

Characterization phase and agreed to jointly develop a candidate<br />

codec. The joint candidate codec met all the requirements in the<br />

subjective quality tests for the G.722.1FB Optimization/<br />

Characterization phase in April 2008. In June 2008, the joint<br />

codec was adopted as <strong>ITU</strong>-T Recommendation <strong>G.719</strong> which is<br />

the first full-band audio coding standard of <strong>ITU</strong>-T [1].<br />

In this paper we first give an overview of the <strong>ITU</strong>-T <strong>G.719</strong><br />

codec, followed by a description of two important modules of<br />

the codec - adaptive time-frequency transform and quantization<br />

of transform coefficients. We also present the evaluation of the<br />

computational complexity and memory requirements of the<br />

codec. Finally, subjective test results from the Optimization/<br />

Characterization phase are summarized.<br />

2. OVERVIEW OF THE <strong>G.719</strong> CODEC<br />

The <strong>G.719</strong> codec is a low-complexity transform-based audio<br />

codec and can provide an audio bandwidth of 20 Hz to 20 kHz<br />

at 32 - 128 kbps. The codec operates on frames of 20 ms<br />

corresponding to 960 samples at a sampling rate of 48 kHz and<br />

has an algorithmic delay of 40 ms. The codec features very high<br />

audio quality and extremely low computational complexity<br />

compared to other state-of-the-art audio coding algorithms.<br />

2.1 Encoder<br />

Input signal<br />

Fs = 48 kHz<br />

Transient<br />

detector<br />

Transform<br />

Norm<br />

estimation<br />

∗ Minjie Xie was with <strong>Polycom</strong>, Inc., when this work was done. He is now with Huawei Technologies (USA).<br />

978-1-4244-3679-8/09/$25.00 ©2009 IEEE 265<br />

Transient<br />

signaling<br />

Quantization<br />

indices<br />

Huffman<br />

code<br />

MUX<br />

Norm<br />

quantization<br />

and coding<br />

Spectrum<br />

normalization<br />

Norm<br />

adjustment<br />

Bit<br />

allocation<br />

FLVQ<br />

and coding<br />

Noise level<br />

adjustment<br />

Noise<br />

level<br />

adjustment<br />

index<br />

Figure 1: Block diagram of the <strong>G.719</strong> encoder.


2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, <strong>New</strong> Paltz, NY<br />

A block diagram of the <strong>G.719</strong> encoder is shown in Figure 1.<br />

The input signal sampled at 48 kHz is processed through a<br />

transient detector. Depending on the detection of a transient,<br />

indicated by a flag IsTransient, a high frequency resolution or a<br />

low frequency resolution transform is applied on the input signal<br />

frame. The adaptive transform is based on a modified discrete<br />

cosine transform (MDCT) [2] in case of stationary frames. For<br />

transient frames, the MDCT is modified to obtain a higher<br />

temporal resolution without a need for additional delay and with<br />

very little overhead in complexity. Transient frames have a<br />

temporal resolution equivalent to 5 ms frames.<br />

The obtained transform coefficients are grouped into bands<br />

of unequal lengths as 8, 16, 24, or 32. As the bandwidth is 20<br />

kHz, only 800 transform coefficients are used. The 160<br />

transform coefficients representing frequencies above 20 kHz<br />

are ignored. The norm or power of each band is estimated and<br />

the resulting spectral envelope consisting of the norms of all<br />

bands is scalar quantized and encoded. The coefficients are<br />

then normalized by the quantized norms. The quantized norms<br />

are further adjusted based on adaptive spectral weighting and<br />

used as input for bit allocation. An adaptive bit-allocation<br />

scheme based on the quantized norms of the bands is used to<br />

assign the available bits in a frame among the bands. The<br />

number of bits assigned to each transform coefficient can be as<br />

large as 9 bits depending on the input signal. The normalized<br />

transform coefficients are lattice vector quantized according to<br />

the allocated bits for each band. Huffman coding is applied to<br />

the quantization indices for both the coded norms and transform<br />

coefficients. The norm of the non-coded transform coefficients<br />

is estimated, coded, and transmitted to the decoder.<br />

2.2 Decoder<br />

Transient signaling<br />

FLVQ indices<br />

FLVQ<br />

Decoding<br />

Noise level<br />

adjustment index<br />

Spectral-fill<br />

Generator<br />

DEMUX<br />

+<br />

Norm indices<br />

Envelope<br />

Shaping<br />

Transient signaling<br />

Inverse<br />

Transform<br />

Figure 2: Block diagram of the <strong>G.719</strong> decoder.<br />

Audio signal<br />

Fs = 48 kHz<br />

A block diagram of the decoder is shown in Figure 2. The<br />

transient flag is first decoded which indicates the frame<br />

configuration, i.e. stationary or transient. The spectral envelope<br />

is then decoded and the same, bit-exact, norm adjustment and<br />

bit-allocation algorithms are used at the decoder to re-compute<br />

the bit-allocation which is essential for decoding quantization<br />

indices of the normalized transform coefficients.<br />

After decoding the transform coefficients, the non-coded<br />

transform coefficients (allocated zero bits) in low frequencies<br />

are regenerated by using a spectral-fill codebook built from the<br />

decoded transform coefficients. The noise level adjustment<br />

index is used to adjust the level of the regenerated coefficients.<br />

The non-coded transform coefficients in high frequencies are<br />

regenerated using bandwidth extension. The decoded transform<br />

266<br />

coefficients and regenerated transform coefficients are mixed<br />

and lead to normalized spectrum. The decoded spectral<br />

envelope is applied to the decoded full-band spectrum. Finally,<br />

the inverse transform is applied to recover the time-domain<br />

decoded signal. This is performed by applying either the<br />

inverse MDCT for stationary mode, or the inverse of the higher<br />

temporal resolution transform for transient mode.<br />

3. ADAPTIVE TIME-FREQUENCY TRANSFORM<br />

The adaptive time-frequency transform is based on the detection<br />

of a transient. In the case of stationary signals, the transform<br />

has a high frequency resolution which is able to efficiently<br />

represent stationary sounds. In the case of transient sounds, the<br />

time-frequency transform will increase its time resolution and<br />

allows a better representation of the rapid changes in the input<br />

signal characteristics. These two modes of operations share a<br />

common buffering and windowing module and the switching<br />

between one mode of operation to the other is instantaneous.<br />

Thus no additional look-ahead in the transient detection is<br />

needed. This allows the codec to have selectable time resolution<br />

at low complexity and with zero additional delay.<br />

If the input signal is detected as stationary, a type IV<br />

discrete cosine transform (DCTIV) [2] is applied on the output<br />

~<br />

x ( n)<br />

of the time domain aliasing (TDA) operation. The DCTIV<br />

of the time aliased signal is defined by the following equation:<br />

959<br />

⎡⎛<br />

⎞⎛<br />

⎞ ⎤<br />

=<br />

~<br />

1 1 π<br />

y(<br />

k)<br />

∑ x(<br />

n)<br />

cos⎢⎜<br />

n + ⎟⎜<br />

k + ⎟ ⎥<br />

= ⎣⎝<br />

2 ⎠⎝<br />

2 ⎠ 960<br />

n 0<br />

⎦<br />

, k = 0, 1, …, 959 (1)<br />

The resulting signal y(k) represents the transform coefficients of<br />

the input frame. It should be noted that in stationary mode, the<br />

cascade of windowing, TDA, and DCTIV is equivalent to<br />

applying the modulated lapped transform (MLT) [2].<br />

If the signal is detected as a transient, a further re-ordering<br />

of the time-domain aliased signal is performed. The basis<br />

functions of the resulting filter-bank would have an incoherent<br />

time and frequency responses without re-ordering. The reordering<br />

operation consists of shuffling the upper and lower half<br />

of the TDA output signal<br />

~<br />

x ( n)<br />

as follows:<br />

v( n)<br />

=<br />

~<br />

x(<br />

959 − n)<br />

, n = 0, 1, …, 959. (2)<br />

This re-ordering is only conceptual and in reality no<br />

computations are involved. Higher time resolution is obtained<br />

by zero padding the signal v(n) and dividing the resulting signal<br />

into four overlapped equal length sub-frames. The amount of<br />

zero-padding is equal to 120 on each side of the signal. The<br />

segments are 50% overlapped and each segment has a length<br />

equal to 480. The two inner segments are post-windowed using<br />

a sine window of length 480. The windows for outer segments<br />

are constructed using half a sine window.<br />

Each resulting post-windowed segment is further processed<br />

by applying the MDCT, i.e. time aliasing followed by DCT IV.<br />

The output of the MDCT for each segment represents the signal<br />

spectrum at different time instants, thus in case of transients, a<br />

higher time resolution is used. The length of the output of each<br />

of the four MDCT is half the length of the input segment, i.e.<br />

240. Therefore, this operation does not introduce additional<br />

redundancy.


2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, <strong>New</strong> Paltz, NY<br />

4. TRANSFORM COEFFICIENT QUANTIZATION<br />

Each band consists of one or more vectors of 8-dimensional<br />

transform coefficients and the coefficients are normalized by the<br />

quantized norm. All 8-dimensional vectors belonging to one<br />

band are assigned the same number of bits for quantization. A<br />

fast lattice vector quantization (FLVQ) scheme [3] is used to<br />

quantize the normalized coefficients in 8 dimensions. In FLVQ<br />

the quantizer comprises two sub-quantizers: a D8-based higherrate<br />

lattice vector quantizer (HRQ) and an RE8-based lower-rate<br />

lattice vector quantizer (LRQ).<br />

HRQ is a multi-rate quantizer designed to quantize the<br />

transform coefficients at rates of 2 up to 9 bit/coefficient and its<br />

codebook is based on the so-called Voronoi code for the D8 lattice [4]. D8 is a well-known lattice and defined as:<br />

D8 = {(y1, y2, y3, y4, y5, y6, y7, y8) ∈ Z8 | ∑<br />

i=<br />

8<br />

1<br />

y i = even} (3)<br />

where Z 8 is the lattice which consists of all points with integer<br />

coordinates. It can be seen that D 8 consists of the points having<br />

integer coordinates with an even sum. The codebook of HRQ is<br />

constructed from a finite region of the D 8 lattice and is not<br />

stored in memory. The codewords are generated by a simple<br />

algebraic method and a fast quantization algorithm is used.<br />

To minimize the distortion for a given rate, the D 8 lattice<br />

should be truncated and scaled. Actually, the input vectors<br />

instead of the lattice codebook are scaled in order to use the fast<br />

searching and indexing algorithms proposed by Conway and<br />

Sloane [4]. However, the fast searching algorithm introduced in<br />

[4] assumes an infinite lattice which can not be used as the<br />

codebook in real-time audio coding systems. In other words, for<br />

a given rate the algorithm can not be used to quantize the input<br />

vectors lying outside the truncated lattice region. To resolve<br />

this problem, a fast method for quantizing “outliers” xo is<br />

developed and described as follows:<br />

1) Scale down the vector x o by 2: x o = x o / 2.<br />

2) Find the nearest D 8 point u to x o and then compute the<br />

index vector j of u using the indexing algorithm in [4].<br />

3) Find the codeword y from the index vector j and then<br />

compare y with u. If y is different from u, repeat Steps 1)<br />

to 3). Otherwise, compute w = x o / 16. Note that due to<br />

the normalization of transform coefficients, a few<br />

iterations are required to find the best codeword.<br />

4) Compute x o = x o + w.<br />

5) Find the nearest D 8 point u to x o and then compute the<br />

index vector j of u using the indexing algorithm in [4].<br />

6) Find the codeword y from the index vector j and then<br />

compare y with u. If y and u are exactly same, k = j and<br />

repeat Steps 4) to 6). Otherwise, k is the index of the best<br />

codeword to x o and stop.<br />

It has been observed that the 8-dimensional coefficient<br />

vectors have a high concentration of probability around the<br />

origin. Therefore, Huffman coding is optionally tried for the<br />

quantization indices of HRQ to increase efficiency of<br />

quantization when the rate is smaller than 6 bit/coefficient.<br />

LRQ is a 1-bit quantizer and uses spherical codes based on<br />

the rotated Gosset lattice RE8 [5] as the codebook. The RE8 lattice is defined as<br />

RE8 = 2 D8 U {2 D8 + (1, 1, 1, 1, 1, 1, 1, 1)} (4)<br />

267<br />

and consists of the points falling on concentric spheres of radius<br />

2 2r<br />

centered at the origin, where r = 0, 1, 2, ... . The set of<br />

points on a sphere constitutes a spherical code and can be used<br />

as a quantization codebook. The LRQ codebook of 256<br />

codewords is stored in a structured look-up table so that a fast<br />

searching method and a fast indexing method are developed to<br />

reduce the computational complexity [3].<br />

5. COMPLEXITY OF THE <strong>G.719</strong> CODEC<br />

The computational complexity of the <strong>G.719</strong> codec in 16/32-bit<br />

fixed-point is estimated by encoding and decoding the source<br />

material used for the subjective tests of the <strong>G.719</strong> Optimization/<br />

Characterization phase. The observed average and worst-case<br />

complexity in units of WMOPS [6] are shown in Table 1 for<br />

different bitrates. Storage requirements are reported in Table 2.<br />

Table 1: Computational complexity of the <strong>G.719</strong> codec.<br />

Bitrate<br />

(kbps)<br />

Encoder Decoder Encoder + Decoder<br />

Avg. Max Avg. Max Avg. Max<br />

32 6.663 7.996 6.876 7.413 13.539 15.397<br />

48 7.427 9.073 7.230 7.806 14.657 16.861<br />

64 7.912 9.899 7.554 8.161 15.466 18.060<br />

80 8.303 10.564 7.865 8.496 16.169 19.026<br />

96 8.555 10.796 8.122 8.757 16.677 19.536<br />

112 8.787 10.881 8.368 9.009 17.155 19.890<br />

128 8.980 11.775 8.610 9.225 17.590 21.000<br />

Table 2: Memory usage of the <strong>G.719</strong> codec.<br />

Memory type Encoder Decoder Encoder + Decoder<br />

Static RAM 1 1.0 3.9 4.9<br />

Scratch RAM 1 12.2 12.2 24.4<br />

Data ROM 1 8.3 8.9 10.7<br />

Program ROM 2 1.2 1.2 1.8<br />

1 In units of 16-b kWords 2 In units of 1000’s basic operators<br />

6. SUBJECTIVE PERFORMANCE OF <strong>G.719</strong><br />

Subjective tests for the <strong>ITU</strong>-T <strong>G.719</strong> (ex. G.722.1FB)<br />

Optimization/Characterization phase were performed from mid<br />

February through early April 2008 by independent listening<br />

laboratories in American English, French, and Spanish.<br />

According to a test plan designed by <strong>ITU</strong>-T Q7/SG12 experts,<br />

the joint candidate codec conducted two experiments as follows:<br />

• Experiment 1: Speech (clean, reverberant, and noisy)<br />

• Experiment 2: Mixed content and music<br />

Mixed content items are representative of advertisement, film<br />

trailers, news with jingles, music with announcements, etc., and<br />

contain speech, music, and noise. Each experiment used the<br />

“triple stimulus/hidden reference/double blind test method”<br />

described in <strong>ITU</strong>-R Recommendation BS.1116-1. A standard<br />

MPEG audio codec, LAME MP3 version 3.97 as found on the<br />

LAME website as of September 2006 [7] was used as the<br />

reference codec in the subjective tests. The <strong>ITU</strong>-T requirement


2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, <strong>New</strong> Paltz, NY<br />

was that the <strong>G.719</strong> candidate codec at 32, 48, and 64kbps be<br />

proven “Not Worse Than” the reference codec at 40, 56, and 64<br />

kbps, respectively, with a 95% statistical confidence level. In<br />

addition, the <strong>G.719</strong> candidate codec at 64 kbps was also tested<br />

against the G.722.1C codecs at 48 kbps for Experiment 2.<br />

The subjective test results for the <strong>G.719</strong> codec are shown in<br />

Figures 3-6. Statistical analysis of the results showed that the<br />

<strong>G.719</strong> codec met all performance requirements specified for the<br />

subjective Optimization/Characterization test. For experiment 1<br />

the <strong>G.719</strong> codec was better than the reference codec at all bit<br />

rates in both languages. For experiment 2 the <strong>G.719</strong> codec is<br />

better than the reference codec at the lowest bit rate for all the<br />

items and at the two other bitrates for most of the items.<br />

An additional subjective listening test for the <strong>G.719</strong> codec<br />

was conducted later to evaluate the quality of the codec at rates<br />

higher than those described in the <strong>ITU</strong>-T test plan. Because the<br />

quality expectation of the codec at these high rates is high, a<br />

pre-selection of critical items, for which the quality at the lower<br />

bitrate range was most degraded, was conducted prior to testing.<br />

The test results are shown in Figure 7. It has been proven that<br />

transparency was reached for critical material at 128 kbps.<br />

DMOS<br />

5.0<br />

4.5<br />

4.0<br />

3.5<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Ref 40kbps<br />

<strong>G.719</strong> 32kbps<br />

Ref 56kbps<br />

<strong>G.719</strong> 48kbps<br />

Ref 64kbps<br />

Speech (North American English)<br />

<strong>G.719</strong> 64kbps<br />

Ref 40kbps<br />

<strong>G.719</strong> 32kbps<br />

Ref 56kbps<br />

<strong>G.719</strong> 48kbps<br />

Ref 64kbps<br />

Clean Reverberant Reverberant and Noisy<br />

Figure 3: Subjective test results – Experiment 1 (English).<br />

DMOS<br />

5.0<br />

4.5<br />

4.0<br />

3.5<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Ref 40kbps<br />

<strong>G.719</strong> 32kbps<br />

Ref 56kbps<br />

<strong>G.719</strong> 48kbps<br />

Ref 64kbps<br />

<strong>G.719</strong> 64kbps<br />

Speech (French)<br />

Ref 40kbps<br />

<strong>G.719</strong> 32kbps<br />

Ref 56kbps<br />

<strong>G.719</strong> 48kbps<br />

Ref 64kbps<br />

Clean Reverberant Reverberant and Noisy<br />

Figure 4: Subjective test results – Experiment 1 (French).<br />

DMOS<br />

5.0<br />

4.5<br />

4.0<br />

3.5<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Ref<br />

40kbps<br />

<strong>G.719</strong> 64kbps<br />

<strong>G.719</strong> 64kbps<br />

Ref 40kbps<br />

Ref 40kbps<br />

Mixed-content and Music (North American English)<br />

<strong>G.719</strong><br />

32kbps<br />

Ref<br />

56kbps<br />

<strong>G.719</strong><br />

48kbps<br />

Ref<br />

64kbps<br />

<strong>G.719</strong><br />

64kbps<br />

<strong>G.719</strong> 32kbps<br />

<strong>G.719</strong> 32kbps<br />

Ref 56kbps<br />

Ref 56kbps<br />

<strong>G.719</strong> 48kbps<br />

<strong>G.719</strong> 48kbps<br />

G.722.1C<br />

48kbps<br />

Ref 64kbps<br />

Ref 64kbps<br />

<strong>G.719</strong> 64kbps<br />

<strong>G.719</strong> 64kbps<br />

<strong>G.719</strong><br />

64kbps<br />

Figure 5: Subjective test results – Experiment 2 (English).<br />

268<br />

DMOS<br />

5.0<br />

4.5<br />

4.0<br />

3.5<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Ref<br />

40kbps<br />

<strong>G.719</strong><br />

32kbps<br />

Mixed-content and Music (Spanish)<br />

Ref<br />

56kbps<br />

<strong>G.719</strong><br />

48kbps<br />

Ref<br />

64kbps<br />

<strong>G.719</strong><br />

64kbps<br />

G.722.1C<br />

48kbps<br />

<strong>G.719</strong><br />

64kbps<br />

Figure 6: Subjective test results – Experiment 2 (Spanish).<br />

5.5<br />

5.0<br />

4.5<br />

4.0<br />

3.5<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Source<br />

<strong>G.719</strong><br />

32kbps<br />

Source<br />

Additional Subjective Test Results (DMOS)<br />

<strong>G.719</strong><br />

48kbps<br />

Source<br />

<strong>G.719</strong><br />

64kbps<br />

Source<br />

<strong>G.719</strong><br />

80kbps<br />

Source<br />

<strong>G.719</strong><br />

96kbps<br />

Source<br />

<strong>G.719</strong><br />

112kbps<br />

Figure 7: Additional subjective test results.<br />

7. CONCLUSION<br />

This paper has presented the 20 kHz full-band audio coding<br />

algorithm and subjective Optimization/Characterization test<br />

results of <strong>ITU</strong>-T Recommendation <strong>G.719</strong>. The <strong>G.719</strong> codec<br />

features very high audio quality, extremely low computational<br />

complexity, and low algorithmic delay compared to other stateof-the-art<br />

audio coding codecs. Subjective test results show that<br />

the <strong>G.719</strong> codec achieves transparent audio quality at 128 kbps.<br />

The main intended applications for this codec are<br />

videoconferencing and teleconferencing, as well as Internet<br />

streaming.<br />

8. REFERENCES<br />

[1] <strong>ITU</strong>-T Recommendation <strong>G.719</strong>, “<strong>Low</strong>-complexity fullband<br />

audio coding for high-quality conversational<br />

applications,” June 2008.<br />

[2] H. S. Malvar, Signal Processing with Lapped Transforms.<br />

Norwood, MA: Artech House, 1992.<br />

[3] M. Xie, “Lattice Vector Quantization and its Applications<br />

in Speech and Audio Coding,” in Speech Processing in<br />

Modern Communication: Challenges and Perspectives.<br />

Springer, To appear.<br />

[4] J. H. Conway and N. J. A. Sloane, Sphere Packings,<br />

Lattices and Groups. Springer-Verlag, <strong>New</strong> York, 1988.<br />

[5] C. Lamblin and J.-P. Adoul, “Algorithme de quantification<br />

vectorielle sphérique à partir du réseau de Gosset d’ordre<br />

8,” Annales des Télécommunications, pp. 172-186, 1988.<br />

[6] <strong>ITU</strong>-T Recommendation G.191, “Software tools for speech<br />

and audio coding standardization,” Sept. 2005.<br />

[7] http://sourceforge.net/project/showfiles.php?group_id=290<br />

&package_id=309.<br />

Source<br />

<strong>G.719</strong><br />

128kbps

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!