15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The methods in this subsection are inspired by specific models of visual perception. The first is an<br />

approach based on a very comprehensive vision model, performing spatial and temporal frequency<br />

decomposition via filters designed to reflect properties of the HVS. The second and third are techniques<br />

using visually relevant transforms (the Gabor and derivative of Gaussian transforms, respectively) in an<br />

otherwise conventional hybrid (predictive/transform) framework. Finally, a method based on spatiotemporal<br />

segmentation (following the contour/texture model of vision) will be discussed.<br />

The Perceptual Components Architecture<br />

The perceptual components architecture [46] is a framework for the compression of color image<br />

sequences based on the processing thought to take place in the early HVS. It consists of the following<br />

steps. The input RGB image sequence is converted into an opponent color space (white/black (WB),<br />

red/green (RG), and blue/yellow (BY)). The sequence is filtered spatially with a set of frequency and<br />

orientation selective filters, inspired by the frequency and orientation selectivity of the HVS. Filters based<br />

on the temporal frequency response of the visual system are applied along the temporal dimension. The<br />

filtered sequences are then subsampled using a hexagonal grid, and subsampled by a factor of two in the<br />

temporal dimension. Uniform quantization is applied within each subband, with higher frequency<br />

subbands quantized more coarsely. The WB (luminance) component is quantized less coarsely overall<br />

than the RG and BY (chrominance) components. The first-order entropy of the result provides an estimate<br />

of the compression ratio.<br />

Note that there is no prediction or motion compensation. This is a 3-D subband coder, where temporal<br />

redundancy is exploited via the temporal filters. For a 256 × 256, 8 frame segment of the “football”<br />

sequence (a widely used test sequence depicting a play from an American football game), acceptable<br />

image quality was achieved for about 1 bit/pixel (from 24 bits/pixel). Although this is not very high<br />

compression, the sequence used is more challenging than most. Another contributing factor is that the<br />

subsampled representation is 8/3 the size (in terms of bits) of the original, which must be overcome<br />

before any compression is realized.<br />

Very-Low-Bit-Rate Coding Using the Gabor Transform<br />

In discussing the Gabor transform previously, it was stated that the basis functions of this transform are<br />

optimally (jointly) local. In the context of coding, there are three mechanisms that can be exploited to<br />

achieve compression, all of which depend on locality: the local correlation between pixels in the sequence;<br />

the bounded frequency response of the human visual system (as characterized by the CSF); and visual<br />

masking (the decrease in visual sensitivity near spatial and temporal discontinuities). To take advantage<br />

of local spatial correlation, the image representation upon which a compression method is based must<br />

be spatially local (which is why images are partitioned into blocks in JPEG, MPEG-1&2, H.261, etc.). If<br />

the CSF is to be exploited (e.g., by quantizing high frequency coefficients coarsely) localization in the<br />

spatial-frequency domain is required. To exploit visual masking, spatial locality (of a fairly high degree)<br />

is required.<br />

The Gabor transform is inherently local in space, so the partitioning of the image into blocks is not<br />

required (hence no blocking artifacts are observed at high compression ratios). Its spatial locality also<br />

provides a mechanism for exploiting visual masking, while its spatial-frequency locality allows the<br />

bandlimited nature of the HVS to be utilized.<br />

An encoder and decoder based on this transform are shown in Figs. 28.26 and 28.27 [47]. Note that<br />

they are in the classic hybrid (predictive/transform) form. This codec does not include motion compensation,<br />

and is for monochrome image sequences.<br />

Applying this method to a 128-by-128, 8 bit/pixel version of the Miss America sequence resulted in<br />

reasonable image quality at a compression ratio of approximately 335:1. 2 At 24 frames per second, the<br />

associated bit rate is 9.4 kbits/s (a bitrate consistent, e.g., with wireless videotelephony).<br />

2 Not including the initial frame, which is intracoded to 9.1 kbits (a compression ratio of 14).<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!