15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 28.26 A Gabor transform-based video encoder.<br />

FIGURE 28.27 The associated Gabor transform-based decoder.<br />

Video Coding Using the Derivative of Gaussian Transform<br />

As mentioned previously, the derivative of Gaussian transform (DGT) has properties similar to the Gabor<br />

transform, but with the practical advantage that it is real-valued. This makes it particularly well-suited to<br />

video compression. In [48] the hybrid codec structure shown in Figs. 28.24 and 28.25 is adapted to the<br />

DGT, replacing the DCT (and IDCT), and adapting the quantization scheme to fit the visibility of the<br />

DGT basis, via a simple quantization mask.<br />

Comparable results to those of the standard H.261 (DCT-based) codec are obtained for bitrates around<br />

320 kbits/s (5 channels in the p∗64 model).<br />

Object-Based Coding by Split and Merge Segmentation<br />

Object-based coding reflects the fact that scenes are largely composed of distinct objects, and that these<br />

objects are perceived as boundaries surrounding fields of shading or texture (the contour/texture theory of<br />

vision). Encoding an image or sequence in this way requires segmentation to identify the constituent objects.<br />

This view of compression, which also facilitates interaction and editing, underlies the MPEG-4 video compression<br />

standard [49]. Although the method that will be described is different in detail from MPEG-4, as<br />

one of the earliest documented object-based systems, it illustrates many important aspects of such systems.<br />

In this approach [50], 3-D (spatiotemporal) segmentation is used to reduce the redundant information<br />

in a sequence (essentially identifying objects within the sequence), while retaining information critical<br />

to the human observer. The sequence is treated as a single 3-D data volume, the voxels of which are<br />

grouped into regions via split and merge. The uniformity criterion used for the segmentation is the<br />

goodness-of-fit to a 3-D polynomial. The sequence is then encoded in terms of region boundaries (a binary<br />

tree structure) and region interior intensities (the coefficients of the 3-D polynomial).<br />

The data volume is first split such that each region is a parallelepiped over which the gray level variation<br />

can be approximated within a specified mean squared error (Fig. 28.28). Regions are split by quadrants,<br />

following the octree strategy. A region adjacency graph is constructed, with nodes corresponding to each<br />

region and links between the nodes assigned a cost indicating the similarity of the regions. A high cost<br />

indicates low similarity. Regions are merged, starting with regions with the lowest cost, and the region<br />

adjacency graph is updated. The resulting regions are represented using a pyramidal (binary tree)<br />

structure, with the regions labeled so that adjacent regions have different labels.<br />

© 2002 by CRC Press LLC<br />

f N<br />

Gabor(f DN ) + n N<br />

+<br />

Σ<br />

--<br />

fDN fPN Inverse<br />

Quantizer<br />

Gabor<br />

Transform<br />

Frame<br />

Delay<br />

+<br />

Σ<br />

fRN +<br />

fDN + n ′ N<br />

Inverse<br />

Gabor<br />

+<br />

Σ<br />

Transform +<br />

fPN Quantization<br />

Frame<br />

Delay<br />

Gabor(f DN ) + n N<br />

Inverse<br />

Quantizer<br />

Inverse<br />

Gabor<br />

Transform<br />

f RN

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!