11.07.2015 Views

A Modified SPIHT Algorithm for Image Coding With a Joint MSE and ...

A Modified SPIHT Algorithm for Image Coding With a Joint MSE and ...

A Modified SPIHT Algorithm for Image Coding With a Joint MSE and ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006 713A <strong>Modified</strong> <strong>SPIHT</strong> <strong>Algorithm</strong> <strong>for</strong> <strong>Image</strong> <strong>Coding</strong> <strong>With</strong>a <strong>Joint</strong> <strong>MSE</strong> <strong>and</strong> Classification Distortion MeasureShaorong Chang <strong>and</strong> Lawrence Carin, Fellow, IEEEAbstract—The set partitioning in hierarchical trees (<strong>SPIHT</strong>)algorithm is an efficient wavelet-based progressive image-compressiontechnique, designed to minimize the mean-squared error(<strong>MSE</strong>) between the original <strong>and</strong> decoded imagery. However, the<strong>MSE</strong>-based distortion measure is not in general well correlatedwith image-recognition quality, especially at low bit rates. Specifically,low-amplitude wavelet coefficients that may be important<strong>for</strong> classification are given low priority by conventional <strong>SPIHT</strong>. Inthis paper, we use the kernel matching pursuits (KMP) method toautonomously estimate the importance of each wavelet subb<strong>and</strong><strong>for</strong> distinguishing between different textures, with textural segmentationfirst per<strong>for</strong>med via a hidden Markov tree. Based onsubb<strong>and</strong> importance determined via KMP, we scale the waveletcoefficients prior to <strong>SPIHT</strong> coding, with the goal of minimizing aLagrangian distortion based jointly on the <strong>MSE</strong> <strong>and</strong> classificationerror. For comparison we consider Bayes tree-structured vectorquantization (B-TSVQ), also designed to obtain a tradeoff between<strong>MSE</strong> <strong>and</strong> classification error. The per<strong>for</strong>mances of the original<strong>SPIHT</strong>, the modified <strong>SPIHT</strong>, <strong>and</strong> B-TSVQ are compared.Index Terms—Classification, hidden Markov tree (HMT), imagesegmentation, set partitioning in hierarchical trees (<strong>SPIHT</strong>), vectorquantization (VQ).I. INTRODUCTIONWHEN per<strong>for</strong>ming compression at relatively low bit rates,there is in general in<strong>for</strong>mation lost between the originalimage <strong>and</strong> that recovered after decoding. Most compressionschemes are based on minimizing the mean-square error (<strong>MSE</strong>)between the original <strong>and</strong> compressed imagery. While this is anatural direction in many applications, there are problems <strong>for</strong>which one will ultimately make a classification decision basedon the decoded imagery. For example, in medical-image compression,<strong>for</strong> transmission or storage, an expert will often make adiagnosis based on the decoded imagery [3]. In remote sensing,one often collects very large quantities of data (e.g., infraredor synthetic-aperture-radar imagery), necessitating low-bit-ratecompression. In the remote-sensing problem, humans will alsooften make decisions based on the decoded imagery. It is there<strong>for</strong>edesirable to encode the original imagery in a manner thataccounts <strong>for</strong> the ultimate classification task, this motivating considerationof non-<strong>MSE</strong> distortion measures <strong>and</strong>, hence, modificationof the associated encoders/decoders.Manuscript received June 4, 2004; revised February 15, 2005. The associateeditor coordinating the review of this manuscript <strong>and</strong> approving it <strong>for</strong> publicationwas Prof. Fern<strong>and</strong>o M. B. Pereira.The authors are with the Department of Electrical <strong>and</strong> Computer Engineering,Duke University, Durham, NC 27708-0291 USA (e-mail:chshrong@ee.duke.edu; lcarin@ee.duke.edu).Digital Object Identifier 10.1109/TIP.2005.860595It should be noted that, in a related application, coding algorithmsdeveloped <strong>for</strong> acoustic data have been modified to accentuatethose frequencies that play an important role in humanhearing [11]. Related work in perceptually weighted quantizationhas demonstrated significant improvement in visual quality[25]. In the work presented here, we extend these ideas to emphasizewavelet coefficients of importance to image-based classification.The idea of developing encoders/decoders that account <strong>for</strong>both <strong>MSE</strong> <strong>and</strong> classification per<strong>for</strong>mance has been consideredpreviously in the context of vector quantization (VQ) [15], [17].Specifically, rather than simply basing VQ on a squared-distancedistortion measure, a Lagrangian distortion measure wasdeveloped in which one can adjust the relative importanceplaced on <strong>MSE</strong> <strong>and</strong> Bayes risk. This results in a Bayes-VQalgorithm, implemented efficiently (<strong>and</strong> approximately) via aBayes tree-structured vector quantization (B-TSVQ) <strong>for</strong>malism[15]. The B-TSVQ is particularly well suited <strong>for</strong> medicalimaging applications, <strong>for</strong> which one may have a large amountof training data (to learn the codebook) <strong>and</strong> the statistics of theimagery to be compressed are relatively stationary (e.g., <strong>for</strong> agiven part of the anatomy).By contrast, in remote-sensing applications it is often difficultto predict a priori what imagery may be encountered, <strong>and</strong>there<strong>for</strong>e design of a robust codebook will often be challenging.Moreover, it is computationally expensive <strong>and</strong> requires a largebit budget to adaptively augment the codebook as new data areacquired. This limitation of VQ-based compression schemeshas motivated the recent focus on wavelet-based compression,such as via <strong>SPIHT</strong> [19], these algorithms typically not requiringan a priori codebook. The <strong>SPIHT</strong> algorithm prioritizesall of the wavelet coefficients in a given image simultaneously,<strong>and</strong> in this sense it has some connection to VQ (block-likecoding). However, zero-tree-type encoding is employed <strong>for</strong> thelarge number of small-amplitude wavelet coefficients, <strong>and</strong> anembedded scalar quantizer is employed <strong>for</strong> the “important”coefficients.For the applications of interest here, the principal limitationof <strong>SPIHT</strong> is that the associated importance map, which delineatesthe significant wavelet coefficients, is based on <strong>MSE</strong> <strong>and</strong>there<strong>for</strong>e priority is given to the large-amplitude wavelet coefficients.While the small-amplitude coefficients are of relativelydiminished importance <strong>for</strong> <strong>MSE</strong> distortion, they may be of significantimportance in a classification task. Moreover, large-amplitudecoefficients which may be unimportant <strong>for</strong> classificationare given priority by <strong>SPIHT</strong>. In the work presented here we extend<strong>SPIHT</strong> such that it places importance on those wavelet coefficientsthat are of importance <strong>for</strong> classification, with the rel-1057-7149/$20.00 © 2006 IEEE


714 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 1.error.Overall modified <strong>SPIHT</strong> (M<strong>SPIHT</strong>) coding scheme. A scaling step is added be<strong>for</strong>e <strong>SPIHT</strong> coding to balance the classification error <strong>and</strong> the reconstructionative importance placed on <strong>MSE</strong> <strong>and</strong> classification accommodatedusing a Lagrangian distortion measure, analogous to thatemployed in B-TSVQ.The rescaling of coefficients is a well-known technique <strong>for</strong>adjusting their relative importance prior to encoding. There<strong>for</strong>e,the principal challenge considered here involves developing atechnique to autonomously <strong>and</strong> adaptively learn the appropriatewavelet-coefficient scalings, accounting <strong>for</strong> a Lagrangian distortionmeasure of the type discussed above. The difficulty isthat we do not want to make any assumptions about what maybe important <strong>for</strong> a classification task, since in many applications(e.g., remote sensing) the imagery is often too variable. We,there<strong>for</strong>e, proceed under the assumption that segmenting theimage into distinct textural classes will play an important role ina subsequent classification task. For example, in remote-sensingclassification, an anomaly or target (to be distinguished withina classification stage) is typically defined by its textural contrastrelative to the texture of the background. There<strong>for</strong>e, thefirst stage of our algorithm is to segment the image into textures,with determined adaptively via a minimum-descriptionlength (MDL) framework [26]. Once the segmentation isper<strong>for</strong>med, a second algorithm is employed to scale the importanceof the wavelet coefficients in the context of realizing thissegmentation. The <strong>MSE</strong> is also accounted <strong>for</strong>, in a Lagrangiansetting, with the relative importance placed on classification <strong>and</strong><strong>MSE</strong> dictated by the chosen Lagrange multiplier. Note that accounting<strong>for</strong> <strong>MSE</strong> in addition to importance <strong>for</strong> classification isnecessary because the final decoded image will likely be viewedby a human, who may not wish to rely overly on the results ofthe (likely imperfect) classifier at the encoder.After per<strong>for</strong>ming segmentation <strong>and</strong> determining the relativeimportance of the wavelet coefficients, the coefficients arescaled appropriately, with the coefficient-dependent scale factorsent to the decoder. The encoding <strong>and</strong> decoding procedure thenemploys a slightly modified <strong>for</strong>m of <strong>SPIHT</strong>, as detailed below.The overall coding scheme is shown in Fig. 1.Since wavelets are to be employed in the subsequent <strong>SPIHT</strong>basedencoder/decoder, we employ a wavelet-based segmentationalgorithm. hidden Markov trees (HMTs) are well suitedto capturing the multiscale statistical dependence of waveletcoefficients [4]. We propose an unsupervised image segmentationmethod using an HMT mixture model, the parameterestimation problem <strong>for</strong> which is solved by a generalized expectation-maximization(EM) algorithm [1]. The posterior probabilitydistribution across the mixture components yields the imagesegmentation. The segmentation is per<strong>for</strong>med autonomouslyat the encoder.After per<strong>for</strong>ming segmentation into textures, the final stepprior to encoding is to scale the wavelet coefficients based ontheir importance <strong>for</strong> classification, while also accounting in aLagrangian sense <strong>for</strong> <strong>MSE</strong>. This is effected by implementingkernel matching pursuits (KMP) [24]. The wavelet coefficientsare scaled based on the KMP results, <strong>and</strong> then encoded using amodified version of <strong>SPIHT</strong>, as detailed below. We compare theper<strong>for</strong>mance of B-TSVQ <strong>and</strong> modified <strong>SPIHT</strong>.The remainder of the paper is organized as follows. In SectionII, we present the definition of the HMT mixture modelalong with the expectation-maximization (EM) training algorithm.We consider in Section III an additive regression modelto estimate the importance of the wavelet coefficients <strong>for</strong> texturerecognition, employing a KMP solution. In Section IV, themodified <strong>SPIHT</strong> coding scheme is discussed. Typical results ofthe algorithms are presented in Section V, with conclusions presentedin Section VI.II. IMAGE SEGMENTATION<strong>Image</strong> segmentation is a fundamental low-level operation inimage analysis <strong>for</strong> object identification. The encoding strategyadopted here is based on a wavelet decomposition of the image,<strong>and</strong>, there<strong>for</strong>e, we utilize a wavelet-based segmentation procedure.To account <strong>for</strong> the variability of anticipated imagery, thesegmentation algorithm autonomously determines the numberof textures as well as their statistical characteristics.A. HMT-Based Block ClassificationThe image is analyzed with a two-dimensional wavelet trans<strong>for</strong>m,employing a one-dimensional trans<strong>for</strong>m in each of the twoprincipal directions. For wavelet levels we yield a contiguousset of wavelet quadtrees, each corresponding to a blockin the original image, with . Based on the persistencestatistical property of wavelet coefficients, which observes thatlarge (or small) wavelet coefficient magnitudes tend to propagatethrough the scales corresponding to the same spatial location,Crouse et al. [4] have introduced the wavelet-domain HMTmodel to capture the joint wavelet statistics.The HMT models the marginal probability density function(pdf) of each wavelet coefficient as a Gaussian mixture with ahidden state variable. It assumes that the key dependency betweenthe hidden state variable of the wavelet coefficients is


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 715tree-structured <strong>and</strong> Markovian, tied to the wavelet quadtree, employinga state transition matrix to quantify the degree of persistencebetween scales. The iterative EM algorithm <strong>for</strong> HMTs[4] was proposed to train the parameters (the mixture densityparameters <strong>and</strong> the probabilistic graph transition probabilities)to match the data in the maximum likelihood (ML) sense. Thetrained HMT provides a good approximation to the joint probabilityof the wavelet coefficients, yielding good classificationper<strong>for</strong>mance.The conventional HMT training process requires the availabilityof labeled imagery. Specifically, data must be provided<strong>for</strong> each texture class, followed by HMT training. The trainedHMTs can then be applied to segment new imagery that mightbe observed, assuming that this new imagery is characterized bysimilar textures. In our problem we assume little or no knowledgeof the anticipated image textural properties, <strong>and</strong>, there<strong>for</strong>e,determination of the textural classes is determined jointly withHMT training.B. HMT Mixture ModelWe assume that the statistics of the wavelet coefficients fromthe overall image may be represented as a mixture of HMTs,analogous to the well-known Gaussian mixture model (GMM)[1]. The probabilistic model <strong>for</strong> mixture components is givenbywhere are the wavelet coefficients of a wavelet tree, ,are the mixing coefficients of the textures,which may also be interpreted as the prior probabilities, <strong>and</strong> represents the parameters of the th HMTcomponent. The vector represents the cumulative set of modelparameters, specifically <strong>and</strong> , .We employ an iterative training procedure, analogous to thatfound in GMM design [1]. Let <strong>and</strong> represent themodel parameters <strong>for</strong> mixture component after iteration .Weestimate the probability that the th wavelet tree is generated bytexture (corresponding to the th HMT, denoted )as(1)where, <strong>for</strong> notational simplicity, we define. Equation (3) represents an approximation tothe conditional expectationusing. The samples that are associated withtexture with higher likelihood make a greater contribution tothe parameters of that texture component. The cumulative setof HMT parameters , e.g., state-transition probabilities,state-dependent parameters, etc., define the overall set of parameters<strong>for</strong> the th HMT.The mixing coefficients are updated aswhere we have again assumed .For initialization, we use all the data with equal probabilityto train an initial (single) HMT with parameters. We then cluster the data into (scalar) Gaussian mixtures(denoting textures) based on , assumingthat the data from the same texture have similar probabilityvalues. In this manner, we assign the initial probability. The same ideahas been applied to effectively use unlabeled sequential datain learning hidden Markov models [10], with this termed theextended Baum–Welch (EBW) algorithm.We segment the image via a maximum a posteriori (MAP)estimator, that is(5)where represents the HMT model parameters <strong>for</strong> mixturecomponent , after convergence is achieved <strong>for</strong> the a<strong>for</strong>ementionedtraining algorithm. The parameter , representing thenumber of textures in the image, is selected autonomously via anin<strong>for</strong>mation-theoretic model-selection method called the MDLprinciple, derived by Rissanen [18]. The MDL principle statesthat the best model is the one that minimizes the summed descriptionlength of the model <strong>and</strong> the likelihood of the data withrespect to the model, making a trade-off between the model accuracy<strong>and</strong> model succinctness [26]. In our case, we calculatethe value as(4)(2)The parameters of each HMT are updated by an augmented<strong>for</strong>m of the EM algorithm in [4]. In [4], the wavelettreesare each used separately within an “upward-downward”algorithm to update the parameters of eachindividual HMT. On iteration , HMT model parametersare initiated using parameters from the previous step. Letrepresent an arbitrary parameter from ,soupdated using wavelet tree . Then, the associated cumulativeparameter, based on all wavelet trees, is expressed as(3)where the first term denotes the accuracy of the mixturemodel, the second term reflects the model complexity, <strong>and</strong>is the number of the free parameters estimated. We chooseto minimize (6).III. QUANTIZATION BINSThe purpose of rescaling the wavelet coefficients is to helpthe encoder order the output bit stream with consideration ofthe ultimate recognition task, with the balance between <strong>MSE</strong><strong>and</strong> segmentation per<strong>for</strong>mance driven by a Lagrangian metric.Wavelet coefficients that play an important role in defining thesegmented texture class labels (determined automatically, asdiscussed in Section II) should be represented by a relatively(6)


716 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006large number of bits, even if the associated wavelet coefficientis small. In the context of an algorithm such as <strong>SPIHT</strong>, bitsare sent sequentially <strong>for</strong> each wavelet coefficient, with thenumber of bits associated with a given wavelet coefficient dictatedby the coefficients “importance,” defined in conventional<strong>SPIHT</strong> by its amplitude. <strong>SPIHT</strong> exploits the fact that manyof the wavelet coefficients are of small amplitude, yieldingefficient coding via importance maps <strong>and</strong> zero trees. Since<strong>SPIHT</strong> is a state-of-the-art algorithm, rather than developingan entirely new technique <strong>for</strong> encoding wavelet coefficientswhile accounting <strong>for</strong> <strong>MSE</strong> <strong>and</strong> classification, we modify thewavelet-coefficient amplitudes such that they may be then applieddirectly in a slightly modified <strong>for</strong>m of <strong>SPIHT</strong>. Specifically,we develop an algorithm that scales the wavelet coefficientsbased on their importance to the a<strong>for</strong>ementioned Lagrangiancost function. Wavelet coefficients that are important, as definedby this metric, are scaled to have larger relative amplitudes,such that they are given priority by <strong>SPIHT</strong>. Similarly, waveletcoefficients that are deemed to be less important, as definedby the Lagrangian metric, are scaled to have lower relativeamplitudes. This rescaling modifies the importance placed onthe wavelet coefficients (beyond simply basing importance onthe original amplitude of the coefficient), after which <strong>SPIHT</strong>proceeds with only minor changes.Be<strong>for</strong>e proceeding we note an issue that must be examinedwhen presenting results. The efficiency of the original <strong>SPIHT</strong>algorithm is dictated by the persistence property of the waveletcoefficients with scale. Specifically, small-amplitude coefficientstend to be clustered as a function of scale, <strong>and</strong> thesecan be efficiently coded with zero trees or importance maps.The a<strong>for</strong>ementioned rescaling has the potential of disturbingthe zero trees, which are exploited so effectively via <strong>SPIHT</strong>.Using actual measured remote-sensing imagery, we demonstratebelow that the rescaling still yields a highly efficientmodified-<strong>SPIHT</strong> algorithm.A. Objective FunctionThe KMP [24] algorithm is employed to prioritize the importanceof the wavelet coefficients. The KMP algorithm is anextension of matching pursuits [14] to kernels. Here, we employa kernel that selects wavelet coefficients, <strong>and</strong>, there<strong>for</strong>e, this is aspecial case of KMP, essentially reducing to matching pursuits.Assume we have a vector of wavelet coefficients from awavelet tree, <strong>and</strong> we wish to develop a classifier that, based on, predicts the texture label (e.g., <strong>for</strong> two textures ).Note that we per<strong>for</strong>med a similar task with HMT-based segmentation.The distinction is that here we are not simply interestedin segmenting the data, we are interested in quantifying whichcoefficients in the wavelet tree are most important <strong>for</strong> this task.The KMP classifier [24] is <strong>for</strong>mulated aswhere is the dimension of the wavelet vector , is the correspondingkernel weight, reflecting the importance of waveletcoefficient to the cost function. Because of the sparsity of theKMP classifier, most of are equal to zero.(7)Our objective is to minimize the cost functionwhere are the quantized wavelet coefficients associated withthe th wavelet tree, <strong>and</strong> the scalar is a Lagrangian multiplierthat constitutes a compromise between the quantization error<strong>and</strong> the classification error. Thisoptimization is per<strong>for</strong>med under the constraint that the entropyof the quantized coefficients is constant, specificallywhere is the entropy of the th wavelet coefficients.For the examples reported below, we have found that thelinear algorithm in (7) yields good results. However, in general,it may be desirable to design an algorithm that is nonlinear infeature space, based <strong>for</strong> example on a kernel algorithm. For suchpurposes one may employ a kernel-based technique that allowsgeneral nonlinear decision surfaces in feature space, while simultaneouslyallowing one to determine the importance of therespective features. The interested reader is referred to [13] <strong>for</strong>details on such an approach.B. Uni<strong>for</strong>m Scalar Dead-Zone QuantizationThe uni<strong>for</strong>m dead-zone quantizer [21] quantizes coefficientswith magnitude less than the threshold to zero. The coefficientsoutside of are uni<strong>for</strong>mly quantized with step size. That is(10).Since small variations in input signals around zero are usuallycaused by noise, quantization with a dead-zone around zeroeliminates the noise around zero <strong>and</strong> improves the signal quality.For this reason, this quantizer is widely used in image compression.Part I of the JPEG2000 st<strong>and</strong>ard includes uni<strong>for</strong>mscalar dead-zone quantization, where the dead-zone is equal totwo times the other quantization bin sizes [that is, ,asshown in (11)]. The <strong>SPIHT</strong> algorithm progressively quantizesthe wavelet coefficients using double deadzone uni<strong>for</strong>m scalarquantizer, with quantization bin size reducing to one half of theprevious iteration(11).In this study, we use double dead zone uni<strong>for</strong>m quantizerto quantize the wavelet coefficients. We model thequantization step as adding noise to the original signal(12)where . For a high-resolution uni<strong>for</strong>m quantizer withquantization bin size of , the mean-square quantization errorof variable is(8)(9)(13)


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 717Fig. 2. Relationship between Ef g <strong>and</strong> 1 is modeled as (14). Results areshown <strong>for</strong> three wavelet coefficients, corresponding to the data in Fig. 4.Fig. 3. Relationship between R <strong>and</strong> log 1 can be modeled as (15). Resultsare shown <strong>for</strong> three wavelet coefficients, corresponding to the data in Fig. 4.In the high-resolution region, it is well known that an optimaltrans<strong>for</strong>m code based on scaler quantization should apply a uni<strong>for</strong>mscalar quantizer with the same bin size <strong>for</strong> all coefficients.The <strong>SPIHT</strong> algorithm applies such a procedure. However,this optimality is based on a mean-squared error (<strong>MSE</strong>)distortion measure (as well as the asymptotic assumption).In the work presented here, we first weight the wavelet coefficientsbased on their importance <strong>for</strong> a joint (Lagrangian) <strong>MSE</strong><strong>and</strong> classification-based distortion measure. After per<strong>for</strong>mingthe weighting, <strong>SPIHT</strong> is employed with minimal change. Whilethis results in uni<strong>for</strong>m quantization bins <strong>for</strong> the scaled coefficients,we implicitly employ nonuni<strong>for</strong>m quantization in theoriginal coefficients. The bin size is no longer the same <strong>for</strong>all wavelet coefficients, <strong>and</strong> our task is to determine <strong>for</strong> theth coefficient. This is addressed in the next section.1) Quantization Error Versus Quantization Bin Size: Wemodel the mean-square quantization error of the doubledead zone quantizer as a function of as(14)where, typically, (compared to (13) since dependson the distribution of the th wavelet coefficient. Givenan image, we compute as a function of <strong>and</strong> per<strong>for</strong>ma least square fit to determine <strong>and</strong> in (14). In Fig. 3, wecompare the true <strong>and</strong> the regression values obtained by(14) of three different wavelet coefficients. We can see that themodel fits the data well in the range of interest.2) Entropy Versus Quantization Bin Size: The entropy of thequantized data is also related to quantization bin size . Intuitively,as increases the entropy decreases. The entropy reducesto zero as reaches a value such that all the data arequantized to zero. It is appropriate to use a logarithmic modelto fit this relationship, as shown in Fig. 3. Specifically(15)where <strong>and</strong> are constants to be determined. This is a generalizationof a uni<strong>for</strong>m scalar quantizer <strong>for</strong> which(16)where is the differential entropy of the original continuousr<strong>and</strong>om variable that is quantized. We rewrite the constraintof (9) as(17)C. Iterative <strong>Algorithm</strong> to Find the Optimal Quantization BinSizes1) Initialization: We set all the quantization bin sizes equalto a constant(18)2) Quantization: We quantize each wavelet coefficientusing a double dead zone uni<strong>for</strong>m quantizer, defined in(11), with quantization step size of(19)3) Classifier Design: Given the quantized data, the objectivefunction in (8), is minimized by using the KMP algorithm.In this step, we determine .4) Quantization Bin Size Update: Given , the quantizationerror model ((12) <strong>and</strong> (14)), the entropy model ((3)),the objective function is minimized by using Lagrangianmethod(20)In (20), the first two terms implement (8) <strong>for</strong> fixed , <strong>and</strong> the lastterm implements the constraint in (17). By setting, we obtain(21)


718 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006<strong>and</strong> from (17)(22)We repeat steps 2)–4) until the reduction of is smaller than athreshold.The importance weight applied to the th wavelet coefficientis defined as(23)From (23), we can see that those wavelets with large , correspondingto wavelet coefficients deemed important <strong>for</strong> the joint<strong>MSE</strong>-classification cost function, require finer quantizationbins, thus corresponding to a large importance weight. <strong>With</strong>increasing, we assign smaller quantization bins to those waveletcoefficients deemed important <strong>for</strong> classification.IV. MODIFIED <strong>SPIHT</strong>A. Review of the <strong>SPIHT</strong> <strong>Algorithm</strong>Set partitioning in hierarchical trees (<strong>SPIHT</strong>), proposed bySaid <strong>and</strong> Pearlman [19], is one of the most efficient image compressionalgorithms. The effectiveness of the <strong>SPIHT</strong> algorithmoriginates from the efficient subset partitioning <strong>and</strong> the compact<strong>for</strong>m of the significance in<strong>for</strong>mation. The <strong>SPIHT</strong> algorithm definesspatial orientation trees, sets of coordinates, <strong>and</strong> recursiveset partitioning rules [19]. The algorithm is composed of twopasses: a sorting pass <strong>and</strong> a refinement pass. It is implementedby alternately scanning three ordered lists: list of insignificantsets (LIS), list of insignificant pixels (LIP), <strong>and</strong> list of significantpixels (LSP). The LIS <strong>and</strong> LIP represent the individual <strong>and</strong> setsof coordinates, respectively, <strong>for</strong> wavelet coefficients that are lessthan a threshold. During the sorting pass the significance of LIP<strong>and</strong> LIS are tested, followed by removal (as appropriate) to LSP<strong>and</strong> set splitting operations to maintain the insignificance propertyof the lists. In the refinement pass, the th most significantbits in the LSP, which contains the coordinates of the significantpixels, are scanned <strong>and</strong> output. The <strong>SPIHT</strong> algorithm reducesthe threshold <strong>and</strong> repeats the two passes until the bit budget ismet.Recently, many modifications have been made to the <strong>SPIHT</strong>algorithm, <strong>SPIHT</strong>-based coding representing a very active areaof research. For example, Pearlman et al. proposed a set-partitioningembedded block coding algorithm [16] to extend <strong>SPIHT</strong>to block-based image coding. <strong>SPIHT</strong> has also been modified [6]<strong>for</strong> real-time image <strong>and</strong> video transmission using optimal errorprotection. In addition, in [12], an efficient color image compressionalgorithm has been proposed based on the <strong>SPIHT</strong> algorithm.B. <strong>Modified</strong> <strong>SPIHT</strong> <strong>Algorithm</strong>At first thought, the weighted wavelet coefficients, whichhave a larger dynamic range than the original ones, may makethe <strong>SPIHT</strong> algorithm inefficient, because this may result inmore scans of the wavelet coefficients. In fact, the per<strong>for</strong>manceof the modified <strong>SPIHT</strong> with set to zero is slightly better thanFig. 4. Example of 8-bit grayscale high altitude aerial image, with two distincttextures: a rural area <strong>and</strong> an urban area.the original <strong>SPIHT</strong> <strong>for</strong> the data considered. This is becausewe optimize the quantization bin size with the constraintthat the average entropy of the quantized wavelet coefficientsdoes not change, which assigns the larger quantization bins tothe wavelet coefficients deep in the tree. We demonstrate theimportance weights <strong>and</strong> the R(D) per<strong>for</strong>mance using a highaltitude aerial imagery with two textures (see Fig. 4). This is agrayscale high-altitude optical aerial image of size 256 256pixels from the USC-SIPI image database, with 8 bits/pixel. Inthe remainder of the text, we will return to the data in Fig. 4to analyze various aspects of our algorithm <strong>and</strong> to providecomparisons to other approaches. The results presented hereare representative of per<strong>for</strong>mance we have observed based onan extensive study of the full USC-SIPI data base.The importance weight is the reciprocal of the quantizationbin size obtained via the algorithm proposed in Section III-C,as shown in Fig. 5, with representing the first LL waveletcoefficient <strong>and</strong> the others sequentially representing LH, HL, orHH wavelet coefficients. The larger the index , the deeper thecorresponding wavelet coefficients reside in the spatial orientationtree defined in <strong>SPIHT</strong>. We compare the <strong>MSE</strong> at different bitrates <strong>for</strong> <strong>SPIHT</strong> <strong>and</strong> modified <strong>SPIHT</strong> (M<strong>SPIHT</strong>) with .Asshow in Fig. 6, M<strong>SPIHT</strong> with has a slightly smaller <strong>MSE</strong>than the original <strong>SPIHT</strong> at low bit rates.In these <strong>and</strong> subsequent experiments, we choose theDaubechies 9/7 biorthogonal wavelet trans<strong>for</strong>m, because it issymmetric, almost orthogonal <strong>and</strong> gives the best results <strong>for</strong>dyadic sampled images [23]. Since the wavelet is symmetric,we use nonexpansive symmetric extension wavelet trans<strong>for</strong>mto h<strong>and</strong>le the boundary problem [2].As increases, we modify the quantization bin sizes to betterfit the classification task. This will deteriorate the zero treecoding efficiency. However, the sparseness of KMP impliesthat if only a small number of coefficients are important <strong>for</strong>classification, only a small number of wavelet coefficients binswill be inconsistent with <strong>SPIHT</strong>. Specifically, examining(20) reveals that when <strong>for</strong> the th wavelet coefficient, thecorresponding is only explicitly driven by the conventional<strong>MSE</strong>. However, all are determined jointly, based on the


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 719Fig. 5. Importance weight of M<strong>SPIHT</strong> ( =0)based on 8 2 8 blocks. Resultsare shown <strong>for</strong> data in Fig. 4. i =1represents the LL wavelet coefficients <strong>and</strong>the others represents LH, HL, or HH wavelet coefficients from coarse to finescale.Fig. 7. MDL value as a function of the texture number. We set the number oftextures to M =2by choosing the one with minimum MDL value. Results areshown <strong>for</strong> data in Fig. 4.Fig. 6. <strong>MSE</strong> is compared between <strong>SPIHT</strong> <strong>and</strong> M<strong>SPIHT</strong> ( =0)using theimportance weight shown in Fig. 5. Results are shown <strong>for</strong> data in Fig. 4.Fig. 8. Posterior probability estimated using 2-HMT-mixture model <strong>for</strong> datain Fig. 4. The dark regions are characterized as belong less urban.overall entropy constraint defined by the last term in (20). Inthis manner, all are directly or indirectly affected by theclassification objective. In Section V, we present several resultsto explain the per<strong>for</strong>mance of the modified-<strong>SPIHT</strong> algorithm<strong>for</strong> .V. EXPERIMENTS AND RESULTSA. Automatic <strong>Image</strong> SegmentationWe, again, consider the data from Fig. 4. After a three-levelwavelet decomposition, we obtain wavelet trees of size 64.We train the parameters of the 2-HMT-mixture model by the EMalgorithm described in Section II. We set the number of texturesto based on the MDL criterion (see Fig. 7). In Fig. 8,we show the posterior probabilities of one texture component.We see that most of the urban areas are bright, which representshigher likelihood <strong>and</strong> the rural areas are darker, indicating lowerlikelihood. The segmentation is consistent with human visualrecognition.While the segmentation results presented above are interesting,in that they correspond to an actual image, it is useful toconsider an example <strong>for</strong> which the data is simulated <strong>and</strong> thereare known to be textures (i.e., there is no subjectivityin assigning texture labels). We have, there<strong>for</strong>e, synthesizedtextures. To synthesize these three textures we havethree different HMT models (model parameters including theinitial-state probability vectors of the root nodes <strong>for</strong> LH, HL,<strong>and</strong> HH b<strong>and</strong>, the state-transition matrices from parent nodesto children nodes, the mean <strong>and</strong> st<strong>and</strong>ard deviation of eachstate <strong>for</strong> each wavelet coefficient, <strong>and</strong> LL b<strong>and</strong> probabilitydistribution parameters). We first synthesize HMT state sequencesusing the initial-state <strong>and</strong> state-transition probabilities.Given the state sequences <strong>and</strong> the state-dependent probabilitydistribution parameters (Gaussian distribution), we synthesizeblocks of LH, HL, <strong>and</strong> HH wavelet coefficients by a r<strong>and</strong>omgeneration processes (driven by the HMT state-dependentstatistics). The LL coefficients are synthesized separately accordingto their probability distributions. By inverse wavelettrans<strong>for</strong>m, we have the three synthesized textures in the imagedomain. The example images are shown in Fig. 9, as is theMDL parameters as a function of . It is observed thatis correctly predicted.


720 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 9. Top two figures <strong>and</strong> the bottom left one are the synthesized textures using three different HMT models. The bottom right figure plots the MDL value asa function of M using the mixture of the three textures. M =3textures gives the minimum MDL value.As in all examples presented here, distinct HMTs are defined<strong>for</strong> each texture <strong>and</strong> <strong>for</strong> the LH, HL, <strong>and</strong> HH wavelet quadtrees.The different wavelet b<strong>and</strong>s are assumed to be statistically independent.For these synthesized images, we have consideredwavelet levels, <strong>and</strong> the probability of correct HMT segmentationwas 98.9%.B. Robustness <strong>and</strong> Sparsity of KMP ClassifierHaving demonstrated the utility of the automatic segmentationalgorithm, using measured (Fig. 4) <strong>and</strong> synthesized(Fig. 9) data, we now address example results <strong>for</strong> weighting thewavelet coefficients based on their importance <strong>for</strong> segmentation<strong>and</strong> <strong>MSE</strong> (using algorithmic details from Section III). In thissection, we examine the effectiveness of KMP <strong>for</strong> achievingthis task. To demonstrate the ability of KMP to select “in<strong>for</strong>mative”predictor variables in the presence of irrelevant ones,we generate -dimensional data vectors from two Gaussianclasses, with means(24)In both cases, the covariance matrix is the identity matrix. Regardlessof the dimensionality , the Bayes error rate is 0.159.The optimal classifier is linear <strong>and</strong> uses only the first two dimensionsof the data. The average error rate of the KMP method iscompared with the Bayes error, the classification per<strong>for</strong>manceof the relevance vector machine (RVM) [22], <strong>and</strong> also the per<strong>for</strong>manceof the linear least squares regression model (LSRM) [9].Fig. 10. Feature selection ability of KMP: The classification error rates arecompared among RVM, KMP, Bayes, <strong>and</strong> LSRM classifiers as the functionof the total feature dimension d defined in (24). Results are shown <strong>for</strong> datasynthesized in Section V-B.The average errors with respect to different feature dimensionsare plotted in Fig. 10. All three of these approaches yield alinear classifier in feature space, with the RVM weights [22] applieddirectly to the feature components. More generally a nonlinearRVM could be employed using a kernel [22], but in thiscase the sparseness would not be on the feature components; itis possible to extend the kernel-based RVM such that sparsenessis manifested on the features [13], but this algorithm was


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 721Fig. 11. Feature selection ability of KMP: The weight amplitudes of thefeatures are compared between KMP <strong>and</strong> LSRM classifiers. Results are shown<strong>for</strong> the same data as used in Fig. 10.Fig. 12. Classification errors as the function of <strong>MSE</strong> at bit rates of 0.2, 0.3,0.4, 0.5, <strong>and</strong> 0.6 bpp are compared among M<strong>SPIHT</strong> ( = 50000), M<strong>SPIHT</strong>( =0), <strong>and</strong> the original <strong>SPIHT</strong>. Results are shown <strong>for</strong> data in Fig. 4.not found to be significantly better than the nonkernel RVM <strong>for</strong>the data considered here.These results shows that the KMP algorithm is robust inthe presence of irrelevant feature variables, as compared tothe linear least squares regression model <strong>and</strong> also the RVM, apopular sparse classifier. We plot the weight amplitude of eachfeature dimension in Fig. 11. We see that most of the featuresare removed by a zero weight, <strong>and</strong> the first two features havethe dominant weight values. For LSRM the weight amplitudesof many irrelevant features are large, resulting in bad generalizationper<strong>for</strong>mance. These results <strong>and</strong> others like themmotivate using KMP to select features, <strong>and</strong>, hence, to weighthe importance of wavelet coefficients. We iteratively select thewavelet coefficients that reduce the label regression error most<strong>and</strong> estimate the wavelet-coefficient weights jointly with allthe selected coefficients. In the compression example presentedbelow we choose the first 10% most important wavelet coefficients<strong>and</strong> calculate the wavelet-coefficient weights.In the last series of results, we have considered examples ofautomatic segmentation <strong>and</strong> the subsequent reweighting of thewavelet coefficients, based on their importance <strong>for</strong> a cost functionthat accounts <strong>for</strong> classification accuracy <strong>and</strong> <strong>MSE</strong>. We nowput these two steps together, integrated with modified <strong>SPIHT</strong>, toexamine overall algorithm per<strong>for</strong>mance.C. <strong>Modified</strong> <strong>SPIHT</strong> Versus Original <strong>SPIHT</strong>In our first example we consider the measured imagery inFig. 4. In Fig. 12, we compare the classification errors as a functionof <strong>MSE</strong> error. The definition of “truth” is guided by the resultsof the HMT segmentation, with a threshold of 0.5 appliedto the results in Fig. 8, to yield the “true” segmentation used<strong>for</strong> scoring. For the modified <strong>SPIHT</strong> (M<strong>SPIHT</strong>) algorithm wemust first send the relative weighting of the wavelet coefficients(so that the decoder can undo the wavelet-coefficient weighting,<strong>and</strong> per<strong>for</strong>m an inverse wavelet trans<strong>for</strong>m to obtain an approximationto the original image). For levels, as consideredhere, this corresponds to 64 coefficients. These coefficients aresent using 4 bits per coefficient. When quantizing the image at0.5 bpp, 0.8% of the total bits sent correspond to the wavelet-coefficientweights.The M<strong>SPIHT</strong> results with correspond to accounting <strong>for</strong><strong>MSE</strong> alone, as indicated in (8). We observe that M<strong>SPIHT</strong> withprovides slightly better classification per<strong>for</strong>mance than<strong>SPIHT</strong>. Recall that M<strong>SPIHT</strong> employs nonuni<strong>for</strong>m quantizationbins, even when . Considering , <strong>for</strong> whichsignificant importance is placed on classification per<strong>for</strong>mance,we note a marked reduction in classification error. The pointson the curves in Fig. 12 correspond to 0.2, 0.3, 0.4, 0.5, <strong>and</strong>0.6 bits per pixel (bpp). While yields improvedclassification per<strong>for</strong>mance, we suffer in the context of <strong>MSE</strong>.The <strong>MSE</strong> at 0.6 bpp is comparable to that of 0.4 bpp, whencomparing M<strong>SPIHT</strong><strong>and</strong> <strong>SPIHT</strong>, respectively.To avoid issues inherent to defining the “true” segmentationof measured imagery, <strong>and</strong> to have better control over the propertiesof the wavelet coefficients, we consider synthetic data. Wenow consider an original texture (right half of Fig. 13), per<strong>for</strong>ma wavelet decomposition, add white Gaussian noise (WGN) towavelet coefficients in particular subb<strong>and</strong>s <strong>and</strong> then synthesizethe image via inverse wavelet trans<strong>for</strong>m. In this example, weconsider wavelet levels, adding WGN to half of the 16finest LH wavelet coefficients. Since the coefficients with additivenoise are the only ones that distinguish the statistics ofthe two synthesized textures, we expect these to be deemed important<strong>for</strong> the classification task. The average signal-to-noiseratio (SNR) <strong>for</strong> these wavelet coefficients is dB, definedas , where is the average squared amplitude<strong>for</strong> the coefficients to which noise is added, <strong>and</strong> is the noisevariance.We compare the per<strong>for</strong>mance at bit rates of 0.2, 0.3, 0.4,0.5, <strong>and</strong> 0.6, with results presented in Fig. 14. The M<strong>SPIHT</strong>algorithm withdemonstrates significantly improvedclassification per<strong>for</strong>mance, relative to original <strong>SPIHT</strong><strong>and</strong> M<strong>SPIHT</strong> <strong>for</strong> (<strong>for</strong> which only <strong>MSE</strong> is considered).


722 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 13. Gaussian noise at 03 dB SNR is added to the finest LH waveletcoefficients of the right half texture, to obtain the synthetic textures shown inthe left half.Fig. 15. Gaussian noise at 03:5 dB SNR is added to the second-level LHwavelet coefficients of the right half texture, to obtain the synthetic texturesshown in the left half.Fig. 14.Classification errors as the function of <strong>MSE</strong> are compared amongM<strong>SPIHT</strong> ( = 50000), M<strong>SPIHT</strong> ( =0), <strong>and</strong> the original <strong>SPIHT</strong>. The bitrates are 0.2, 0.3, 0.4, 0.5, <strong>and</strong> 0.6 bpp. Results are shown <strong>for</strong> data in Fig. 13.As indicated above, there is potential <strong>for</strong> disturbance of zerotrees via the coefficient reweighting. This implies that the per<strong>for</strong>manceof the M<strong>SPIHT</strong> algorithm may be impacted by whichparticular wavelet coefficients are scaled to be large (where inthe wavelet tree). In the pervious example the significant waveletcoefficients (<strong>for</strong> which noise was added) were at the finest level.We now consider a synthetic image as in the left half of Fig. 15,but now the white Gaussian noise is added to the coefficientsin the second LH b<strong>and</strong> (<strong>for</strong> levels, as in the previousexample). In this case, the SNR, as defined above, is dB.Results are presented in Fig. 16. We again consider<strong>for</strong> the M<strong>SPIHT</strong> results <strong>for</strong> which classification is emphasized.In Fig. 16, we note that M<strong>SPIHT</strong> <strong>for</strong> provides excellentclassification per<strong>for</strong>mance <strong>for</strong> low <strong>MSE</strong> (highest bpp), betterthan M<strong>SPIHT</strong> with . This is attributed to the factthat the coefficients that are important are relatively large inamplitude (lower scale than in Fig. 14). These large-amplitudeFig. 16.Classification errors as the function of <strong>MSE</strong> are compared amongM<strong>SPIHT</strong> ( = 50000), M<strong>SPIHT</strong> ( =0), <strong>and</strong> the original <strong>SPIHT</strong>. The bitrates are from 0.2 to 0.6 bpp at the interval of 0.05 bpp. Results are shown <strong>for</strong>synthesized data in Fig. 15.coefficients, which are also important <strong>for</strong> classification arereconstructedwell by <strong>SPIHT</strong><strong>and</strong> M<strong>SPIHT</strong> with . However,at lower bpp (higher <strong>MSE</strong>), the classification improvementof M<strong>SPIHT</strong> withis more evident. Note that inFig. 14, <strong>for</strong> which the small-amplitude finest-level coefficientsare important <strong>for</strong> classification, the M<strong>SPIHT</strong> withyields better classification per<strong>for</strong>mance <strong>for</strong> all <strong>MSE</strong> (bpp)considered.D. <strong>Modified</strong> <strong>SPIHT</strong> <strong>and</strong> Bayes VQThe Bayes tree-structured vector quantization (B-TSVQ) algorithmintroduced by Oehler <strong>and</strong> Gray [15] is a joint compression<strong>and</strong> classification technique. It combines classification <strong>and</strong>compression into a single vector quantizer by incorporating aBayes risk term into the distortion measure. For large block


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 723Fig. 17. Training imagery <strong>for</strong> B-TSVQ, with labeled “urban” <strong>and</strong> “rural”classes.sizes, B-TSVQ per<strong>for</strong>mance approaches the theoretical rate-distortionbound [5]. TSVQ has the limitations of computationalcomplexity, a requirement of knowledge of the posterior probability,<strong>and</strong> it also requires the availability of a large trainingset. In image processing, the VQ block size is usually 4 4orsmaller because of computational constraints [20].In our first comparison, we consider the measured imageryin Fig. 4, with the “true” segmentation defined as discussedabove (dictated by the results of the HMT segmentation). TheM<strong>SPIHT</strong> results are extensions from Fig. 12, <strong>for</strong><strong>and</strong> . The M<strong>SPIHT</strong> results are computed adaptively onthe imagery in Fig. 4, without any a priori training data. By contrast,the B-TSVQ requires training data to design the tree-structuredcodebook <strong>and</strong> to build the associated classifier (a look-uptable, that maps a code to a texture). In Fig. 17, we present separatetraining data, from the “rural” <strong>and</strong> “urban” classes, usedto train the B-TSVQ algorithm. These data are distinct examplesfrom the same USC-SIPI database from which Fig. 4 wasacquired. We consider a bit rate of 0.35 bit/pixel. To achievethis bit rate we run the required number of M<strong>SPIHT</strong> iterations,while <strong>for</strong> B-TSVQ the bit rate is dictated by the number of thecodes <strong>and</strong> size of each block. We here consider a codebook ofsize 49, <strong>and</strong> each block is of size 4 4. The M<strong>SPIHT</strong> classificationis based upon two wavelet levels, to be consistent withthe 4 4 blocks used by B-TSVQ. However, to improve codingefficiency, the M<strong>SPIHT</strong> is run <strong>for</strong> levels (only two ofwhich are used in the classifier). To run M<strong>SPIHT</strong> with ,wefirst run the iterative algorithm in Section III-C <strong>for</strong> . Thewavelet <strong>and</strong> scaling coefficients are then weighted as so determined.The subsequent three wavelet levels are then per<strong>for</strong>medon these weighted coefficients.We also show M<strong>SPIHT</strong> results <strong>for</strong> a classifier based on threelevels, corresponding to 8 8 blocks. The results in Fig. 18, <strong>for</strong>both M<strong>SPIHT</strong> <strong>and</strong> B-TSVQ, are computed by controlling theLagrange multiplier that dictates the balance between concentratingon <strong>MSE</strong> <strong>and</strong> classification error. The results indicate thatM<strong>SPIHT</strong> has better compression per<strong>for</strong>mance than B-TSVQ(smaller <strong>MSE</strong>), but B-TSVQ has more sensitivity to the Lagrangian-driventradeoff between <strong>MSE</strong> <strong>and</strong> classification (althoughin these results the B-TSVQ <strong>MSE</strong> does not change substantiallyas the Lagrange multiplier changes).Fig. 18. Classification error as a function of <strong>MSE</strong>, at a bit rate of 0.35 bpp.For both B-TSVQ <strong>and</strong> M<strong>SPIHT</strong> the variation in classification error <strong>and</strong> <strong>MSE</strong> iscontrolled by adjusting a respective Lagrange multiplier. Results are shown <strong>for</strong>data in Fig. 4.Fig. 19. Classification error as the function of <strong>MSE</strong> with Lagrangianmultipliers increasing are compared between B-TSVQ <strong>and</strong> M<strong>SPIHT</strong> at bit ratesof 0.35 bpp. Results are shown <strong>for</strong> data in Fig. 13.To complement the results discussed above (Fig. 4), <strong>for</strong> whichthere may be some uncertainty as to actual “truth,” we also considersynthetic data. In Fig. 19, we study the tradeoff between<strong>MSE</strong> <strong>and</strong> classification error <strong>for</strong> B-TSVQ <strong>and</strong> M<strong>SPIHT</strong> <strong>for</strong> thebit rate 0.35 bit/pixel, <strong>for</strong> the data in Fig. 13, which <strong>for</strong> B-TSVQcorresponds to a codebook of size 49, <strong>for</strong> blocks of size 4 4.Again, the M<strong>SPIHT</strong> classification is based upon two waveletlevels, to be consistent with the 4 4 blocks used by B-TSVQ,<strong>and</strong> to improve coding efficiency the M<strong>SPIHT</strong> is run <strong>for</strong>levels. We also show M<strong>SPIHT</strong> results <strong>for</strong> a classifier based onthree levels, corresponding to 8 8 blocks. Separate trainingdata with the same statistics were used to build the codes <strong>for</strong>the B-TSVQ. When comparing results <strong>for</strong> 4 4 blocks, we notethat the M<strong>SPIHT</strong> algorithm per<strong>for</strong>ms best <strong>for</strong> high classificationerror (lower <strong>MSE</strong>), with this attributed to the fact that theM<strong>SPIHT</strong> algorithm is effectively employing larger block sizes


724 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006TABLE ICPU (PENTIUM 4, 1.8-GHZ) TIME OF HMT SEGMENTATIONAND WEIGHT ESTIMATION PROCESSESthan the B-TSVQ. Recall that, although classification is per<strong>for</strong>medwith two levels, wavelet levels are used in theM<strong>SPIHT</strong> algorithm. For low classification error (high <strong>MSE</strong>), heB-TSVQ algorithm per<strong>for</strong>ms better <strong>for</strong> the 4 4 blocks, whichmay be attributed to the fact that B-TSVQ exploits training data.While the B-TSVQ is limited to blocks of size 4 4 (<strong>for</strong> computationalreasons), the M<strong>SPIHT</strong> algorithm is readily applied tolarger blocks. In Fig. 19 we also show M<strong>SPIHT</strong> results <strong>for</strong> 8 8blocks, <strong>for</strong> which we note that these results are superior to thoseprovided by either algorithm when considering 4 4 blocks.E. Computation Cost of M<strong>SPIHT</strong>It has been demonstrated that using the M<strong>SPIHT</strong> algorithmone may reduce the probability of classification error at a givenbit rate, with an associated increase in <strong>MSE</strong>. However, thereis a computational cost associated with per<strong>for</strong>ming the HMTbasedsegmentation <strong>and</strong> wavelet-coefficient weighting. Clearlythis cost must be sufficiently small to warrant doing somethingother than conventional coding (<strong>SPIHT</strong>). The HMT segmentationcost scales linearly with the number of wavelet trees, <strong>and</strong>the number of wavelet trees scales with order . Hence,the cost of HMT segmentation scales as . The cost ofestimating the wavelet weights is typically far less expensivethan that of HMT-based segmentation. In Table I we tabulatethe CPU time (Pentium 4, 1.8-Hz) <strong>for</strong> HMT segmentation <strong>and</strong>coefficient weighting, <strong>for</strong> typical images (like those shown here)as a function of . All codes were written in C.It is important to emphasize that one need not per<strong>for</strong>m HMTsegmentation <strong>and</strong> computation of new coefficient weights <strong>for</strong>each new image. Initially one may per<strong>for</strong>m these tasks. However,subsequently these coefficient weights can be retained untilnew data warrants a change. A change in the image statisticsmay be detected via the current set of HMTs. If new data arefound to be associated with these HMTs with low likelihood,the segmentation can be refined <strong>and</strong> the coefficient weightingrecomputed.In this manner, he coding algorithm adapts to new imagery,always adjusting to achieve a desired balance between classification<strong>and</strong> <strong>MSE</strong> accuracy. This would be very difficult with aB-TSVQ algorithm, since it is often prohibitively expensive torefine the codebook as data with new statistics are observed.VI. CONCLUSIONWe have designed a scheme <strong>for</strong> pruning <strong>and</strong> weighting thewavelet coefficients be<strong>for</strong>e wavelet coding, with the ultimategoal of image classification. We demonstrated improved segmentationper<strong>for</strong>mance of the decoded image, with little decreasein <strong>MSE</strong> image quality at low bit rates. We also proposeda modified <strong>SPIHT</strong> algorithm, using the importance weight in<strong>for</strong>mation,to focus <strong>SPIHT</strong> on wavelet coefficients of importance<strong>for</strong> classification. We tested the method on high altitudetwo-texture aerial photographic imagery <strong>and</strong> also using synthesizeddata. The B-TSVQ is a widely used joint compression<strong>and</strong> classification technique. Using measured <strong>and</strong> synthesizeddata, we found that the compression <strong>and</strong> classification per<strong>for</strong>manceof the modified <strong>SPIHT</strong> algorithm is comparable to thatof B-TSVQ. The modified <strong>SPIHT</strong> algorithm has the advantagesof not requiring codebook design <strong>and</strong> it is not limited in the sizeof the blocks used <strong>for</strong> classification.We have considered the measured imagery in Fig. 4 <strong>for</strong>several of the example results. These data were acquired fromthe USC SIPI data base, <strong>and</strong> represent only one example fromnumerous test cases we have considered. The results presentedhere are typical of the per<strong>for</strong>mance observed on such data.We have also considered canonical textures from the USCSIPI data base (see http://sipi.usc.edu/services/database/database.cgi?volumetextures). The segmentation per<strong>for</strong>manceon such textural data was observed to be consistent with resultspresented here <strong>for</strong> synthesized data based on HMT synthesis(see, <strong>for</strong> example, Fig. 9). We have not shown texture-segmentationresults here on USC SIPI canonical textures in an ef<strong>for</strong>t toconserve space, <strong>and</strong> because the results are analogous to thoseobserved <strong>for</strong> the HMT-generated synthetic data. Moreover,by utilizing the HMT-generated synthetic textures, we have“truth” as to which wavelet coefficients are most relevant <strong>for</strong> theclassification task (of interest <strong>for</strong> examining the per<strong>for</strong>manceof the KMP component of the algorithm). For the data in thea<strong>for</strong>ementioned USC SIPI data base, it is difficult to definetruth as to the most-relevant wavelet coefficients.It is of interest to make the connection of the work reportedhere to TSVQ [7] <strong>and</strong> B-TSVQ [15]. In TSVQ <strong>and</strong> B-TSVQne per<strong>for</strong>ms a sequence of binary question, with a 1 or 0used to represent the result of each test. The sequence ofones <strong>and</strong> zeros defines a traversal through the tree, therebydefining a code (the sequence of ones <strong>and</strong> zeros providethe index <strong>for</strong> a corresponding code). One requires trainingdata to build the codebook <strong>and</strong> corresponding tree, <strong>and</strong> thecodebook <strong>and</strong> tree must be known at the decoder. These facts<strong>and</strong> computational requirements necessitate relatively smallcode sizes (typically 4 4 [20]). As Gray <strong>and</strong> Neuhoff [8]have pointed out, zero-tree wavelet coding is closely relatedto VQ. In fact, the <strong>SPIHT</strong> algorithm may be viewed as ageneralized TSVQ algorithm. In particular, in <strong>SPIHT</strong>, one asksa series of binary questions, as the LIS, LIP <strong>and</strong> LSP lists arebuilt <strong>and</strong> refined. Moreover, binary questions are sequentiallyasked during the sorting <strong>and</strong> refinement passes. If we fix theaverage bit rate (bpp) of the <strong>SPIHT</strong> algorithm, there are a finitenumber of ways the <strong>SPIHT</strong> algorithm may answer these binaryquestions <strong>and</strong> yield the desired bpp. This may be viewed as afinite (but very large) codebook, with the sequence of binaryquestions used to define a desired code. The major advantageof wavelets <strong>and</strong> <strong>SPIHT</strong> is that we do not need to explicitlybuild, store or search the codebook, because it is inherent tothe structure of the wavelet trans<strong>for</strong>m, known at the decoder<strong>and</strong> encoder. Consequently, <strong>SPIHT</strong> requires no training data.Further, the modified <strong>SPIHT</strong> algorithm may be viewed asa generalization of B-TSVQ. The significant advantage ofmodified <strong>SPIHT</strong> is that no training data is required, <strong>and</strong> thatit is, there<strong>for</strong>e, adaptive to new data. As the data changes,


CHANG AND CARIN: MODIFIED <strong>SPIHT</strong> ALGORITHM FOR IMAGE CODING 725the encoder may inexpensively send the decoder the updatedwavelet weights, <strong>and</strong> implicitly the modified <strong>SPIHT</strong> algorithmrefines its B-TSVQ like codebook automatically. As Gray <strong>and</strong>Neuhoff [8] explain, this connection of wavelet coding toVQ plays an important role in the <strong>SPIHT</strong> per<strong>for</strong>mance, <strong>and</strong>to the modified <strong>SPIHT</strong> algorithm presented here. Additionaladvantages of <strong>SPIHT</strong> <strong>and</strong> modified <strong>SPIHT</strong> vis-a-vis TSVQ<strong>and</strong> B-TSVQ are that <strong>SPIHT</strong> is an embedded encoder, <strong>and</strong>,hence, a partial reconstruction can be per<strong>for</strong>med if the bitstream is stopped. Moreover the implicit size of the <strong>SPIHT</strong>codes, <strong>for</strong> a given bpp, extend over the support of the entireimage (not limited to 4 4, <strong>for</strong> example).REFERENCES[1] J. Bilmes, “A Gentle Tutorial on the EM <strong>Algorithm</strong> <strong>and</strong> its Applicationto Parameter Estimation <strong>for</strong> Gaussian Mixture <strong>and</strong> Hidden MarkovModels,” Univ. Cali<strong>for</strong>nia, Berkeley, Tech. Rep. ICSI-TR-97-021, 1997.[2] C. M. Brislawn, “Classification of nonexpansive symmetric extensiontrans<strong>for</strong>ms <strong>for</strong> multirate filter banks,” Appl. Commun. Harmon. Anal.,vol. 3, 1996.[3] P. C. Cosman, C. Tseng, R. M. Gray, R. A. Olshen, L. E. Moses, H.C. Davidson, C. J. Bergin, <strong>and</strong> E. A. Riskin, “Tree-structured vectorquantization of ct chest scans: <strong>Image</strong> quality <strong>and</strong> diagnostic accuracy,”IEEE Trans. Med. Imag., vol. 12, no. 12, pp. 727–739, Dec. 1993.[4] M. Crouse, R. Nowak, <strong>and</strong> R. Baraniuk, “Wavelet-based statistical signalprocessing using hidden Markov models,” IEEE Trans. Signal Process.,vol. 46, no. 4, pp. 886–902, Apr. 1998.[5] Y. Dong, “Rate Distortion Analysis of <strong>Joint</strong> Compression <strong>and</strong> Classification:Application to HMM State (Pose) Estimation via Multi-AspectScattering Data,” Ph.D. dissertation, Duke Univ., Durham, NC, 2002.[6] M. Farshchian, S. Cho, <strong>and</strong> W. A. Pearlman, “Optimal error protection<strong>for</strong> real-time image <strong>and</strong> video transmission,” IEEE Signal Process. Lett.,vol. 11, no. 10, pp. 780–783, Oct. 2004.[7] A. Gersho <strong>and</strong> R. M. Gray, Vector Quantization <strong>and</strong> Signal Compression.Norwell, MA: Kluwer, 1992.[8] R. M. Gray <strong>and</strong> D. L. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory,vol. 44, no. 5, pp. 2325–1283, Oct. 1998.[9] T. Hastie, R. Tibshirani, <strong>and</strong> J. Friedman, The Elements of StatisticalLearning:Data Mining, Inference, <strong>and</strong> Prediction. New York:Springer-Verlag, 2001.[10] M. Inoue <strong>and</strong> N. Ueda, “Exploitation of unlabeled sequences in hiddenMarkov models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no.12, pp. 1570–1581, Dec. 2003.[11] N. S. Jayant, J. D. Johnston, <strong>and</strong> R. J. Safranek, “Signal compressionbased on models of human perception,” Proc. IEEE, vol. 81, no. 10, pp.1385–1424, Oct. 1993.[12] A. A. Kassim <strong>and</strong> W. S. Lee, “Embedded color image coding using spihtwith partially linked spatial orientation trees,” IEEE Trans. Circuits Syst.Video Technol., vol. 13, no. 2, pp. 203–206, Feb. 2003.[13] B. Krishnapuram, A. Harternink, L. Carin, <strong>and</strong> M. Figueiredo, “Abayesian approach to joint feature selection <strong>and</strong> classifier design,” IEEETrans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1105–1111, Sep.2004.[14] S. G. Mallat <strong>and</strong> Z. Zhang, “Matching pursuits with time-frequency dictionaries,”IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415,Dec. 1993.[15] K. L. Oehler <strong>and</strong> R. M. Gray, “Combining image compression <strong>and</strong> classificationusing vector quantization,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 17, no. 5, pp. 461–473, May 1995.[16] W. A. Pearlman, A. Islam, N. Nagaraj, <strong>and</strong> A. Said, “Efficient,low-complexity image coding with a set-partitioning embedded blockcoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 11, pp.1219–1235, Nov. 2004.[17] K. O. Perlmutter, S. M. Perlmutter, <strong>and</strong> R. M. Gray, “Bayes riskweighted vector quantization with posterior estimation <strong>for</strong> imagecompression <strong>and</strong> classification,” IEEE Trans. <strong>Image</strong> Process., vol. 5,no. 2, pp. 347–360, Feb. 1996.[18] J. Rissanen, “Modeling by shortest data description,” Automatica, vol.14, 1978.[19] A. Said <strong>and</strong> W. A. Pearlman, “A new fast <strong>and</strong> efficient image codec basedon set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 6, no. 2, pp. 243–250, Feb. 1996.[20] M. J. T. Smith <strong>and</strong> A. Docef, A Study Guide <strong>for</strong> Digital <strong>Image</strong> Processing,1st ed. Singapore: World Scientific, 1998.[21] D. S. Taubman <strong>and</strong> M. W. Marcellin, JPEG2000: <strong>Image</strong> CompressionFundamentals, St<strong>and</strong>ards, <strong>and</strong> Practice. Norwell, MA: Kluwer, 2001.[22] M. E. Tipping, “Sparse bayesian learning <strong>and</strong> the relevance vector machine,”J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.[23] J. D. Villasenor, B. Belzer, <strong>and</strong> J. Liao, “Wavelet filter evaluation <strong>for</strong>image compression,” IEEE Trans. <strong>Image</strong> Process., vol. 4, no. 8, pp.1053–1060, Aug. 1995.[24] P. Vincent <strong>and</strong> Y. Bengio, “Kernel matching pursuit,” Mach. Learn., vol.48, no. 1–3, pp. 165–187, 2002.[25] Z. Wang <strong>and</strong> A. C. Bovik, “Embedded foveation image coding,” IEEETrans. <strong>Image</strong> Process., vol. 10, pp. 1397–1410, Oct. 2001.[26] R. S. Zemel, “A Minimum Description Length Framework <strong>for</strong> UnsupervisedLearning,” Ph.D. dissertation, Univ. Toronto, Toronto, ON,Canada, 1993.Shaorong Chang was born in Henan, China, in May1978. She received the B.S. degree in electronic engineeringfrom Tsinghua University, Beijing, China,in 2000, <strong>and</strong> the M.S. degree in electrical engineeringfrom Duke University, Durham, NC, in 2001, whereshe is currently pursuing the Ph.D. degree.Her primary research interests are in image compression<strong>and</strong> classification.Lawrence Carin (SM’96–F’01) was born on March 25, 1963, in Washington,DC. He received the the B.S., M.S., <strong>and</strong> Ph.D. degrees in electrical engineeringfrom the University of Maryl<strong>and</strong>, College Park, in 1985, 1986, <strong>and</strong> 1989, respectively.In 1989, he joined the Electrical Engineering Department, Polytechnic Universityof Brooklyn, Brooklyn, NY, as an Assistant Professor, where he becamean Associate Professor in 1994. In September 1995, he joined the ElectricalEngineering Department, Duke University, Durham, NC, where he is now theWilliam H. Younger Professor of Engineering. He was the principal investigator(PI) on a Multidisciplinary University Research Initiative (MURI) on demining(1996–2001) <strong>and</strong> he is currently the PI of a MURI dedicated to multimodal inversion.His current research interests include short-pulse scattering, subsurfacesensing, <strong>and</strong> wave-based signal processing.Dr. Carin is a member of the Tau Beta Pi <strong>and</strong> Eta Kappa Nu honor societies.He was an Associate Editor of the IEEE TRANSACTIONS ON ANTENNAS ANDPROPAGATION from 1996 to 2004.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!