A Modified SPIHT Algorithm for Image Coding With a Joint MSE and ...

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006 713A Modified SPIHT Algorithm for Image Coding Witha Joint MSE and Classification Distortion MeasureShaorong Chang and Lawrence Carin, Fellow, IEEEAbstract—The set partitioning in hierarchical trees (SPIHT)algorithm is an efficient wavelet-based progressive image-compressiontechnique, designed to minimize the mean-squared error(MSE) between the original and decoded imagery. However, theMSE-based distortion measure is not in general well correlatedwith image-recognition quality, especially at low bit rates. Specifically,low-amplitude wavelet coefficients that may be importantfor classification are given low priority by conventional SPIHT. Inthis paper, we use the kernel matching pursuits (KMP) method toautonomously estimate the importance of each wavelet subbandfor distinguishing between different textures, with textural segmentationfirst performed via a hidden Markov tree. Based onsubband importance determined via KMP, we scale the waveletcoefficients prior to SPIHT coding, with the goal of minimizing aLagrangian distortion based jointly on the MSE and classificationerror. For comparison we consider Bayes tree-structured vectorquantization (B-TSVQ), also designed to obtain a tradeoff betweenMSE and classification error. The performances of the originalSPIHT, the modified SPIHT, and B-TSVQ are compared.Index Terms—Classification, hidden Markov tree (HMT), imagesegmentation, set partitioning in hierarchical trees (SPIHT), vectorquantization (VQ).I. INTRODUCTIONWHEN performing compression at relatively low bit rates,there is in general information lost between the originalimage and that recovered after decoding. Most compressionschemes are based on minimizing the mean-square error (MSE)between the original and compressed imagery. While this is anatural direction in many applications, there are problems forwhich one will ultimately make a classification decision basedon the decoded imagery. For example, in medical-image compression,for transmission or storage, an expert will often make adiagnosis based on the decoded imagery [3]. In remote sensing,one often collects very large quantities of data (e.g., infraredor synthetic-aperture-radar imagery), necessitating low-bit-ratecompression. In the remote-sensing problem, humans will alsooften make decisions based on the decoded imagery. It is thereforedesirable to encode the original imagery in a manner thataccounts for the ultimate classification task, this motivating considerationof non-MSE distortion measures and, hence, modificationof the associated encoders/decoders.Manuscript received June 4, 2004; revised February 15, 2005. The associateeditor coordinating the review of this manuscript and approving it for publicationwas Prof. Fernando M. B. Pereira.The authors are with the Department of Electrical and Computer Engineering,Duke University, Durham, NC 27708-0291 USA (e-mail:chshrong@ee.duke.edu; lcarin@ee.duke.edu).Digital Object Identifier 10.1109/TIP.2005.860595It should be noted that, in a related application, coding algorithmsdeveloped for acoustic data have been modified to accentuatethose frequencies that play an important role in humanhearing [11]. Related work in perceptually weighted quantizationhas demonstrated significant improvement in visual quality[25]. In the work presented here, we extend these ideas to emphasizewavelet coefficients of importance to image-based classification.The idea of developing encoders/decoders that account forboth MSE and classification performance has been consideredpreviously in the context of vector quantization (VQ) [15], [17].Specifically, rather than simply basing VQ on a squared-distancedistortion measure, a Lagrangian distortion measure wasdeveloped in which one can adjust the relative importanceplaced on MSE and Bayes risk. This results in a Bayes-VQalgorithm, implemented efficiently (and approximately) via aBayes tree-structured vector quantization (B-TSVQ) formalism[15]. The B-TSVQ is particularly well suited for medicalimaging applications, for which one may have a large amountof training data (to learn the codebook) and the statistics of theimagery to be compressed are relatively stationary (e.g., for agiven part of the anatomy).By contrast, in remote-sensing applications it is often difficultto predict a priori what imagery may be encountered, andtherefore design of a robust codebook will often be challenging.Moreover, it is computationally expensive and requires a largebit budget to adaptively augment the codebook as new data areacquired. This limitation of VQ-based compression schemeshas motivated the recent focus on wavelet-based compression,such as via SPIHT [19], these algorithms typically not requiringan a priori codebook. The SPIHT algorithm prioritizesall of the wavelet coefficients in a given image simultaneously,and in this sense it has some connection to VQ (block-likecoding). However, zero-tree-type encoding is employed for thelarge number of small-amplitude wavelet coefficients, and anembedded scalar quantizer is employed for the “important”coefficients.For the applications of interest here, the principal limitationof SPIHT is that the associated importance map, which delineatesthe significant wavelet coefficients, is based on MSE andtherefore priority is given to the large-amplitude wavelet coefficients.While the small-amplitude coefficients are of relativelydiminished importance for MSE distortion, they may be of significantimportance in a classification task. Moreover, large-amplitudecoefficients which may be unimportant for classificationare given priority by SPIHT. In the work presented here we extendSPIHT such that it places importance on those wavelet coefficientsthat are of importance for classification, with the rel-1057-7149/$20.00 © 2006 IEEE

714 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 1.error.Overall modified SPIHT (MSPIHT) coding scheme. A scaling step is added before SPIHT coding to balance the classification error and the reconstructionative importance placed on MSE and classification accommodatedusing a Lagrangian distortion measure, analogous to thatemployed in B-TSVQ.The rescaling of coefficients is a well-known technique foradjusting their relative importance prior to encoding. Therefore,the principal challenge considered here involves developing atechnique to autonomously and adaptively learn the appropriatewavelet-coefficient scalings, accounting for a Lagrangian distortionmeasure of the type discussed above. The difficulty isthat we do not want to make any assumptions about what maybe important for a classification task, since in many applications(e.g., remote sensing) the imagery is often too variable. We,therefore, proceed under the assumption that segmenting theimage into distinct textural classes will play an important role ina subsequent classification task. For example, in remote-sensingclassification, an anomaly or target (to be distinguished withina classification stage) is typically defined by its textural contrastrelative to the texture of the background. Therefore, thefirst stage of our algorithm is to segment the image into textures,with determined adaptively via a minimum-descriptionlength (MDL) framework [26]. Once the segmentation isperformed, a second algorithm is employed to scale the importanceof the wavelet coefficients in the context of realizing thissegmentation. The MSE is also accounted for, in a Lagrangiansetting, with the relative importance placed on classification andMSE dictated by the chosen Lagrange multiplier. Note that accountingfor MSE in addition to importance for classification isnecessary because the final decoded image will likely be viewedby a human, who may not wish to rely overly on the results ofthe (likely imperfect) classifier at the encoder.After performing segmentation and determining the relativeimportance of the wavelet coefficients, the coefficients arescaled appropriately, with the coefficient-dependent scale factorsent to the decoder. The encoding and decoding procedure thenemploys a slightly modified form of SPIHT, as detailed below.The overall coding scheme is shown in Fig. 1.Since wavelets are to be employed in the subsequent SPIHTbasedencoder/decoder, we employ a wavelet-based segmentationalgorithm. hidden Markov trees (HMTs) are well suitedto capturing the multiscale statistical dependence of waveletcoefficients [4]. We propose an unsupervised image segmentationmethod using an HMT mixture model, the parameterestimation problem for which is solved by a generalized expectation-maximization(EM) algorithm [1]. The posterior probabilitydistribution across the mixture components yields the imagesegmentation. The segmentation is performed autonomouslyat the encoder.After performing segmentation into textures, the final stepprior to encoding is to scale the wavelet coefficients based ontheir importance for classification, while also accounting in aLagrangian sense for MSE. This is effected by implementingkernel matching pursuits (KMP) [24]. The wavelet coefficientsare scaled based on the KMP results, and then encoded using amodified version of SPIHT, as detailed below. We compare theperformance of B-TSVQ and modified SPIHT.The remainder of the paper is organized as follows. In SectionII, we present the definition of the HMT mixture modelalong with the expectation-maximization (EM) training algorithm.We consider in Section III an additive regression modelto estimate the importance of the wavelet coefficients for texturerecognition, employing a KMP solution. In Section IV, themodified SPIHT coding scheme is discussed. Typical results ofthe algorithms are presented in Section V, with conclusions presentedin Section VI.II. IMAGE SEGMENTATIONImage segmentation is a fundamental low-level operation inimage analysis for object identification. The encoding strategyadopted here is based on a wavelet decomposition of the image,and, therefore, we utilize a wavelet-based segmentation procedure.To account for the variability of anticipated imagery, thesegmentation algorithm autonomously determines the numberof textures as well as their statistical characteristics.A. HMT-Based Block ClassificationThe image is analyzed with a two-dimensional wavelet transform,employing a one-dimensional transform in each of the twoprincipal directions. For wavelet levels we yield a contiguousset of wavelet quadtrees, each corresponding to a blockin the original image, with . Based on the persistencestatistical property of wavelet coefficients, which observes thatlarge (or small) wavelet coefficient magnitudes tend to propagatethrough the scales corresponding to the same spatial location,Crouse et al. [4] have introduced the wavelet-domain HMTmodel to capture the joint wavelet statistics.The HMT models the marginal probability density function(pdf) of each wavelet coefficient as a Gaussian mixture with ahidden state variable. It assumes that the key dependency betweenthe hidden state variable of the wavelet coefficients is

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 715tree-structured and Markovian, tied to the wavelet quadtree, employinga state transition matrix to quantify the degree of persistencebetween scales. The iterative EM algorithm for HMTs[4] was proposed to train the parameters (the mixture densityparameters and the probabilistic graph transition probabilities)to match the data in the maximum likelihood (ML) sense. Thetrained HMT provides a good approximation to the joint probabilityof the wavelet coefficients, yielding good classificationperformance.The conventional HMT training process requires the availabilityof labeled imagery. Specifically, data must be providedfor each texture class, followed by HMT training. The trainedHMTs can then be applied to segment new imagery that mightbe observed, assuming that this new imagery is characterized bysimilar textures. In our problem we assume little or no knowledgeof the anticipated image textural properties, and, therefore,determination of the textural classes is determined jointly withHMT training.B. HMT Mixture ModelWe assume that the statistics of the wavelet coefficients fromthe overall image may be represented as a mixture of HMTs,analogous to the well-known Gaussian mixture model (GMM)[1]. The probabilistic model for mixture components is givenbywhere are the wavelet coefficients of a wavelet tree, ,are the mixing coefficients of the textures,which may also be interpreted as the prior probabilities, and represents the parameters of the th HMTcomponent. The vector represents the cumulative set of modelparameters, specifically and , .We employ an iterative training procedure, analogous to thatfound in GMM design [1]. Let and represent themodel parameters for mixture component after iteration .Weestimate the probability that the th wavelet tree is generated bytexture (corresponding to the th HMT, denoted )as(1)where, for notational simplicity, we define. Equation (3) represents an approximation tothe conditional expectationusing. The samples that are associated withtexture with higher likelihood make a greater contribution tothe parameters of that texture component. The cumulative setof HMT parameters , e.g., state-transition probabilities,state-dependent parameters, etc., define the overall set of parametersfor the th HMT.The mixing coefficients are updated aswhere we have again assumed .For initialization, we use all the data with equal probabilityto train an initial (single) HMT with parameters. We then cluster the data into (scalar) Gaussian mixtures(denoting textures) based on , assumingthat the data from the same texture have similar probabilityvalues. In this manner, we assign the initial probability. The same ideahas been applied to effectively use unlabeled sequential datain learning hidden Markov models [10], with this termed theextended Baum–Welch (EBW) algorithm.We segment the image via a maximum a posteriori (MAP)estimator, that is(5)where represents the HMT model parameters for mixturecomponent , after convergence is achieved for the aforementionedtraining algorithm. The parameter , representing thenumber of textures in the image, is selected autonomously via aninformation-theoretic model-selection method called the MDLprinciple, derived by Rissanen [18]. The MDL principle statesthat the best model is the one that minimizes the summed descriptionlength of the model and the likelihood of the data withrespect to the model, making a trade-off between the model accuracyand model succinctness [26]. In our case, we calculatethe value as(4)(2)The parameters of each HMT are updated by an augmentedform of the EM algorithm in [4]. In [4], the wavelettreesare each used separately within an “upward-downward”algorithm to update the parameters of eachindividual HMT. On iteration , HMT model parametersare initiated using parameters from the previous step. Letrepresent an arbitrary parameter from ,soupdated using wavelet tree . Then, the associated cumulativeparameter, based on all wavelet trees, is expressed as(3)where the first term denotes the accuracy of the mixturemodel, the second term reflects the model complexity, andis the number of the free parameters estimated. We chooseto minimize (6).III. QUANTIZATION BINSThe purpose of rescaling the wavelet coefficients is to helpthe encoder order the output bit stream with consideration ofthe ultimate recognition task, with the balance between MSEand segmentation performance driven by a Lagrangian metric.Wavelet coefficients that play an important role in defining thesegmented texture class labels (determined automatically, asdiscussed in Section II) should be represented by a relatively(6)

716 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006large number of bits, even if the associated wavelet coefficientis small. In the context of an algorithm such as SPIHT, bitsare sent sequentially for each wavelet coefficient, with thenumber of bits associated with a given wavelet coefficient dictatedby the coefficients “importance,” defined in conventionalSPIHT by its amplitude. SPIHT exploits the fact that manyof the wavelet coefficients are of small amplitude, yieldingefficient coding via importance maps and zero trees. SinceSPIHT is a state-of-the-art algorithm, rather than developingan entirely new technique for encoding wavelet coefficientswhile accounting for MSE and classification, we modify thewavelet-coefficient amplitudes such that they may be then applieddirectly in a slightly modified form of SPIHT. Specifically,we develop an algorithm that scales the wavelet coefficientsbased on their importance to the aforementioned Lagrangiancost function. Wavelet coefficients that are important, as definedby this metric, are scaled to have larger relative amplitudes,such that they are given priority by SPIHT. Similarly, waveletcoefficients that are deemed to be less important, as definedby the Lagrangian metric, are scaled to have lower relativeamplitudes. This rescaling modifies the importance placed onthe wavelet coefficients (beyond simply basing importance onthe original amplitude of the coefficient), after which SPIHTproceeds with only minor changes.Before proceeding we note an issue that must be examinedwhen presenting results. The efficiency of the original SPIHTalgorithm is dictated by the persistence property of the waveletcoefficients with scale. Specifically, small-amplitude coefficientstend to be clustered as a function of scale, and thesecan be efficiently coded with zero trees or importance maps.The aforementioned rescaling has the potential of disturbingthe zero trees, which are exploited so effectively via SPIHT.Using actual measured remote-sensing imagery, we demonstratebelow that the rescaling still yields a highly efficientmodified-SPIHT algorithm.A. Objective FunctionThe KMP [24] algorithm is employed to prioritize the importanceof the wavelet coefficients. The KMP algorithm is anextension of matching pursuits [14] to kernels. Here, we employa kernel that selects wavelet coefficients, and, therefore, this is aspecial case of KMP, essentially reducing to matching pursuits.Assume we have a vector of wavelet coefficients from awavelet tree, and we wish to develop a classifier that, based on, predicts the texture label (e.g., for two textures ).Note that we performed a similar task with HMT-based segmentation.The distinction is that here we are not simply interestedin segmenting the data, we are interested in quantifying whichcoefficients in the wavelet tree are most important for this task.The KMP classifier [24] is formulated aswhere is the dimension of the wavelet vector , is the correspondingkernel weight, reflecting the importance of waveletcoefficient to the cost function. Because of the sparsity of theKMP classifier, most of are equal to zero.(7)Our objective is to minimize the cost functionwhere are the quantized wavelet coefficients associated withthe th wavelet tree, and the scalar is a Lagrangian multiplierthat constitutes a compromise between the quantization errorand the classification error. Thisoptimization is performed under the constraint that the entropyof the quantized coefficients is constant, specificallywhere is the entropy of the th wavelet coefficients.For the examples reported below, we have found that thelinear algorithm in (7) yields good results. However, in general,it may be desirable to design an algorithm that is nonlinear infeature space, based for example on a kernel algorithm. For suchpurposes one may employ a kernel-based technique that allowsgeneral nonlinear decision surfaces in feature space, while simultaneouslyallowing one to determine the importance of therespective features. The interested reader is referred to [13] fordetails on such an approach.B. Uniform Scalar Dead-Zone QuantizationThe uniform dead-zone quantizer [21] quantizes coefficientswith magnitude less than the threshold to zero. The coefficientsoutside of are uniformly quantized with step size. That is(10).Since small variations in input signals around zero are usuallycaused by noise, quantization with a dead-zone around zeroeliminates the noise around zero and improves the signal quality.For this reason, this quantizer is widely used in image compression.Part I of the JPEG2000 standard includes uniformscalar dead-zone quantization, where the dead-zone is equal totwo times the other quantization bin sizes [that is, ,asshown in (11)]. The SPIHT algorithm progressively quantizesthe wavelet coefficients using double deadzone uniform scalarquantizer, with quantization bin size reducing to one half of theprevious iteration(11).In this study, we use double dead zone uniform quantizerto quantize the wavelet coefficients. We model thequantization step as adding noise to the original signal(12)where . For a high-resolution uniform quantizer withquantization bin size of , the mean-square quantization errorof variable is(8)(9)(13)

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 717Fig. 2. Relationship between Ef g and 1 is modeled as (14). Results areshown for three wavelet coefficients, corresponding to the data in Fig. 4.Fig. 3. Relationship between R and log 1 can be modeled as (15). Resultsare shown for three wavelet coefficients, corresponding to the data in Fig. 4.In the high-resolution region, it is well known that an optimaltransform code based on scaler quantization should apply a uniformscalar quantizer with the same bin size for all coefficients.The SPIHT algorithm applies such a procedure. However,this optimality is based on a mean-squared error (MSE)distortion measure (as well as the asymptotic assumption).In the work presented here, we first weight the wavelet coefficientsbased on their importance for a joint (Lagrangian) MSEand classification-based distortion measure. After performingthe weighting, SPIHT is employed with minimal change. Whilethis results in uniform quantization bins for the scaled coefficients,we implicitly employ nonuniform quantization in theoriginal coefficients. The bin size is no longer the same forall wavelet coefficients, and our task is to determine for theth coefficient. This is addressed in the next section.1) Quantization Error Versus Quantization Bin Size: Wemodel the mean-square quantization error of the doubledead zone quantizer as a function of as(14)where, typically, (compared to (13) since dependson the distribution of the th wavelet coefficient. Givenan image, we compute as a function of and performa least square fit to determine and in (14). In Fig. 3, wecompare the true and the regression values obtained by(14) of three different wavelet coefficients. We can see that themodel fits the data well in the range of interest.2) Entropy Versus Quantization Bin Size: The entropy of thequantized data is also related to quantization bin size . Intuitively,as increases the entropy decreases. The entropy reducesto zero as reaches a value such that all the data arequantized to zero. It is appropriate to use a logarithmic modelto fit this relationship, as shown in Fig. 3. Specifically(15)where and are constants to be determined. This is a generalizationof a uniform scalar quantizer for which(16)where is the differential entropy of the original continuousrandom variable that is quantized. We rewrite the constraintof (9) as(17)C. Iterative Algorithm to Find the Optimal Quantization BinSizes1) Initialization: We set all the quantization bin sizes equalto a constant(18)2) Quantization: We quantize each wavelet coefficientusing a double dead zone uniform quantizer, defined in(11), with quantization step size of(19)3) Classifier Design: Given the quantized data, the objectivefunction in (8), is minimized by using the KMP algorithm.In this step, we determine .4) Quantization Bin Size Update: Given , the quantizationerror model ((12) and (14)), the entropy model ((3)),the objective function is minimized by using Lagrangianmethod(20)In (20), the first two terms implement (8) for fixed , and the lastterm implements the constraint in (17). By setting, we obtain(21)

718 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006and from (17)(22)We repeat steps 2)–4) until the reduction of is smaller than athreshold.The importance weight applied to the th wavelet coefficientis defined as(23)From (23), we can see that those wavelets with large , correspondingto wavelet coefficients deemed important for the jointMSE-classification cost function, require finer quantizationbins, thus corresponding to a large importance weight. Withincreasing, we assign smaller quantization bins to those waveletcoefficients deemed important for classification.IV. MODIFIED SPIHTA. Review of the SPIHT AlgorithmSet partitioning in hierarchical trees (SPIHT), proposed bySaid and Pearlman [19], is one of the most efficient image compressionalgorithms. The effectiveness of the SPIHT algorithmoriginates from the efficient subset partitioning and the compactform of the significance information. The SPIHT algorithm definesspatial orientation trees, sets of coordinates, and recursiveset partitioning rules [19]. The algorithm is composed of twopasses: a sorting pass and a refinement pass. It is implementedby alternately scanning three ordered lists: list of insignificantsets (LIS), list of insignificant pixels (LIP), and list of significantpixels (LSP). The LIS and LIP represent the individual and setsof coordinates, respectively, for wavelet coefficients that are lessthan a threshold. During the sorting pass the significance of LIPand LIS are tested, followed by removal (as appropriate) to LSPand set splitting operations to maintain the insignificance propertyof the lists. In the refinement pass, the th most significantbits in the LSP, which contains the coordinates of the significantpixels, are scanned and output. The SPIHT algorithm reducesthe threshold and repeats the two passes until the bit budget ismet.Recently, many modifications have been made to the SPIHTalgorithm, SPIHT-based coding representing a very active areaof research. For example, Pearlman et al. proposed a set-partitioningembedded block coding algorithm [16] to extend SPIHTto block-based image coding. SPIHT has also been modified [6]for real-time image and video transmission using optimal errorprotection. In addition, in [12], an efficient color image compressionalgorithm has been proposed based on the SPIHT algorithm.B. Modified SPIHT AlgorithmAt first thought, the weighted wavelet coefficients, whichhave a larger dynamic range than the original ones, may makethe SPIHT algorithm inefficient, because this may result inmore scans of the wavelet coefficients. In fact, the performanceof the modified SPIHT with set to zero is slightly better thanFig. 4. Example of 8-bit grayscale high altitude aerial image, with two distincttextures: a rural area and an urban area.the original SPIHT for the data considered. This is becausewe optimize the quantization bin size with the constraintthat the average entropy of the quantized wavelet coefficientsdoes not change, which assigns the larger quantization bins tothe wavelet coefficients deep in the tree. We demonstrate theimportance weights and the R(D) performance using a highaltitude aerial imagery with two textures (see Fig. 4). This is agrayscale high-altitude optical aerial image of size 256 256pixels from the USC-SIPI image database, with 8 bits/pixel. Inthe remainder of the text, we will return to the data in Fig. 4to analyze various aspects of our algorithm and to providecomparisons to other approaches. The results presented hereare representative of performance we have observed based onan extensive study of the full USC-SIPI data base.The importance weight is the reciprocal of the quantizationbin size obtained via the algorithm proposed in Section III-C,as shown in Fig. 5, with representing the first LL waveletcoefficient and the others sequentially representing LH, HL, orHH wavelet coefficients. The larger the index , the deeper thecorresponding wavelet coefficients reside in the spatial orientationtree defined in SPIHT. We compare the MSE at different bitrates for SPIHT and modified SPIHT (MSPIHT) with .Asshow in Fig. 6, MSPIHT with has a slightly smaller MSEthan the original SPIHT at low bit rates.In these and subsequent experiments, we choose theDaubechies 9/7 biorthogonal wavelet transform, because it issymmetric, almost orthogonal and gives the best results fordyadic sampled images [23]. Since the wavelet is symmetric,we use nonexpansive symmetric extension wavelet transformto handle the boundary problem [2].As increases, we modify the quantization bin sizes to betterfit the classification task. This will deteriorate the zero treecoding efficiency. However, the sparseness of KMP impliesthat if only a small number of coefficients are important forclassification, only a small number of wavelet coefficients binswill be inconsistent with SPIHT. Specifically, examining(20) reveals that when for the th wavelet coefficient, thecorresponding is only explicitly driven by the conventionalMSE. However, all are determined jointly, based on the

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 719Fig. 5. Importance weight of MSPIHT ( =0)based on 8 2 8 blocks. Resultsare shown for data in Fig. 4. i =1represents the LL wavelet coefficients andthe others represents LH, HL, or HH wavelet coefficients from coarse to finescale.Fig. 7. MDL value as a function of the texture number. We set the number oftextures to M =2by choosing the one with minimum MDL value. Results areshown for data in Fig. 4.Fig. 6. MSE is compared between SPIHT and MSPIHT ( =0)using theimportance weight shown in Fig. 5. Results are shown for data in Fig. 4.Fig. 8. Posterior probability estimated using 2-HMT-mixture model for datain Fig. 4. The dark regions are characterized as belong less urban.overall entropy constraint defined by the last term in (20). Inthis manner, all are directly or indirectly affected by theclassification objective. In Section V, we present several resultsto explain the performance of the modified-SPIHT algorithmfor .V. EXPERIMENTS AND RESULTSA. Automatic Image SegmentationWe, again, consider the data from Fig. 4. After a three-levelwavelet decomposition, we obtain wavelet trees of size 64.We train the parameters of the 2-HMT-mixture model by the EMalgorithm described in Section II. We set the number of texturesto based on the MDL criterion (see Fig. 7). In Fig. 8,we show the posterior probabilities of one texture component.We see that most of the urban areas are bright, which representshigher likelihood and the rural areas are darker, indicating lowerlikelihood. The segmentation is consistent with human visualrecognition.While the segmentation results presented above are interesting,in that they correspond to an actual image, it is useful toconsider an example for which the data is simulated and thereare known to be textures (i.e., there is no subjectivityin assigning texture labels). We have, therefore, synthesizedtextures. To synthesize these three textures we havethree different HMT models (model parameters including theinitial-state probability vectors of the root nodes for LH, HL,and HH band, the state-transition matrices from parent nodesto children nodes, the mean and standard deviation of eachstate for each wavelet coefficient, and LL band probabilitydistribution parameters). We first synthesize HMT state sequencesusing the initial-state and state-transition probabilities.Given the state sequences and the state-dependent probabilitydistribution parameters (Gaussian distribution), we synthesizeblocks of LH, HL, and HH wavelet coefficients by a randomgeneration processes (driven by the HMT state-dependentstatistics). The LL coefficients are synthesized separately accordingto their probability distributions. By inverse wavelettransform, we have the three synthesized textures in the imagedomain. The example images are shown in Fig. 9, as is theMDL parameters as a function of . It is observed thatis correctly predicted.

720 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 9. Top two figures and the bottom left one are the synthesized textures using three different HMT models. The bottom right figure plots the MDL value asa function of M using the mixture of the three textures. M =3textures gives the minimum MDL value.As in all examples presented here, distinct HMTs are definedfor each texture and for the LH, HL, and HH wavelet quadtrees.The different wavelet bands are assumed to be statistically independent.For these synthesized images, we have consideredwavelet levels, and the probability of correct HMT segmentationwas 98.9%.B. Robustness and Sparsity of KMP ClassifierHaving demonstrated the utility of the automatic segmentationalgorithm, using measured (Fig. 4) and synthesized(Fig. 9) data, we now address example results for weighting thewavelet coefficients based on their importance for segmentationand MSE (using algorithmic details from Section III). In thissection, we examine the effectiveness of KMP for achievingthis task. To demonstrate the ability of KMP to select “informative”predictor variables in the presence of irrelevant ones,we generate -dimensional data vectors from two Gaussianclasses, with means(24)In both cases, the covariance matrix is the identity matrix. Regardlessof the dimensionality , the Bayes error rate is 0.159.The optimal classifier is linear and uses only the first two dimensionsof the data. The average error rate of the KMP method iscompared with the Bayes error, the classification performanceof the relevance vector machine (RVM) [22], and also the performanceof the linear least squares regression model (LSRM) [9].Fig. 10. Feature selection ability of KMP: The classification error rates arecompared among RVM, KMP, Bayes, and LSRM classifiers as the functionof the total feature dimension d defined in (24). Results are shown for datasynthesized in Section V-B.The average errors with respect to different feature dimensionsare plotted in Fig. 10. All three of these approaches yield alinear classifier in feature space, with the RVM weights [22] applieddirectly to the feature components. More generally a nonlinearRVM could be employed using a kernel [22], but in thiscase the sparseness would not be on the feature components; itis possible to extend the kernel-based RVM such that sparsenessis manifested on the features [13], but this algorithm was

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 721Fig. 11. Feature selection ability of KMP: The weight amplitudes of thefeatures are compared between KMP and LSRM classifiers. Results are shownfor the same data as used in Fig. 10.Fig. 12. Classification errors as the function of MSE at bit rates of 0.2, 0.3,0.4, 0.5, and 0.6 bpp are compared among MSPIHT ( = 50000), MSPIHT( =0), and the original SPIHT. Results are shown for data in Fig. 4.not found to be significantly better than the nonkernel RVM forthe data considered here.These results shows that the KMP algorithm is robust inthe presence of irrelevant feature variables, as compared tothe linear least squares regression model and also the RVM, apopular sparse classifier. We plot the weight amplitude of eachfeature dimension in Fig. 11. We see that most of the featuresare removed by a zero weight, and the first two features havethe dominant weight values. For LSRM the weight amplitudesof many irrelevant features are large, resulting in bad generalizationperformance. These results and others like themmotivate using KMP to select features, and, hence, to weighthe importance of wavelet coefficients. We iteratively select thewavelet coefficients that reduce the label regression error mostand estimate the wavelet-coefficient weights jointly with allthe selected coefficients. In the compression example presentedbelow we choose the first 10% most important wavelet coefficientsand calculate the wavelet-coefficient weights.In the last series of results, we have considered examples ofautomatic segmentation and the subsequent reweighting of thewavelet coefficients, based on their importance for a cost functionthat accounts for classification accuracy and MSE. We nowput these two steps together, integrated with modified SPIHT, toexamine overall algorithm performance.C. Modified SPIHT Versus Original SPIHTIn our first example we consider the measured imagery inFig. 4. In Fig. 12, we compare the classification errors as a functionof MSE error. The definition of “truth” is guided by the resultsof the HMT segmentation, with a threshold of 0.5 appliedto the results in Fig. 8, to yield the “true” segmentation usedfor scoring. For the modified SPIHT (MSPIHT) algorithm wemust first send the relative weighting of the wavelet coefficients(so that the decoder can undo the wavelet-coefficient weighting,and perform an inverse wavelet transform to obtain an approximationto the original image). For levels, as consideredhere, this corresponds to 64 coefficients. These coefficients aresent using 4 bits per coefficient. When quantizing the image at0.5 bpp, 0.8% of the total bits sent correspond to the wavelet-coefficientweights.The MSPIHT results with correspond to accounting forMSE alone, as indicated in (8). We observe that MSPIHT withprovides slightly better classification performance thanSPIHT. Recall that MSPIHT employs nonuniform quantizationbins, even when . Considering , for whichsignificant importance is placed on classification performance,we note a marked reduction in classification error. The pointson the curves in Fig. 12 correspond to 0.2, 0.3, 0.4, 0.5, and0.6 bits per pixel (bpp). While yields improvedclassification performance, we suffer in the context of MSE.The MSE at 0.6 bpp is comparable to that of 0.4 bpp, whencomparing MSPIHTand SPIHT, respectively.To avoid issues inherent to defining the “true” segmentationof measured imagery, and to have better control over the propertiesof the wavelet coefficients, we consider synthetic data. Wenow consider an original texture (right half of Fig. 13), performa wavelet decomposition, add white Gaussian noise (WGN) towavelet coefficients in particular subbands and then synthesizethe image via inverse wavelet transform. In this example, weconsider wavelet levels, adding WGN to half of the 16finest LH wavelet coefficients. Since the coefficients with additivenoise are the only ones that distinguish the statistics ofthe two synthesized textures, we expect these to be deemed importantfor the classification task. The average signal-to-noiseratio (SNR) for these wavelet coefficients is dB, definedas , where is the average squared amplitudefor the coefficients to which noise is added, and is the noisevariance.We compare the performance at bit rates of 0.2, 0.3, 0.4,0.5, and 0.6, with results presented in Fig. 14. The MSPIHTalgorithm withdemonstrates significantly improvedclassification performance, relative to original SPIHTand MSPIHT for (for which only MSE is considered).

722 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006Fig. 13. Gaussian noise at 03 dB SNR is added to the finest LH waveletcoefficients of the right half texture, to obtain the synthetic textures shown inthe left half.Fig. 15. Gaussian noise at 03:5 dB SNR is added to the second-level LHwavelet coefficients of the right half texture, to obtain the synthetic texturesshown in the left half.Fig. 14.Classification errors as the function of MSE are compared amongMSPIHT ( = 50000), MSPIHT ( =0), and the original SPIHT. The bitrates are 0.2, 0.3, 0.4, 0.5, and 0.6 bpp. Results are shown for data in Fig. 13.As indicated above, there is potential for disturbance of zerotrees via the coefficient reweighting. This implies that the performanceof the MSPIHT algorithm may be impacted by whichparticular wavelet coefficients are scaled to be large (where inthe wavelet tree). In the pervious example the significant waveletcoefficients (for which noise was added) were at the finest level.We now consider a synthetic image as in the left half of Fig. 15,but now the white Gaussian noise is added to the coefficientsin the second LH band (for levels, as in the previousexample). In this case, the SNR, as defined above, is dB.Results are presented in Fig. 16. We again considerfor the MSPIHT results for which classification is emphasized.In Fig. 16, we note that MSPIHT for provides excellentclassification performance for low MSE (highest bpp), betterthan MSPIHT with . This is attributed to the factthat the coefficients that are important are relatively large inamplitude (lower scale than in Fig. 14). These large-amplitudeFig. 16.Classification errors as the function of MSE are compared amongMSPIHT ( = 50000), MSPIHT ( =0), and the original SPIHT. The bitrates are from 0.2 to 0.6 bpp at the interval of 0.05 bpp. Results are shown forsynthesized data in Fig. 15.coefficients, which are also important for classification arereconstructedwell by SPIHTand MSPIHT with . However,at lower bpp (higher MSE), the classification improvementof MSPIHT withis more evident. Note that inFig. 14, for which the small-amplitude finest-level coefficientsare important for classification, the MSPIHT withyields better classification performance for all MSE (bpp)considered.D. Modified SPIHT and Bayes VQThe Bayes tree-structured vector quantization (B-TSVQ) algorithmintroduced by Oehler and Gray [15] is a joint compressionand classification technique. It combines classification andcompression into a single vector quantizer by incorporating aBayes risk term into the distortion measure. For large block

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 723Fig. 17. Training imagery for B-TSVQ, with labeled “urban” and “rural”classes.sizes, B-TSVQ performance approaches the theoretical rate-distortionbound [5]. TSVQ has the limitations of computationalcomplexity, a requirement of knowledge of the posterior probability,and it also requires the availability of a large trainingset. In image processing, the VQ block size is usually 4 4orsmaller because of computational constraints [20].In our first comparison, we consider the measured imageryin Fig. 4, with the “true” segmentation defined as discussedabove (dictated by the results of the HMT segmentation). TheMSPIHT results are extensions from Fig. 12, forand . The MSPIHT results are computed adaptively onthe imagery in Fig. 4, without any a priori training data. By contrast,the B-TSVQ requires training data to design the tree-structuredcodebook and to build the associated classifier (a look-uptable, that maps a code to a texture). In Fig. 17, we present separatetraining data, from the “rural” and “urban” classes, usedto train the B-TSVQ algorithm. These data are distinct examplesfrom the same USC-SIPI database from which Fig. 4 wasacquired. We consider a bit rate of 0.35 bit/pixel. To achievethis bit rate we run the required number of MSPIHT iterations,while for B-TSVQ the bit rate is dictated by the number of thecodes and size of each block. We here consider a codebook ofsize 49, and each block is of size 4 4. The MSPIHT classificationis based upon two wavelet levels, to be consistent withthe 4 4 blocks used by B-TSVQ. However, to improve codingefficiency, the MSPIHT is run for levels (only two ofwhich are used in the classifier). To run MSPIHT with ,wefirst run the iterative algorithm in Section III-C for . Thewavelet and scaling coefficients are then weighted as so determined.The subsequent three wavelet levels are then performedon these weighted coefficients.We also show MSPIHT results for a classifier based on threelevels, corresponding to 8 8 blocks. The results in Fig. 18, forboth MSPIHT and B-TSVQ, are computed by controlling theLagrange multiplier that dictates the balance between concentratingon MSE and classification error. The results indicate thatMSPIHT has better compression performance than B-TSVQ(smaller MSE), but B-TSVQ has more sensitivity to the Lagrangian-driventradeoff between MSE and classification (althoughin these results the B-TSVQ MSE does not change substantiallyas the Lagrange multiplier changes).Fig. 18. Classification error as a function of MSE, at a bit rate of 0.35 bpp.For both B-TSVQ and MSPIHT the variation in classification error and MSE iscontrolled by adjusting a respective Lagrange multiplier. Results are shown fordata in Fig. 4.Fig. 19. Classification error as the function of MSE with Lagrangianmultipliers increasing are compared between B-TSVQ and MSPIHT at bit ratesof 0.35 bpp. Results are shown for data in Fig. 13.To complement the results discussed above (Fig. 4), for whichthere may be some uncertainty as to actual “truth,” we also considersynthetic data. In Fig. 19, we study the tradeoff betweenMSE and classification error for B-TSVQ and MSPIHT for thebit rate 0.35 bit/pixel, for the data in Fig. 13, which for B-TSVQcorresponds to a codebook of size 49, for blocks of size 4 4.Again, the MSPIHT classification is based upon two waveletlevels, to be consistent with the 4 4 blocks used by B-TSVQ,and to improve coding efficiency the MSPIHT is run forlevels. We also show MSPIHT results for a classifier based onthree levels, corresponding to 8 8 blocks. Separate trainingdata with the same statistics were used to build the codes forthe B-TSVQ. When comparing results for 4 4 blocks, we notethat the MSPIHT algorithm performs best for high classificationerror (lower MSE), with this attributed to the fact that theMSPIHT algorithm is effectively employing larger block sizes

724 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCJ 2006TABLE ICPU (PENTIUM 4, 1.8-GHZ) TIME OF HMT SEGMENTATIONAND WEIGHT ESTIMATION PROCESSESthan the B-TSVQ. Recall that, although classification is performedwith two levels, wavelet levels are used in theMSPIHT algorithm. For low classification error (high MSE), heB-TSVQ algorithm performs better for the 4 4 blocks, whichmay be attributed to the fact that B-TSVQ exploits training data.While the B-TSVQ is limited to blocks of size 4 4 (for computationalreasons), the MSPIHT algorithm is readily applied tolarger blocks. In Fig. 19 we also show MSPIHT results for 8 8blocks, for which we note that these results are superior to thoseprovided by either algorithm when considering 4 4 blocks.E. Computation Cost of MSPIHTIt has been demonstrated that using the MSPIHT algorithmone may reduce the probability of classification error at a givenbit rate, with an associated increase in MSE. However, thereis a computational cost associated with performing the HMTbasedsegmentation and wavelet-coefficient weighting. Clearlythis cost must be sufficiently small to warrant doing somethingother than conventional coding (SPIHT). The HMT segmentationcost scales linearly with the number of wavelet trees, andthe number of wavelet trees scales with order . Hence,the cost of HMT segmentation scales as . The cost ofestimating the wavelet weights is typically far less expensivethan that of HMT-based segmentation. In Table I we tabulatethe CPU time (Pentium 4, 1.8-Hz) for HMT segmentation andcoefficient weighting, for typical images (like those shown here)as a function of . All codes were written in C.It is important to emphasize that one need not perform HMTsegmentation and computation of new coefficient weights foreach new image. Initially one may perform these tasks. However,subsequently these coefficient weights can be retained untilnew data warrants a change. A change in the image statisticsmay be detected via the current set of HMTs. If new data arefound to be associated with these HMTs with low likelihood,the segmentation can be refined and the coefficient weightingrecomputed.In this manner, he coding algorithm adapts to new imagery,always adjusting to achieve a desired balance between classificationand MSE accuracy. This would be very difficult with aB-TSVQ algorithm, since it is often prohibitively expensive torefine the codebook as data with new statistics are observed.VI. CONCLUSIONWe have designed a scheme for pruning and weighting thewavelet coefficients before wavelet coding, with the ultimategoal of image classification. We demonstrated improved segmentationperformance of the decoded image, with little decreasein MSE image quality at low bit rates. We also proposeda modified SPIHT algorithm, using the importance weight information,to focus SPIHT on wavelet coefficients of importancefor classification. We tested the method on high altitudetwo-texture aerial photographic imagery and also using synthesizeddata. The B-TSVQ is a widely used joint compressionand classification technique. Using measured and synthesizeddata, we found that the compression and classification performanceof the modified SPIHT algorithm is comparable to thatof B-TSVQ. The modified SPIHT algorithm has the advantagesof not requiring codebook design and it is not limited in the sizeof the blocks used for classification.We have considered the measured imagery in Fig. 4 forseveral of the example results. These data were acquired fromthe USC SIPI data base, and represent only one example fromnumerous test cases we have considered. The results presentedhere are typical of the performance observed on such data.We have also considered canonical textures from the USCSIPI data base (see http://sipi.usc.edu/services/database/database.cgi?volumetextures). The segmentation performanceon such textural data was observed to be consistent with resultspresented here for synthesized data based on HMT synthesis(see, for example, Fig. 9). We have not shown texture-segmentationresults here on USC SIPI canonical textures in an effort toconserve space, and because the results are analogous to thoseobserved for the HMT-generated synthetic data. Moreover,by utilizing the HMT-generated synthetic textures, we have“truth” as to which wavelet coefficients are most relevant for theclassification task (of interest for examining the performanceof the KMP component of the algorithm). For the data in theaforementioned USC SIPI data base, it is difficult to definetruth as to the most-relevant wavelet coefficients.It is of interest to make the connection of the work reportedhere to TSVQ [7] and B-TSVQ [15]. In TSVQ and B-TSVQne performs a sequence of binary question, with a 1 or 0used to represent the result of each test. The sequence ofones and zeros defines a traversal through the tree, therebydefining a code (the sequence of ones and zeros providethe index for a corresponding code). One requires trainingdata to build the codebook and corresponding tree, and thecodebook and tree must be known at the decoder. These factsand computational requirements necessitate relatively smallcode sizes (typically 4 4 [20]). As Gray and Neuhoff [8]have pointed out, zero-tree wavelet coding is closely relatedto VQ. In fact, the SPIHT algorithm may be viewed as ageneralized TSVQ algorithm. In particular, in SPIHT, one asksa series of binary questions, as the LIS, LIP and LSP lists arebuilt and refined. Moreover, binary questions are sequentiallyasked during the sorting and refinement passes. If we fix theaverage bit rate (bpp) of the SPIHT algorithm, there are a finitenumber of ways the SPIHT algorithm may answer these binaryquestions and yield the desired bpp. This may be viewed as afinite (but very large) codebook, with the sequence of binaryquestions used to define a desired code. The major advantageof wavelets and SPIHT is that we do not need to explicitlybuild, store or search the codebook, because it is inherent tothe structure of the wavelet transform, known at the decoderand encoder. Consequently, SPIHT requires no training data.Further, the modified SPIHT algorithm may be viewed asa generalization of B-TSVQ. The significant advantage ofmodified SPIHT is that no training data is required, and thatit is, therefore, adaptive to new data. As the data changes,

CHANG AND CARIN: MODIFIED SPIHT ALGORITHM FOR IMAGE CODING 725the encoder may inexpensively send the decoder the updatedwavelet weights, and implicitly the modified SPIHT algorithmrefines its B-TSVQ like codebook automatically. As Gray andNeuhoff [8] explain, this connection of wavelet coding toVQ plays an important role in the SPIHT performance, andto the modified SPIHT algorithm presented here. Additionaladvantages of SPIHT and modified SPIHT vis-a-vis TSVQand B-TSVQ are that SPIHT is an embedded encoder, and,hence, a partial reconstruction can be performed if the bitstream is stopped. Moreover the implicit size of the SPIHTcodes, for a given bpp, extend over the support of the entireimage (not limited to 4 4, for example).REFERENCES[1] J. Bilmes, “A Gentle Tutorial on the EM Algorithm and its Applicationto Parameter Estimation for Gaussian Mixture and Hidden MarkovModels,” Univ. California, Berkeley, Tech. Rep. ICSI-TR-97-021, 1997.[2] C. M. Brislawn, “Classification of nonexpansive symmetric extensiontransforms for multirate filter banks,” Appl. Commun. Harmon. Anal.,vol. 3, 1996.[3] P. C. Cosman, C. Tseng, R. M. Gray, R. A. Olshen, L. E. Moses, H.C. Davidson, C. J. Bergin, and E. A. Riskin, “Tree-structured vectorquantization of ct chest scans: Image quality and diagnostic accuracy,”IEEE Trans. Med. Imag., vol. 12, no. 12, pp. 727–739, Dec. 1993.[4] M. Crouse, R. Nowak, and R. Baraniuk, “Wavelet-based statistical signalprocessing using hidden Markov models,” IEEE Trans. Signal Process.,vol. 46, no. 4, pp. 886–902, Apr. 1998.[5] Y. Dong, “Rate Distortion Analysis of Joint Compression and Classification:Application to HMM State (Pose) Estimation via Multi-AspectScattering Data,” Ph.D. dissertation, Duke Univ., Durham, NC, 2002.[6] M. Farshchian, S. Cho, and W. A. Pearlman, “Optimal error protectionfor real-time image and video transmission,” IEEE Signal Process. Lett.,vol. 11, no. 10, pp. 780–783, Oct. 2004.[7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.Norwell, MA: Kluwer, 1992.[8] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory,vol. 44, no. 5, pp. 2325–1283, Oct. 1998.[9] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning:Data Mining, Inference, and Prediction. New York:Springer-Verlag, 2001.[10] M. Inoue and N. Ueda, “Exploitation of unlabeled sequences in hiddenMarkov models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no.12, pp. 1570–1581, Dec. 2003.[11] N. S. Jayant, J. D. Johnston, and R. J. Safranek, “Signal compressionbased on models of human perception,” Proc. IEEE, vol. 81, no. 10, pp.1385–1424, Oct. 1993.[12] A. A. Kassim and W. S. Lee, “Embedded color image coding using spihtwith partially linked spatial orientation trees,” IEEE Trans. Circuits Syst.Video Technol., vol. 13, no. 2, pp. 203–206, Feb. 2003.[13] B. Krishnapuram, A. Harternink, L. Carin, and M. Figueiredo, “Abayesian approach to joint feature selection and classifier design,” IEEETrans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1105–1111, Sep.2004.[14] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,”IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415,Dec. 1993.[15] K. L. Oehler and R. M. Gray, “Combining image compression and classificationusing vector quantization,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 17, no. 5, pp. 461–473, May 1995.[16] W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said, “Efficient,low-complexity image coding with a set-partitioning embedded blockcoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 11, pp.1219–1235, Nov. 2004.[17] K. O. Perlmutter, S. M. Perlmutter, and R. M. Gray, “Bayes riskweighted vector quantization with posterior estimation for imagecompression and classification,” IEEE Trans. Image Process., vol. 5,no. 2, pp. 347–360, Feb. 1996.[18] J. Rissanen, “Modeling by shortest data description,” Automatica, vol.14, 1978.[19] A. Said and W. A. Pearlman, “A new fast and efficient image codec basedon set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 6, no. 2, pp. 243–250, Feb. 1996.[20] M. J. T. Smith and A. Docef, A Study Guide for Digital Image Processing,1st ed. Singapore: World Scientific, 1998.[21] D. S. Taubman and M. W. Marcellin, JPEG2000: Image CompressionFundamentals, Standards, and Practice. Norwell, MA: Kluwer, 2001.[22] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.[23] J. D. Villasenor, B. Belzer, and J. Liao, “Wavelet filter evaluation forimage compression,” IEEE Trans. Image Process., vol. 4, no. 8, pp.1053–1060, Aug. 1995.[24] P. Vincent and Y. Bengio, “Kernel matching pursuit,” Mach. Learn., vol.48, no. 1–3, pp. 165–187, 2002.[25] Z. Wang and A. C. Bovik, “Embedded foveation image coding,” IEEETrans. Image Process., vol. 10, pp. 1397–1410, Oct. 2001.[26] R. S. Zemel, “A Minimum Description Length Framework for UnsupervisedLearning,” Ph.D. dissertation, Univ. Toronto, Toronto, ON,Canada, 1993.Shaorong Chang was born in Henan, China, in May1978. She received the B.S. degree in electronic engineeringfrom Tsinghua University, Beijing, China,in 2000, and the M.S. degree in electrical engineeringfrom Duke University, Durham, NC, in 2001, whereshe is currently pursuing the Ph.D. degree.Her primary research interests are in image compressionand classification.Lawrence Carin (SM’96–F’01) was born on March 25, 1963, in Washington,DC. He received the the B.S., M.S., and Ph.D. degrees in electrical engineeringfrom the University of Maryland, College Park, in 1985, 1986, and 1989, respectively.In 1989, he joined the Electrical Engineering Department, Polytechnic Universityof Brooklyn, Brooklyn, NY, as an Assistant Professor, where he becamean Associate Professor in 1994. In September 1995, he joined the ElectricalEngineering Department, Duke University, Durham, NC, where he is now theWilliam H. Younger Professor of Engineering. He was the principal investigator(PI) on a Multidisciplinary University Research Initiative (MURI) on demining(1996–2001) and he is currently the PI of a MURI dedicated to multimodal inversion.His current research interests include short-pulse scattering, subsurfacesensing, and wave-based signal processing.Dr. Carin is a member of the Tau Beta Pi and Eta Kappa Nu honor societies.He was an Associate Editor of the IEEE TRANSACTIONS ON ANTENNAS ANDPROPAGATION from 1996 to 2004.

A Modified SPIHT Algorithm for Image Coding With a Joint MSE and ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?