Transform coding techniques for lossy hyperspectral data compression

Transform coding techniques for lossy 

hyperspectral data compression 

Barbara Penna, Member, IEEE, Tammam Tillo, Member, IEEE, 

Enrico Magli, Member, IEEE, Gabriella Olmo, Member, IEEE 

Abstract 

Transform-based lossy compression has a huge potential for hyperspectral data reduction. Hyperspectral 

data are three-dimensional, and the nature of their correlation is different in each dimension. 

This calls for a careful design of the 3D transform to be used for compression. 

In this paper we investigate the transform design and rate allocation stage for lossy compression 

of hyperspectral data. Firstly, we select a set of 3D transforms, obtained by combining in various ways 

wavelets, wavelet packets, the discrete cosine transform, and the Karhunen-Loève transform (KLT), 

and evaluate the coding efficiency of these combinations. Secondly, we propose a low-complexity 

version of the KLT, in which complexity and performance can be balanced in a scalable way, allowing 

one to design the transform that better matches a specific application. Thirdly, we integrate this, as 

well as other existing transforms, in the framework of Part 2 of the JPEG 2000 standard, taking 

advantage of the high coding efficiency of JPEG 2000, and exploiting the interoperability of an 

international standard. 

We report experimental results on AVIRIS scenes. It is shown that the scheme based on the 

proposed low-complexity KLT significantly outperforms previous schemes as to rate-distortion performance. 

We also carry out some experiments in order to evaluate the effect of lossy compression on 

image classification using the spectral angle mapper method. It turns out that classification accuracy 

is reasonably correlated with the mean squared-error; therefore, the proposed scheme exhibits considerably 

better performance than the state-of-the-art also in terms of this more application-related 

quality assessment. 

Index Terms 

Lossy compression; hyperspectral data; wavelets; wavelet packets; KLT; DCT; JPEG 2000; 

SPIHT 3D. 

The authors are with CERCOM (Center for Multimedia Radio Communications), Dip. di Elettronica, Politecnico 

di Torino, Corso Duca degli Abruzzi 24 - 10129 Torino - Italy - Ph.: +39-011-5644195 - FAX: +39-011-5644099 

- E-mail: barbara.penna(tammam.tillo,enrico.magli,gabriella.olmo)@polito.it. Corresponding 

author: Enrico Magli.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (SUBMITTED DEC. 2005) 1 

Transform coding techniques for lossy 

hyperspectral data compression 

I. INTRODUCTION 

Hyperspectral imaging amounts to collecting the energy reflected or emitted by ground targets at 

a typically very high number of wavelengths, resulting in a data cube containing tens to hundreds of 

bands. These data have become increasingly popular, since they enable plenty of new applications, 

including detection and identification of surface and atmospheric constituents, analysis of soil type, 

agriculture and forest monitoring, environmental studies and military surveillance. The data are usually 

acquired by a remote platform (a satellite or an aircraft), and then downlinked to a ground station. 

Due to the huge size of the datasets, compression is necessary to match the available transmission 

bandwidth. 

In the past, scientific data have been almost exclusively compressed by means of lossless methods, 

in order to preserve their full quality. However, more recently, there has been an increasing interest 

in their lossy compression. In fact, two of the most recent satellites, SPOT 4 and IKONOS, employ 

on-board lossy compression prior to downlinking the data to ground stations. As lossy compression 

allows for higher scene acquisition rates, several lossy algorithms have been designed for multispectral 

and hyperspectral images. 

Many of these techniques are based on decorrelating transforms, in order to exploit spatial and 

inter-band (i.e., spectral) correlation, followed by a quantization stage and an entropy coder. Examples 

include the JPEG 2000 standard [1], and set partitioning methods such as SPIHT and its variations 

(SPIHT-2D, SPIHT-3D, SPECK). Some authors have also proposed to employ the 3D Discrete Wavelet 

Transform (DWT) [2], [3], [4] and the 3D Discrete Cosine Transform (DCT) [5], [6]. 

Moreover, several methods that treat differently spectral and spatial redundancy have been investigated. 

A popular approach involves the combination of a one-dimensional spectral decorrelator such 

as the Karhunen-Loève Transform (KLT), the DWT, or the DCT, followed by JPEG 2000 employed as 

spatial decorrelator, rate allocator, and entropy coder (see e.g. [7], [8]); SPIHT has also been used for 

the same purpose [9]. In [10] and [11] a 3D version of SPIHT and a low complexity image encoder 

with similar features (Set Partitioned Embedded BloCK(SPECK)) are proposed for hyperspectral 

image compression, exploiting the wavelet packet transform. 

Although there has been a large amount of work on 3D coders, the proposed techniques have been 

often tested under different conditions and using different datasets; this makes it difficult to evaluate


the best combination of spatial and spectral transforms for a given application. On a related note, 

regardless of the fact that the KLT is the optimal transform in the coding gain sense, its practical 

application has been somewhat limited because of its complexity and of the fact that the transform 

is signal-adaptive; however, a few recent works have rediscovered the KLT and attempted to exploit 

its superior decorrelation capabilities [12]. 

In [13] vector quantization and spectral KLT are employed to exploit the correlation between 

multispectral bands. In [14], an efficient adaptive KLT algorithm for multispectral image compression 

is presented. The proposed technique exploits an adaptive algorithm to continuously adjust eigenvalues 

and eigenvectors when input image data are received sequentially. 

In [15], an integer implementation of the KTL followed by JPEG 2000 applied to each transformed 

component is presented. In [16] a hybrid 3D wavelet transform is used, employing JPEG 2000 

as spatial decorrelator, with full 3D post-compression rate-distortion optimization; this technique 

significantly outperforms state-of-the-art schemes. 

The highest-performance existing schemes take advantage of the high coding efficiency of the 

KLT as spectral decorrelator, and of JPEG 2000 as spatial decorrelator and entropy coder. However, 

a few issues are still open. Firstly, although the KLT is the optimal decorrelating transform, its 

complexity is very high, due to the need to estimate covariance matrices, solve eigenvector problems, 

and computing matrix-vector products. Secondly, most schemes employ JPEG 2000 separately on 

each band, allocating the same rate to each of them; this approach is obviously suboptimal, since the 

spectral transform unbalances the energy in different bands, and this can be exploited by differentiating 

the rate allocation. 

Although lossy compression allows for much higher compression ratios than lossless compression, 

it introduces degradation in the data. Therefore, as lossy compression has become more popular, 

researchers have started to investigate the quality issues associated with such information losses. Two 

important questions are 1) whether the metrics based on the mean-squared error (MSE) can adequately 

capture the effects of this degradation in typical remote sensing applications, and 2) whether other 

simple metrics exist, which are better then MSE at capturing these effects. Recently, a comprehensive 

investigation of this problem has been reported in [17], where the authors consider a set of quality 

metrics and a set of image degradations, including lossy compression. The sensitivity of each metric 

to each degradation is evaluated on hyperspectral data, and some general conclusions are drawn. It 

is shown that, taking e.g. spectral angle mapper (SAM) classification [18] as reference application, 

MSE turns out to be a reasonably good metric; however, more than one metric should be used if an 

accurate characterization of the degradation is required. 

This paper attempts to solve the transform coding problems outlined above, and builds on the


state-of-the-art by providing the following contributions. First of all, we report the results of a 

comprehensive experiment aimed at comparing the coding efficiency of several combinations of 

spatial and spectral transforms. The experiment is carried out in the framework of lossy compression 

of hyperspectral data, since these data have become very popular, and are more amenable to spectral 

decorrelation than multispectral data; in particular, a few AVIRIS scenes have been used to evaluate 

the selected transforms. We consider different transforms such as DCT, rectangular, square and hybrid 

wavelet and wavelet packet transforms [16], KLT, and various spatial/spectral combinations of these 

transforms; the evaluation procedure is designed so as to simulate global 3D rate allocation, as opposed 

to assigning the same rate to each transformed band. Secondly, we propose a new low-complexity 

version of the KLT, which provides performance similar to the full-featured KLT, with significantly 

lower computational complexity. Thirdly, we integrate the low-complexity KLT, as well as a few of 

the best existing transforms, into a practical compression scheme based on the JPEG 2000 standard. 

This choice allows us to define a compression scheme combining the flexibility and interoperability 

of an international standard with the high coding efficiency of JPEG 2000 and the KLT. In particular, 

the resulting scheme is compliant with the multicomponent transformation extension defined in Part 

2 of JPEG 2000 [19], and significantly outperforms the best existing lossy compression techniques. 

The comparison is carried out using MSE-based metrics, as well as investigating the effect of lossy 

compression on SAM classification. 

This paper is organized as follows. In Sect. II we analyze various decorrelating transforms. In 

Sect. III we outline a transform evaluation procedure and provide evaluation results on hyperspectral 

data. In Sect. IV we define the proposed KLT-based algorithm, whereas in Sect. V we report its 

performance evaluation. Finally, in Sect. VI we draw some conclusions. 

II. 3D TRANSFORM CODING STUDY 

Transform coding techniques are very attractive for image coding thanks to their good energy 

compaction characteristics. Since hyperspectral images exhibit both spatial and spectral redundancy, 

a 3D transform is a natural approach. To choose an efficient coding scheme, a comprehensive study 

with different transforms has been carried out. The transforms employed in the development of the 

proposed compression schemes are described in the following. Only separable extensions to multiple 

dimensions have been considered for simplicity. 

A. Karhunen-Loève Transform 

The KLT is the optimal block-based transform (in a statistical sense) for data compression, because 

it approximates a signal in the transform domain using the smallest number of coefficients, minimizing


the MSE between the reconstructed and original image. Defining the covariance matrix C X 

random row vector X with mean value µ X as 

of a 

C X = E[(X − µ X )(X − µ X ) T ] (1) 

the KLT transform matrix V is obtained by aligning columnwise the eigenvectors of C X . It can 

be shown that the transformed random vector Y = V T X has uncorrelated components, i.e. C Y = 

V T C X V is a diagonal matrix. 

Although the KLT is provably optimal, it has a few drawbacks. First, the transform matrix V is 

obtained as the solution of a numerically intensive eigenvector problem. Moreover, the transform is 

signal-adaptive; hence it has to be recomputed for each input vector, and it has to be transmitted 

along with the compressed data, thus causing a significant overhead. 

B. Discrete Cosine Transform 

The DCT is a popular technique for converting a signal into elementary frequency components. In 

particular, the DCT represents the input signal as a linear combination of weighted basis functions 

that are related to its frequency components. It is widely used in image compression, and is known 

to be close to optimal in terms of its energy compaction capabilities for Gauss-Markov processes; 

moreover, a number of fast algorithms have been developed to speed up the computation of this 

transform. A description of the DCT and its applications in signal coding can be found in several 

books, e.g. [20]. 

C. Discrete Wavelet Transform 

The DWT [20] is widely used in many signal processing fields, thanks to its ability to accurately 

describe both small-scale and large-scale components of a signal. The DWT is based on the principle 

that efficient decorrelation can be achieved by splitting the data into two half-rate subsequences, 

carrying information respectively on the approximation and detail of the original signal, or equivalently 

on the low- and high-frequency half-bands of its spectrum. Since most of the signal energy of realworld 

signals is typically concentrated in the lowpass frequencies, this process splits the signal in a 

very significant and a little significant part, leading to good energy compaction. The procedure can 

be iterated on the lowpass subsequence by means of the filter bank configuration shown in Fig. 1. 

D. Discrete Wavelet Packet Transform 

The discrete wavelet packet transform (DWPT) is a generalization of the DWT that offers a richer 

range of options for signal analysis. The DWPT is implemented by a filter bank in which also the


Input 

H l (z) 

H h (z) 

2 

2 

H l (z) 

H h (z) 

2 

2 

Input 

H l (z) 

2 

H l (z) 

H h (z) 

2 

2 

H h (z) 

2 

H l (z) 

H h (z) 

2 

2 

Fig. 1. Implementation of wavelet transforms by means of a filter bank scheme with lowpass and highpass filters denoted 

as H l (z) and H h (z) respectively. (a) DWT and (b) DWPT. The circles denote subsampling by a factor of two. 

highpass outputs are allowed to be further split into approximation and detail. If both the lowpass and 

highpass sequences are always split, the system is said to be a complete wavelet packet transform. 

However, it is not necessary for the transform to be complete; for any given input signal, there exists 

an optimal choice of highpass and lowpass iterations that captures most of the input signal correlation, 

which is known as best basis wavelet packet transform. Given an appropriate cost function, a search 

algorithm adaptively selects the best basis for a given signal. 

Different cost functions can be employed, e.g. entropy, minimum distortion, minimum number of 

coefficients above a certain threshold [21], or rate-distortion optimization [22]. Our purpose is to 

select the best 3D transforms in terms of energy compaction; hence, the cost function in [22] is 

not suitable because it explicitly takes quantization into account, while our analysis aims at being 

independent of the specific quantization scheme employed. We have found that the coding gain, 

which is a performance measure of transform efficiency [20], is a very good and theoretically sound 

objective function for seeking the best decomposition tree. Assuming a DWPT with l decomposition


levels, the coding gain G (l−1)→l is defined as 

G (l−1)→l = 

( ∏ N 

i=1 (σi l )2 ) 1 N 

where σ l−1 is the standard deviation of the transformed coefficients in a subband at level l − 1, and 

σ i l 

are the standard deviations of the i =1,...,N subbands stemming from a further decomposition; 

note that the denominator is the geometric mean of the subband variances. For example, N is equal 

to two for a one-dimensional transform, and to four and eight for a 2D and 3D transform respectively. 

The energy distribution of the transformed subbands is supposed to be highly unbalanced, i.e. a small 

percentage of the subbands concentrates a high percentage of the total energy. The more unbalanced 

is the distribution, the lower is the geometric mean. If G (l−1)→l > 1 the decomposition at level l is 

σ 2 l−1 

retained, otherwise the current subband at level l − 1 is kept. 

This procedure has a very intuitive interpretation in terms of average distortion after reconstruction. 

Given a total coding rate, high-resolution quantization theory can be applied to the bit-allocation 

problem among the eight subbands at level l. This yields [23] that the optimal average distortion 

for a Gaussian source is D = γ( ∏ N 

i=1 σ i) 1 N 2 −2R , with γ a given constant. Moreover, the distortion 

if the decomposition at level l − 1 is retained is proportional to σl−1 2 . Therefore, the coding gain is 

simply the ratio between the distortions respectively obtained by keeping the current representation 

and performing an additional decomposition step. As a consequence, under the high-rate and Gaussian 

assumptions, maximizing the coding gain amounts to picking the branch of the decomposition tree 

that minimizes the average distortion of the reconstructed data. It is worth noting that, since this 

rate-distortion model embeds the quantizer effect in the constant γ, using the coding gain allows 

one to obtain the optimal decomposition tree taking quantization into account, but without making 

any explicit assumption on the employed quantizer. This is very important, as we are interested in 

comparing only the transforms and not the quantizers, although in a practical compression algorithm 

the transform will have to be followed by a quantizer. 

E. Multidimensional extensions of wavelet transforms 

When the DWT and DWPT have to be applied to 3D data set, multidimensional extensions of 

either transform are required. In the following we will consider three possible extensions, which 

are referred to as square, rectangular and hybrid transforms. Our description refers to the DWT; the 

generalization to DWPT is straightforward. 

A square 2D transform is such that first one decomposition level is computed in all dimensions. 

Then, the (multidimensional) approximation subband is considered, and a new iteration is applied to 

it.


A rectangular 2D transform is such that first the complete 1D wavelet transform (i.e., all 

decomposition levels) is computed in one dimension, and then the complete transform is applied 

to the second dimension. 

In 3D, a square transform is obtained by first computing one decomposition level in all dimensions, 

and then iterating on the LLL cube. Conversely, the rectangular transform is obtained by first applying 

the complete transform along the first dimension, then along the second one, and finally along the 

third one. 

In 3D, hybrid transforms can also be obtained as in [16] by first applying the complete transform 

in one dimension, and then taking a 2D square transform in the other two dimensions. The obtained 

transform is referred to as 3D hybrid rectangular/square DWT. 

F. 3D transforms selected for evaluation 

The previously described one-dimensional transforms have been combined in various ways to obtain 

3D transforms for hyperspectral data. The most relevant combinations are reported in the following. 

As for filter selection in the DWT and DWPT, the (9,7) biorthogonal wavelet filter pair has been 

used throughout this work; this filter is known to provide excellent compression performance, and 

has been selected for inclusion in the JPEG 2000 standard. 

1) 3D square DWT: This method is based on the wavelet transform applied in all three dimensions 

simultaneously. In particular, one level of wavelet decomposition is applied along each of the three 

dimensions. This procedure is repeated on the obtained LLL cube, as opposed to the rectangular 

transform described in Sect. II-F.3. As an example, Fig. 2 shows in a pictorial way the subbands 

obtained by performing three levels of 3D-decomposition on a data cube as described above. The 

obtained decomposition has cubic subbands in 3D. 

Fig. 2. 

Subbands obtained by performing three levels of 3D square DWT on a data cube.


2) 3D square DWPT: In the 3D square DWPT, one level of wavelet packet decomposition is applied 

along the three dimensions; then, all the sub-cubes obtained by the wavelet packet decomposition may 

be further split so as to minimize an appropriate cost function. This procedure is repeated iteratively 

on each obtained cube for a given number of decomposition levels. 

3) Hybrid rectangular/square 3D transform: As hyperspectral data carry a lot of information in the 

spectral dimension, it is interesting to think of transforms that operate differently in the spectral and 

spatial directions, in order to match the different nature of those correlations. A way to obtain such 

multidimensional transform is the 3D hybrid rectangular/square wavelet transform. Fig. 3 depicts 

a hybrid 3D wavelet transform obtained with three decomposition levels as described. As can be 

seen, the subbands generated by this transform are rectangles in 2D and parallelepipeds in 3D. The 

number of subbands is higher than that of the classical square transform; in particular the frequency 

partitioning is such that high horizontal and low vertical frequencies lie in subbands where the basis 

functions are short and long in the horizontal and vertical dimensions respectively. The obtained 

frequency tessellation is finer, and has more radial symmetry than the square transform. 

Fig. 3. 

Hybrid rectangular/square 3D DWT subband decomposition of a data cube. 

4) Hybrid spectral wavelet packet and spatial square 2D wavelet transform: In this method a 

DWPT is first applied in the spectral dimension, while a 2D square DWT is applied in the spatial 

dimension. In other terms, this is equivalent to the hybrid rectangular/square 3D wavelet, in which 

the spectral DWT is replaced by a spectral DWPT. The cost function for DWPT is minimized in 

the third dimension considering the whole cube obtained by the 1D packet decomposition, and not 

the single 1D vectors. In fact, the separated optimization for each single vector could yield different 

bases. In this case, the transformed data cube would present discontinuities, which would penalize the 

performance of the next spatial 2D DWT stage. An other advantage is the reduced overhead, since a


single spectral decomposition tree has to be transmitted, instead of a separate tree for each spectral 

vector. 

Best basis selection using the coding gain yields the decomposition represented in Fig. 4. 

Consistently with the notion that spectral vectors have a significant information content, the obtained 

decomposition is finer than the classical dyadic wavelet tree in the high-frequency portion of the 

spectrum, and almost resembles a Fourier transform. 

L 

H 

L H L H 

L H L H L H 

Fig. 4. 

Best basis for the spectral DWPT. 

5) Hybrid spectral wavelet packet transform and spatial wavelet packet transform: In this method 

a wavelet packet decomposition is applied separately in the spectral and spatial dimensions. The cost 

function is minimized in the third dimension considering the cube obtained by the 1D DWPT; then, 

a 2D DWPT is evaluated on each single band. 

6) Hybrid spectral discrete wavelet transform and spatial wavelet packet transform: In this method 

a DWT is applied in the spectral dimension, while a 2D DWPT follows in the spatial dimension. 

7) Spectral discrete cosine transform and spatial discrete wavelet transform: This method applies 

a one-dimensional DCT transform to each spectral vector, and a 2D square DWT on each single band 

of the obtained transformed cube. 

8) Spectral KLT and spatial DWT: This method applies the KLT in the spectral dimension followed 

by the 2D square DWT along the spatial dimensions. In order to evaluate the transform matrix which 

optimally decorrelates the spectral dimension, we estimate the covariance matrix of the hyperspectral 

data cube assuming that each spectral vector, containing the radiance of a pixel at a given spatial 

location in all the bands, is a realization of the random process that has to be decorrelated.


In particular, given the hyperspectral data cube, i.e. B bands containing M lines and N samples, 

we form the M × N column vectors X ij =[x 1 ij ,x2 ij ,...,xB ij ]T , for i =1, ...M and j =1, ...N, 

where x k ij 

is the pixel with spatial coordinates (i, j) in band k. We employ the sample mean vector 

M x =[m x 1,m x 2,...,m x B], with m x 

k = 1 ∑ M ∑ N 

MN i=1 j=1 xk ij , as estimate of the ensemble averages 

of each band. 

For each spectral vector, we estimate its covariance matrix using one single realization, i.e. 

C X,i,j = (X ij − M x ) T (X ij − M x ). Finally, we compute the average covariance matrix C X = 

1 

∑ M 

∑ N 

MN i=1 j=1 C X,i,j. 

We solve the eigenvector problem for the symmetric matrix C X , obtaining the eigenvalues λ i and 

eigenvectors u i that satisfy C X u i = λ i u i . The KLT kernel is a unitary matrix V , whose columns 

are the eigenvectors u i arranged in descending order of eigenvalue magnitude. This matrix is used to 

transform each spectral vector, after subtracting its mean value, as Y ij = V T (X ij − M x ). 

The complexity of the decorrelation transform is the sum of three contributions. The first one is the 

evaluation of the covariance matrix (O(B 2 MN)); the second one is the solution of the eigenvector 

problem (O(B 3 ), [24]); the third one is the computation of transform coefficients (O(B 2 MN)). It 

can be observed that M, N and B are of the same order of magnitude, hence the second term is 

negligible with the respect to the first and third. The overall computational complexity is very high, 

and has so far limited the use of the KLT in practical applications. 

III. RESULTS OF TRANSFORM EVALUATION 

A. Evaluation procedure 

All the transforms described in Sect. II-F have been compared in terms of their energy compaction 

capability, which is a measure of the fraction of signal energy contained in a given number of transform 

coefficients. It is a very important property in image compression, since it provides an estimation of 

the effect of quantization in a practical coding scheme. The energy compaction property is evaluated 

for each transform by performing the following steps: 

• computing the 3D transform on a few hyperspectral scenes; 

• zeroing out a given percentage of transform coefficients, taken as those with the smallest 

magnitude; 

• computing the inverse 3D transform; 

• computing the peak signal-to-noise ratio (PSNR) with respect to the original image. 

Of particular importance is the fact that the coefficients to be zeroed out are taken in arbitrary 

order within the complete 3D set of transform coefficients, and not on a band by band basis. This is


akin to performing a 3D rate-distortion optimization, which is known to provide significantly better 

results than band-by-band optimization [16]. 

The performance evaluation is carried out on 16-bit radiance AVIRIS data cubes. AVIRIS scenes 

have 224 bands and 614 × 512 pixels resolution, but each scene has been cropped to 256 × 256 × 224 

pixels. Scene 4 of Cuprite and scene 3 of Jasper Ridge have been employed; for brevity, we only 

report the set of results for Cuprite. 

B. Energy compaction results 

For clarity, in Tab. I we summarize the acronyms used in the figures to identify the eight 3D 

transforms that have been evaluated. 

TABLE I 

ACRONYMS OF THE EVALUATED TRANSFORMS 

Acronym 

DWT3D 

DWP3D 

DWT1D2D 

DWP1D-DWT2D 

DWP1D-DWP2D 

DWT1D-DWP2D 

DCT1D-DWT2D 

KLT1D-DWT2D 

Sect. 

II.E.1 

II.E.2 

II.E.3 

II.E.4 

II.E.5 

II.E.6 

II.E.7 

II.E.8 

We anticipate that, not surprisingly, we have found that the spectral correlation plays a crucial role 

for compression, since the transforms that are better able to capture this correlation are those that 

rank best for compression. It is already known for lossless compression (see e.g. [25]) that large 

bit-rate reductions can be achieved by employing an efficient model of the spectral correlation. As 

will be seen, exploiting this correlation in the lossy case calls for the use of separate spectral and 

spatial transforms. 

In Fig. 5 we compare the DWT3D and DWP3D transforms. Neither transform is computed 

separately in the spectral dimension. As can be seen, the rate-distortion curve of the DWP3D transform 

is significantly better than that of the DWT3D. This is due to the fact that the 3D square wavelet 

transform is isotropic in all three dimension. This may not be the most appropriate correlation model 

of a hyperspectral dataset, since the subband decomposition in the spectral dimension is not as fine as 

it could be. Hence, because of the rather rough tessellation of 3D frequency space, the DWT3D turns


out to have poor performance as to energy compaction of spectral vectors. The DWP3D transform 

performs much better, since its ability to adaptively select the frequency tessellation allows it to 

refine the signal description along the spectral dimension, and hence to exploit much better the 

spectral correlation. 

The obtained decomposition tree is depicted in Fig. 6. It is possible to observe that both low and high 

spatial frequency components have been finely decomposed in the low spectral frequency components. 

Moreover, almost all the low spatial frequencies (along both the pixels and lines direction) have been 

finely decomposed. On the other hand, all the high frequency components along the three dimensions 

have not been further decomposed, as in the case of the 3D square DWT. 

100 

DWT3D 

DWP3D 

90 

80 

PSNR (dB) 

70 

60 

50 

40 

30 

50 55 60 65 70 75 80 85 90 95 100 

% of transform coefficients set to zero 

Fig. 5. 

Performance comparison of different transforms - DWT3D vs DWP3D. 

Fig. 7 compares the performance of wavelet and wavelet packet transforms computed separately in 

the spectral direction. Namely, we first compute a full 1D DWT or DWPT in the spectral direction, 

followed by a square 2D DWT or DWPT. The following remarks can be made. Spatial decorrelation is 

performed more effectively by the DWT than by the DWPT. This is somewhat counterintuitive, since 

one would expect that the optimized decomposition provides better results. As a matter of fact, in our 

evaluation procedure we zero out the least significant coefficients taken from the complete 3D set of 

transform coefficients, whereas the DWPT transform is optimized separately in the spectral and spatial 

directions. Thus, our procedure closely simulated a 3D rate allocation, which would work better with 

a three-dimensionally optimized transform. As an example, the 2D DWPT does not decompose the 

bands in the water absorption region because they contain little information; however, those bands


LLL 

lines 

bands 

pixels 

Fig. 6. 

gain. 

Sub-cubes obtained by the 3D wavelet packet decomposition minimizing the cost function based on the coding 

contain many high-valued coefficients that have to be retained, at the expenses of other coefficients 

that are discarded. Therefore, the mismatch between the 3D coefficient selection and the separatedness 

of the DWPT transform makes it useless, and even disadvantageous, to perform best basis selection. 

Comparing Fig. 7 and Fig. 5, it can be seen that the performance of the DWT1D2D transform is 

very similar to that of the DWP3D transform, but with a significantly reduced computational effort, 

since it is not necessary to compute the optimal basis. In fact, this transform has been selected in 

[16] for its favorable trade-off between performance and complexity. 

Continuing the study of spectrally separable transforms, Fig. 8 compares the DWT1D2D, DCT1D- 

DWT2D, and KLT1D-DWT2D transforms. Following the results described above, these transforms 

have been selected in order to compare different spectral decorrelators, using the 2D DWT for spatial 

decorrelation because of its effectiveness. Not surprising, the KLT turns out to be the best transform; 

since we are employing the same transform for all spectral vectors, the overhead of describing the 

transform matrix in the compressed file is negligible. The performance gain of the KLT1D-DWT2D 

with respect to the DWT1D2D is about 2 dB at high quality levels, and significantly more at low bitrates. 

However, as outlined in Sect. II-F.8, the KLT1D-DWT2D requires the estimation (and averaging) 

of as many covariance matrices as samples per band, followed by the solution of the eigenvector


100 

90 

DWT1D2D 

DWP1D−DWT2D 

DWT1D−DWP2D 

DWP1D2D 

80 

PSNR (dB) 

70 

60 

50 

40 

50 55 60 65 70 75 80 85 90 95 100 

% of transformed coefficients set to zero 

Fig. 7. 

Performance comparison of different transforms, with a separable transform in the spectral direction. 

problem. As expected, the DCT performs almost always worse than the DWT, except for a range in 

the very low bit-rate region. 

In summary, this analysis shows that the schemes with highest performance are based on hybrid 

rectangular/square transforms. Among these schemes, the 2D DWT should be preferred for spatial 

decorrelation. As far as spectral decorrelation is concerned, the 1D DWT provides good performance 

with limited complexity. The KLT achieves significantly better performance at all bit-rates, and 

especially in the low bit-rate region. It should be noticed that this KLT employs a single transform 

matrix for all spectral vectors. This approach is effective because, along the spectral dimension, the 

signal depend almost exclusively on pixel land cover; since only a few land covers are typically 

present in an image, the KLT works better than the other transforms. In the spatial dimension the 

signal depends on the scene geometry, which is less predictable with many discontinuities near region 

boundaries. In this case, due to the high degree of nonstationarity, the single KLT becomes far from 

optimal, and the DWT works better; the use of the optimal KLT would require the solution to multiple 

eigenvector problems, which is not realistic in a practical scenario. 

IV. PROPOSED LOW-COMPLEXITY KLT 

As has been seen, the spectral KLT can provide performance gains in excess of 2 dB with respect 

to the wavelet transform using a single “average” covariance matrix. However, although this is a 

somewhat simplified version of the transform, because it assumes that the spectral vectors are samples 

of a stationary signal, its complexity is still high for real-time applications, for the reasons outlined


100 

90 

80 

PSNR (dB) 

70 

60 

50 

40 

KLT1D−DWT2D 

DWT1D2D 

DCT1D−DWT2D 

30 

50 55 60 65 70 75 80 85 90 95 100 

% Transformed coefficients set to zero 

Fig. 8. Performance comparison of different transforms, with a separable transform in the spectral direction: KLT, DWT 

and DCT. 

in Sect. II-F.8. In the following we propose a low-complexity version of the KLT that alleviates 

this problem with virtually no performance loss with respect to the full-complexity transform. In 

particular, in Sect. IV-A we define the low-complexity one-dimensional KLT; in Sect. IV-B we define 

our proposed 3D transform based on the low-complexity KLT, and evaluate its energy compaction 

capability followed the same procedure used with the other transforms; in Sect. IV-C we provide a 

breif overview of JPEG 2000, and in Sect. IV-D we describe the integration of the proposed transform 

within Part 2 of JPEG 2000 [26]. 

A. One-dimensional transform 

The KLT applies principal components analysis to the spectral dimension evaluating the average 

correlation matrix over all spectral vectors. For an AVIRIS scene, this amounts to computing and 

averaging over 300000 such matrices. To simplify this process, we note that convergence of the 

estimation process may be achieved using fewer matrices. 

Using the notation defined in Sect. II-F.8, in the proposed low-complexity transform all the 

processing is not carried out on the complete set of spectral vectors, but rather on a subset of vectors 

selected at random. Hence, the sample mean vector is defined as M x ′ =[m ′ x 1,m′ x 2,...,m′ x 

], where 

B 

m ′ x 

= 1 ∑ 

k M ′ N 

∑i∈I 

′ j∈J xk ij , and I and J are sets containing respectively M ′ and N ′ different 

indexes picked at random in the intervals [1,M] and [1,N], with M ′ ≤ M and N ′ ≤ N. This 

process is also depicted in Fig. 9, where the different sets of spectral vectors are highlighted.


- - - - - - - 

- - - - - - - 

- - - - - - - 

- - - - - - - 

KLT Transform 

matrix 

- 

- 

- 

- 

Correlated 

components 

= 

- 

- 

- 

- 

Decorrelated 

components 

Spectral vector employed in the 

evaluation of Covariance matrix 

in the low complexity KLT 

method 

Spectral vector employed in the 

evaluation of Covariance matrix 

in the full complexity KLT 

method 

Fig. 9. 

Computation of the covariance matrix for the full-complexity and low-complexity KLT. 

The covariance matrix is obtained as C 

X ′ = 1 ∑ 

M ′ N 

∑i∈I 

′ j∈J (X ij − M x) ′ T (X ij − M x). ′ Itis 

used to form the eigenvector set C 

X ′ u′ i = λ′ i u′ i , where u′ i are the eigenvectors associated with the 

eigenvalues λ ′ i . Aligning the eigenvectors columnwise we obtain the low-complexity KLT matrix V ′ . 

The transformed vector is computed as Y ij =(V ′ ) T (X ij − M x). ′ We also denote as ρ = M ′ N ′ 

the 

percentage of spectral vectors employed to evaluate the covariance matrix. Obviously, the smaller is 

ρ, the lower is the complexity of this KLT. 

The complexity of the first stage of the new transform, i.e. the evaluation of the covariance matrix, 

becomes O(ρB 2 MN), i.e. it is reduced by a factor ρ, as the number of covariance matrices to be 

computed decreases linearly with ρ; the other two terms remain unchanged. Therefore, the proposed 

scheme is able to significantly reduce the complexity of the first stage, while no advantage is achieved 

in the third stage, because the resulting transform matrix does not exhibit any specific structure that 

can be exploited to reduce its complexity. Some numerical results on the complexity of the complete 

JPEG 2000 based algorithm are given in Sect. V-A. 

Fig. 10 shows an example of covariance matrix of all the spectral vectors. This matrix is clearly 

symmetric, and is depicted as an image with 256 grey levels. The tone is proportional to the absolute 

value of the correlation. The elements of the main diagonal have very high values, as also a large 

number of the main diagonal neighbors do, since a high degree of inter-band correlation is present. It 

can be seen that some elements of matrix are close to zero, because some pairs of bands are poorly 

MN


correlated. This behavior is particularly evident for the bands around 160, characterized by the water 

absorption region. 

1 

20 

40 

60 

80 

0.9 

0.8 

0.7 

band 

100 

120 

0.6 

0.5 

140 

0.4 

160 

180 

200 

220 

20 40 60 80 100 120 140 160 180 200 220 

band 

0.3 

0.2 

0.1 

Fig. 10. Example of correlation matrix of the spectral vectors, depicted as an image with 256 grey levels. The tone is 

proportional to the absolute value of the correlation. The elements of the main diagonal have very high values, as also a 

large number of the main diagonal neighbors do, since a high degree of inter-band correlation is present. 

B. 3D transform evaluation: spectral low-complexity KLT and spatial 2D DWT 

In order to obtain a 3D transform that can be applied to a hyperspectral data cube, we employ 

the proposed low-complexity KLT as a spectral decorrelator, followed by the 2D DWT for spatial 

decorrelation. In other terms, this transform is equivalent to the KLT1D-DWT2D, which turned out 

to be the highest performance transform, but employs the low-complexity version of the KLT. 

We have evaluated the performance of this transform, following the procedure outlined in Sect. 

III-A, for several values of ρ, in order to understand how many spectral vectors are actually needed 

to obtain convergence in the estimate of the covariance matrix, and hence optimal performance. The 

results are reported in Fig. 11. As can be seen, taking ρ =0.1 we obtain a negligible performance loss 

with respect to the full-complexity KLT; in particular, setting to zero 50% and 96% of the transform 

coefficients yields a PSNR loss of 0.21 and 0.16 dB respectively. Taking ρ =0.01 the loss is still 

very small, with an even larger computational saving; in this case, the PSNR decreases by 0.89 and 

1.12 dB when 50% and 96% of the transform coefficients are respectively set to zero.


100 

90 

80 

PSNR (dB) 

70 

60 

50 

KLT1D−DWT2D ρ = 1 

KLT1D−DWT2D ρ = 0.1 



KLT1D−DWT2D ρ = 0.0001 

DWT1D2D 

40 

50 55 60 65 70 75 80 85 90 95 100 

% transformed coefficients set to zero 

Fig. 11. Performance of the 3D transform employing the low-complexity KLT, for several values of ρ. 

C. Overview of JPEG 2000 

The architecture of the JPEG 2000 core coding system (Part 1) is based on transform coding. An 

image may be divided into several sub-images (tiles), to reduce memory and computing requirements; 

in the following we disregard color transformations, as they are of no particular interest in the 

hyperspectral image scenario. A biorthogonal discrete wavelet transform is first applied to each tile, 

whose output is a series of versions of the tile at different resolution levels (subbands); then, the 

transform coefficients are quantized, independently for each subband, with an embedded dead-zone 

quantizer. Each subband of the wavelet decomposition is divided into rectangular blocks (codeblocks), 

which are independently encoded with the EBCOT (Embedded Block Coding with Optimized 

Truncation) entropy coding engine; EBCOT is based on a bit-plane approach, context modeling 

and arithmetic coding. The bit stream output by EBCOT is organized by the rate allocator into a 

sequence of layers, each layer containing contributions from each code-block; the block truncation 

points associated with each layer are optimized in the rate distortion sense. The final JPEG 2000 

codestream consists of a main header, followed by one or more sections corresponding to individual 

tiles. Each tile comprises a tile header and a layered representation of the included code-blocks, 

organized into packets. In order to form a progressive bitstream, the layers are formed and ordered 

in such a way that the most important information is placed at the beginning of the bitstream. The 

JPEG 2000 decoder performs exactly the same steps (except for rate allocation), in reverse order: 

syntax parsing, codeblock decoding by EBCOT, inverse quantization, inverse wavelet transform, and


tile mosaicking. 

Part 2 of the standard provides specific tools that can be applied to hyperspectral images. In 

particular, the multicomponent transformation feature allows for spectral decorrelation by means of an 

external transform, followed by the application of JPEG 2000 to a whole block of decorrelated bands; 

the bands are separately decorrelated in the spatial directions by means of the 2D wavelet transform, 

whereas the rate allocation is optimized across the whole block. Since JPEG 2000 standardizes the 

decoder, Part 2 provides the syntax (i.e. the MCC, MCT, and MCO marker segments) to embed 

into the codestream the inverse spectral transform that must be carried out after performing JPEG 

2000 decoding of each component. Three types of spectral transformations are supported, namely 

i) array-based transformations (i.e., those that can be described by a set of linear equations in the 

input coefficients, e.g. the DCT or the KLT); ii) dependency transformations (i.e., those of the causal 

predictive type, like causal DPCM); iii) wavelet transforms. For each class, reversible and irreversible 

modes are foreseen. Irreversible transforms are specified for example by storing the transform matrix 

coefficients in floating-point format in the relevant marker segments within the codestream. Reversible 

transforms are defined as a set of single element linear transformations and rounding operations; this 

structure can accommodate lifting-based integer implementations of classical transforms such as DCT 

and wavelets (see e.g. [27], [28]). 

D. Integration of low-complexity KLT within JPEG 2000 

The proposed technique employs a hybrid 3D transform; it first applies the low-complexity KLT 

as multicomponent extension to JPEG 2000, and then the JPEG 2000 2D DWT, rate allocation and 

entropy coding to the spectrally transformed bands. Three decomposition levels are performed for 

the 2D spatial transform, employing the (9,7) filter. The inverse KLT transform matrix is written 

in an MCT marker segment in the compressed file. Notably, the post-compression rate-distortion 

optimization is operated on the complete 3D set of transformed coefficients, ensuring optimal 

performance. 

On a related note, a very desirable feature of a compression system for remote sensing images 

is the ability to generate quicklook images without having to fully decode the compressed file. In a 

typical scenario, a user would download a low spatial resolution false-color quicklook of the scene. 

To do so, full spectral decorrelation is necessary in order to extract the three false-color bands, and 

then reduced resolution decoding of each band has to be carried out. This procedure is impractical, 

because it requires to perform the full spectral decorrelation to extract few channels. Moreover, it 

is not compliant with the JPEG 2000 standard, which requires that the spatial inverse transforms 

are performed before the spectral one. On the other hand, JPEG 2000 Part 2 provides an interesting


feature, in that, through suitable marker segments, it is possible to specify different transformations 

for selected groups of bands [26]. For example, the three bands to be used to generate false-color 

quicklooks can be skipped by the spectral decorrelator and compressed in intraband mode; this yields 

a slight performance loss, but allows increased flexibility in the access to selected portions of the data. 

This procedure can be extended to the proposed scheme, where the bands to be used to generate the 

quicklooks could be canceled from the spectral vectors in the computation of the covariance matrix, 

and then of the transform coefficients. However, this goes beyond the scope of the present paper, and 

is left for further work. 

V. EXPERIMENTAL RESULTS 

The proposed scheme, based on the low-complexity KLT and JPEG 2000, has been compared 

with other state-of-the-art lossy compression schemes. First, in Sect. V-A we evaluate the complexity 

of the low- and full-complexity KLT; this allows to assess the actual computational advantage in a 

realistic compression setting. Then, in Sect. V-B we compare the compression performance of various 

algorithms. Finally, in Sect. V-C we report the results of some experiments aimed at evaluating the 

effect of lossy compression on remote sensing applications, and specifically on SAM classification. 

In the results described in this section, we have employed a set of AVIRIS radiance data using 256 

lines with 512 pixels and all bands, unless otherwise noted. The AVIRIS sensor is a representative 

hyperspectral one, and the data are publicly available on the Internet at aviris.jpl.nasa.gov; 

since these data are widely used in the literature, comparisons with other techniques are facilitated. In 

particular, the Cuprite, Jasper Ridge and Moffett Field scenes have been used. PSNR has been used 

as quality metric for lossy compression 1 . JPEG 2000 has been run without error resilience options, 

and no quality layers have been formed. 

A. Complexity 

Fig. 12 shows the compression performance and the computation time of the proposed algorithm as 

a function of ρ, on the Cuprite scene. The computation times have been measured on a Pentium IV PC 

at 3 GHz, and refer only to the evaluation of the covariance matrix. As can be seen, the performance 

loss is very smooth as ρ decreases, allowing one to select the best performance-complexity tradeoff 

for a given application. As had been noted in the previous experiment, the values ρ =0.1 and 

ρ =0.01 yield a very small loss, and can be used as starting point for a fine optimization. These 

values provide a complexity reduction of about 20 and 100 times with respect to the full-complexity 

transform. 

1 Since the data have only 15 significant bits in modulus, 2 15 − 1 has been used as peak value for PSNR computation.


80 

PSNR(dB) 

75 

70 

65 

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 

10 3 

ρ 

time[s] 

10 2 

10 1 

10 0 

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 

ρ 

Fig. 12. Top: Performance of the low-complexity KLT as a function of ρ. The curve refers to an encoding rate of 1 bpp. 

Bottom: computation time for the evaluation of the covariance matrix, as function of ρ. The results refer to the Cuprite 

scene. 

Clearly, the covariance matrix evaluation is only one source of complexity, since the solution 

to the eigenvector problem, the computation of transform coefficients, as well as the quantization, 

entropy coding and rate allocation, have to be taken into account. Tab. II compares the end-to-end 

computation time for the full-complexity KLT, the low-complexity KLT with ρ equal to 0.1 and 

0.01, and the technique proposed in [16], which employs the DWT1D2D transform, using JPEG 

2000 for the spatial wavelet transform, quantization, entropy coding and rate allocation; the time 

spent in the covariance matrix evaluation has also been reported. As can be seen, using ρ =0.1 

and ρ =0.01 yields an end-to-end computational saving of 2.57 and 3.01 times respectively, with a 

minor performance loss. As ρ decreases the computation time tends to settle on an asymptotic value 

which is larger than the value of the DWT1D2D. This is due to the fact that the solution to the 

eigenvector problem, and especially the computation of transform coefficients, are more demanding 

than the spectral DWT. This is somewhat obvious because the transform coefficients are computed as 

a full matrix-vector product, since the KLT matrix does not exhibit any structure that can be exploited 

to reduce the number of operations. However, the low-complexity KLT with ρ =0.01 is only about 

40% more complex than the DWT1D2D, and provides a significant performance gain, as will be seen 

in the following.


TABLE II 

COMPLEXITY COMPARISON OF END-TO-END COMPRESSION ALGORITHMS.RUNNING TIMES ARE EXPRESSED IN 

SECONDS. 

Algorithm Total time C X 

full-complexity KLT 816.12 557.35 

low-compl. KLT, ρ =0.1 318.01 59.58 

low-compl. KLT, ρ =0.01 270.57 8.73 

DWT1D2D 195.86 n.a. 

B. Compression performance 

The compression performance of the proposed scheme has been compared with that of other stateof-the-art 

schemes. The results are shown in Fig. 13 for the Cuprite scene. The following algorithms 

are compared: 1) the proposed scheme with the low-complexity KLT (ρ =0.01); 2) the scheme with 

the full-complexity KLT; 3) the DWT1D2D scheme employing JPEG 2000 and 3D rate-distortion 

optimization, as proposed in [16]; 4) the 3D-SPIHT scheme proposed in [10], [29]. 

As expected, the performance of the low-complexity KLT is very close to that of the full-complexity 

transform, with a maximum loss of 0.27 dB at high bit-rates. The performance gap between the 

DWT1D2D transform and 3D-SPIHT had already been noticed in [16], where it was pointed out 

that the hybrid square/rectangular 3D wavelet transform performs significantly better than the 3D 

square transform, mainly thanks to the finer frequency tessellation, which is a better match to the 

high information content of the spectral vectors. It should be noted that, with respect to the technique 

in [16], the proposed KLT-based scheme achieves a significant PSNR gain, ranging between 2.5 and 

6.7 dB. The gain with respect to 3D-SPIHT is even larger, and reflects the transform evaluation results 

in Sect. III, where it has been observed that the 3D square DWT is not able to effectively capture all 

the correlations of a hyperspectral dataset. 

Similar results have been achieved for other scene. In particular in Fig. 14 we report performance 

results for the Jasper Ridge scene. The gain with the respect to the technique in [16] is between 5 

and 8.1 dB, and even larger gains are achieved with respect to 3D-SPIHT. 

In order to compare the performance of the proposed KLT-based technique with the most up-to-date 

compression technology, a comparison with SPECK [30], [11], [31] has been carried out. The results 

in Tab. III are worked out in the same conditions as [30], [11], [31], i.e. using only 512 samples per 

line of the AVIRIS reflectance data set, 512 lines, and coding all 224 bands as a whole. The proposed 

scheme with full-complexity KLT, low-complexity KLT (ρ =0.01), the DWT1D2D scheme in [16],


90 

85 

80 

PSNR (dB) 

75 

70 

65 

60 

Full−complexity KLT + JPEG 2000 

Low−complexity KLT ( ρ= 0.01) + JPEG 2000 

DWT1D + JPEG 2000 

3D−SPIHT 

55 

0 0.5 1 1.5 2 2.5 

rate ( bpp) 

Fig. 13. Performance evaluation of the proposed JPEG 2000 based technique: rate-distortion curve for the Cuprite 

scene. Dashed: Full-complexity KLT. Solid: low-complexity KLT, ρ = 0.01. Dotted+star: DWT1D2D as proposed in 

[16]. Solid+star: 3D-SPIHT. 

and SPECK are compared; note that, in the table, results are given in terms of signal-to-noise ratio 

(SNR) rather than PSNR. 

Consistently with the results reported above, also on the reflectance data the performance loss of 

the low-complexity KLT with respect to the full-complexity one does not exceed 0.5 dB. The lowcomplexity 

KLT has a gain of about 7 dB with respect to the scheme in [16] employing the hybrid 

rectangular/square DWT, even though for high bit-rate the performance gap decreases. The proposed 

scheme exhibits a significant gain also with respect to SPECK; the SNR gain is even more remarkable, 

ranging from 5 to more than 10 dB. This gain is mainly due to two factors. The former is the improved 

coding efficiency of the KLT with respect to the spectral DWPT employd in SPECK. The latter is 

the 3D post-compression rate-distortion optimization, which is more flexible in selecting the portions 

of the 3D set of transform coefficients that contribute more significantly to the reconstructed image 

quality. 

C. Impact on image exploitation 

As is well-known, quality metrics such as PSNR, which are based on the MSE, measure the 

fidelity of the reconstructed image with respect to the original image. However, higher PSNR may not 

necessarily yield higher quality of a remote sensing lossy-compressed image for a given application. 

In fact, some artifacts, e.g. tiling, which may have little effect on PSNR, could heavily bias the


90 

85 

80 

75 

PSNR (dB) 

70 

65 

60 

Full−complexity KLT + JPEG 2000 

Low−complexity KLT ( ρ=0.01) + JPEG 2000 

55 

DWT1D2D 

3D−SPIHT 

50 

0 0.5 1 1.5 2 2.5 

rate (bpp) 

Fig. 14. Performance evaluation of the proposed JPEG 2000 based technique: rate-distortion curve for the Jasper Ridge 

scene. Dashed: Full-complexity KLT. Solid: low-complexity KLT, ρ =0.01. Dotted+star: DWT1D2D as proposed in [16]. 

Solid+star: 3D-SPIHT. 

analysis results of the reconstructed images. Therefore, it is necessary to validate the compression 

results also from the remote sensing application standpoint. Although this is an important research 

topic, there is no widely accepted protocol to evaluate remote sensing image quality in a general 

way; this is partly caused by the conspicuous number of existing remote sensing applications, which 

makes it difficult to work out a reasonable set of quality metrics. 

In [17] a study of various quality metrics has been carried out. It is shown that MSE is reasonably 

good at capturing the effect of lossy compression on SAM classification, although more than one 

metric is needed to accurately analyze the quality degradation. A similar approach has been followed 

in [30], [11], [31], where SAM classification is used as benchmark application to evaluate the impact 

of lossy compression. In this paper, we also adopt the SAM classification method; SAM permits 

rapid mapping by calculating the spectral similarity between the image spectral vectors and reference 

vectors. These reference vectors can either be taken from laboratory or field measurements or extracted 

directly from the image. SAM measures the spectral similarity by calculating the angle between the 

two spectral vectors, treating them as vectors in a B-dimensional space; small angles between the 

two vectors indicate high similarity, and high angles indicate low similarity. It computes the arccosine 

of the dot product between the test vector t to a reference vector r with the following equation: 

( ∑ ) 

B 

i=1 

arccos 

t ir i 

( ∑ B 

i=1 t2 i ) 1 2 ( ∑ B 

i=1 r2 i ) (2) 

1 

2 

where B is the number of bands of the hyperspectral image cube, t i are the components of the test


TABLE III 

COMPARISON BETWEEN THE PROPOSED TECHNIQUE AND SPECK - SNR (DB). 

Coding scheme / Rate (bpp) 0.1 0.2 0.5 1 2 3 4 

Jasper Ridge - scene 1 

Proposed scheme (KLT) 27.97 34.02 40.77 45.50 51.24 56.86 57.77 

Proposed scheme (low-complexity KLT) 27.90 33.85 40.69 45.35 50.99 56.59 57.84 

DWT1D2D [16] 22.31 26.55 34.31 40.41 47.31 52.56 57.56 

3D-SPECK 19.70 23.66 31.75 38.55 46.00 48.59 52.36 

Moffett Field - scene 1 



DWT1D2D [16] 17.24 22.34 32.22 40.85 48.76 53.98 59.25 

3D-SPECK 16.67 21.52 29.91 38.60 47.18 51.27 55.57 

Moffett Field - scene 3 



DWT1D2D [16] 12.86 17.91 27.53 36.37 45.09 50.75 55.87 

3D-SPECK 12.60 17.98 26.99 35.37 40.10 46.71 50.79 

vector, and r i those of the reference vector. 

Following the procedure in [30], [11], [31], we have selected an area in scene 1 of Jasper Ridge, 

and have applied the k-means clustering method [32], [33] to evaluate the centroids of three clusters, 

namely asphalt, water and vegetation. Subsequently, we have employed these three selected centroids 

as reference vectors in the SAM method. 

In Tab. IV we report the classification performance in terms of percentage of pixels assigned to 

the same cluster in the reconstructed image with respect to the original one. It can be noticed that 

the performance reflects quite closely the result discussed above in terms of PSNR, in that, in the 

vast majority of cases, a higher PSNR results into a smaller classification error; this is consistent 

with the results in [17], and confirms that MSE-based metrics are reasonably good indicators of 

the performance degradation caused by compression artifacts. These results, although without any 

presumption of being exhaustive, indeed indicate that for this application the proposed scheme yields 

improved performance also in terms of classification results, making the proposed technique very 

competitive in terms of complexity, compression performance, and remote sensing image quality. 

In Fig. 16 one can see the results of the classification procedure applied to the original image, to 

the reconstructed image with full-complexity KLT at rate of 1 bpp, and to the reconstructed image


TABLE IV 

COMPARISON BETWEEN THE PROPOSED TECHNIQUE AND SPECK, IN TERMS OF THE PERCENTAGE OF PIXELS THAT 

ARE ASSIGNED TO THE SAME CLUSTER IN THE ORIGINAL AND THE RECONSTRUCTED IMAGES. 

Coding scheme / Rate (bpp) 0.1 0.2 0.5 1 2 3 4 

Jasper Ridge - scene 1 

Full-complexity KLT 98.98 99.73 99.93 99.97 99.98 99.99 99.99 

Low-complexity KLT, ρ =0.1 98.96 99.74 99.93 99.97 99.98 99.99 99.99 

Low-complexity KLT, ρ =0.01 98.98 99.73 99.93 99.97 99.98 99.99 99.99 

DWT1D2D 98.74 99.36 99.84 99.95 99.98 99.99 99.99 

3D-SPIHT 97.64 98.69 99.66 99.90 99.95 99.98 99.99 

with low-complexity KLT (ρ =0.01) at rate of 1 bpp. The original image is displayed in Fig. 15. It 

can be observed that the thematic maps obtained from the compressed images are almost identical 

to the reference map. 

Fig. 15. 

Jasper Ridge - scene 1 (reflectance). 

VI. CONCLUSIONS 

In this paper we have carried out an extensive study of 3D transforms for lossy compression of hyperspectral 

data. It has been found that, among wavelet-based transforms, a hybrid-rectangulare/square 

transform is highly suitable, and achieves performance similar to wavelet packets.


Fig. 16. Classification result respectively on the original image (left), on the compressed image with full-complexity KLT 

at 1 bpp (center), on the compressed image with low-complexity KLT (ρ =0.01) at 1 bpp (right). The number of clusters 

is equal to 3. 

The best spectral transform has turned out to be the KLT. In order to make this transform 

computationally feasible, we have proposed a low-complexity version with comparable performance. 

The degree of computational saving and the related performance loss can be tuned to the specific 

needs of each application. 

The low-complexity KLT, along with a hybrid wavelet-based scheme, have been integrated 

into a JPEG 2000 Part 2 compliant scheme. Tests have been carried out on AVIRIS data, and 

comparisons have been performed with respect to 3D-SPIHT and SPECK. The proposed KLT-based 

scheme achieves significant performance gains with respect to the hybrid schemes and 3D-SPIHT; 

it outperforms SPECK by 5 to 10 dB in PSNR. An end-to-end complexity reduction of about three 

times can be achieved using the low-complexity KLT, with a minor performance loss (about 0.5 

dB). This transform is only about 40% more complex than 3D wavelets, but has significantly better 

performance. 

A quality assessment of compressed images has also been carried out by evaluating the effects of 

several lossy compression schemes on the results of SAM classification. It turns out that, for this 

application, PSNR is a good indicator of classification performance, so that the proposed scheme is 

still the highest-performance one by a large margin. 

REFERENCES 

[1] D.S. Taubman and M.W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards, and Practice, Kluwer, 

2001. 

[2] S. Lim, K. Sohn, and C. Lee, “Compression for hyperspectral images using three dimensional wavelet transform,” in 

Proc. of IGARSS - IEEE International Geoscience and Remote Sensing Symposium, Sydney, Australia, 2001. 

[3] Y. Tseng, H. Shih, and P. Hsu, “Hyperspectral image compression using three-dimensional wavelet transformation,” 

in Proceedings of the the 21st Asian Conference on Remote Sensing (ACRS), Taipei, Taiwan, 2000.


[4] A. Kaarna and J. Parkkinen, “Comparison of compression methods for multispectral images,” in Proc. of NORSIG - 

Nordic Signal Processing Symposium, Kolmarden, Sweden, 2000, vol. 2, pp. 251–254. 

[5] G.P. Abousleman, M.W. Marcellin, and B.R. Hunt, “Compression of hyperspectral imagery using the 3-D DCT and 

hybrid DPCM-DCT,” IEEE Transactions on Geoscience and Remote Sensing, vol. 33, no. 1, pp. 26–34, Jan. 1995. 

[6] D. Markman and D. Malah, “Hyperspectral image coding using 3D transforms,” in Proc. of ICIP - IEEE International 

Conference on Image Processing, Thessaloniki, Greece, 2001. 

[7] M.D. Pal, C.M. Brislawn, and S.R. Brumby, “Feature extraction from hyperspectral images compressed using the 

JPEG-2000 standard,” in Proc. of SSIAI - IEEE Southwest Symposium on Image Analysis and Interpretation, Santa 

Fe, New Mexico, 2002. 

[8] M.D. Pal and C.M. Brislawn S.P. Brumby, “Feature extraction form hyperspectral images compressed using the 

JPEG-2000 standard,” in Proc. of SSIAI Southwest Symposium on Image Analysis and Interpretation, Santa Fe, New 

Mexico, 2002. 

[9] S. Lim, K.H. Sohn, and C. Lee, “Principal component analysis for compression of hyperspectral images,” in Proc. of 

IGARSS - IEEE International Geoscience and Remote Sensing Symposium, Sydney, Australia, 2001. 

[10] X. Tang, C. Sungdae, and W.A. Pearlman, “3D set partitioning coding methods in hyperspectral image compression,” 

in Proc. of ICIP - IEEE International Conference on Image Processing, Barcelona, Spain, 2003. 

[11] X. Tang and W.A. Pearlman, “Three-dimensional wavelet-based compression of hyperspectral images,” in 

Hyperspectral Data Compression. Kluwer Academic Publishers, 2005. 

[12] J.A. Sagri, A.G. Tescher, and J.T. Reagan, “Practical transform coding of multispectral imagery,” IEEE Signal 

Processing Magazine, pp. 32–43, Jan. 1995. 

[13] P.L. Dragotti, G. Poggi, and A.R.P. Ragozini, “Compression of multispectral images by three-dimensional SPIHT 

algorithm,” IEEE Transactions on Geoscience and Remote Sensing, vol. 38, no. 1, pp. 416–428, Jan. 2000. 

[14] L. Chang, C. Cheng, and T. Chen, “An efficient adaptive KLT for multispectral image compression,” in Proceedings 

of 4th IEEE Southwest Symposium on Image Analysis and Interpretation, Austin, TX, 2000. 

[15] P. Hao and Q. Shi, “Reversible integer KLT for progressive-to-lossless compression of multiple component images,” 

in Proc of IEEE International Conference on Image Processing, 2003, Barcelona, Spain, 2003. 

[16] B. Penna, T.Tillo, E. Magli, and G. Olmo, “Progressive 3D coding of hyperspectral images based on JPEG 2000,” 

IEEE Geoscience and Remote Sensing Letters, to appear Jan. 2006. 

[17] E. Christophe, D. Léger, and C. Mailhes, “Quality criteria benchmark for hyperspectral imagery,” IEEE Transactions 

on Geoscience and Remote Sensing, vol. 43, no. 9, pp. 2103–2114, Sept. 2005. 

[18] V. Guralnik and G. Karypis, “A scalable algorithm for clustering protein sequences,” in Workshop on Data Mining 

in Bioinformatics, 2001. 

[19] JPEG 2000 Part 2 - Extensions, Document ISO/IEC 15444-2. 

[20] M. Vetterli and J. Kovacevic, Wavelet and Subband Coding, Pretince Hall, 1995. 

[21] R.R. Coifman and M.V. Wickerhauser, “Entropy-based algorithms for best basis selection,” IEEE Transactions on 

Information Theory, vol. 38, no. 2, pp. 713–718, Mar. 1992. 

[22] K. Ramchandran and M. Vetterli, “Best wavelet packet bases in a rate-distortion sense,” IEEE Transactions on Image 

Processing, vol. 2, no. 2, pp. 160–175, Apr. 1993. 

[23] V.K. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 

9–21, 2001. 

[24] I.S. Dhillon, ANewO(N 2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem, Ph.D. Thesis, 

University of California, Berkeley, 1997.


[25] X. Wu and N. Memon, “Context-based lossless interband compression - extending CALIC,” IEEE Transactions on 

Image Processing, vol. 9, no. 6, pp. 994–1001, June 2000. 

[26] JPEG 2000 Part 2 - Extensions, Document ISO/IEC 15444-2. 

[27] T.D. Tran, “The binDCT: fast multiplierless approximation of the DCT,” IEEE Signal Processing Letters, vol. 7, no. 

6, pp. 141–144, June 2000. 

[28] P. Hao and Q. Shi, “Matrix factorizations for reversible integer mapping,” IEEE Transactions on Signal Processing, 

vol. 49, no. 10, pp. 2314–2324, Oct. 2001. 

[29] X. Tang, C. Sungdae, and W.A. Pearlman, “Comparison of 3D set partitioning methods in hyperspectral image 

compression featuring an improved 3D-SPIHT,” in Proceedings of the IEEE Data Compression Conference (DCC), 

2003. 

[30] X. Tang, W.A. Pearlman, and J.W. Modestino, “Hyperspectral image compression using three-dimensional wavelet 

coding: A lossy-to-lossless solution,” submitted to IEEE Transactions on Geoscience and Remote Sensing, available 

at http://www.cipr.rpi.edu/ pearlman/ , 2004. 

[31] X. Tang and W.A. Pearlman, “Lossy-to-lossless block-based compression of hyperspectral volumetric data,” in Proc 

of IEEE International Conference on Image Processing, 2004. 

[32] F.A. Kruse, A.B. Lefkoff, J.B. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H. Goetz, “The 

spectral image processing system (sips) interactive visualization and analysis of imaging spectrometer data,” Remote 

Sensing of Environment, vol. 44, pp. 145–163, 1993. 

[33] J.W. Boardman, F.A. Kruse, and R.O. Green, “Mapping target signatures via partial unmixing of AVIRIS data,” in 

Fifth JPL Airborne Earth Science Workshop, JPL Publication, 1995, pp. 23–26.

Transform coding techniques for lossy hyperspectral data compression

Create successful ePaper yourself

Delete template?

Save as template?