Hyperspectral Image Compression Using JPEG2000 and Principal ...

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 4, NO. 2, APRIL 2007 201 

Hyperspectral Image Compression Using JPEG2000 

and Principal Component Analysis 

Qian Du, Member, IEEE, and James E. Fowler, Senior Member, IEEE 

Abstract—Principal component analysis (PCA) is deployed in 

JPEG2000 to provide spectral decorrelation as well as spectral 

dimensionality reduction. The proposed scheme is evaluated in 

terms of rate-distortion performance as well as in terms of information 

preservation in an anomaly-detection task. Additionally, 

the proposed scheme is compared to the common approach 

of JPEG2000 coupled with a wavelet transform for spectral 

decorrelation. Experimental results reveal that, not only does the 

proposed PCA-based coder yield rate-distortion and informationpreservation 

performance superior to that of the wavelet-based 

coder, the best PCA performance occurs when a reduced number 

of PCs are retained and coded. A linear model to estimate the 

optimal number of PCs to use in such dimensionality reduction 

is proposed. 

Index Terms—Hyperspectral data compression, JPEG2000, 

principal component analysis (PCA), wavelet transforms. 

I. INTRODUCTION 

HYPERSPECTRAL images typically possess a high 

degree of spectral as well as spatial correlation. As a 

consequence, data compression can significantly reduce hyperspectral 

data volumes to more manageable size for storage and 

communication. Wavelet-based lossy compression techniques 

are of particular interest due to their long history of providing 

excellent rate-distortion performance for traditional 2-D imagery. 

Consequently, a number of prominent 2-D compression 

algorithms have been extended to 3-D; these 3-D waveletbased 

techniques include 3D-SPIHT, 3D-SPECK, and 3-D tarp 

(see [1] for a review). In addition, the JPEG2000 standard has 

been widely applied to 3-D hyperspectral image coding (e.g., 

[2]–[4]) owing to its ability to code multiple image components. 

However, it has been argued that such a direct extension from 

2-D to 3-D, without the consideration of special characteristics 

of hyperspectral imagery, may be problematic [5], as data 

analysis applied subsequent to compression may be affected. 

Typically, the development of a 3-D compression algorithm 

involves coupling a decorrelating transform in the spectral 

direction with a spatial wavelet transform plus a coding algorithm 

suitably modified for a 3-D data array. Most often, 

a wavelet transform is also deployed spectrally, so as to implement 

a 3-D wavelet decomposition. Indeed, Annex I of 

Part 2 of the JPEG2000 standard supports a spectral discrete 

wavelet transform (DWT) in this manner, and it has been shown 

that JPEG2000 plus a spectral DWT generally achieves ratedistortion 

performance superior to that of other wavelet-based 

Manuscript received June 29, 2006; revised October 8, 2006. 

The authors are with the Department of Electrical and Computer Engineering, 

GeoResources Institute (GRI)—Mississippi State HPC 2 , Mississippi State 

University, Mississippi State, MS 39762 USA (e-mail: du@ece.msstate.edu; 

fowler@ece.msstate.edu). 

Digital Object Identifier 10.1109/LGRS.2006.888109 

Fig. 1. Rate-distortion performance for PCA + JPEG2000 using all principal 

components (P = 224) compared to that of DWT + JPEG2000. Rate is 

expressed in terms of bits per pixel per band. Distortion is measured as the 

signal-to-noise ratio (SNR), being the log ratio of signal variance to meansquare 

error, as in [1]. 

TABLE I 

SNR (IN DECIBELS) AT 1.0 bpppb 

Fig. 2. SNR performance as the number of principal components coded P 

varies for PCA + JPEG2000 on the Jasper Ridge dataset. 

techniques [1]. However, Annex I supports other transform 

structures as well, including arbitrary transforms in the form 

of unitary-matrix decompositions. 

1545-598X/$25.00 © 2007 IEEE

202 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 4, NO. 2, APRIL 2007 

TABLE II 

NUMBER OF PCS P ∗ THAT YIELDS MAXIMUM SNR 

In this letter, we investigate the performance of principal 

component analysis (PCA) for spectral decorrelation in conjunction 

with JPEG2000 coding. This letter has four primary 

objectives. First, we compare this PCA-based spectral decorrelation 

to the more common approach of a DWT spectral 

transform. Second, we investigate the impact of using PCA for 

dimensionality reduction in addition to spectral decorrelation, 

observing the effect that the number of principal components 

(PCs) used in compression has on rate-distortion performance. 

Third, we evaluate the performance of JPEG2000 with PCAbased 

spectral decorrelation in terms of information preservation, 

i.e., in terms of the usefulness of the reconstructed data 

in analysis, such as detection and classification. Finally, using 

our observations of both rate-distortion as well as data-analysis 

performance, we develop a heuristic that estimates the optimal 

number of PCs to be retained during dimensionality reduction. 

Our experiments indicate that, not only does PCA outperform 

the DWT for spectral decorrelation, the best performance occurs 

when only a small subset of PCs is coded instead of the 

entire set. The simple linear heuristic we develop estimates the 

optimal number of PCs, from both the rate-distortion as well as 

information-preservation perspectives. 

II. JPEG2000 AND PCA-BASED 

SPECTRAL DECORRELATION 

A. Implementation 

PCA, a well-known technique in multivariate data analysis, 

has been widely used in the hyperspectral setting for spectral 

decorrelation as well as spectral dimensionality reduction. Capable 

of optimal decorrelation in a statistical sense, PCA often 

provides excellent decorrelation in practice. When all PCs are 

retained, PCA is commonly known as the Karhunen–Loève 

transform (KLT), in which case a hyperspectral image with 

N spectral bands produces an N × N unitary KLT transform 

matrix. As this matrix is data dependent, it must be communicated 

to the decoder in any KLT-based compression system. 

Alternatively, PCA can effectuate dimensionality reduction by 

retaining in the KLT transform matrix only those eigenvectors 

corresponding to the P largest eigenvectors. The data volume 

passed to the encoder then has P

DU AND FOWLER: HYPERSPECTRAL IMAGE COMPRESSION USING JPEG2000 AND PCA 203 

TABLE III 

DETECTION PERFORMANCE OF PCA + JPEG2000 AND DWT + JPEG2000 

standard. For both PCA + JPEG2000 and DWT + JPEG2000, 

a spatial 9/7 transform with five levels is used. 

Fig. 1 shows rate-distortion performance when using PCA + 

JPEG2000 and DWT + JPEG2000 to encode the three AVIRIS 

scenes. Table I tabulates the SNR for PCA + JPEG2000 and 

DWT + JPEG2000 at a fixed rate; also shown is the performance 

when JPEG2000 is used with no spectral decorrelation. 

We see that, in all cases, although DWT-based spectral decorrelation 

improves SNR by around 15 dB with respect to no 

spectral decorrelation, PCA-based spectral decorrelation results 

in a further 5-dB increase. From a statistical perspective, PCA 

offers optimal decorrelation while highly structured correlation 

is known to exist between DWT coefficients, both within 

subbands and across subbands. While JPEG2000 exploits this 

DWT correlation structure spatially, no attempt is made to 

exploit residual correlation across components, i.e., spectrally. 

As a consequence, a spectral DWT leaves a significant degree of 

correlation present in the spectral direction; the spectral PCA, 

with its optimal decorrelation, thus performs better. 

III. JPEG2000 AND PCA-BASED 

DIMENSIONALITY REDUCTION 

A. Rate-Distortion Considerations 

In the use of PCA above, we retained and coded the full 

number of PCs. Theoretically, PCRD optimization results in an 

optimal rate allocation and SNR, thereby obviating the need for 

explicit dimensionality reduction. In theory, one should code 

all PCs and let the PCRD optimization allocate rate across the 

PCs, as PCs that are unneeded will be automatically allocated 

zero rate. However, in practice, rate-distortion performance 

is improved if fewer than the full complement of PCs is 

coded (i.e., for P

204 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 4, NO. 2, APRIL 2007 

Fig. 4. ROC curves for the Moffett dataset at 0.05 bpppb. P d = 

probability of detection. P f = probability of false alarm. 

in the data. In this letter, unsupervised detection is considered. 

We focus on anomaly detection, wherein an anomaly is defined 

as a small object or material whose spectral signature is very 

different from the background. In hyperspectral applications, 

such an anomaly is most likely an unknown target. Anomaly 

detection provides a good test of the performance of lossy 

compression in information preservation because these small 

objects or materials are prone to be sacrificed during compression, 

although they may be, in fact, quite important to the 

hyperspectral application. 

The well-known RX algorithm is employed for anomaly 

detection [7]. Since there is no real ground truth available, we 

consider the detection map resulting from the original data to be 

ground truth; the detection map resulting from the reconstructed 

data is then compared to it. The measure of similarity comparison 

is the spatial correlation coefficient ρ—the closer ρ is to 

1.0, the better the detection performance is considered to be. 

Fig. 3 indicates that the detection performance varies with 

P , which is the number of PCs used for compression, with the 

maximal correlation coefficient occurring in all cases for fewer 

than 224 PCs. Once again, this experiment demonstrates that 

it is unnecessary, even undesirable, to retain the minor PCs, 

since they contain noise only and will not contribute to target 

detection. 

Table III presents P † , which is the number of PCs that 

achieves maximum detection performance, for each of the 

three AVIRIS datasets. Additionally, detection performance 

for PCA + JPEG2000 using all 224 PCs as well as the 

performance for DWT + JPEG2000 is given. We see that, in all 

cases, the maximum ρ occurs for strictly fewer than 224 PCs; 

additionally, this maximal PCA + JPEG2000 ρ is greater than 

that of DWT + JPEG2000 in all cases. 

Table III also examines the effect that compression has 

on anomalous pixels as measured using the Spectral Angle 

Mapper (SAM); such anomalous pixels typically are among 

the worst for spectral distortion in the dataset. In general, PCA 

+ JPEG2000 using all the PCs results in less spectral distortion 

than DWT + JPEG2000 for the anomalous pixels. Table III 

also shows the mean and standard deviation of the SAM over 

the entire image, and from these values, we observe that the 

average spectral distortion is typically quite small, even for low 

rate. On average, DWT + JPEG2000 incurs the most average 

Fig. 5. P ∗ and ˜P ∗ versus rate R. 

TABLE IV 

COMPARISON BETWEEN P ∗ AND ˜P ∗ AND THE RESULTING SNRS 

spectral distortion, while PCA + JPEG2000 using P † PCs 

incurs the least. 

In comparing Table II with Table III, it is interesting to see 

that P ∗ , which is the number of PCs yielding maximal SNR 

in Table II, is typically close to P † , which is the number of 

PCs yielding maximal detection performance in Table III. 

Often, these two values are identical. Consequently, we argue 

that, if we code the dataset using P ∗ PCs, we can use the 

reconstructed image for practical data analysis, such as target 

detection, because this same number of PCs should provide 

good performance in that sense as well. We use this observation 

as a motivation for developing a heuristic that estimates P ∗ in 

Section III-C. 

In order to further evaluate the detection performance, 

analysis of the receiver operating characteristic (ROC) is 

conducted. Target pixels are defined by thresholding the 

detection map produced from the original data with a threshold 

of half the maximum. Detection maps from the reconstructed 

data are then thresholded with thresholds ranging from 10% 

to 90% of maximum, and the detection and false-alarm 

probabilities are calculated for each threshold using the 

previously defined target pixels. Fig. 4 shows the resulting 

ROC curves for the Moffett dataset, which has a sufficient 

number of anomalous pixels for meaningful ROC analysis. 

To quantify the performance, the areas under the ROC curves 

are given in Table III for Moffett, with a larger area indicating

DU AND FOWLER: HYPERSPECTRAL IMAGE COMPRESSION USING JPEG2000 AND PCA 205 

TABLE V 

RATE-DISTORTION AND ANOMALY-DETECTION PERFORMANCE 

superior detection performance. We see that the ROC areas 

generally match the correlation coefficients in Table III. 

C. Linear Model for Dimensionality Reduction 

Let P ∗ be the number of PCs that yields maximal SNR for 

a given dataset. From Fig. 2, P ∗ clearly varies with the rate 

R of the coding. We thus propose a simple linear model on 

R as an estimate of P ∗ . This is illustrated in Fig. 5 in which 

wefitalinetotheP ∗ versus R curves for the three AVIRIS 

datasets; specifically, we estimate P ∗ as ˜P ∗ = αR + β, where 

α =87.7778 and β =16.2222 are constants. Even though the 

P ∗ versus R curves are rather nonlinear in Fig. 5, this linear 

model fits the curves quite closely at the lower rates, which is 

crucial, since, from Fig. 2, the performance suffers the most 

from not coding at P ∗ when the rate is low. 

As shown in Table IV, this linear model yields rather accurate 

results in some cases. For instance, when the bit rate is 

0.25 bpppb, the estimated ˜P ∗ values are very close to the true 

P ∗ values. In some cases, however, ˜P ∗ can be quite different 

from P ∗ . For instance, the Cuprite scene yields the P ∗ versus 

R curve furthest from the linear model, as shown in Fig. 5; at 

0.5 bpppb, the model gives ˜P ∗ =75, while P ∗ =57for this 

dataset. Fortunately, the difference in SNR between these two 

values is only about 0.05 dB, which is relatively negligible. As 

a consequence, we conclude that the model provides a simple 

yet effective estimate of P ∗ . 

To further evaluate the proposed linear model, Table V 

summarizes the SNR and ρ values resulting from using P ∗ 

(maximum SNR), P † (maximum ρ), and ˜P ∗ from the linear 

model. In many cases, P ∗ is close to P † ; in other cases, these 

two numbers are different, but the resulting performance is still 

close. Most importantly, ˜P ∗ , from the linear model, always 

yields comparable SNR and ρ. 

IV. CONCLUSION 

In this letter, we investigated the performance of spectral 

PCA in conjunction with JPEG2000 for hyperspectral image 

compression, comparing to the common approach of a spectral 

DWT followed by JPEG2000. Experimental results on AVIRIS 

datasets yield the following observations: 1) spectral decorrelation 

via PCA results in rate-distortion performance superior 

to that of a spectral DWT; 2) rate-distortion performance is 

further improved by retaining and coding only a subset of PCs 

rather than using all the PCs; 3) data-analysis performance, 

such as anomaly detection, is similarly maximized by coding 

a subset of PCs rather than all the PCs; 4) the number of 

PCs that yields maximum SNR for a given dataset is usually 

close to the number of PCs that yields maximum detection 

performance; and 5) a simple linear model can effectively 

estimate the optimal number of PCs. Consequently, we conclude 

that the capabilities of PCA for decorrelation simultaneous 

with dimensionality reduction offer the potential for an 

excellent data-compression performance, significantly superior 

to that obtained by other techniques based on 3-D wavelet 

transforms. 

REFERENCES 

[1] J. E. Fowler and J. T. Rucker, “3D wavelet-based compression of hyperspectral 

imagery,” in Hyperspectral Data Exploitation: Theory and Applications, 

C.-I Chang, Ed. Hoboken, NJ: Wiley, 2007. 

[2] J. T. Rucker, J. E. Fowler, and N. H. Younan, “JPEG2000 coding strategies 

for hyperspectral data,” in Proc. Int. Geosci. and Remote Sens. Symp., 

Seoul, Korea, Jul. 2005, vol. 1, pp. 128–131. 

[3] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Progressive 3-D coding of 

hyperspectral images based on JPEG2000,” IEEE Geosci. Remote Sens. 

Lett., vol. 3, no. 1, pp. 125–129, Jan. 2006. 

[4] P. Kulkarni, A. Bilgin, M. W. Marcellin, J. C. Dagher, J. H. Kasner, 

T. J. Flohr, and J. C. Rountree, “Compression of earth science data with 

JPEG2000,” in Hyperspectral Image Compression, G. Motta, F. Rizzo, and 

J. A. Storer, Eds. New York: Springer-Verlag, 2006, ch. 12, pp. 347–378. 

[5] M. R. Pickering and M. J. Ryan, “An architecture for the compression of 

hyperspectral imagery,” in Hyperspectral Image Compression, G. Motta, 

F. Rizzo, and J. A. Storer, Eds. New York: Springer-Verlag, 2006, ch. 1, 

pp. 1–34. 

[6] D. Taubman, “High performance scalable image compression with 

EBCOT,” IEEE Trans. Image Process., vol. 9, no. 7, pp. 1158–1170, 

Jul. 2000. 

[7] C.-I Chang and S.-S. Chiang, “Anomaly detection and classification for 

hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 6, 

pp. 1314–1325, Jun. 2002.

Hyperspectral Image Compression Using JPEG2000 and Principal ...

Create successful ePaper yourself

Delete template?

Save as template?