Spectral Feature Extraction - Cornell University

CEE 615: Digital Image Processing 1 

W. Philpot, Cornell University 

Feature Extraction 

General Goal: Classification / Pattern Recognition 

Features are functions of the original measurement variables that are useful for classification 

and/or pattern recognition. 

Feature extraction is the process of defining a set of features, or image characteristics, which 

will most efficiently or meaningfully represent the information that is important for 

analysis and classification. 

• In remote sensing the spectral bands and the spectral and spatial resolution of satellite 

images (and most aircraft images) are predetermined. 

• The user may only have the option of selecting a subset of the data. 

• Much of the information in the data set may be of little value for discrimination. Indeed, 

pattern recognition using the original measurements is frequently inefficient and may 

even obscure interpretation. 

1. Spatial vs. spectral information Gray values represent only a small portion of the 

information content of most images. The bulk of the image information is usually spatial in 

nature. 

2. Redundant information In spectral data, much of the information is repeated from image to 

image. This redundancy complicates analysis and classification unnecessarily. 

3. Information vs. useful information Much of the real variability (information?) in the 

image data is of little or no value for a given classification problem: it may represent 

random or systematic variability in the target of interest, or it may be due to changes in the 

"background" targets which are not of immediate interest. 

The goal of feature extraction is to improve the effectiveness and efficiency of analysis and 

classification. This may be done by: 

1) eliminating redundancy in the image data. 

2) eliminating variability in the image data that is of little of no value in classification -- even 

discarding entire images if that is appropriate. 

3) restructuring the data (in feature space) in order to optimize the performance of the 

classifier. 

4) extracting spatial information (texture, size, shape, ...) which is crucial to target 

identification. 

That is, one would like to 

1) minimize the number (and detail) of the features, 

2) maximize pattern discrimination.



Spatial pattern ==> shape, size, position, orientation, texture, ... 

In considering spatial pattern will concentrate on texture as the primary feature of interest. 

- shape, size, position, orientation are not terribly consistent in much of remote sensing 

(forest, agricultural fields, ...) 

- texture is one of the more important features in visual pattern recognition 

Comments: 

• No formal, universally accepted definition of texture exists. 

• Yet texture is intuitively recognizable and is an attribute that is crucial in 

photointerpretation. 

• Texture is usually described in very qualitative terms: 

smooth, uniform, flat - coarse, grainy 

even - uneven 

regular - irregular 

periodic - random 

Smoothing + thresholding 

Smoothing + gradient 

Gradient + smoothing



Spectral Feature Extraction 

Band (color) space vs. Feature space 

band 3 

texture feature 

band 1 

band 2 

Band (color) space 

band 1 

Single image operations 

• For a single image, "spectral" feature extraction refers to 

selecting a gray value or a range of gray values. 

• This is done by compressing the data in some manner 

(e.g., density slicing, thresholding, etc.) The result is a 

smaller feature space. 

• The ultimate in simplicity is a binary image (1-bit). The 

binary image identifies only the presence of absence 

of a single feature (e.g., water, clouds) 

band 2/band 3 

Feature space 

1-D histogram 

1-dimensional image histogram, h(k), maps the 

frequency of occurrence of a gray value, k, in an image. 

1-D pdf 

1-dimensional image histogram, h(k), maps the 

frequency of occurrence of a gray value, k, in an image. 

The probability density function (pdf), specifies the 

probability that a pixel picked at 

random from the image will have a 

particular gray value. 

p(k) ≈ h(k)/N 

where N = # pixels in the image



2-D histogram & pdf 

A 2-dimensional histogram, h(k 1 ,k 2 ) is a count of the # of pixels having a gray value of k 1 in 

band 1 and a value of k 2 in band 2. 

A 2-dimensional probability density function, p(k 1 ,k 2 ) represents the frequency with which 

pixels have a gray value of k 1 in band 1 and a value of k 2 in band 2: 

p(k) = p(k 1 , k 2 ) = h(k 1 , k 2 )/N 

where: p(k 1 , k 2 ) is the joint pdf, and 

k is a vector 

n-D pdf 

An n-dimensional probability density function, p(k) represents the frequency with which pixels 

have a particular set of gray values in an n-band image set: 

p(k) = p(k 1 , k 2 , … , k n ) = h(k)/N 

• Histograms contain NO spatial information 

• The size of a histogram is independent of the size of the image -- it depends only on the bit 

depth of the image. 

1. Selecting Spectral Features 2. Eliminate redundancy among variables



3. Algebraic Operations 

Algebraic operations produce a single output image as a result of a pixel-by-pixel sum, 

difference, product or quotient of two or more input images. 

Note: input images must be accurately registered. 

Concepts: 2-Dimensional histogram 

Image addition - tends to extract information that is positively correlated from image to image. 

Removes information that is uncorrelated from image to image. The output image gray value, k', 

is given by: 

k' = a 0 + a 1 k 1 + a 2 k 2 + . . . + a n k n 

where: 

k i = the gray value of a pixel in the i th image. 

a i = weighting coefficient for the i th image with a i > 0 

• Addition is often applied to data when the important information is correlated from band to 

band. 

• If the uncorrelated information is random noise, addition will tend to remove the random 

noise. 

• If the data are not noisy and image 1 and image 2 are different spectral bands, then: 

uncorrelated data ==> "color" information 

correlated data ==> intensity information. 

Example 1: Removal of random noise: 

If k 1 is the gray value of an image 

contaminated with random noise, then we 

might write: 

k 1 = k 0 + n 1 

signal+ noise 

Note that k 0 > 0 ; n 0 may be positive or 

negative. Similarly, for a second image of 

the same scene: 

k 2 = k 0 + n 2 

Taking the average value of the two images gives: 

Figure 1. Telescope photograph of a galaxy 

1 1 

k' = ( k + k ) = k + ( n + n ) 

1 2 0 1 2 

2 2 

Generalizing to the case when t images of the same scene are available: 

1 1 

k' = ( k + k + + k ) = k + ( n + n + + n ) 

1 2 n 0 1 2 t 

2 t 

Since n i may be positive or negative with a randomly varying magnitude, 

1 

noise = ( n + n + + n ) →0 as t →∞ 

1 2 t 

t



Example 2: Spectral Addition: 

Adding gray values from different spectral bands. 

2-band case: k' = a 0 + a 1 k 1 + a 2 k 2 

• The vector, a = (a 1 ,a 2 ), defines the direction of changing gray value. 

• The coefficients may be defined by choosing 

(k 11 , k 21 ) => k min = minimum gray value in the new image, and 

(k 12 , k 22 ) => k max = maximum gray value in the new image. 

The coefficients are: 

a 2 = m a 1 where: m = slope of the best fit straight line. 

= (k 22 -k 21 )/(k 12 -k 11 ) 

a 

1 

= 

k 

max 

− k 

min 

( k − k ) + m( k − k ) 

12 11 22 21 

a 0 = k min – a 1 (k 11 + m k 21 ) 

• Spectral addition can be regarded as a partitioning of the measurement space. 

• The space is partitioned into plane-parallel regions which are perpendicular to the direction 

of the coefficient vector a = (a 1 , a 2 ). 

• Intensity information extraction will be optimized if the coefficient vector, is parallel to the 

direction of correlation.



Image subtraction 

Image subtraction - tends to extract information that is uncorrelated (or negatively correlated 

from image to image. The output image gray value, k', is given by: 

k' = a 1 k 1 + a 2 k 2 

where: k 1,k 2 = gray values of pixel in image 1 & 2. 

a 1,a 2 = weighting coefficients, with 

a 1 > 0 and a 2 < 0 

(or vice versa) 

• Subtraction is often applied to data when the important information is uncorrelated from 

band to band. 

• For two different spectral bands, uncorrelated 

information is the spectral (color) difference 

between the two images. 

• If the two images collected at two different times, 

the uncorrelated information represents the 

temporal difference between the two images. 

Motion 

detection 

Castleman (1979) Digital Image Processing 

Motion detection (capillaries) 

Subtraction: Change detection 

difference image threshold image 

Moik (1980) Digital Processing of Remotely Sensed Images



Spectral Differencing (subtraction) 

Subtracting gray values from different spectral bands. 

k' = a 0 + a 1 k 1 + a 2 k 2 

• The vector, a = (a 1 ,a 2 ), defines the direction of changing gray value: a 2 = m a 1 

• The coefficients may be defined by choosing 

(k 11 , k 21 ) => k min = minimum gray value in the new image, and 

(k 12 , k 22 ) => k max = maximum gray value in the new image. 

The coefficients are given by: 

a 2 = m a 1 

where: 

m = slope of a line parallel to a 

a 

1 

= 

k 

max 

− k 

min 

( k − k ) + m( k − k ) 

12 11 22 21 

a 0 = k min – a 1 (k 11 + m k 21 ) 

k min = a 0 + a 1 k 11 + a 2 k 21 

k max = a 0 + a 1 k 12 + a 2 k 22 

• Spectral differencing is another partitioning of the measurement space. 

• The space is partitioned into plane-parallel regions which are perpendicular to the direction 

of the coefficient vector a = (a 1 , a 2 ). 

• Color information extraction will be optimized if the coefficient vector, is perpendicular 

to the direction of highest correlation. 

Band 1 Band 2 Band 2 – Band 1 

• Pixels that are bright in the difference image are brighter in band 2. 

• Pixels that are dark in the difference image are brighter in band 1. 

• Gray indicates little or no color change.



Image ratioing (division) 

• Ratioing tends to extract information that is uncorrelated from image to image. 

• Ideally, ratioing adjusts for differences in intensity while emphasizing color differences. 

• The output image gray value, k o , is given by: 

k 0 = ak 1 /k 2 

where: k 1 , k 2 = gray value for image 1 and image 2, respectively 

a = coefficient 

• Ratioing partitions measurement space into radial segments. 

• A simple ratio will partition the space into unequal radial segments which emanate from an 

origin at (0,0). 

• The implicit assumption is that (0,0) represents black (zero intensity) and that distance 

from (0,0) corresponds to an increase in intensity. 

Consider the assumption that: 

- (0,0) represents black (zero intensity) and that 

- distance from (0,0) corresponds to an increase in intensity. 

• In remote sensing imagery this is frequently not the case, 

largely due to atmospheric effects on the observed 

radiation. The true zero intensity point is usually offset in 

measurement space. 

• It would also be more effective to make the partitioning 

uniform 

A reasonable scaling is given by: 

⎡ ⎛ ⎞ ⎤ 

min ⎥ 

⎣ ⎝ ⎠ ⎦ 

n − n 

f 0 −1 

k − b 

2 2 

k ' = tan 

−θ 

( θ −θ ) 

⎢ ⎜ ⎟ 

k −b 

max min 1 1 

where: 

q min = minimum angle 

q max = maximum angle 

n f = maximum gray value in output image (usually 255) 

n 0 = minimum gray value in output image (usually 0) 

b 1 , b 0 = offsets of the zero-intensity point in measurement space.

Spectral Feature Extraction - Cornell University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?