Modified Fisher's Linear Discriminant Analysis for ... - IEEE Xplore
Modified Fisher's Linear Discriminant Analysis for ... - IEEE Xplore
Modified Fisher's Linear Discriminant Analysis for ... - IEEE Xplore
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
504 <strong>IEEE</strong> GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 4, NO. 4, OCTOBER 2007<br />
II. MFLDA<br />
Let the total scatter matrix S T be defined as<br />
S T =<br />
n∑<br />
(r i − µ)(r i − µ) T (4)<br />
i=1<br />
and it can be related with S W and S B by [1]<br />
S T = S W + S B . (5)<br />
So the maximization of (3) is equivalent to maximizing<br />
q ′ = wT S B w<br />
w T S T w . (6)<br />
Following the same idea of FLDA, the solution will be the<br />
eigenvectors of the generalized eigenproblem: S B w = λS T w.<br />
When the only available in<strong>for</strong>mation is the class signatures<br />
{s 1 , s 2 ,...,s p }, they can be treated as class means, i.e., M =<br />
[µ 1 µ 2 ···µ p ] ≈ [s 1 s 2 ···s p ].TheS B in (2) becomes<br />
Ŝ B =<br />
p∑<br />
(s j − ˆµ)(s j − ˆµ) T (7)<br />
j=1<br />
where ˆµ is the mean of class signatures, i.e., (1/p) ∑ p<br />
i=1 s i =<br />
ˆµ. S T in (4) can be replaced by the data covariance<br />
matrix Σ, i.e.,<br />
Ŝ T =Σ=<br />
N∑<br />
(r i − ˜µ)(r i − ˜µ) T (8)<br />
i=1<br />
where ˜µ is the sample mean of the entire data set with N pixels,<br />
i.e., (1/N ) ∑ N<br />
i=1 r i = ˜µ. Then, the solution is the eigenvectors<br />
of the generalized eigenproblem: ŜBw = λΣw or Σ −1 Ŝ B .<br />
Regardless of the actual classes present in the data, replacing<br />
S T with Σ represents an extreme case, which means all the<br />
pixels are separated into the classes they belong to and selected<br />
as samples. Using ŜB as S B represents another extreme case,<br />
which means there is only one sample in each class. So the<br />
discrepancy incurred comes from two factors: only one sample<br />
(i.e., class signature) <strong>for</strong> each of the p classes is used to estimate<br />
S B , and all the pixels are used to estimate S T with the implicit<br />
assumption that pixels are put into all the existing classes<br />
including unknown background classes (i.e., the actual number<br />
of classes p T may be greater than p). In the experiments, it will<br />
be shown that the term Σ −1 is very effective in background<br />
suppression.<br />
Since the rank of ŜB is the same as S B , which is (p − 1),<br />
the dimensionality of the MFLDA-trans<strong>for</strong>med data is (p − 1)<br />
as that of FLDA. After the data are projected onto this (p − 1)-<br />
dimensional space, an algorithm is needed <strong>for</strong> some tasks, such<br />
as classification or detection. A less powerful distance-based<br />
classifier such as the Spectral Angle Mapper (SAM) can be<br />
applied. Or, a more powerful filter, such as target constrained<br />
interference minimized filter (TCIMF), may be used [6].<br />
III. RELATIONSHIP BETWEEN LDA-BASED APPROACHES<br />
A. Relationship Between FLDA and CFLDA<br />
The CFLDA in [5] imposed a constraint to align the class<br />
centers along with different directions [4], i.e.,<br />
w T l µ j = δ lj , <strong>for</strong> 1 ≤ l; j ≤ p. (9)<br />
This also means that the jth trans<strong>for</strong>m vector w j is <strong>for</strong> the<br />
jth class. So the CFLDA-trans<strong>for</strong>med data are actually classification<br />
maps. It can be derived that when the constraint<br />
was satisfied, w T S B w was a constant. Thus, the constrained<br />
problem would be to minimize w T S W w in (3) while satisfying<br />
the constraint in (9). Using the Lagrange multiplier approach,<br />
it was shown that the desired trans<strong>for</strong>m matrix W including all<br />
the p trans<strong>for</strong>m vectors is<br />
W CFLDA = S −1<br />
W M ( M T S −1<br />
W M) −1<br />
. (10)<br />
Obviously, the implementation of CFLDA requires the<br />
knowledge of the training samples of each class to compute<br />
S W .<br />
B. Relationship Between CFLDA, CLDA, and MFLDA<br />
Following the same idea of FLDA in maximizing the class<br />
separability, the CLDA in [2] and [3] imposed the same<br />
constraint that different classes were aligned along different<br />
directions as in (9). To make the constrained problem easier to<br />
solve, it employed the ratio of within-class and between-class<br />
distances instead of the Raleigh quotient [4]. It was proved that<br />
the trans<strong>for</strong>med within-class distance is a constant when the<br />
constraint in (9) was satisfied. It also used the data covariance<br />
matrix Σ to substitute S T as in MFLDA. It was proved that the<br />
trans<strong>for</strong>m matrix W is equivalent to [3]<br />
W CLDA =Σ −1 M(M T Σ −1 M) −1 . (11)<br />
Equation (11) is similar to (10) except that S W is replaced<br />
with Σ. There<strong>for</strong>e, CLDA does not require the training samples<br />
in each class and it needs the class signatures only. Similar to<br />
CFLDA, CLDA was designed <strong>for</strong> classification, so the classification<br />
maps were obtained right after the trans<strong>for</strong>m.<br />
C. Use of Σ and S W<br />
Both CFLDA and CLDA apply the constraint in (9), resulting<br />
in the similar operators in (10) and (11) with the difference<br />
that CLDA uses Σ while CFLDA uses S W . So CLDA does<br />
not require the training samples, which is the same as in<br />
MFLDA. There is another benefit of using Σ. As mentioned<br />
earlier, the true number of classes present in an image scene<br />
p T is greater than p due to the difficulty of exhausting all the<br />
present classes, in particular, those background classes. In the<br />
ideal case when all the pixels in an image scene are put into<br />
the p T classes, S T =Σ. There<strong>for</strong>e, using Σ in LDA-based<br />
approaches represents the best situation <strong>for</strong> S T , which means