Download Full Issue in PDF - Academy Publisher
Download Full Issue in PDF - Academy Publisher
Download Full Issue in PDF - Academy Publisher
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1522 JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013<br />
category. But it just makes a rough analysis on leaves<br />
shapes, so the next step is ref<strong>in</strong>ed analysis.<br />
C. Classification Model of Leaves 2<br />
Then factor analysis is made on tree leaf shapes with<strong>in</strong><br />
one category to calculate factor score, which is used for<br />
cluster<strong>in</strong>g. This k<strong>in</strong>d of cluster<strong>in</strong>g analysis method is<br />
ref<strong>in</strong>ed. We know that there are several dozens of factors<br />
describ<strong>in</strong>g leaf shapes, such as leaf shape, leaf width, leaf<br />
length, leaf ve<strong>in</strong>, etc., but we know that the length of<br />
ve<strong>in</strong>s <strong>in</strong> a certa<strong>in</strong> extent determ<strong>in</strong>es leaf length and leaf<br />
width. And some factors could be completely described<br />
by other factors, so we use the method of reduc<strong>in</strong>g<br />
dimension firstly and then cluster<strong>in</strong>g. We use factor<br />
analysis to reduce the dimension of <strong>in</strong>fluential factors to<br />
get factor score for cluster<strong>in</strong>g. This method not only can<br />
dist<strong>in</strong>guish well leaf shapes, but also can reduce the<br />
complexity of the analyzed problem.<br />
The mathematical model for factor analysis is as<br />
follows:<br />
⎧X1 = a11F1 + a12F2 + + a1 mFm<br />
+ ε1<br />
⎪X2 = a21F1+ a22F2 + + a2 mFm<br />
+ ε<br />
2<br />
⎨<br />
, (1)<br />
⎪<br />
<br />
⎪<br />
⎩XP = aP 1F1 + aP2F2<br />
+ + aPmFm + ε<br />
P<br />
represented with matrix:<br />
⎡X1 ⎤ ⎡a11 a12 a1 m ⎤ ⎡F1<br />
⎤ ⎡ε1<br />
⎤<br />
⎢<br />
X<br />
⎥ ⎢ ⎥<br />
2<br />
a21 a22 a<br />
⎢<br />
2m<br />
F<br />
⎥ ⎢<br />
2<br />
ε<br />
⎥<br />
⎢ ⎥ ⎢<br />
<br />
2<br />
= ⎥ ⎢ ⎥+<br />
⎢ ⎥ .<br />
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢<br />
⎥<br />
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥<br />
⎣X P⎦ ⎣aP1 aP2<br />
aPm⎦<br />
⎣Fm<br />
⎦ ⎣ε<br />
P⎦<br />
Simply recorded as:<br />
And meet:<br />
1) m≤ P;<br />
2) cov ( F, ε ) = 0 ;<br />
X = AF + ε . (2)<br />
⎡1 0 ⎤<br />
3) D( F)<br />
=<br />
⎢ ⎥<br />
⎢<br />
<br />
⎥<br />
= Im<br />
;<br />
⎢⎣0 1 ⎥⎦<br />
F , , F is unrelated and variance are 1.<br />
1 m<br />
2<br />
⎡σ<br />
1<br />
0 ⎤<br />
⎢ ⎥<br />
4) D ( ε ) = ⎢ ⎥ .<br />
⎢<br />
2<br />
0 σ ⎥<br />
⎣ <br />
P ⎦<br />
ε<br />
1,<br />
,ε P<br />
denote unrelated and different variance.<br />
Among them is the P dimensional random vector as<br />
unobservable volume, comprised by P <strong>in</strong>dexes got <strong>in</strong><br />
F = F , F ′ is called common<br />
actual observation. ( )<br />
factor of ( )<br />
1<br />
,<br />
m<br />
X = X , , 1<br />
X ′<br />
P<br />
the above-mentioned<br />
<strong>in</strong>tegrated variable. A is factor load<strong>in</strong>g matrix, on which<br />
maximum variance rotation is made with variance, so that<br />
the structure of A simplified. In other words, the square<br />
value of every column elements of load<strong>in</strong>g matrix is<br />
made to polarization 0 or 1 or the more dispersed the<br />
contribution rate of public factor is the better is the result.<br />
Variables got from factor analysis are represented as<br />
l<strong>in</strong>ear comb<strong>in</strong>ation of public factors:<br />
Xi = ai1F1+ ai2F2 + + aimFm<br />
+ εi<br />
i = 1,2, ,P<br />
(3)<br />
But usually when public factors are used to represent<br />
the orig<strong>in</strong>al variables, it is more convenient to describe<br />
the characteristics of research object. Therefore, public<br />
factors are represented as l<strong>in</strong>ear comb<strong>in</strong>ation of variables,<br />
i.e., the factor score function, namely<br />
F′= β X + β X + + β X<br />
j j1 1 j2 2 jP P<br />
j= 1,2, ,m<br />
(4)<br />
We calculated m factor score for each left samples.<br />
Use the score of these m factors as a variable value to<br />
cluster different leaves with the method of K-means<br />
Cluster.<br />
D. Cluster<strong>in</strong>g Error Estimation<br />
We have given the evaluation method for judg<strong>in</strong>g<br />
cluster<strong>in</strong>g effect. Usually we use back substitution<br />
misjudgment probability and cross misjudgment<br />
probability. If the number of misjudg<strong>in</strong>g samples belong<br />
to G 1 as belong to G2<br />
is N<br />
1<br />
, and the number of<br />
misjudg<strong>in</strong>g samples belong to G 2 as belong to G 1 is N<br />
2<br />
,<br />
the total number of samples of the two general<br />
classifications is n ,Then misjudgment probability is:<br />
N1+<br />
N2<br />
p = (5)<br />
n<br />
Back substitution misjudgment probability<br />
Set G<br />
1<br />
, G<br />
2<br />
as two general classifications,<br />
X , , 1<br />
X<br />
m<br />
and Y , , 1<br />
Yn<br />
are tra<strong>in</strong><strong>in</strong>g samples from<br />
G<br />
1<br />
, G<br />
2<br />
respectively, with all the tra<strong>in</strong><strong>in</strong>g samples used as<br />
m+ n new samples, which is substituted gradually <strong>in</strong>to<br />
established criterion for judg<strong>in</strong>g the ownership of the new<br />
samples. The process is called back substitution. If the<br />
number of misjudg<strong>in</strong>g samples belong to G 1<br />
as belong<br />
to G<br />
2<br />
is N<br />
1<br />
, and the number of misjudg<strong>in</strong>g samples<br />
belong to G 2<br />
as belong to G 1<br />
is N<br />
2<br />
, then misjudgment<br />
probability is:<br />
N1+<br />
N2<br />
pˆ<br />
=<br />
m+<br />
n<br />
Cross judgment probability<br />
Back to generation misjudgment probability is to<br />
elim<strong>in</strong>ate a sample every time, and use the rest<br />
of m+ n− 1 tra<strong>in</strong><strong>in</strong>g samples to establish a criterion for<br />
judgment, then use established criterion to make<br />
judgment on deleted samples. The above-mentioned<br />
analysis is made on each sample of those tra<strong>in</strong><strong>in</strong>g samples,<br />
and uses its misjudgment proportion as the misjudgment<br />
probability. The specific procedure is as follows:<br />
1) From tra<strong>in</strong><strong>in</strong>g samples <strong>in</strong> general classification G 1 ,<br />
elim<strong>in</strong>ate one of the samples, and use the rest of the<br />
© 2013 ACADEMY PUBLISHER