25.01.2015 Views

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1522 JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013<br />

category. But it just makes a rough analysis on leaves<br />

shapes, so the next step is ref<strong>in</strong>ed analysis.<br />

C. Classification Model of Leaves 2<br />

Then factor analysis is made on tree leaf shapes with<strong>in</strong><br />

one category to calculate factor score, which is used for<br />

cluster<strong>in</strong>g. This k<strong>in</strong>d of cluster<strong>in</strong>g analysis method is<br />

ref<strong>in</strong>ed. We know that there are several dozens of factors<br />

describ<strong>in</strong>g leaf shapes, such as leaf shape, leaf width, leaf<br />

length, leaf ve<strong>in</strong>, etc., but we know that the length of<br />

ve<strong>in</strong>s <strong>in</strong> a certa<strong>in</strong> extent determ<strong>in</strong>es leaf length and leaf<br />

width. And some factors could be completely described<br />

by other factors, so we use the method of reduc<strong>in</strong>g<br />

dimension firstly and then cluster<strong>in</strong>g. We use factor<br />

analysis to reduce the dimension of <strong>in</strong>fluential factors to<br />

get factor score for cluster<strong>in</strong>g. This method not only can<br />

dist<strong>in</strong>guish well leaf shapes, but also can reduce the<br />

complexity of the analyzed problem.<br />

The mathematical model for factor analysis is as<br />

follows:<br />

⎧X1 = a11F1 + a12F2 + + a1 mFm<br />

+ ε1<br />

⎪X2 = a21F1+ a22F2 + + a2 mFm<br />

+ ε<br />

2<br />

⎨<br />

, (1)<br />

⎪<br />

<br />

⎪<br />

⎩XP = aP 1F1 + aP2F2<br />

+ + aPmFm + ε<br />

P<br />

represented with matrix:<br />

⎡X1 ⎤ ⎡a11 a12 a1 m ⎤ ⎡F1<br />

⎤ ⎡ε1<br />

⎤<br />

⎢<br />

X<br />

⎥ ⎢ ⎥<br />

2<br />

a21 a22 a<br />

⎢<br />

2m<br />

F<br />

⎥ ⎢<br />

2<br />

ε<br />

⎥<br />

⎢ ⎥ ⎢<br />

<br />

2<br />

= ⎥ ⎢ ⎥+<br />

⎢ ⎥ .<br />

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢<br />

⎥<br />

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥<br />

⎣X P⎦ ⎣aP1 aP2<br />

aPm⎦<br />

⎣Fm<br />

⎦ ⎣ε<br />

P⎦<br />

Simply recorded as:<br />

And meet:<br />

1) m≤ P;<br />

2) cov ( F, ε ) = 0 ;<br />

X = AF + ε . (2)<br />

⎡1 0 ⎤<br />

3) D( F)<br />

=<br />

⎢ ⎥<br />

⎢<br />

<br />

⎥<br />

= Im<br />

;<br />

⎢⎣0 1 ⎥⎦<br />

F , , F is unrelated and variance are 1.<br />

1 m<br />

2<br />

⎡σ<br />

1<br />

0 ⎤<br />

⎢ ⎥<br />

4) D ( ε ) = ⎢ ⎥ .<br />

⎢<br />

2<br />

0 σ ⎥<br />

⎣ <br />

P ⎦<br />

ε<br />

1,<br />

,ε P<br />

denote unrelated and different variance.<br />

Among them is the P dimensional random vector as<br />

unobservable volume, comprised by P <strong>in</strong>dexes got <strong>in</strong><br />

F = F , F ′ is called common<br />

actual observation. ( )<br />

factor of ( )<br />

1<br />

,<br />

m<br />

X = X , , 1<br />

X ′<br />

P<br />

the above-mentioned<br />

<strong>in</strong>tegrated variable. A is factor load<strong>in</strong>g matrix, on which<br />

maximum variance rotation is made with variance, so that<br />

the structure of A simplified. In other words, the square<br />

value of every column elements of load<strong>in</strong>g matrix is<br />

made to polarization 0 or 1 or the more dispersed the<br />

contribution rate of public factor is the better is the result.<br />

Variables got from factor analysis are represented as<br />

l<strong>in</strong>ear comb<strong>in</strong>ation of public factors:<br />

Xi = ai1F1+ ai2F2 + + aimFm<br />

+ εi<br />

i = 1,2, ,P<br />

(3)<br />

But usually when public factors are used to represent<br />

the orig<strong>in</strong>al variables, it is more convenient to describe<br />

the characteristics of research object. Therefore, public<br />

factors are represented as l<strong>in</strong>ear comb<strong>in</strong>ation of variables,<br />

i.e., the factor score function, namely<br />

F′= β X + β X + + β X<br />

j j1 1 j2 2 jP P<br />

j= 1,2, ,m<br />

(4)<br />

We calculated m factor score for each left samples.<br />

Use the score of these m factors as a variable value to<br />

cluster different leaves with the method of K-means<br />

Cluster.<br />

D. Cluster<strong>in</strong>g Error Estimation<br />

We have given the evaluation method for judg<strong>in</strong>g<br />

cluster<strong>in</strong>g effect. Usually we use back substitution<br />

misjudgment probability and cross misjudgment<br />

probability. If the number of misjudg<strong>in</strong>g samples belong<br />

to G 1 as belong to G2<br />

is N<br />

1<br />

, and the number of<br />

misjudg<strong>in</strong>g samples belong to G 2 as belong to G 1 is N<br />

2<br />

,<br />

the total number of samples of the two general<br />

classifications is n ,Then misjudgment probability is:<br />

N1+<br />

N2<br />

p = (5)<br />

n<br />

Back substitution misjudgment probability<br />

Set G<br />

1<br />

, G<br />

2<br />

as two general classifications,<br />

X , , 1<br />

X<br />

m<br />

and Y , , 1<br />

Yn<br />

are tra<strong>in</strong><strong>in</strong>g samples from<br />

G<br />

1<br />

, G<br />

2<br />

respectively, with all the tra<strong>in</strong><strong>in</strong>g samples used as<br />

m+ n new samples, which is substituted gradually <strong>in</strong>to<br />

established criterion for judg<strong>in</strong>g the ownership of the new<br />

samples. The process is called back substitution. If the<br />

number of misjudg<strong>in</strong>g samples belong to G 1<br />

as belong<br />

to G<br />

2<br />

is N<br />

1<br />

, and the number of misjudg<strong>in</strong>g samples<br />

belong to G 2<br />

as belong to G 1<br />

is N<br />

2<br />

, then misjudgment<br />

probability is:<br />

N1+<br />

N2<br />

pˆ<br />

=<br />

m+<br />

n<br />

Cross judgment probability<br />

Back to generation misjudgment probability is to<br />

elim<strong>in</strong>ate a sample every time, and use the rest<br />

of m+ n− 1 tra<strong>in</strong><strong>in</strong>g samples to establish a criterion for<br />

judgment, then use established criterion to make<br />

judgment on deleted samples. The above-mentioned<br />

analysis is made on each sample of those tra<strong>in</strong><strong>in</strong>g samples,<br />

and uses its misjudgment proportion as the misjudgment<br />

probability. The specific procedure is as follows:<br />

1) From tra<strong>in</strong><strong>in</strong>g samples <strong>in</strong> general classification G 1 ,<br />

elim<strong>in</strong>ate one of the samples, and use the rest of the<br />

© 2013 ACADEMY PUBLISHER

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!