12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

d(C , Q)is minimized.ci1d(X , Q)i(4.16)According to the theory in Huang (1998), the Equation (4.16) is minimized if the frequency of value q kin data set C, for k th attribute, is equal or greater than the frequency of all different x ik such that x ik q k .Therefore, we can choose the mode vectors of m-p attributes as the highest frequency values in theseattributes. Their forms can be seen as follows:{q jk } = mode k= {“max freq” Val Ck }, k=p+1,.., m. (4.17)Accuracy MeasureThe accuracy (Acc) for measuring the quality of clustering algorithm is given by:Acc Ki1nai(4.18)where n is the number of samples in the dataset, a i is the number of data samples occurring in bothcluster i <strong>and</strong> its corresponding class, <strong>and</strong> K is number of clusters.Consequently, the clustering error (err) is defined as:err 1Acc(4.19).4.3.2.2 KMIX AlgorithmStep 1: Initialise K clusters according to K partitions of data set.Step 2: Update K centre vectors in the new data set ( in the first time, the centre vectors are calculated)Q j = (q N j1, q N j2, …, q N jp, q C jp+1, …, q C jm), j = 1, 2, …, kwhere {q N ji} i=1,2..p = {mean N ji } (mean of i th attribute in cluster j),<strong>and</strong> {q C ji} i=p+1,..m ={mode C ji} (max freq value in attribute i th in cluster j).Step 3: Update clusters as the following tasks:Calculate the distance between X i in i th cluster to K centre vectors:d(X i ,Q j ) = d N (X i ,Q j ) + d C (X i, Q j ); j=1,2,..k53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!