12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Discrete (categorical) attributes: the similarity measure between two patterns depends on the numberof similar values in the categorical attribute (Kaufman & Rousseeuw, 1990). This means thedissimilarity is a number of different values in this categorical attribute. It is given by:dissim(x , x ) d(x , x ) ijijm2k1(xik, xjk)i,j 1,2..n(4.15);0if xik x jkwhere ( xik , x jk ) , k 1,2,..m2;i,j 1,2..,n , <strong>and</strong> m1 if xikx2 is number of categoricaljkattributes.Boolean attributes: The dissimilarity measures are calculated as in the categorical or continuousattributes according to the interpretation of the attribute.Centre VectorsAssume that the data attribute set includes continuous <strong>and</strong> discrete attributes. Note that Booleanattributes are treated as continuous or discrete as indicated above. Therefore, there are two types ofcentre vectors. Assume that m attributes contains the p first continuous attributes; <strong>and</strong> m-p remainingdiscrete attributes. This means each pattern X in the input space can be seen as:X=(x i1 , x i2, …x ip , x ip+1, x ip+2, … x im ) )If Q is a centre vector for the sub data set C, Q can be represented as:Q= (q j1, q j2, … q jp , q jp+1, q jp+2, …, q jm )The task now is to find p continuous attribute values, <strong>and</strong> m-p discrete attribute values for centre vectorQ. According to Han (1981), these centre attribute values can be calculated as follows:Continuous attribute: The centre values {q jk } k=1,..p ,={mean k }, where mean k is the average of k thattribute.Discrete attribute: The centre values {q jk } k=p+1,.., m ={mode k }, where mode k is the “mode” of k thattribute.Definition 1: A vector Q is a “mode vector” of a data set C = (X 1, X 2, … X c ) , c

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!