12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The results in Figure 8.3 show that some attributes have high rank according to the Relief whereas theirranks are low corresponding to the mutual information calculations. For example, the Relief algorithmranked the attribute “R1 PAT” as the first whereas the first ranked by the mutual information is the“CARDIAC_FAIL” attribute.Overall, in Figures 8.2 <strong>and</strong> 8.3, the measurements, between each attribute in the data set (CM2,CM3aD) for the outcomes, are very similar <strong>using</strong> either Relief or mutual information algorithms. Thismeans the rank from the use of mutual information calculations is nearly the same as the popular Reliefalgorithm except The advantage of <strong>using</strong> mutual information over Relief is that this algorithm canshow the weight values from each attribute directly to outcome classes whereas Relief weight valuesare based on the distinguishing samples that are near each other in the same class. Hence, mutualinformation seems to be simpler to use than the Relief algorithm. Moreover, mutual informationalgorithm represents an interesting combination between pattern recognition concepts (a pattern isrepresented in the attribute dimensional space), Bayes' theory, <strong>and</strong> mutual information.8.5. Mutual Information <strong>and</strong> ClusteringThis section demonstrates the use of mutual information in the KMIX clustering algorithm. The hope isthat the KMIX results can be improved by <strong>using</strong> the attribute weights (mutual information values)inside the clustering process.8.5.1. The Weighted KMIX Algorithm (WKMIX)The idea behind the Weighted KMIX Algorithm (WKMIX) is derived from the contributions of Huang(1997) where the weights are applied to the categorical attributes. According to Huang (1997), thechoice of the weight depends on how many numeric attributes are allocated in the data domain. Theweight is normally chosen as the overall average st<strong>and</strong>ard deviation of numeric attributes. Therefore,the weights do not clearly reflect the relationship between the data attributes <strong>and</strong> the clusters.Moreover, there always exists an influence of data attributes to the outcome risks for patients inmedical domains. Therefore, the data attributes will have an influence on the clusters in the clusteringprocess. The use of mutual information enhances the alternative significant levels of the data attributescontributing to the outcomes as shown in the previous section. It is suggested that the combination ofthe KMIX algorithm <strong>and</strong> the weights derived from the mutual information might improve the clusteringprocess. This is like supervised clustering, where the attributes‟ contributions to the outcomes are134

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!