12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Step 1 (Selection): The data set is selected from the Dundee site with 18 attributes, <strong>and</strong> 341 cases.Step 2 (Clean/Transform/Filter):Cleaning task: Missing values are filled <strong>using</strong> the same method as in the previous section.Transformation task: This task requires only numerical attributes. Hence, the continuous datais rescaled into the range of [0, 1] <strong>using</strong> the linear equation as indicated above. The other(categorical <strong>and</strong> Boolean) attributes are ignored in this step.Filtering task: The following heuristic decision rules are applied based on the two attributes“PATIENT STATUS” <strong>and</strong> “COMBINE”. The model CM3aD has two levels of risks (“Highrisk”, “Low risk”) as given by:(PATIENT STATUS, COMBINE) = 0 “Low risk”(PATIENT STATUS, COMBINE) 1 ”High risk”The model CM3bD has three levels of risks (“High risk”, “Medium risk”, “Low Risk”) as given by:(PATIENT STATUS, COMBINE) = 0 “Low risk”(PATIENT STATUS, COMBINE) = 1 ”Medium risk”(PATIENT STATUS, COMBINE) = 2 ”High risk”Step 3 (<strong>Data</strong> Mining Techniques): The clustering algorithm KMIX is used in this step with bothmodels without the expected outputs indicated above. The number of required clusters is 2 <strong>and</strong> 3according to the model CM3aD <strong>and</strong> CM3bD respectively.Step 4 (Compared/ Evaluation): The clustering results are compared to the expected outcomesdefined in step 2 by <strong>using</strong> st<strong>and</strong>ard measures such as confusion matrix, sensitivity, specificity rates, <strong>and</strong>so on.Step 5 (Building Clustering Models): The new clustering models of CM3aDC <strong>and</strong> CM3bDC are builtbased on the KMIX results. This means the input set is the same as in the models of CM3aD <strong>and</strong>CM3bD. However, these new models‟ outcomes are derived from the KMIX results. Both new models,CM3aDC <strong>and</strong> CM3bDC, are then applied again from step 3 in the thesis framework. A new process iscreated for a request for the use of supervised neural network techniques. The results are then measured<strong>and</strong> evaluated with st<strong>and</strong>ard measures.77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!