12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Scoring values: The original data included the physiological score <strong>and</strong> operative severity score valuestaken from the POSSUM <strong>and</strong> PPOSSUM systems. Furthermore, the data in this site includes enoughinformation for use separately in the POSSUM <strong>and</strong> PPOSSUM calculations.The valid or missing value frequencies of some significant attributes can be seen in Table 5.1. Theseattributes are labelled as "PATIENT_STATUS"; "Heart Disease”; "Diabetes”; <strong>and</strong> “Stroke”. Theseattributes are highlighted in the research of collaborative clinicians (Kuhan et al., 2001), as some of themain factors expected to contribute to the outcomes for patient risks.NumberValidMissingDiabetes Heart Disease Stroke PATIENT_STATUS497 497 497 4981 1 1 0Table 5.1: The frequencies of significant attributes in the Hull site data.It is clear from Table 5.1 that there is one case (1 out of 498) that includes missing values in all thesignificant attributes except the “PATIENT_STATUS” attribute. However, the “PATIENT_STATUS”attribute is the most significant, <strong>and</strong> this attribute will be the main factor for outcome calculations inlater chapters. Therefore, this case with some missing values will not be eliminated. Its missing valueswill be filled by the use of data mining method (see detail in “<strong>Data</strong> Selection Strategy” section).Dundee Site <strong>Data</strong>This data includes 57 attributes, <strong>and</strong> 341 cases from cardiovascular patients at the Dundee site. Thedetailed structure of the original data can be seen in Appendix A. This data site has similarcharacteristics to the Hull site such as redundant attributes, missing values, <strong>and</strong> noisy values. Themethod of data treatments such as elimination of redundant attributes, filling the missing data, <strong>and</strong> soon is based on the strategy indicated in “<strong>Data</strong> Selection Strategy” section below. The maincharacteristics can be seen as follows:Redundant attributes: For example, the attribute “ADMISSION_DATE” shows patient‟s operationdate; or the two attributes “Surgeon name1” <strong>and</strong> “Surgeon name2” represents names of operatingdoctors. Their values might be helpful in a general evaluation, but offer little relevance to the specificpurposes of this thesis.67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!