12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

attributes are best removed. By the use of this method, the attributes in the Hull site are reduced from98 (original) attributes to 86 "meaningful" attributes. Similarly, the original 57 attributes in the Dundeesite are reduced to 36 "meaningful" attributes.LOWEST_BPN Valid 4Missing 494Table 5.3: The frequencies of LOWEST_BP attribute in the Hull site.<strong>Data</strong> Cleaning StageThe most significant work in this stage is dealing with missing data values. There are many methods todeal with missing data values such as linear regression, decision tree computation, st<strong>and</strong>ard deviation,mean-mode method, <strong>and</strong> so on. The detail for each method except the mean-mode is not describedhere. Detail of these methods can be seen in Pyle (1999) <strong>and</strong> Han <strong>and</strong> Kamber (2001). The mean-modemethod, for each type of attribute, can be seen as follows:Numerical attributes: Fill missing values by the mean of the “non-missing” values (Pyle, 1999).Categorical attributes: Fill missing values by the mode (Han <strong>and</strong> Kamber, 2001). This is themaximum of frequency of the “non-missing” categorical values for the attribute.Boolean attributes: The missing values here can be treated as for categorical attributes. This meansmissing values are filled by the mode of the attribute.Attribute Name Descriptions Type % Missing ValuesHull SiteDundee SiteHeart disease Any heart disease Boolean 1/498 (0.002%) 7/341 (2.05%)ECG Electrocardiogram Categorical 0 16/341 (4.69%)Blood loss Blood loss in operation Continuous 8/498 (0.016%) 243/341 (71.26%)Table 5.4: The missing values rates of some attributes for both the Hull <strong>and</strong> Dundee sites.For example, Table 5.4 shows the rates of missing values in both the Hull <strong>and</strong> Dundee sites.71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!