12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Missing values: The data includes 1912 output of 12311 cells with missing values (16%) after deletionof the above redundant attributes.Noisy <strong>and</strong> inconsistent data: As an example of numerical outlier values, the attribute "PACK YRS"has a big gap between the maximum value of 160, <strong>and</strong> the minimum value of 2. This affects thetransformation process as it unduly changes the mean of the attribute values.Scoring values: The site does not include the scored values (physiological score, <strong>and</strong> operative severityscore) from the POSSUM <strong>and</strong> PPOSSUM systems. Furthermore, the data in this site is insufficient touse with the scoring systems of POSSUM <strong>and</strong> PPOSSUM, as it lacks information for these systems‟variables.To complete a similar analysis, as with the Hull site, the valid or missing value frequencies for somesignificant attributes can be seen in Tables 5.2 below.Heart30 D stroke/deathDiabetes StrokeDiseaseN Valid 341 334 341 340Missing 0 7 0 1Table 5.2: The frequencies of the significant attributes for the Dundee site.Table 5.2 shows that the attribute “Heart Disease” has 7 missing values whereas there is only onemissing value for the “Stroke” attribute. The method of dealing with missing values is identical to themethod indicated for the Hull site.5.3.2. Thesis Experimental StepsThe detailed steps of the thesis methodology can be redrawn as shown in Figure 5.4. The process flow<strong>and</strong> individual steps in Figure 5.4 can be illustrated in detail as follows:Step 1 (Selection): A data set is selected from one or both of the Hull <strong>and</strong> Dundee sites, <strong>and</strong> stored inthe “<strong>Data</strong> Warehouse”. Note that data from both sites were collected <strong>and</strong> stored in various (Excel)computer files in earlier studies. The data here is understood as “Raw data” in the “knowledgediscovery from data” process. Therefore, the other process steps such as pre-processing <strong>and</strong>transformation steps are needed.68

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!