13.07.2015 Views

data consistency, completeness and cleaning - The INCLEN Trust

data consistency, completeness and cleaning - The INCLEN Trust

data consistency, completeness and cleaning - The INCLEN Trust

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Another Approach:Using Frequencies


Checking for Invalid CharacterValues….(1)• Run frequencies on all character variables that represent alimited number of categories such as gender, residence,hospital’s department, occupation, etc.GENDERFrequency2 1F 300M 440X 1f 3Missing values 5


Checking for Invalid Character Values….(2)• Three categories do not fit with our <strong>data</strong> valueGENDERFrequency2 1F 300M 440X 1f 3Missing values 5


Checking for Invalid Character Values….(3)• <strong>The</strong> 2 <strong>and</strong> the X are inappropriate values.• f depending on the situation, it could be consideredan error or notGENDER2 1Occur onceF 300M 440X 1Occur oncef 3Missing values 5Frequency


Correcting Invalid Character Values• If the lower case values were entered into thefile by mistake but the value, aside from thecase, was correct, we consider this valuecorrect <strong>and</strong> change each of these lower casevalues to upper case• For the 2 <strong>and</strong> X values, we need to identifythe location of these errors <strong>and</strong> correct it afterchecking the medical records


Checking Missing Data• Check each of the cases with missing <strong>data</strong>(here on gender)• See whether there is information in the casethat allows that variable to be entered (e.g.the patient’s name will generally indicategender)


Checking for Invalid Numeric Values• <strong>The</strong> techniques for checking invalid numeric <strong>data</strong> are quite differentfrom the techniques used with character <strong>data</strong>– Examine minimum <strong>and</strong> maximum values for each numeric variable– Internal <strong>consistency</strong> methods; if we see that most of the <strong>data</strong> values fallwithin a certain range of values, then any values that fall far enough outsidethe range may be <strong>data</strong> errors– Run a univariate analysis, focusing especially on• Number of non-missing observations, number of observation not equal to zero <strong>and</strong>the number of observation greater than zero are of most interest at this stage• Extremes shows the five lowest <strong>and</strong> five highest values for numeric variables• Quantiles• Mean• St<strong>and</strong>ard deviation to decide on constitute reasonable cutoffs for low <strong>and</strong> high <strong>data</strong>value• Range• Graphic displays: a stem-<strong>and</strong> leaf plot, a box plot <strong>and</strong> a normal probability plot• Check the medical records for the extreme values <strong>and</strong> write a note to the<strong>data</strong> center about the findings to help in further <strong>cleaning</strong> of these <strong>data</strong>


Dates: Hospitalization…..(1)• We can create a variablefrom subtracting thedate of discharge fromdate of admission, <strong>and</strong>call it totalhospitalization 1• This variable will detectany wrong <strong>data</strong> entryfor dates such as casenumber 6014


Dates: Hospitalization…..(2)• We can create a variablefrom adding the dayspatient spent in ICU,ward <strong>and</strong> private room<strong>and</strong> call it totalhospitalization 2


Dates: Hospitalization…..(3)• To check in<strong>consistency</strong> wecan create a variable, lets callit difference by subtractingthe total hospitalization 1(created from subtractingdates of admission <strong>and</strong>discharge) <strong>and</strong> the totalhospitalization 2 (created bysumming the days spent inICU, ward <strong>and</strong> private room)• We need to check any valueother than zero by using theauto-filter comm<strong>and</strong> <strong>and</strong>recheck the medical records


ACKNOWLEDGEMENTWe thank• Prof. Donald <strong>and</strong> Miss. Yara

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!