12.07.2015 Views

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

Predicting Cardiovascular Risks using Pattern Recognition and Data ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

chosen by the user. The methods for choosing m features are not described in detail in this thesis. Moredetail about this method, <strong>and</strong> the definition of m features, can be seen in Liu <strong>and</strong> Motoda (1998).8.2.2. Wrapper MethodThe wrapper method is used as an inductive algorithm to estimate the value of a given feature subset(e.g via cross-validation). This means its goal is to return a subset of features that gives the lowestprediction error. However, according to Dash <strong>and</strong> Liu (1997) the algorithms exhibit a moderatecomplexity, because the number of executions requires a high computational cost, in particular whenused with exhaustive search strategies. Therefore, this method is not used for the thesis data. Furtherdetails about the wrapper method can be found in Dash <strong>and</strong> Liu (1997); Liu <strong>and</strong> Motoda (1998); <strong>and</strong>Talavera (2005).8.2.3. Relief AlgorithmAlgorithm ReliefInput: For each training instance a vector of attribute values <strong>and</strong> the class valueOutput: The vector W of estimations of the qualities of attributesset all weights W[A]:=0.0;for i:=1 to m do beginr<strong>and</strong>omly select an instance R i ;find nearest hit H <strong>and</strong> nearest miss M;for A:=1 to a doW[A]:= W[A] –diff(A,R i, H)/m +diff(A,R i ,M)/m;end;Figure 8.1: Pseudo code of original relief algorithm (Kira <strong>and</strong> Rendell, 1992).The Relief algorithm (Kira <strong>and</strong> Rendell, 1992) is a filter method that estimates the usefulness ofattributes according their values in distinguishing samples that are near each other. The algorithmsearches for two nearest neighbors of each sample in the data domain in the following way: Firstly, itcompares one pattern from the “nearest hit” class with another from the “nearest miss” mis-class. Itupdates the quality estimation according to the “miss” <strong>and</strong> the “hit” value. This process is repeated126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!