01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

258 6 Statistical Classification<br />

Statistical software products such as <strong>SPSS</strong> <strong>and</strong> <strong>STATISTICA</strong> allow the<br />

selection of the cases used for training <strong>and</strong> for testing linear discriminant<br />

classifiers. With <strong>SPSS</strong>, it is possible to use a selection variable, easing the task of<br />

specifying r<strong>and</strong>omly selected samples. <strong>SPSS</strong> also affords performing a leave-oneout<br />

classification. With <strong>STATISTICA</strong>, one can initially select the cases used for<br />

training (Selection Conditions option in the Tools menu), <strong>and</strong> once the<br />

classifier is designed, specify test cases (Select Cases button in the<br />

Classification tab of the comm<strong>and</strong> window). In <strong>MATLAB</strong> <strong>and</strong> R one may<br />

create a case-selecting vector, called a filter, with r<strong>and</strong>om 0s <strong>and</strong> 1s.<br />

Example 6.14<br />

Q: Consider the two-class cork-stopper classifier, with two features, presented in<br />

section 6.2.2 (see classification matrix in Table 6.3). Evaluate the performance of<br />

this classifier using the partition method with k = 3, <strong>and</strong> the leave-one-out method.<br />

A: <strong>Using</strong> the partition method with k = 3, a test set estimate of Pet = 9.9 % was<br />

obtained, which is near the training set error estimate of 10%. The leave-one-out<br />

method also produces Pet = 10 % (see Table 6.11; the “Original” matrix is the<br />

training set estimate, the “Cross-validated” matrix is the test set estimate). The<br />

closeness of these figures is an indication of reliable error estimation for this high<br />

dimensionality ratio classification problem (n/d = 25). <strong>Using</strong> formula 6.28 the 95%<br />

confidence limits for these error estimates are: s = 0.03 ⇒ Pe = 10% ± 5.9%.<br />

Table 6.11. Listing of the classification matrices obtained with <strong>SPSS</strong>, using the<br />

leave-one-out method in the classification of the first two classes of the cork-<br />

stopper data with two features.<br />

Predicted Group Membership Total<br />

C 1 2<br />

Original Count 1 49 1 50<br />

2 9 41 50<br />

% 1 98.0 2.0 100<br />

2 18.0 82.0 100<br />

Cross-validated Count 1 49 1 50<br />

2 9 41 50<br />

% 1 98.0 2.0 100<br />

2 18.0 82.0 100<br />

Example 6.15<br />

Q: Consider the three-class, cork-stopper classifier, with four features, determined<br />

in Example 6.13. Evaluate the performance of this classifier using the leave-oneout<br />

method.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!