01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Exercises 269<br />

6.2 Repeat the previous exercise for the three classes of the Cork Stoppers’ dataset,<br />

using features N, PRM <strong>and</strong> ARTG.<br />

6.3 Consider the problem of classifying cardiotocograms (CTG dataset) into three classes:<br />

N (normal), S (suspect) <strong>and</strong> P (pathological).<br />

a) Determine which features are most discriminative <strong>and</strong> appropriate for a<br />

Mahalanobis classifier approach for this problem.<br />

b) Design the classifier <strong>and</strong> estimate its performance using a partition method for the<br />

test set error estimation.<br />

6.4 Repeat the previous exercise using the Rocks’ dataset <strong>and</strong> two classes: {granites} vs.<br />

{limestones, marbles}.<br />

6.5 A physician would like to have a very simple rule available for screening out<br />

carcinoma situations from all other situations using the same diagnostic means <strong>and</strong><br />

measurements as in the Breast Tissue dataset.<br />

a) <strong>Using</strong> the Breast Tissue dataset, find a linear Bayesian classifier with only<br />

one feature for the discrimination of carcinoma versus all other cases (relax the<br />

normality <strong>and</strong> equal variance requirements). Use forward <strong>and</strong> backward search<br />

<strong>and</strong> estimate the priors from the training set sizes of the classes.<br />

b) Obtain training set <strong>and</strong> test set error estimates of this classifier, <strong>and</strong> 95%<br />

confidence intervals.<br />

c) <strong>Using</strong> the SC Size program, assess the deviation of the error estimate from the<br />

true Bayesian error, assuming that the normality <strong>and</strong> equal variance requirements<br />

were satisfied.<br />

d) Suppose that the risk of missing a carcinoma is three times higher than the risk of<br />

misclassifying a non-carcinoma. How should the classifying rule be reformulated<br />

in order to reflect these risks, <strong>and</strong> what is the performance of the new rule?<br />

6.6 Design a linear discriminant classifier for the three classes of the Clays’ dataset <strong>and</strong><br />

evaluate its performance.<br />

6.7 Explain why all ROC curves start at (0,0) <strong>and</strong> finish at (1,1) by analysing what kind of<br />

situations these points correspond to.<br />

6.8 Consider the Breast Tissue dataset. Use the ROC curve approach to determine<br />

single features that will discriminate carcinoma cases from all other cases. Compare the<br />

alternative methods using the ROC curve areas.<br />

6.9 Repeat the ROC curve experiments illustrated in Figure 6.20 for the FHR Apgar<br />

dataset, using combinations of features.<br />

6.10 Increase the amplitude of the signal impulses by 20% in the Signal & Noise<br />

dataset. Consider the following impulse detection rule:<br />

2<br />

An impulse is detected at time n when s(n) is bigger than α ∑ ( s(<br />

n − i)<br />

+ s(<br />

n + i)<br />

) .<br />

i=<br />

1<br />

Determine the ROC curve corresponding to several α values, <strong>and</strong> determine the best α<br />

for the impulse/noise discrimination. How does this method compare with the<br />

amplitude threshold method described in section 6.4?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!