08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Learning</strong> How to Classify <strong>with</strong> Real-world Examples<br />

Independent of what the original values were, after Z-scoring, a value of zero is the<br />

mean and positive values are above the mean and negative values are below it.<br />

Now every feature is in the same unit (technically, every feature is now<br />

dimensionless; it has no units) and we can mix dimensions more confidently. In fact,<br />

if we now run our nearest neighbor classifier, we obtain 94 percent accuracy!<br />

Look at the decision space again in two dimensions; it looks as shown in the<br />

following screenshot:<br />

The boundaries are now much more complex and there is interaction between the<br />

two dimensions. In the full dataset, everything is happening in a seven-dimensional<br />

space that is very hard to visualize, but the same principle applies: where before a<br />

few dimensions were dominant, now they are all given the same importance.<br />

The nearest neighbor classifier is simple, but sometimes good enough. We can<br />

generalize it to a k-nearest neighbor classifier by considering not just the closest<br />

point but the k closest points. All k neighbors vote to select the label. k is typically<br />

a small number, such as 5, but can be larger, particularly if the dataset is very large.<br />

[ 46 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!