08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5<br />

Creating our first classifier<br />

Let us start <strong>with</strong> the simple and beautiful nearest neighbor method from the previous<br />

chapter. Although it is not as advanced as other methods, it is very powerful. As it<br />

is not model-based, it can learn nearly any data. However, this beauty comes <strong>with</strong> a<br />

clear disadvantage, which we will find out very soon.<br />

Starting <strong>with</strong> the k-nearest neighbor (kNN)<br />

algorithm<br />

This time, we won't implement it ourselves, but rather take it from the sklearn<br />

toolkit. There, the classifier resides in sklearn.neighbors. Let us start <strong>with</strong> a simple<br />

2-nearest neighbor classifier:<br />

>>> from sklearn import neighbors<br />

>>> knn = neighbors.KNeighborsClassifier(n_neighbors=2)<br />

>>> print(knn)<br />

KNeighborsClassifier(algorithm=auto, leaf_size=30, n_neighbors=2, p=2,<br />

warn_on_equidistant=True, weights=uniform)<br />

It provides the same interface as all the other estimators in sklearn. We train it using<br />

fit(), after which we can predict the classes of new data instances using predict():<br />

>>> knn.fit([[1],[2],[3],[4],[5],[6]], [0,0,0,1,1,1])<br />

>>> knn.predict(1.5)<br />

array([0])<br />

>>> knn.predict(37)<br />

array([1])<br />

>>> knn.predict(3)<br />

NeighborsWarning: kneighbors: neighbor k+1 and neighbor k have the<br />

same distance: results will be dependent on data order.<br />

neigh_dist, neigh_ind = self.kneighbors(X)<br />

array([0])<br />

To get the class probabilities, we can use predict_proba(). In this case, where<br />

we have two classes, 0 and 1, it will return an array of two elements as in the<br />

following code:<br />

>>> knn.predict_proba(1.5)<br />

array([[ 1., 0.]])<br />

>>> knn.predict_proba(37)<br />

array([[ 0., 1.]])<br />

>>> knn.predict_proba(3.5)<br />

array([[ 0.5, 0.5]])<br />

[ 95 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!