10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 8<br />

From these predictions, we can use scikit-learn to compute the F1 score:<br />

from sklearn.metrics import f1_score<br />

print("F-score: {0:.2f}".format(f1_score(predictions,<br />

y_test.argmax(axis=1) )))<br />

The score here is 0.97, which is a great result for such a relatively simple model.<br />

Recall that our features were simple pixel values only; the neural network worked<br />

out how to use them.<br />

Now that we have a classifier <strong>with</strong> good accuracy on letter prediction, we can start<br />

putting together words for our CAPTCHAs.<br />

Predicting words<br />

We want to predict each letter from each of these segments, and put those<br />

predictions together to form the predicted word from a given CAPTCHA.<br />

Our function will accept a CAPTCHA and the trained neural network, and it will<br />

return the predicted word:<br />

def predict_captcha(captcha_image, neural_network):<br />

We first extract the sub-images using the segment_image function we created earlier:<br />

subimages = segment_image(captcha_image)<br />

We will be building our word from each of the letters. The sub-images are ordered<br />

according to their location, so usually this will place the letters in the correct order:<br />

predicted_word = ""<br />

Next we iterate over the sub-images:<br />

for subimage in subimages:<br />

Each sub-image is unlikely to be exactly 20 pixels by 20 pixels, so we will need to<br />

resize it in order to have the correct size for our neural network.<br />

subimage = resize(subimage, (20, 20))<br />

We will activate our neural network by sending the sub-image data into the input<br />

layer. This propagates through our neural network and returns the given output.<br />

All this happened in our testing of the neural network earlier, but we didn't have to<br />

explicitly call it. The code is as follows:<br />

outputs = net.activate(subimage.flatten())<br />

[ 175 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!