10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 8<br />

Next we set the font of the CAPTCHA we will use. You will need a font file and the<br />

filename in the following code (Coval.otf) should point to it (I just placed the file in<br />

the Notebook's directory.<br />

font = ImageFont.truetype(r"Coval.otf", 22)<br />

draw.text((2, 2), text, fill=1, font=font)<br />

You can get the Coval font I used from the Open Font Library at<br />

http://openfontlibrary.org/en/font/bretan.<br />

We convert the PIL image to a NumPy array, which allows us to use scikit-image<br />

to perform a shear on it. The scikit-image library tends to use NumPy arrays for<br />

most of its computation. The code is as follows:<br />

image = np.array(im)<br />

We then apply the shear transform and return the image:<br />

affine_tf = tf.AffineTransform(shear=shear)<br />

image = tf.warp(image, affine_tf)<br />

return image / image.max()<br />

In the last line, we normalize by dividing by the maximum value, ensuring our<br />

feature values are in the range 0 to 1. This normalization can happen in the data<br />

preprocessing stage, the classification stage, or somewhere else.<br />

From here, we can now generate images quite easily and use pyplot to display<br />

them. First, we use our inline display for the matplotlib graphs and import pyplot.<br />

The code is as follows:<br />

%matplotlib inline<br />

from matplotlib import pyplot as plt<br />

Then we create our first CAPTCHA and show it:<br />

image = create_captcha("GENE", shear=0.5)<br />

plt.imshow(image, cmap='Greys')<br />

The result is the image shown at the start of this section: our CAPTCHA.<br />

Splitting the image into individual letters<br />

Our CAPTCHAs are words. Instead of building a classifier that can identify the<br />

thousands and thousands of possible words, we will break the problem down into<br />

a smaller problem: predicting letters.<br />

[ 167 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!