10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classifying Objects in Images Using Deep <strong>Learning</strong><br />

The last line, the break, is to test the code—this will drastically reduce the number of<br />

training examples, allowing you to quickly see if your code is working. I'll prompt<br />

you later to remove this line, after you have tested that the code works.<br />

Next, create a dataset by stacking these batches on top of each other. We use<br />

NumPy's vstack, which can be visualized as adding rows to the end of the array:<br />

X = np.vstack([batch['data'] for batch in batches])<br />

We then normalize the dataset to the range 0 to 1 and then force the type to be a<br />

32-bit float (this is the only datatype the GPU-enabled virtual machine can run <strong>with</strong>):<br />

X = np.array(X) / X.max()<br />

X = X.astype(np.float32)<br />

We then do the same <strong>with</strong> the classes, except we perform a hstack, which is similar<br />

to adding columns to the end of the array. We then use the OneHotEncoder to turn<br />

this into a one-hot array:<br />

from sklearn.preprocessing import OneHotEncoder<br />

y = np.hstack(batch['labels'] for batch in batches).flatten()<br />

y = OneHotEncoder().fit_transform(y.reshape(y.shape[0],1)).todense()<br />

y = y.astype(np.float32)<br />

Next, we split the dataset into training and testing sets:<br />

X_train, X_test, y_train, y_test = train_test_split(X, y, test_<br />

size=0.2)<br />

Next, we reshape the arrays to preserve the original data structure. The original<br />

data was 32 by 32 pixel images, <strong>with</strong> 3 values per pixel (for the red, green, and<br />

blue values);<br />

X_train = X_train.reshape(-1, 3, 32, 32)<br />

X_test = X_test.reshape(-1, 3, 32, 32)<br />

We now have a familiar training and testing dataset, along <strong>with</strong> the target classes<br />

for each. We can now build the classifier.<br />

Creating the neural network<br />

We will be using the nolearn package to build the neural network, and therefore<br />

will follow a pattern that is similar to our replication experiment in Chapter 8,<br />

Beating CAPTCHAs <strong>with</strong> Neural Networks replication.<br />

[ 264 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!