08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 12<br />

Jug is also specially optimized to work <strong>with</strong> numpy arrays. So, whenever your tasks<br />

return or receive numpy arrays, you are taking advantage of this optimization. Jug is<br />

another piece of this ecosystem where everything works together.<br />

We will now look back at Chapter 10, Computer Vision–Pattern Recognition Finding<br />

Related Posts. We learned how to compute features on images. Remember that we<br />

were loading image files, computing features, combining these, normalizing them,<br />

and finally learning how to create a classifier. We are going to redo that exercise but<br />

this time <strong>with</strong> the use of jug. The advantage of this version is that it is now possible<br />

to add a new feature <strong>with</strong>out having to recompute all of the previous versions.<br />

We start <strong>with</strong> a few imports as follows:<br />

from jug import TaskGenerator<br />

Now we define the first task generator, the feature computation:<br />

@TaskGenerator<br />

def hfeatures(fname):<br />

import mahotas as mh<br />

import numpy as np<br />

im = mh.imread(fname, as_grey=1)<br />

im = mh.stretch(im)<br />

h = mh.features.haralick(im)<br />

return np.hstack([h.ptp(0), h.mean(0)])<br />

Note how we only imported numpy and mahotas inside the function. This is a small<br />

optimization; this way, only if the task is run are the modules loaded. Now we set<br />

up the image filenames as follows:<br />

filenames = glob('dataset/*.jpg')<br />

We can use TaskGenerator on any function, even on the ones that we did not write,<br />

such as numpy.array:<br />

import numpy as np<br />

as_array = TaskGenerator(np.array)<br />

# compute all features:<br />

features = as_array([hfeature(f) for f in filenames])<br />

# get labels as an array as well<br />

labels = map(label_for, f)<br />

res = perform_cross_validation(features, labels)<br />

@TaskGenerator<br />

[ 247 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!