08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 12<br />

A task could be "call double <strong>with</strong> argument 3". Another task would be "call double<br />

<strong>with</strong> argument 642.34". Using jugs, we can build these tasks as follows:<br />

from jug import Task<br />

t1 = Task(double, 3)<br />

t2 = Task(double, 642.34)<br />

Save this to a file called jugfile.py (which is just a regular <strong>Python</strong> file). Now, we<br />

can run jug execute to run the tasks. This is something you run on the command<br />

line, not at the <strong>Python</strong> prompt! Instead of <strong>Python</strong>'s jugfile.py file (which should<br />

do nothing), you run jug execute.<br />

You will also get some feedback on the tasks (jug will say that two tasks named<br />

"double" were run). Run jug execute again and it will tell you that it did nothing!<br />

It does not need to. In this case, we gained little, but if the tasks took a long time to<br />

compute, it would be very useful.<br />

You may notice that a new directory named jugfile.jugdata also appeared on<br />

your hard drive, <strong>with</strong> a few other weirdly named files. This is the memoization<br />

cache. If you remove it, jug execute will run all your tasks (both of them) again.<br />

Often it is good to distinguish between pure functions, which simply<br />

take their inputs and return a result, from more general functions<br />

that can perform actions such as reading from files, writing to files,<br />

accessing global variables, modifying their arguments, or anything that<br />

the language allows. Some programming languages, such as Haskell,<br />

even have syntactic ways in which to distinguish pure from impure<br />

functions.<br />

With jug, your tasks do not need to be perfectly pure. It is even<br />

recommended that you use tasks to read your data or write your<br />

results. However, accessing and modifying global variables will not<br />

work well; the tasks may be run in any order in different processors.<br />

The exceptions are global constants, but even this may confuse the<br />

memoization system (if the value is changed between runs). Similarly,<br />

you should not modify the input values. Jug has a debug mode (use<br />

jug execute—debug), which slows down your computation, but<br />

would give you useful error messages if you make this sort of mistake.<br />

The preceding code works but it is a bit cumbersome; you are always repeating the<br />

Task(function, argument) construct. Using a bit of <strong>Python</strong> magic, we can make<br />

the code even more natural:<br />

from jug import TaskGenerator<br />

from time import sleep<br />

@TaskGenerator<br />

def double(x):<br />

[ 243 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!