08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Getting Started <strong>with</strong> <strong>Python</strong> <strong>Machine</strong> <strong>Learning</strong><br />

>>> c[~np.isnan(c)]<br />

array([ 1., 2., 3., 4.])<br />

>>> np.mean(c[~np.isnan(c)])<br />

2.5<br />

Comparing runtime behaviors<br />

Let us compare the runtime behavior of NumPy <strong>with</strong> normal <strong>Python</strong> lists. In the<br />

following code, we will calculate the sum of all squared numbers of 1 to 1000 and<br />

see how much time the calculation will take. We do it 10000 times and report the<br />

total time so that our measurement is accurate enough.<br />

import timeit<br />

normal_py_sec = timeit.timeit('sum(x*x for x in xrange(1000))',<br />

number=10000)<br />

naive_np_sec = timeit.timeit('sum(na*na)',<br />

setup="import numpy as np; na=np.<br />

arange(1000)",<br />

number=10000)<br />

good_np_sec = timeit.timeit('na.dot(na)',<br />

setup="import numpy as np; na=np.<br />

arange(1000)",<br />

number=10000)<br />

print("Normal <strong>Python</strong>: %f sec"%normal_py_sec)<br />

print("Naive NumPy: %f sec"%naive_np_sec)<br />

print("Good NumPy: %f sec"%good_np_sec)<br />

Normal <strong>Python</strong>: 1.157467 sec<br />

Naive NumPy: 4.061293 sec<br />

Good NumPy: 0.033419 sec<br />

We make two interesting observations. First, just using NumPy as data storage<br />

(Naive NumPy) takes 3.5 times longer, which is surprising since we believe it must<br />

be much faster as it is written as a C extension. One reason for this is that the access of<br />

individual elements from <strong>Python</strong> itself is rather costly. Only when we are able to apply<br />

algorithms inside the optimized extension code do we get speed improvements, and<br />

tremendous ones at that: using the dot() function of NumPy, we are more than 25<br />

times faster. In summary, in every algorithm we are about to implement, we should<br />

always look at how we can move loops over individual elements from <strong>Python</strong> to some<br />

of the highly optimized NumPy or SciPy extension functions.<br />

[ 16 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!