10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Next Steps…<br />

Extending the I<strong>Python</strong> Notebook<br />

http://ipython.org/ipython-doc/1/interactive/public_server.html<br />

The I<strong>Python</strong> Notebook is a powerful tool. It can be extended in many ways, and<br />

one of those is to create a server to run your Notebooks, separately from your main<br />

computer. This is very useful if you use a low-power main computer, such as a small<br />

laptop, but have more powerful computers at your disposal. In addition, you can set<br />

up nodes to perform parallelized computations.More datasets are available at:<br />

http://archive.ics.uci.edu/ml/<br />

There are many datasets available on the Internet, from a number of different<br />

sources. These include academic, commercial, and government datasets. A collection<br />

of well-labelled datasets is available at the UCI ML library, which is one of the best<br />

options to find datasets for testing your algorithms.<br />

Try out the OneR algorithm <strong>with</strong> some of these different datasets.<br />

Chapter 2 – Classifying <strong>with</strong> scikit-learn<br />

Estimators<br />

Scalability <strong>with</strong> the nearest neighbor<br />

https://github.com/jnothman/scikit-learn/tree/pr2532<br />

A naïve implementation of the nearest neighbor algorithm is quite slow—it checks<br />

all pairs of points to find those that are close together. Better implementations exist,<br />

<strong>with</strong> some implemented in scikit-learn. For instance, a kd-tree can be created that<br />

speeds up the algorithm (and this is already included in scikit-learn).<br />

Another way to speed up this search is to use locality-sensitive hashing,<br />

Locality-Sensitive Hashing (LSH). This is a proposed improvement for scikit-learn,<br />

and hasn't made it into the package at the time of writing. The above link gives a<br />

development branch of scikit-learn that will allow you to test out LSH on a dataset.<br />

Read through the documentation attached to this branch for details on doing this.<br />

[ 298 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!