08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Clustering – Finding<br />

Related Posts<br />

In the previous chapter, we have learned how to find classes or categories of<br />

individual data points. With a handful of training data items that were paired<br />

<strong>with</strong> their respective classes, we learned a model that we can now use to classify<br />

future data items. We called this supervised learning, as the learning was guided<br />

by a teacher; in our case the teacher had the form of correct classifications.<br />

Let us now imagine that we do not possess those labels by which we could learn<br />

the classification model. This could be, for example, because they were too expensive<br />

to collect. What could we have done in that case?<br />

Well, of course, we would not be able to learn a classification model. Still, we could<br />

find some pattern <strong>with</strong>in the data itself. This is what we will do in this chapter,<br />

where we consider the challenge of a "question and answer" website. When a user<br />

browses our site looking for some particular information, the search engine will most<br />

likely point him/her to a specific answer. To improve the user experience, we now<br />

want to show all related questions <strong>with</strong> their answers. If the presented answer is not<br />

what he/she was looking for, he/she can easily see the other available answers and<br />

hopefully stay on our site.<br />

The naive approach would be to take the post, calculate its similarity to all other<br />

posts, and display the top N most similar posts as links on the page. This will quickly<br />

become very costly. Instead, we need a method that quickly finds all related posts.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!