10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Discovering Accounts to Follow Using Graph <strong>Mining</strong><br />

The aim of this analysis was to recommend users, and our use of cluster<br />

analysis allowed us to find clusters of similar users. To do this, we found<br />

connected components on a weighted graph we created based on this similarity<br />

metric. We used the NetworkX package for creating graphs, using our graphs,<br />

and finding these connected components.<br />

We then used the Silhouette Coefficient, which is a metric that evaluates how good<br />

a clustering solution is. Higher scores indicate a better clustering, according to the<br />

concepts of intra-cluster and inter-cluster distance. SciPy's optimize module was<br />

used to find the solution that maximises this value.<br />

In this chapter, we compared a few opposites too. Similarity is a measure<br />

between two objects, where higher values indicate more similarity between those<br />

objects. In contrast, distance is a measure where lower values indicate more<br />

similarity. Another contrast we saw was a loss function, where lower scores are<br />

considered better (that is, we lost less). Its opposite is the score function, where<br />

higher scores are considered better.<br />

In the next chapter, we will see how to extract features from another new type of<br />

data: images. We will discuss how to use neural networks to identify numbers in<br />

images and develop a program to automatically beat CAPTCHA images.<br />

[ 160 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!