08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classification II – Sentiment<br />

Analysis<br />

For companies, it is vital to closely monitor the public reception of key events such<br />

as product launches or press releases. With real-time access and easy accessibility of<br />

user-generated content on Twitter, it is now possible to do sentiment classification<br />

of tweets. Sometimes also called opinion mining, it is an active field of research in<br />

which several companies are already selling their products. As this shows that a<br />

market obviously exists, we have motivation to use our classification muscles built<br />

in the previous chapter to build our own home-grown sentiment classifier.<br />

Sketching our roadmap<br />

Sentiment analysis of tweets is particularly hard because of Twitter's size limitation<br />

of 140 characters. This leads to a special syntax, creative abbreviations, and seldom<br />

well-formed sentences. The typical approach of analyzing sentences, aggregating<br />

their sentiment information per paragraph and then calculating the overall sentiment<br />

of a document, therefore, does not work here.<br />

Clearly, we will not try to build a state-of-the-art sentiment classifier. Instead, we<br />

want to:<br />

• Use this scenario as a vehicle to introduce yet another classification<br />

algorithm: Naive Bayes<br />

• Explain how Part Of Speech (POS) tagging works and how it can help us<br />

• Show some more tricks from the scikit-learn toolbox that come in handy<br />

from time to time

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!