01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classification II – Sentiment<br />

Analysis<br />

For companies, it is vital to closely monitor the public reception of key events such<br />

as product launches or press releases. With real-time access and easy accessibility of<br />

user-generated content on Twitter, it is now possible to do sentiment classification<br />

of tweets. Sometimes also called opinion mining, it is an active field of research in<br />

which several companies are already selling their products. As this shows that a<br />

market obviously exists, we have motivation to use our classification muscles built<br />

in the previous chapter to build our own home-grown sentiment classifier.<br />

Sketching our roadmap<br />

Sentiment analysis of tweets is particularly hard because of Twitter's size limitation<br />

of 140 characters. This leads to a special syntax, creative abbreviations, and seldom<br />

well-formed sentences. The typical approach of analyzing sentences, aggregating<br />

their sentiment information per paragraph and then calculating the overall sentiment<br />

of a document, therefore, does not work here.<br />

Clearly, we will not try to build a state-of-the-art sentiment classifier. Instead, we<br />

want to:<br />

• Use this scenario as a vehicle to introduce yet another classification<br />

algorithm: Naive Bayes<br />

• Explain how Part Of Speech (POS) tagging works and how it can help us<br />

• Show some more tricks from the scikit-learn toolbox that come in handy<br />

from time to time

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!