23.03.2013 Views

Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...

Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...

Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Computational Linguistics Volume 37, Number 2<br />

corpus), including a word in the analysis only if it appeared at least five times in both<br />

corpora. That gave us a collection of 483 commonly occurring adjectives; these, as well<br />

as our intensifier dictionary, were the focus of our evaluation. We also investigated a<br />

set of nouns chosen using the same rubric. In most cases, the results were comparable<br />

to those <strong>for</strong> adjectives; however, with a smaller test set (only 184 words), the data were<br />

generally messier, and so we omit the details here.<br />

The basic premise behind our evaluation technique can be described as follows: The<br />

distributional spread of answers in a simple, three-way decision task should directly<br />

reflect the relative distance of words on an SO (semantic orientation) scale. In particular,<br />

we can validate our dictionaries without <strong>for</strong>cing Turkers to use our 11-point scale<br />

(including zero), making their task significantly less onerous as well as less subject to<br />

individual bias. We chose to derive the data in two ways: one task where the goal is to<br />

decide whether a word is positive, negative, or neutral; and another where two polar<br />

words are presented <strong>for</strong> comparison (Is one stronger, or are they the same strength?). In<br />

the <strong>for</strong>mer task, we would expect to see more “errors,” that is, cases where a polar term<br />

in our dictionary is labeled neutral, in words that are in fact more neutral (SO value 1<br />

versus SO value 5). Similarly, in the second task we would expect to see more “equal”<br />

judgments of words which are only 1 SO unit apart than those which are 3 SO units<br />

apart. More <strong>for</strong>mally, we predict that a good ranking of words subjected to this testing<br />

should have the following characteristics:<br />

The percentage of “equal/neutral” judgments should be maximized at<br />

exactly the point we would expect, given the SO value assigned to the<br />

word(s) in question. The number of polar responses should be relatively<br />

balanced.<br />

As we move away from this point (SO = 0) in either direction, we would<br />

expect a linear increase in the appropriate polar response, and a<br />

corresponding drop in the other responses.<br />

The percentage of polar responses may hit a maximum, after which we<br />

would expect a consistent distribution (within a small margin of error; this<br />

maximum might not be 100%, due to noise in the data).<br />

For the first task, that is, the neutral/negative/positive single word decision task,<br />

we included neutral words from our Epinions corpus which had originally been excluded<br />

from our dictionaries. Purely random sampling would have resulted in very<br />

little data from the high SO end of the spectrum, so we randomly selected first by SO<br />

(neutral SO = 0) and then by word, re-sampling from the first step if we had randomly<br />

selected a word that had been used be<strong>for</strong>e. In the end, we tested 400 adjectives chosen<br />

using this method. For each word, we solicited six judgments through Mechanical Turk.<br />

Preparing data <strong>for</strong> the word comparison task was slightly more involved, because<br />

we did not want to remove words from consideration just because they had been used<br />

once. Note that we first segregated the words by polarity: We compared positive words<br />

with positive words and negative words with negative words. This first test yielded a<br />

nice wide range of comparisons by randomly selecting first by SO and then by word, as<br />

well as by lowering the probability of picking high (absolute) SO words, and discounting<br />

words which had been used recently (in early attempts we saw high absolute SO<br />

words like great and terrible appearing over and over again, sometimes in consecutive<br />

queries). Though the odds of this occurring were low, we explicitly disallowed duplicate<br />

288

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!