Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...
Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...
Lexicon-Based Methods for Sentiment Analysis - Simon Fraser ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Computational Linguistics Volume 37, Number 2<br />
corpus), including a word in the analysis only if it appeared at least five times in both<br />
corpora. That gave us a collection of 483 commonly occurring adjectives; these, as well<br />
as our intensifier dictionary, were the focus of our evaluation. We also investigated a<br />
set of nouns chosen using the same rubric. In most cases, the results were comparable<br />
to those <strong>for</strong> adjectives; however, with a smaller test set (only 184 words), the data were<br />
generally messier, and so we omit the details here.<br />
The basic premise behind our evaluation technique can be described as follows: The<br />
distributional spread of answers in a simple, three-way decision task should directly<br />
reflect the relative distance of words on an SO (semantic orientation) scale. In particular,<br />
we can validate our dictionaries without <strong>for</strong>cing Turkers to use our 11-point scale<br />
(including zero), making their task significantly less onerous as well as less subject to<br />
individual bias. We chose to derive the data in two ways: one task where the goal is to<br />
decide whether a word is positive, negative, or neutral; and another where two polar<br />
words are presented <strong>for</strong> comparison (Is one stronger, or are they the same strength?). In<br />
the <strong>for</strong>mer task, we would expect to see more “errors,” that is, cases where a polar term<br />
in our dictionary is labeled neutral, in words that are in fact more neutral (SO value 1<br />
versus SO value 5). Similarly, in the second task we would expect to see more “equal”<br />
judgments of words which are only 1 SO unit apart than those which are 3 SO units<br />
apart. More <strong>for</strong>mally, we predict that a good ranking of words subjected to this testing<br />
should have the following characteristics:<br />
The percentage of “equal/neutral” judgments should be maximized at<br />
exactly the point we would expect, given the SO value assigned to the<br />
word(s) in question. The number of polar responses should be relatively<br />
balanced.<br />
As we move away from this point (SO = 0) in either direction, we would<br />
expect a linear increase in the appropriate polar response, and a<br />
corresponding drop in the other responses.<br />
The percentage of polar responses may hit a maximum, after which we<br />
would expect a consistent distribution (within a small margin of error; this<br />
maximum might not be 100%, due to noise in the data).<br />
For the first task, that is, the neutral/negative/positive single word decision task,<br />
we included neutral words from our Epinions corpus which had originally been excluded<br />
from our dictionaries. Purely random sampling would have resulted in very<br />
little data from the high SO end of the spectrum, so we randomly selected first by SO<br />
(neutral SO = 0) and then by word, re-sampling from the first step if we had randomly<br />
selected a word that had been used be<strong>for</strong>e. In the end, we tested 400 adjectives chosen<br />
using this method. For each word, we solicited six judgments through Mechanical Turk.<br />
Preparing data <strong>for</strong> the word comparison task was slightly more involved, because<br />
we did not want to remove words from consideration just because they had been used<br />
once. Note that we first segregated the words by polarity: We compared positive words<br />
with positive words and negative words with negative words. This first test yielded a<br />
nice wide range of comparisons by randomly selecting first by SO and then by word, as<br />
well as by lowering the probability of picking high (absolute) SO words, and discounting<br />
words which had been used recently (in early attempts we saw high absolute SO<br />
words like great and terrible appearing over and over again, sometimes in consecutive<br />
queries). Though the odds of this occurring were low, we explicitly disallowed duplicate<br />
288