03.07.2013 Views

Debt Analysts' Views of Debt-Equity Conflicts of Interest

Debt Analysts' Views of Debt-Equity Conflicts of Interest

Debt Analysts' Views of Debt-Equity Conflicts of Interest

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

manual coding (positive, neutral, or negative) for each extraction. 14 We then use the text and the<br />

manually coded tone from the training dataset to generate a Naive Bayes model that predicts the<br />

analyst’s tone conditional on the words in the text extraction.<br />

Step 4. We use the model to generate predictions <strong>of</strong> the tone for all the text extractions in our<br />

sample. This prediction equals negative one if the Naïve Bayes algorithm classifies the text<br />

extraction as negative, zero if it is classified as neutral, and one if it is classified as positive. To<br />

determine the accuracy <strong>of</strong> the Naïve Bayes algorithm, we use a procedure similar to that <strong>of</strong> Li<br />

(2010) and Antweiler and Frank (2004) to calculate both the “within-sample” and “out-<strong>of</strong>-<br />

sample” accuracies <strong>of</strong> the classifications by comparing the algorithmic classifications with the<br />

manual classifications (from step 3). To calculate the out-<strong>of</strong>-sample accuracy, we randomly<br />

partition the manually coded training dataset into two parts. One part is used to train and estimate<br />

the model, while the other is used to test the classification accuracy. For expositional purposes,<br />

in Table 1, Panel A we present the results <strong>of</strong> partitioning the sample into two equal halves (50%<br />

each). For an example, refer to the first row <strong>of</strong> Panel A. Out <strong>of</strong> a total <strong>of</strong> 262 manually classified<br />

negative text extractions, 124 were correctly classified as negative by the model — an accuracy<br />

rate <strong>of</strong> 47%. The out-<strong>of</strong>-sample accuracy is 77% for neutral tone extractions, 31% for positive<br />

tone extractions, and 67% across all extractions. To evaluate the within-sample accuracy, we use<br />

the entire manually coded training dataset to estimate the Naive Bayes model, and then compare<br />

the algorithmic classifications with the manual classifications. Panel B <strong>of</strong> Table 1 presents these<br />

within-sample accuracies, which, as expected, are generally higher than the out-<strong>of</strong>-sample<br />

14 We use the services <strong>of</strong> a team <strong>of</strong> ten senior-year business school undergraduate and MBA research assistants to<br />

complete this step. The task is carried out in a standardized manner with the help <strong>of</strong> a process document we<br />

developed. Given the subjective nature <strong>of</strong> the task, we engage two different research assistants to independently<br />

code each text extraction. We then engage a third research assistant to reconcile the differences when they arise.<br />

Untabulated analyses indicate that the accuracy <strong>of</strong> our model using this “three RA” coding approach is similar to an<br />

alternative approach in which we limit the sample to those text extractions in which the first two RAs coding the<br />

extraction agreed on the classification.<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!