book
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Thematic Sessions<br />
ST4: Econometrics<br />
Chair: Marcelo Fernandes<br />
Thursday, September 5th<br />
10:30<br />
A novel framework for semi-automatic text<br />
classification<br />
Rodrigo Targino<br />
FGV/RJ<br />
The Economic Policy Uncertainty (EPU) index, proposed by Baker et al. (2016), is computed as a<br />
proportion of news articles related to this topic. In the original paper the classification of a news<br />
article as EPU related is performed automatically: if the article contains a set of specific words, it<br />
is classified in the topic. Motivated by a deeper understanding of the drivers of the EPU index, we<br />
propose a semi-automatic algorithm for text classification and analysis. Given a set of documents<br />
manually classified we are able to (1) infer the time series with the importance of each word for<br />
the specific classification and (2) compute the probability of a non-classified document to belong<br />
to the category. As an example we apply the algorithm to millions of Brazilian news articles, with a<br />
few hundreds manually classified as EPU related or not.<br />
27