27.08.2019 Views

book

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Thematic Sessions<br />

ST4: Econometrics<br />

Chair: Marcelo Fernandes<br />

Thursday, September 5th<br />

10:30<br />

A novel framework for semi-automatic text<br />

classification<br />

Rodrigo Targino<br />

FGV/RJ<br />

The Economic Policy Uncertainty (EPU) index, proposed by Baker et al. (2016), is computed as a<br />

proportion of news articles related to this topic. In the original paper the classification of a news<br />

article as EPU related is performed automatically: if the article contains a set of specific words, it<br />

is classified in the topic. Motivated by a deeper understanding of the drivers of the EPU index, we<br />

propose a semi-automatic algorithm for text classification and analysis. Given a set of documents<br />

manually classified we are able to (1) infer the time series with the importance of each word for<br />

the specific classification and (2) compute the probability of a non-classified document to belong<br />

to the category. As an example we apply the algorithm to millions of Brazilian news articles, with a<br />

few hundreds manually classified as EPU related or not.<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!