Data tools tipsheet - Investigative Reporters and Editors
Data tools tipsheet - Investigative Reporters and Editors
Data tools tipsheet - Investigative Reporters and Editors
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Open Refine<br />
http://openrefine.org/<br />
Price: Free<br />
OpenRefine (formally Google Refine), allows rapid cleaning of data with the combination of Excellike<br />
formulas <strong>and</strong> text faceting/clustering. The downloadable application, which runs in a Web<br />
browser, groups similar words together based on multiple algorithms <strong>and</strong> allows users to quickly<br />
st<strong>and</strong>ardize names, businesses <strong>and</strong> other data.<br />
mySQL<br />
http://dev.mysql.com/downloads/installer/5.6.html<br />
Price: Free<br />
Although it’s a powerful (<strong>and</strong> free) tool for building databases, mySQL isn’t particularly userfriendly.<br />
It has an open-source community mostly consisting of hardcore developers.<br />
Navicat<br />
http://www.navicat.com/<br />
Price: $100<br />
Navicat’s $100 price tag can be worth it if you’re looking to deal commonly with mySQL. It<br />
provides a user-friendly front end for the database service <strong>and</strong> reduces your need for knowledge<br />
of SQL language. A free trial is available.<br />
Muse<br />
http://mobisocial.stanford.edu/muse/<br />
Price: Free<br />
This experimental research tool from a Stanford computer scientist was built to help users<br />
browse large email archives. Although it was originally meant for people to browse their own<br />
archives, it’s been adapted to import mailbox files from Outlook <strong>and</strong> other clients.<br />
OTHER COOL STUFF<br />
Mr. <strong>Data</strong> Converter<br />
http://shancarter.github.io/mr-data-converter/<br />
Price: Free<br />
This open-source tool, built by Shan Carter, converts Excel data into one of several web-friendly<br />
structured formats, including HTML, JSON <strong>and</strong> XML.<br />
<strong>Data</strong> Science Toolkit<br />
http://www.datasciencetoolkit.org/<br />
Price: Free<br />
This toolkit features an entire suite of easy-to-use Web apps for doing all kinds of cool things to<br />
data, like converting PDFs to plain text <strong>and</strong> converting street addresses to coordinates. Also<br />
features an open API for more advanced users.<br />
Jigsaw<br />
http://www.cc.gatech.edu/gvu/ii/jigsaw/<br />
Price: Free<br />
Another experimental tool born out of academia, this Java application helps users make sense of<br />
large collections of documents with the help of text analysis algorithms. It features a variety of<br />
different ways to look at the documents, from topic clustering to entity extraction.