12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

POSTER ABSTRACTS<br />

Abstract<br />

"Pattern", a web mining module for Python<br />

De Smedt, Tom and Daelemans, Walter<br />

CLiPS, University of Antwerp<br />

"Pattern" is a mash-up package for the Python programming language that bundles<br />

fast, regular expressions-based functionality for NLP and data-mining tasks. It consists<br />

of the following modules:<br />

1) pattern.web: provides easy access to Google, Yahoo, Bing, Twitter, Wikipedia,<br />

Flickr, RSS + a robust HTML DOM parser.<br />

2) pattern.en: tools for verb inflection, noun pluralization/singularization, a WordNet<br />

interface, a fast tagger/chunker based on regular expressions.<br />

3) pattern.table: for working with datasheets (e.g. MS Excel) and CSV-files.<br />

4) pattern.search: regular expressions for syntax and semantics. For example:<br />

"BRAND|NP VP JJ+" matches any sentence in which a noun phrase containing a<br />

brand name is followed by a verb phrase followed by one or more adjectives, e.g.<br />

"the new iPhone will be amazing", "Doritos taste cheesy", ...<br />

5) pattern.vector: corpus tools for tf-idf, cosine similarity, vector space search and<br />

LSA.<br />

6) pattern.graph: for exploring graphs and semantic networks.<br />

The package can be used and extended for harvesting online data, opinion mining,<br />

building semantic networks using a machine learning approach, and so on.<br />

Corresponding author: tom.desmedt@ua.ac.be<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!