12.07.2015 Views

Big Ideas & Big Data - Faculty of Computer Science - Dalhousie ...

Big Ideas & Big Data - Faculty of Computer Science - Dalhousie ...

Big Ideas & Big Data - Faculty of Computer Science - Dalhousie ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 <strong>Computer</strong> <strong>Science</strong>Q & A with Stan MatwinWhat is your line <strong>of</strong> research?I work in Machine Learning and <strong>Data</strong>Mining. Machine Learning is a researcharea in which a computer is givenexamples <strong>of</strong> something (e.g. what is andwhat isn’t an oil spill in a satellite image<strong>of</strong> the sea) and, from these examples,it learns how to classify or predict newexamples <strong>of</strong> that ‘something’ (e.g. torecognize oil spills in new, unseen images).This is an old idea, dating back tothe 1950s, and it was part <strong>of</strong> the originalArtificial Intelligence manifesto. Everybodyagrees that learning is an inherentpart <strong>of</strong> intelligence, but I like to see itmore pragmatically. I am interested inthe use <strong>of</strong> learning programs to learnpractical things: to predict who in theemergency room will need hospitalization,to recognize oil spills, to categorizemedical articles or to catch emergingtrends in a political campaign or inpublic opinion.Of particular interest for me is learningfrom text data: papers, blogs, tweets,notes, etc. I believe that such data callsfor methods that take into accountits linguistic character – we will havestronger methods if they understand thelexical, syntactic and semantic character<strong>of</strong> such data. This is the main topic <strong>of</strong>my Canada Research Chair here at Dal.<strong>Data</strong> Mining, for me, is MachineLearning in the large. First, one is dealingwith large data sets in millions <strong>of</strong>records and terabytes <strong>of</strong> volume. Second– in data mining – it is recognized thatone spends most <strong>of</strong> their effort not in the“model building” phase, but instead inthe data cleaning and data preparationphase (e.g. doing “attribute engineering”).In order to do this, the data minermust learn the basics <strong>of</strong> the domainfrom which the data is coming: they willhave to create in their head fundamental“ontology” <strong>of</strong> that domain: what are themain entities and what are the relationshipsbetween those entities.I am also interested in data privacy.I work on methods that make it hard, orpractically impossible, to identify a givenperson in a dataset.How did you get interestedin that?Well, it all started many years ago when Iwas involved in one <strong>of</strong> the early projectsin Expert Systems (ES), a joint projectwith Cognos. At that time, we weretrying to build an ES that would process(or assist in processing) governmenttravel claims. I got to learn more thanI ever wished about that “fascinating”topic! A question which arose was, howdoes one acquire rules which form theknowledge base <strong>of</strong> an expert system?Somebody suggested that I look atMachine Learning – indeed, one <strong>of</strong> itsearly goals was to replace the classical“Knowledge Acquisition” approach withlearning the rules from examples. I wentto spend a sabbatical with one <strong>of</strong> the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!