Big Ideas & Big Data - Faculty of Computer Science - Dalhousie ...

More documents

Recommendations

Info

$Preparing for University Calculus - Math and Computer Science$

4 Computer ScienceQ & A with Stan MatwinWhat is your line of research?I work in Machine Learning and DataMining. Machine Learning is a researcharea in which a computer is givenexamples of something (e.g. what is andwhat isn’t an oil spill in a satellite imageof the sea) and, from these examples,it learns how to classify or predict newexamples of that ‘something’ (e.g. torecognize oil spills in new, unseen images).This is an old idea, dating back tothe 1950s, and it was part of the originalArtificial Intelligence manifesto. Everybodyagrees that learning is an inherentpart of intelligence, but I like to see itmore pragmatically. I am interested inthe use of learning programs to learnpractical things: to predict who in theemergency room will need hospitalization,to recognize oil spills, to categorizemedical articles or to catch emergingtrends in a political campaign or inpublic opinion.Of particular interest for me is learningfrom text data: papers, blogs, tweets,notes, etc. I believe that such data callsfor methods that take into accountits linguistic character – we will havestronger methods if they understand thelexical, syntactic and semantic characterof such data. This is the main topic ofmy Canada Research Chair here at Dal.Data Mining, for me, is MachineLearning in the large. First, one is dealingwith large data sets in millions ofrecords and terabytes of volume. Second– in data mining – it is recognized thatone spends most of their effort not in the“model building” phase, but instead inthe data cleaning and data preparationphase (e.g. doing “attribute engineering”).In order to do this, the data minermust learn the basics of the domainfrom which the data is coming: they willhave to create in their head fundamental“ontology” of that domain: what are themain entities and what are the relationshipsbetween those entities.I am also interested in data privacy.I work on methods that make it hard, orpractically impossible, to identify a givenperson in a dataset.How did you get interestedin that?Well, it all started many years ago when Iwas involved in one of the early projectsin Expert Systems (ES), a joint projectwith Cognos. At that time, we weretrying to build an ES that would process(or assist in processing) governmenttravel claims. I got to learn more thanI ever wished about that “fascinating”topic! A question which arose was, howdoes one acquire rules which form theknowledge base of an expert system?Somebody suggested that I look atMachine Learning – indeed, one of itsearly goals was to replace the classical“Knowledge Acquisition” approach withlearning the rules from examples. I wentto spend a sabbatical with one of the
Computer Science5leading centres of Machine Learning atthat time, George Mason University inVirginia, and I caught the bug. I likedthe fact that Machine Learning wasdrawing on a variety of disciplines (AI,logic, databases and statistics) to buildits tools. I also liked the fact that it wasdirectly applicable almost everywhere. Iam always interested in applications –they are an opportunity to learn aboutsomething completely new, fromneuro-ophthalmology to forestryto electronic components (toname a few applications Iwas involved in). Applicationsalso attract studentsand, last but not least,research funds. Done well,they often present a generalresearch problem that canbe shared with the communityand initiate a new line ofresearch. That has happenedto our work on oil spill detectionwith R.C. Holte and M. Kubat thatopened the active field of learningfrom imbalanced data.My interest in data privacy is a littledifferent. I am concerned about the factthat modern computers may become atool that can be used to breach and violatepeople’s privacy easier and on a muchlarger scale than it was possible, say, 30years ago. I believe that since the computerresearch community invented thetools that make it possible – databases,the internet, image and voice recognition,barcodes, etc., – it is then our moralobligation to at least think about tools thatwould make privacy easier and that wouldavoid many privacy-averse incidents.What do you hope to achieve in thenext five years?I have several goals. First and foremost, Ihope to create – together with colleaguesfrom Dal – an active, dynamic centre ofexcellence in our joint field of research,which we call Big Data Analytics. Wehave recently created the Institute forBig Data Analytics to focus research onthis area. The Institute will attract talent,ideas and applications, and will makeDalhousie a globally visible centre forthis type of research. We’re getting a verypowerful computer, IBM Netezza, a uniquemachine not only here but on campusesgenerally, which will provide an excellentinfrastructure for Big Data applications.I believe that sincethe computer research communityinvented the tools that make itpossible – databases, the internet, imageand voice recognition, barcodes, etc., – it isthen our moral obligation to at least thinkabout tools that would make privacy easierand that would avoid manyprivacy-averse incidents.At the research level, I hope to makeinroads into a linguistically informedbut still scalable text model (“representation”).I want to complete severalreal-life, deployed applications of dataand text mining techniques. I also wantto continue with a start-up, DeveraLogic, that I founded several years agowith colleagues in Ottawa in the area ofcomputer security, and to bring it to afruitful completion.Who else is involved inthis research?Here at Dal there are several excellentresearchers involved in this type ofresearch. My closest collaborators intext analytics will be Dr. Vlado Keselj, Dr.Evangelos Milios and Dr. Mike Shepherd.I will also collaborate with otherfaculty members at Dal in the areasof visualization, HCI, databases anddata structures and privacy: Dr. RazaAbidi, Dr. Dirk Arnold, Dr. Robert Beiko,Dr. Jamie Blustein, Dr. Stephen Brooks,Dr. Qigang Gao, Dr. Kirstie Hawkey, Dr.Andrew Rau-Chaplin, Dr. Derek Reilly,Dr. Thomas Trappenberg, Dr. CarolynWatters, Dr. Norbert Zeh and Dr. NurZincir-Heywood.In Canada, I cooperate activelywith several researchers acrossthe country: Dr. Nick Cercone(former FCS Dean at Dal, nowat York), Dr. Fred Popowich atSimon Fraser, Dr. Diana Inkpen,Dr. Nathalie Japkowicz at theUniversity of Ottawa, Dr. ChrisDrummond at NRC and Dr.Guy Lapalme at Universite deMontreal.I also plan to continue andfurther develop my rich internationalcollaboration, mainly with Brazil whereI already have a very active, ongoingcooperation; with France and Spainthrough Dal’s partnership in the DMKMErasmus Mundus program; and with mynative Poland, where I hold a Professorshipwith the Academy of Sciences andhave many contacts with several leadingacademic and research centres.What attracts your interest outsideyour research area?I am interested in current affairsand politics – I believe we have to beinformed to influence decision makers onmatters that concern us. I spend a lot oftime reading (online) newspapers in atleast three languages – English, Frenchand Polish. I am also an avid reader ofliterature in these three languagees.Classical music is my major hobby – Ihave a large CD collection, I go to concertswherever I can, also during my frequenttravel. I like hiking and swimming,but I do not do enough of that.
Page 1 and 2: 5Spring 2013Computer ScienceRedefin
Page 6 and 7: 6 Computer ScienceArt in motion:How
Page 8 and 9: 8 Computer ScienceFrom academia to
Page 10 and 11: 10 Computer ScienceNorm Scrimger lo
Page 12: Computer ScienceBCS student, Alexis

Big Ideas & Big Data - Faculty of Computer Science - Dalhousie ...

Create successful ePaper yourself

Delete template?

Save as template?