12.07.2015 Views

Mining Big Data - Department of Mathematics - University of ...

Mining Big Data - Department of Mathematics - University of ...

Mining Big Data - Department of Mathematics - University of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

could be valuable to farmers, who would beable avoid “blanket spraying” pesticidesbecause they would know the type andlocation <strong>of</strong> insects.Stone Age <strong>Data</strong>Among Keogh’s collaborators is Sang-HeeLee, an associate pr<strong>of</strong>essor <strong>of</strong> anthropology.She met Keogh in 2001, when they botharrived at UCR. He talked to her aboutcollaborating by using data miningalgorithms to classify objects, such asarrowheads and petroglyphs.She was hesitant.“I never thought in my life that I’d betalking about big data because I do fossils,”Lee says.But he was persuasive. She opted in. Itwas a good move.In 2008, they were awarded a grant <strong>of</strong>more than $800,000 from the NationalScience Foundation.Currently, Lee and one <strong>of</strong> her Ph.D.students, Jessica Cade, are using Keogh’salgorithms to examine Clovis flutedprojectile points, which resemblearrowheads. They were found throughoutNorth America and are believed to be12,000 to 14,000 years old.With help from Keogh’s algorithms, theyare looking to decipher subtle differences in theshape <strong>of</strong> the stone tools. This is done byturning the outline <strong>of</strong> each tool into a set <strong>of</strong>150,000 data points, which are represented bya line graph. Studying series <strong>of</strong> these linegraphs can show slight variations in the shapes.By studying differences in the shapes,they are trying to determine whether theClovis tool technology was spread by apopulation <strong>of</strong> people migrating or by agroup <strong>of</strong> previous settlers coming intocontact with a group <strong>of</strong> Clovis people,seeing the Clovis design and bringing itback to their home base.This is significant question inanthropology. If their hypothesis thatvariation in stone tools would be high in thecase <strong>of</strong> cultural transmission is true, it couldhelp answer a longtime debate about whenand how the Americas were first settled.Faster ObsolescenceWessler, the pr<strong>of</strong>essor who early in hercareer was called a “database predator,”says today the people in her lab spend 80percent <strong>of</strong> their time on computers and 20percent in a traditional “wet” lab. Twentyyears ago, those numbers were flipped.That flip has occurred because geneticsis rapidly changing due to advancingtechnology.For example, in 2000, after 10 years <strong>of</strong>work, scientists announced they hadsequenced the more than 3 billion basepairs <strong>of</strong> the human genome. Today, Wesslersays, that project would take one week.At UCR, in 2008, the campus receivedits first so-called next generation DNAsequencer, says Glenn Hicks, the academicadministrator <strong>of</strong> the Institute for IntegrativeGenome Biology and an associate researchplant cell biologist.In 2010, when the university upgradedto a newer instrument, that originalsequencer was essentially obsolete. The newrefrigerator-size machine, which is used bymore than 35 labs on campus, ranging frompsychology to bioengineering toentomology, can process nearly 10 times asmuch data up to 25 percent faster than theoriginal one, Hicks said.“It really is revolutionary,” Hicks says.“And the revolution is technology driven.”For Wessler, the impact is being felt withher research on transposable elements, thatabundant component <strong>of</strong> genomes that mayhelp plants and animals adapt to achanging environment.Scientists now recognize thattransposable elements, once thought <strong>of</strong> as“junk DNA,” play vital roles, from guidingdevelopmental processes to contributing tocorrect gene regulation. This is largely due tothe ability to identify and characterizetransposable elements through genomesequencing.Wessler focuses on active transposableelements, also known as “jumping genes,”because they can move from one location inthe genome to another. Active transposableelements generate genetic diversity, the rawWessler, thepr<strong>of</strong>essor who earlyon was called a“database predator,”says today theresearchers inher lab spend 80percent <strong>of</strong> theirtime on computersand 20 percent in atraditional “wet” lab.Twenty years ago,those numbers wereflipped.material for evolution and adaption, whichcould allow some members <strong>of</strong> a populationto survive, for example, with less water or ina colder region.“This could guarantee, in the case <strong>of</strong> acatastrophe, the survival <strong>of</strong> an organism,”Wessler says.Excitement about the possibilities <strong>of</strong>harnessing big data is being felt acrosscampus.As papers are published and wordspreads about the mysteries to be unlockedby data mining, researchers in disciplinesthat were rarely considered data-driven areusing this tool to consider problems withfresh eyes. As anthropologist Sang-Hee Leeput it: “This is opening new horizons foranthropology research. We think we havenothing to do with big data, but look, fromone stone tool, there’s big data. Now it’stime to apply that big data into answeringreally interesting questions.“That work is just starting,” she says.To read more about Sue Wessler, Eamonn Keoghand other UCR technology innovators, visitPROMISE.UCR.EDUUCR Spring 2012 | 13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!