14.07.2015 Views

Science - Mark S. Boguski

Science - Mark S. Boguski

Science - Mark S. Boguski

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

What is Bioinformatics?<strong>Mark</strong> <strong>Boguski</strong>, M.D., Ph.D.Bio 2001


Origin of the term “Bioinformatics”• “Informatics” coined in the early 1990s• Arose in the context of early discussionsof electronic publishing• Definition: Informatics = Information<strong>Science</strong> & Technology• Variations:• Medical Informatics• Bioinformatics (circa 1992)


“Bioinformatics” extends back to the1960sJ Theor Biol 1965 Jan;8(1):97-112Computer aids to protein sequence determination.Dayhoff MO.-----------------Sci Am 1969 Jul;221(1):86-95Computer analysis of protein evolution.Dayhoff MO.


Bioinformatics in the 1970sAnn N Y Acad Sci 1974 Nov 29;241(0):439-48A comparison between evolutionary substitutions and variants inhuman hemoglobins.Fitch WM.-----------------J Mol Evol 1975 Jun 9;5(1):1-24Phylogenies from amino acid sequences aligned with gaps:the problem of gap weighting.Fitch WM, Yasunobu KT.


A watershed event in the early 1980s<strong>Science</strong> 1983 Jul 15;221(4607):275-7Simian sarcoma virus onc gene, v-sis, is derived from the gene(or genes) encoding a platelet-derived growth factor.Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA,Antoniades HN.-----------------Nature 1983 Jul 7-13;304(5921):35-9Platelet-derived growth factor is structurally related to the putativetransforming protein p28sis of simian sarcoma virus.Waterfield MD, Scrace GT, Whittle N, Stroobant P, Johnsson A, Wasteson A,Westermark B, Heldin CH, Huang JS, Deuel TF.


But what is Bioinformatics?LaboratoryInformationManagementSystemsHypothesisdrivenBasicResearch• Practical definitionsData managementand Analysis• Value first recognized by industry• Universities slow to respond with training programs• Initially not viewed as an academic discipline


“NIH Urged to Train Biologists onComputers”Headline in The Washington Post, Monday June 7 1999Recommendation of Federal AdvisoryPanel to NIH Director Varmus:Establish 20 new U.S. centers to teachcomputer-based biomedical research ata cost of US$8M per center per year.Dr. Harold VarmusWhy?


“It’s sink or swim as atidal wave of dataapproaches”Nature 399:517 10 June 1999


The Accelerating Human Genome ProjectNature (September, 1998)<strong>Science</strong> (October, 1998)CollinsWaterstonNature (March, 1999)Gibbs<strong>Science</strong> (March, 1999)Lander


The rate at which DNA sequences beganaccumulating was exponential14,000,000Over 12 millionsequence entriesin GenBank12,000,00010,000,0008,000,000Nearly 13 billionbases from~50,000 species6,000,0004,000,000Rapid DNAsequencing inventedHuman GenomeProject begun2,000,00001965 1970 1975 1980 1985 1990 1995 2001YearGBNational Library of Medicine


Survey of thefield in 1998


How do we bridge the gap betweensequence and function?6,000,0005,000,0004,000,0003,000,0002,000,000DNA SequencingInventedHuman GenomeProject BegunTheGap1,000,00001975 1980 1985 1990 1995 2000PublicationsDNA sequences<strong>Science</strong> (Genome Issue)15 Oct. 1999National Library of Medicine


Universities finallyrespond to thedemand


Three of theseven corecourses at B.U.


Directions in Post-Genome Biology• Genetic variation and human disease• Comparative genomics• Proteomics• Microdevices and microsystems• Innovative microarray applications• Informatics• Modeling and simulation


Informatics• Service functions– Periodic re-assembly of genome sequenceand updating of annotation• Infrastructure development– New tools for visualization &comprehension• New theoretical formulations of complexunits of genetic information• Do we have enough compute power?


“Anticipated advances incomputer speed will be unable tokeep up with the growing [DNA]sequence databases and thedemand for homology searches ofthe data.”Charles DeLisi, , 1988U.S. Department of Energy


Luckily, DeLisi’s dire prediction hasnot (yet) come true100,000,000.0010,000,000.001,000,000.00Moore’s Law vs.Growth of GenBank100,000.0010,000.001,000.00100.0010.001.0019701972197419761978198019821984198619881990199219941996Transistors/chipDNA Sequences19982000


New Problems: Modeling andSimulation• Single pathways & genetic networks• Whole organisms• Whole organsNew Skills Required• Probability and statistics• Applied math & computer science• Biomedical engineering


Computational model of heart failureModel based on aberrantbehaviour of cardiac iontransporter genesComputationrequires days of timeon a large, multiprocessorcomputer∂v(x,t)∂t=1Cm⎡⎢−I⎣ion( v(x,t))− Iapp( x,t)+1 ⎛⎜β ⎝κκ⎞⎟∇ •+ 1⎠( M ( x)∇v(x,t)) ,L∀x∈ Hi⎤⎥⎦Total MembraneCurrentCouplingCurrent

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!