1. Introduction - Algorithms in Bioinformatics
1. Introduction - Algorithms in Bioinformatics
1. Introduction - Algorithms in Bioinformatics
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Bio<strong>in</strong>formatics I, WS’09-10, D. Huson, November 26, 2009 1<br />
1 General <strong>Introduction</strong><br />
Bio<strong>in</strong>formatics I<br />
Module “Bio<strong>in</strong>formatics I” M.Sc. Bio<strong>in</strong>formatik<br />
Prof. Daniel Huson, Dr. Johannes Fischer,<br />
and Dr. Stefan Henz<br />
WS 2009/10<br />
Office hours: see personal webpages.<br />
<strong>1.</strong>1 Time and place<br />
Lectures:<br />
Mondays 10ct-12h A301, Sand 1<br />
Wednesdays 10ct-12h A301, Sand 1<br />
Problem sessions:<br />
Day Time Where<br />
Email:<br />
Daniel Huson huson at <strong>in</strong>formatik.uni-tueb<strong>in</strong>gen.de<br />
Stefan Henz stefan at henz@tueb<strong>in</strong>gen.mpg.de<br />
Johannes Fischer fischer at <strong>in</strong>formatik.uni-tueb<strong>in</strong>gen.de<br />
Website: www-ab.<strong>in</strong>formatik.uni-tueb<strong>in</strong>gen.de/teach<strong>in</strong>g/ws09/bio<strong>in</strong>formatics-i<br />
<strong>1.</strong>2 How to get credit for this course<br />
To pass this course you must:<br />
• Always participate <strong>in</strong> the weekly problem sessions and present your results regularly.<br />
• Obta<strong>in</strong> at least 60 % of all atta<strong>in</strong>able po<strong>in</strong>ts on the assignment sheets.<br />
• Pass the mid-term exam.<br />
• Pass the f<strong>in</strong>al exam.<br />
You may work on and hand-<strong>in</strong> assignments and projects <strong>in</strong> groups of up to two people.<br />
Dates (tentative): Mid-term exam on 14.1<strong>1.</strong>2009, f<strong>in</strong>al exam on 17.2.2010<br />
Grade calculation: 50% mid-term exam, 50% f<strong>in</strong>al exam.
2 Bio<strong>in</strong>formatics I, WS’09-10, D. Huson, November 26, 2009<br />
<strong>1.</strong>3 Course notes and assignments<br />
When possible, course notes (“the script”) will be handed out at the beg<strong>in</strong>n<strong>in</strong>g of each lecture. The<br />
lecture notes will also be made available on the course website.<br />
Assignment sheets will usually be handed out and published on the course web-site on Mondays.<br />
Assignments are due a week later. Solutions should be sent to the tutor by email or handed <strong>in</strong> before<br />
the beg<strong>in</strong>n<strong>in</strong>g of the lecture.<br />
<strong>1.</strong>4 Contents of the lecture<br />
Bio<strong>in</strong>formatics I: Sequences and mach<strong>in</strong>e learn<strong>in</strong>g<br />
Bio<strong>in</strong>formatics II: Structures and systems biology<br />
<strong>1.</strong>4.1 Overview Bio<strong>in</strong>formatics I<br />
• Builds on “Grundlagen der Bio<strong>in</strong>formatik”<br />
• Mandatory lecture for Msc. bio<strong>in</strong>formatics students<br />
• Focuses on algorithms for the analysis of biological primary sequences<br />
• <strong>Algorithms</strong>: dynamic programm<strong>in</strong>g, heuristics, mach<strong>in</strong>e learn<strong>in</strong>g<br />
<strong>1.</strong>4.2 Textbooks<br />
<strong>Introduction</strong> to Computational Biology by Michael Waterman<br />
<strong>Introduction</strong> to Computational Biology by Setubal / Maidanis<br />
Biological sequence analysis by Durb<strong>in</strong>, Eddy, Krogh and Mitchison<br />
Bio<strong>in</strong>formatics - The Mach<strong>in</strong>e Learn<strong>in</strong>g Approach by Pierre Baldi and Soren Brunak<br />
<strong>1.</strong>5 Summary of Grundlagen-der-Bio<strong>in</strong>formatik lecture<br />
• Pairwise alignments (Scor<strong>in</strong>g matrices, NW, SW)<br />
• Blast<br />
• Multiple alignments (SP score, star, progressive)<br />
• Phylogeny (UPGMA, NJ, Maximum Parsimony)<br />
• HMMS (CpG, Viterbi, supervised tra<strong>in</strong><strong>in</strong>g)<br />
• gene f<strong>in</strong>d<strong>in</strong>g (ORF prediction <strong>in</strong> procaryotes, GenScan)<br />
• RNA structures (Nuss<strong>in</strong>ov, Zuker)<br />
• Prote<strong>in</strong> secondary structures (classification, Chou-Fasman, SSP)<br />
• Prote<strong>in</strong> tertiary structures (classification, thread<strong>in</strong>g, de novo, comparison)
Bio<strong>in</strong>formatics I, WS’09-10, D. Huson, November 26, 2009 3<br />
• Bionformatics Databases<br />
• Microarray (Technology, Normalization, Cluster<strong>in</strong>g, Statistics)<br />
<strong>1.</strong>6 Overview Bio<strong>in</strong>formatics I<br />
<strong>1.</strong> Pairwise alignment (quick rem<strong>in</strong>der, aff<strong>in</strong>e gaps, k-band, l<strong>in</strong>ear space)<br />
2. Multiple alignment (T-Coffee, Muscle)<br />
3. BLAST and psi-BLAST, BLAT<br />
4. Phylogeny (ML and Bayesian, network methods)<br />
5. Suffix trees (Generation, searches, repeats)<br />
6. Motif f<strong>in</strong>d<strong>in</strong>g<br />
7. Hidden Markov Models (Tra<strong>in</strong><strong>in</strong>g, Viterbi Tra<strong>in</strong><strong>in</strong>g, Baum-Welch)<br />
8. Gene f<strong>in</strong>d<strong>in</strong>g (GenScan, Tw<strong>in</strong>scan)<br />
9. Support Vector Mach<strong>in</strong>es (subcellular location)<br />
10. Physical mapp<strong>in</strong>g (3 protocols, 3 algorithms)<br />
1<strong>1.</strong> Sequenc<strong>in</strong>g and assembly<br />
12. Population genetics