11.07.2015 Views

Bioinformatics Slides (PDF) - Genomics & Medicine - Stanford ...

Bioinformatics Slides (PDF) - Genomics & Medicine - Stanford ...

Bioinformatics Slides (PDF) - Genomics & Medicine - Stanford ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Bioinformatics</strong><strong>Genomics</strong>& <strong>Medicine</strong>http://biochem118.stanford.edu/Doug Brutlag, Professor Emeritusof Biochemistry & <strong>Medicine</strong> (by courtesy)<strong>Stanford</strong> University School of <strong>Medicine</strong>


What is <strong>Bioinformatics</strong>?IndividualsRNAProteinDNAPhenotypeEvolutionSelectionPopulationsBiological Information


Computational Goals of <strong>Bioinformatics</strong>• Learn & Generalize: Discover conserved patterns (models) ofsequences, structures, metabolism & chemistries from well-studiedexamples.• Prediction: Infer function or structure of newly sequenced genes,genomes, proteomes or proteins from these generalizations.• Organize & Integrate: Develop a systematic and genomic approach tomolecular interactions, metabolism, cell signaling, gene expression…• Simulate: Model gene expression, gene regulation, protein folding,protein-protein interaction, protein-ligand binding, catalytic function,metabolism…• Engineer: Construct novel organisms or novel functions or novelregulation of genes and proteins.• Target: Mutations, RNAi to specifc genes and transcripts or drugs tospecifc protein targets.


Central Paradigm of Molecular BiologyDNA RNA Protein Phenotype


Central Paradigm of <strong>Medicine</strong>DNA RNA Protein SymptomsOpinions


Central Paradigm of <strong>Bioinformatics</strong>GeneticInformationMolecularStructureBiochemicalFunctionPhenotype(Symptoms)MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH


Soybean Leghemoglobin andSperm Whale MyoglobinSoybean LeghemoglobinSperm Whale Myoglobin


Challenges Understanding Genetic InformationGeneticInformationMolecularStructureBiochemicalFunctionPhenotype• Genetic information is redundant• Structural information is redundant• Genes and proteins are one dimensional buttheir function depends on three-dimensionalstructure


Challenges Understanding Genetic InformationGeneticInformationMolecularStructureBiochemicalFunctionPhenotype• Genetic information is redundant• Structural information is redundant• Genes and proteins are one dimensional buttheir function depends on three-dimensionalstructure• Genes and proteins are meta-stable


Discovering Function from Protein SequenceSequences ofCommonStructure or FunctionQueryDatabaseSequence Similarity10 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Dayhoff’s PAM 250Amino Acid Replacement Matrix (1978)


Discovering Function from Protein SequenceSequences ofCommonStructure or FunctionQueryDatabaseSequence Similarity10 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Discovering Function from Protein SequenceConsensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HSequences ofCommonStructure or FunctionQueryDatabaseSequence Similarity10 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Protein Motifs fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


A Typical Motif:Zinc Finger DNA Binding MotifC..C............H....H


Discovering Function from Protein SequenceConsensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HSequences ofCommonStructure or FunctionQueryDatabaseSequence Similarity10 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Discovering Function from Protein SequencePSSMs or Weight MatricesPosition1 2 3 4 5 6 7 8 9 10 11 12Consensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HA 2 1 3 13 10 12 67 4 13 9 1 2R 7 5 8 9 4 0 1 16 7 0 1 0N 0 8 0 1 0 0 0 2 1 1 10 0D 0 1 0 1 13 0 0 12 1 0 4 0C 0 0 1 0 0 0 0 0 0 2 2 1Q 1 1 21 8 10 0 0 7 6 0 0 2E 2 0 0 9 21 0 0 15 7 3 3 0G 9 7 1 4 0 0 8 0 0 0 46 0H 4 3 1 1 2 0 0 2 2 0 5 0I 10 0 11 1 2 10 0 4 9 3 0 16L 16 1 17 0 1 31 0 3 11 24 0 14K 3 4 5 10 11 1 1 13 10 0 5 2M 7 1 1 0 0 0 0 0 5 7 1 8F 4 0 3 0 0 4 0 0 0 10 0 0P 0 6 0 1 0 0 0 0 0 0 0 0S 1 17 0 8 3 1 3 0 2 2 2 0T 5 22 3 11 1 5 0 2 2 2 0 5W 2 0 0 0 0 0 0 0 0 1 0 1Y 1 0 4 2 0 1 0 0 2 4 0 1V 6 3 1 1 2 15 0 0 2 12 0 28QueryDatabaseSequences ofCommonStructure or FunctionSequence Similarity10 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Protein Motifs fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


Position-Specifc Scoring Matrix forProkaryotic Helix-Turn-Helix MotifsSequence Helix Turn HelixRCRO_LAMBD F G Q T K T A K D L G V Y Q S A I N K A I HRCRO_BP434 M T Q T E L A T K A G V K Q Q S I Q L I E ARCRO_BPP22 G T Q R A V A K A L G I S D A A V S Q W K ERPC1_LAMBD L S Q E S V A D K M G M G Q S G V G A L F NRPC1_BP434 L N Q A E L A Q K V G T T Q Q S I E Q L E NRPC1_BPP22 I R Q A A L G K M V G V S N V A I S Q W E RRPC2_LAMBD L G T E K T A E A V G V D K S Q I S R W K RLACR_ECOLI V T L Y D V A E Y A G V S Y Q T V S R V V NCRP_ECOLI I T Q Q E I G Q I V G C S R E T V G R I L KTRPR_ECOLI M S Q R E L K N E L G A G I A T I T R G S NRPC1_CPP22 R G Q R K V A D A L G I N E S Q I S R W K GGALR_ECOLI A T I K D V A R L A G V S V A T V S R V I NY77_BPT7 L S H R S L G E L Y G V S Q S T I T R I L QTER3_ECOLI L T T R K L A Q K L G V E Q P T L Y W H V KVIVB_BPT7 D Y Q A I F A Q Q L G G T Q S A A S Q I D EDEOR_ECOLI L H L K D A A A L L G V S E M T I R R D L NRP32_BACSU R T L E E V G K V F G V T R E R I R Q I E AY28_BPT7 E S N V S L A R T Y G V S Q Q T I C D I R KIMMRE_BPPH S T L E A V A G A L G I Q V S A I V G E E T


Blocks or Finger Prints fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


Blocks or Finger Prints fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


Discovering Function from Protein SequencePSSMs or Weight MatricesPosition1 2 3 4 5 6 7 8 9 10 11 12Consensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HA 2 1 3 13 10 12 67 4 13 9 1 2R 7 5 8 9 4 0 1 16 7 0 1 0N 0 8 0 1 0 0 0 2 1 1 10 0D 0 1 0 1 13 0 0 12 1 0 4 0C 0 0 1 0 0 0 0 0 0 2 2 1Q 1 1 21 8 10 0 0 7 6 0 0 2E 2 0 0 9 21 0 0 15 7 3 3 0G 9 7 1 4 0 0 8 0 0 0 46 0H 4 3 1 1 2 0 0 2 2 0 5 0I 10 0 11 1 2 10 0 4 9 3 0 16L 16 1 17 0 1 31 0 3 11 24 0 14K 3 4 5 10 11 1 1 13 10 0 5 2M 7 1 1 0 0 0 0 0 5 7 1 8F 4 0 3 0 0 4 0 0 0 10 0 0P 0 6 0 1 0 0 0 0 0 0 0 0S 1 17 0 8 3 1 3 0 2 2 2 0T 5 22 3 11 1 5 0 2 2 2 0 5W 2 0 0 0 0 0 0 0 0 1 0 1Y 1 0 4 2 0 1 0 0 2 4 0 1V 6 3 1 1 2 15 0 0 2 12 0 28QueryDatabaseSequences ofCommonStructure or FunctionSequence SimilarityProfles, PSI-BLASTHidden Markov ModelsD 2 D 3 D 4 D 5I 1 I 2 I 3 I 4 I 5AA1 AA2 AA3 AA4 AA5 AA610 20 30 40 50VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Hidden Markov Models fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


Data Mining:The Seach for Buried Treasure


Data Mining:The Seach for Buried Treasure


Data Mining:The Seach for Buried Treasure


PROSITE Patternshttp://expasy.org/prosite/•Active site of trypsin-like serine proteasesG D S G G•Zinc Finger (C 2 H 2 type)C-X(2,4)-C-X(12)-H-X(3,5)-H•N-Glycosylation SiteN-[^P]-[S T]-[^P]•Homeobox Domain Signature[LIVMF]-X(5)-[LIVM]-X(4)-[IV]-[RKQ]-X-W-X(8)-[RK]


Swiss Institute of <strong>Bioinformatics</strong>http://www.isb-sib.ch/


Expasy <strong>Bioinformatics</strong> Resource Portalhttp://expasy.org/


Expasy <strong>Bioinformatics</strong> Resource Portalhttp://expasy.org/


UniProt Knowledge Basehttp://www.uniprot.org/


UniProt Opsin Entrieshttp://www.uniprot.org/


UniProt Human Opsin Advanced Searchhttp://www.uniprot.org/


UniProt Human Opsin Entries Reviewedhttp://www.uniprot.org/


UniProt Human Opsin OPN1MW Entryhttp://www.uniprot.org/uniprot/P04001


Blast UniProt Human Opsin OPN1MW Entryhttp://www.uniprot.org/uniprot/P04001


Blast UniProt Human OPN1MW Resultshttp://www.uniprot.org/uniprot/P04001


NCBI BLAST Home Pagehttp://blast.ncbi.nlm.nih.gov/


NCBI BLAST Home Pagehttp://blast.ncbi.nlm.nih.gov/


NCBI BLAST Parametershttp://blast.ncbi.nlm.nih.gov/


Discovering Function from Protein SequencePSSMs or Weight MatricesPosition1 2 3 4 5 6 7 8 9 10 11 12Consensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HA 2 1 3 13 10 12 67 4 13 9 1 2R 7 5 8 9 4 0 1 16 7 0 1 0N 0 8 0 1 0 0 0 2 1 1 10 0D 0 1 0 1 13 0 0 12 1 0 4 0C 0 0 1 0 0 0 0 0 0 2 2 1Q 1 1 21 8 10 0 0 7 6 0 0 2E 2 0 0 9 21 0 0 15 7 3 3 0G 9 7 1 4 0 0 8 0 0 0 46 0H 4 3 1 1 2 0 0 2 2 0 5 0I 10 0 11 1 2 10 0 4 9 3 0 16L 16 1 17 0 1 31 0 3 11 24 0 14K 3 4 5 10 11 1 1 13 10 0 5 2M 7 1 1 0 0 0 0 0 5 7 1 8F 4 0 3 0 0 4 0 0 0 10 0 0P 0 6 0 1 0 0 0 0 0 0 0 0S 1 17 0 8 3 1 3 0 2 2 2 0T 5 22 3 11 1 5 0 2 2 2 0 5W 2 0 0 0 0 0 0 0 0 1 0 1Y 1 0 4 2 0 1 0 0 2 4 0 1V 6 3 1 1 2 15 0 0 2 12 0 28Sequences ofCommonStructure or FunctionProfles, PSI-BLASTHidden Markov ModelsD 2 D 3 D 4 D 5I 1 I 2 I 3 I 4 I 5AA1 AA2 AA3 AA4 AA5 AA6Sequence Similarity10 20 30 40 501 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Entrez Gene search for Colorblindness


Entrez Gene search for Colorblindness


Entrez Gene search for Colorblindness


Entrez Gene search for Opsins


Entrez Gene search for Opsins


BLAST Similarity Searchhttp://www.ncbi.nlm.nih.gov/BLAST/


Choose Standard Protein-Protein BLASThttp://www.ncbi.nlm.nih.gov/BLAST/


Paste Sequence, Choose SwissProt Databaseand BLAST!


Optional Parameters


BLAST Conserved Domain Output


Sequence Aligned with Domain


Most Signifcant Similarity Hits


Most Signifcant Similarity Hits


Bovine Blue Opsin Similarity


Discovering Function from Protein SequencePSSMs or Weight MatricesPosition1 2 3 4 5 6 7 8 9 10 11 12Consensus Sequencesor Sequence MotifsZinc Finger (C2H2 type)C X{2,4} C X{12} H X{3,5} HA 2 1 3 13 10 12 67 4 13 9 1 2R 7 5 8 9 4 0 1 16 7 0 1 0N 0 8 0 1 0 0 0 2 1 1 10 0D 0 1 0 1 13 0 0 12 1 0 4 0C 0 0 1 0 0 0 0 0 0 2 2 1Q 1 1 21 8 10 0 0 7 6 0 0 2E 2 0 0 9 21 0 0 15 7 3 3 0G 9 7 1 4 0 0 8 0 0 0 46 0H 4 3 1 1 2 0 0 2 2 0 5 0I 10 0 11 1 2 10 0 4 9 3 0 16L 16 1 17 0 1 31 0 3 11 24 0 14K 3 4 5 10 11 1 1 13 10 0 5 2M 7 1 1 0 0 0 0 0 5 7 1 8F 4 0 3 0 0 4 0 0 0 10 0 0P 0 6 0 1 0 0 0 0 0 0 0 0S 1 17 0 8 3 1 3 0 2 2 2 0T 5 22 3 11 1 5 0 2 2 2 0 5W 2 0 0 0 0 0 0 0 0 1 0 1Y 1 0 4 2 0 1 0 0 2 4 0 1V 6 3 1 1 2 15 0 0 2 12 0 28Sequences ofCommonStructure or FunctionProfles, PSI-BLASTHidden Markov ModelsD 2 D 3 D 4 D 5I 1 I 2 I 3 I 4 I 5AA1 AA2 AA3 AA4 AA5 AA6Sequence Similarity10 20 30 40 501 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN10 20 30 40 50


Evaluation of ProflesNegative ProteinsPositive ProteinsT


Evaluation of ProflesSpecificity=TN/(TN+FP)Negative ProteinsPositive Predictive Value=TP/(TP+FP)Sensitivity=TP/(TP+FN)Positive ProteinsTNTPFNFP


MyHits Local Motifs Searchhttp://myhits.isb-sib.ch/


MyHits Local Motifs Queryhttp://myhits.isb-sib.ch/


MyHits Local Motifs Searchhttp://myhits.isb-sib.ch/


MyHits Local Motifs Summaryhttp://myhits.isb-sib.ch/


MyHits Local Motif Hitshttp://myhits.isb-sib.ch/


MyHits Local Motifs Hist (Cont.)http://myhits.isb-sib.ch/


MyHits Local Motifs Hist (Cont.)


MyHits Local Motifs Hist (Cont.)


InterProhttp://www.ebi.ac.uk/interpro/


InterProScanhttp://www.ebi.ac.uk/interpro/


InterPro Scanhttp://www.ebi.ac.uk/Tools/pfa/iprscan/


InterPro Scan HourGlasshttp://www.ebi.ac.uk/InterProScan/


InterPro Scan Resultshttp://www.ebi.ac.uk/InterProScan/


InterPro Scan Resultshttp://www.ebi.ac.uk/InterProScan/


GO: Gene Ontology Databasehttp://www.geneontology.org/


GO: Gene Ontology for Opsin OPN1MWhttp://www.geneontology.org/


GO: Gene Ontology for Opsin OPN1MWhttp://www.geneontology.org/


GO: Sequence Information for OPN1MWhttp://www.geneontology.org/


GO: Annotations for OPN1MWhttp://www.geneontology.org/


GO: Gene Ontology Databasehttp://www.geneontology.org/


GO: Gene Ontology Terms for OPN1MWhttp://www.geneontology.org/


GO: Gene Ontology Term GCRPhttp://www.geneontology.org/


GO: Gene Ontology GCPR Termhttp://www.geneontology.org/


GO: Gene Ontology GCPR Termhttp://www.geneontology.org/


<strong>Bioinformatics</strong> Homeworkhttp://biochem118.stanford.edu/bioinformatics.htmlHomework Assignment1) Select a protein from OMIM or from Entrez Gene or from UniProt concerning thedisease of interest to you. Copy and save the FASTA format of the protein fle.2) Search your protein for motifs with the MyHits Motif Scan Query. Be sure to IncludeProsite Patterns, Prosite Frequent Patterns, Prosite Profles, Prefles, Pfam HMMSs(local Models) in your search. Please send me the MyHits you think are biologicallysignifcant and at least 1 or 2 hits which you think are not statistically or biologicallysignifcant. Please note that only the Profles have expectation values. The Patternsdo not have a measure of statistical signifcance.3) Search your protein for blocks using the InterPro database. Please send me a few ofthe InterPro domains hits you think are signifcant and at least 1 or 2 hits which youthink are not statistically or biologically signifcant. Please note that the defaultgraphic output of InterPro does not list expectation values. You must switch to theTabular view to obtain the statistical signifcance.4) Search your protein for homology using the BLAST method. Please report two orthree hits which are both statistically and biologically signifcant. Also report two orthree hits which you think are neither statistically nor biologically signifcant. Ifyour protein family is very large, you may have to ask BLAST to return more hits tofnd statistically insignifcant hits.


Statistical vs. Biological SignifcanceAssignmentFirst, for each search (MyHits, InterPro and BLAST hit), I would like you toreport some signifcance hits and describe why you think they aresignifcant both statistically and biologically; also report some statisticallyinsignifcant hits (and why) and are any of your statistically insignifcanthits, still signifcant biologically). To remind you what I said in class: astatistically signifcant fnd in the database search is always biologicallysignifcant, but a biologically signifcant result in the search is notnecessarily always statistically signifcant.Statistical signifcance and expectation values.Statistical signifcance is determined by the expectation value which gives youa measure of how likely this fnding is based on pure chance. A fndingwith an E-value of 1 or greater is not signifcant because it could occur bypure chance. A fnding with an E-value less than 10 -3 (one chance in athousand) is generally considered statistically signifcant (unless of courseyou are doing a 1,000 searches!). So the lower the expectation value, themore signifcant the fnding. Findings between 10 -3 and 1 are in the socalled twilight zone and require some further analysis or experiments todetermine their validity.


Statistical vs. Biological Signifcance (cont)InterProUnlike most of the other methods, InterPro sets a very high level ofsignifcance for a fnding before it will report it. This means thatyou will usually not fnd any statistically insignifcant hits for thisparticular search.Biological SignifcanceIn order to determine biological signifcance you must read thebiological properties (ontology terms are the most useful) of yourprotein and the biological properties of your fndings. Thefndings may be signifcant because the fnding defnes a veryclosely related protein family (opsins for example) or a very broadfamily (G-coupled protein receptors or 7-transmembrane proteins)or a common structure (protein fold) or a specifc function (retinalbinding site) or a very specifc catalytic activity. You shoulddescribe in words the level of the biological signifcance.


Statistical vs. Biological Signifcance (cont)MyHitsIf you ask MyHits to return PATTERNs as well as motifs, you willnotice that PATTERNs do not have E-values associated with themso there is no easy way to judge statistical signifcance. Withpattern fndings you are left only with judging biologicalsignifcance. Also none of the Frequent patterns from MyHits arestatistically signifcant.BLASTIf you do not have any insignifcant hits from the BLAST search, itmeans that your protein family is very large and you have to askBLAST to return more results using the Advanced Options at thebottom of the form. Only when you see hits with E-values > 0.001do you have insignifcant fndings.


Hidden Markov Models fromMultiple Sequence AlignmentsEBI Course on Protein Motifs/Signatureshttp://www.ebi.ac.uk/training/online/course/introduction-protein-classifcation-ebi


Multiple Enhancer Sequences


Structure of 5’ CAP© Michael W. King


PolyAdenylation of mRNAs© Michael W. King


Intron Splicing Mechanism© Michael W. King


Splicing, Capping & polyAdenylationYields Mature mRNATSSPromoter Exon Intron Exon Intron ExonTTSTerminatorGene5’ 3’TranscriptSplicing5’ 3’Cappoly-A5’ 3’mRNA


GENSCAN Gene Modelhttp://genes.mit.edu/GENSCAN.htmlHiddenMarkov modelsof genestructure© Christopher Burge


Alternative Splicing Generates DistinctProteinsPromoterExon Intron Exon Intron Exon TerminatorGeneTranscriptSplicingCappoly-A5’ 3’mRNA-1Alternate SplicingTranscriptCappoly-A5’ 3’mRNA-2


ESTs, Full Length cDNAUniGene & RefSeq DatabasesPromoterExon Intron Exon Intron Exon Terminator5’ UTR3’ UTRGeneSplicingTranscriptCappoly-A5’ 3’5’ ESTsmRNA3’ ESTs5’ UTR 3’ UTRProteinFull Length cDNA


Alternative SplicingDetected in EST Libraries


HiddenMarkov modelsof genestructureGENSCAN Gene Modelhttp://genes.mit.edu/GENSCAN.html


Gene Locihttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=geneProteinSequencesmRNASequencesESTSequencesGrailEXPFGENESHGenscanGeneLocus


<strong>Genomics</strong>, <strong>Bioinformatics</strong> &Computational Biology<strong>Genomics</strong><strong>Bioinformatics</strong>Structural <strong>Genomics</strong>ProteomicsComputational Molecular BiologyComputational Biology


<strong>Genomics</strong>, <strong>Bioinformatics</strong> &Computational Biology<strong>Genomics</strong> <strong>Bioinformatics</strong>Systems BiologyStructural <strong>Genomics</strong> ProteomicsComputational Molecular BiologyComputational Biology


<strong>Genomics</strong>, <strong>Bioinformatics</strong> &Computational Biology<strong>Genomics</strong><strong>Bioinformatics</strong>Structural <strong>Genomics</strong>ProteomicsComputational Molecular BiologyComputational BiologyRoboticsMachine LearningDatabasesStatistics & ProbabilityArtifcial IntelligenceInformation TheoryAlgorithms Graph Theory


Redundancy in Genomic& Protein Sequences• DNA is double-stranded• Genetic code• Acceptable amino-acidreplacements• Intron-exon variation• Alternative splicing• Strain variations (SNPs)• Sequencing errors


Hidden Markov Models (after Haussler)http://www.cse.ucsc.edu/compbio/sam.htmlD 2 D 3 D 4D 5I 1 I 2 I 3 I 4 I 5AA1 AA2 AA3 AA4 AA5 AA6


pFAM at Sanger Center (UK)http://www.sanger.ac.uk/Software/Pfam/


pFAM at Sanger Center (UK)http://pfam.sanger.ac.uk/


pFAM at Sanger Center (UK)http://pfam.sanger.ac.uk/


pFAM at Sanger Center (UK)http://pfam.sanger.ac.uk/


pFAM at Sanger Center (UK)http://pfam.sanger.ac.uk/


pFAM at Sanger Center (UK)http://pfam.sanger.ac.uk/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!