Homology Modeling in Biology and Medicine - LSIR - EPFL
Homology Modeling in Biology and Medicine - LSIR - EPFL
Homology Modeling in Biology and Medicine - LSIR - EPFL
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The goal of <strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>Use homologous sequences to construct amodel of 3D structureAnalyze relationships between DNAsequence <strong>and</strong> 3D structure of prote<strong>in</strong>sKnow prote<strong>in</strong>’s 3D structure to underst<strong>and</strong><strong>in</strong>teractions with other moleculesCreate computer-aided drug design,mutagenesis <strong>and</strong> prote<strong>in</strong> eng<strong>in</strong>eer<strong>in</strong>g01/12/2004 3
Homologous sequencesGene A <strong>in</strong> DNA sequenceDuplication ofthe gene AGene AParalogyGene AOrthologyMutation<strong>in</strong>volv<strong>in</strong>gSpeciationXenologyGene AduplicatedCopytransferred tothe specie froman otherorganismGene A <strong>in</strong>differentorganismGene A Gene A’Gene A <strong>in</strong>species 1Gene A <strong>in</strong>species 201/12/2004 4
How determ<strong>in</strong>e the prote<strong>in</strong> structure?By experimentationX-RayNMR (nuclear magnetic resonance spectroscopy)Today, Sequence Analysis have explodedWe have the dataWe need to construct 3D modelsThe idea!Use similar structure to identify constra<strong>in</strong>ts <strong>and</strong> build foldcorrespond<strong>in</strong>g<strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>01/12/2004 7
Where f<strong>in</strong>d the data?Prote<strong>in</strong> Data Bank (PDB)http://www.rcsb.org/pdb/> 10,000 structures of prote<strong>in</strong>sText file conta<strong>in</strong>: coord<strong>in</strong>ates for each heavy (nonhydrogen)atom from the first residue to the lastATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 22901/12/2004 8
The way to visualize the prote<strong>in</strong> It is impossible to read this text file without the help ofgraphic viewers such as RASMOL http://www.bernste<strong>in</strong>-plus-sons.com/software/rasmol Different way to visualize:All-atom model, <strong>in</strong> ball-<strong>and</strong>stickrepresentationCα TraceColor<strong>in</strong>g: by structureSpace-fill<strong>in</strong>g model01/12/2004 9
Structure <strong>and</strong> homologous sequencesWith at least 30% identity between two sequences, adef<strong>in</strong>ite correlation exists between sequence <strong>and</strong>structureIn particular, homologous sequences show verysimilar structures, with strong conservation <strong>in</strong>secondary structural elementsSome folds are preferred by vastly different sequencesto conserve the structure of the active siteOn the other h<strong>and</strong>, some prote<strong>in</strong>s adoptvery similar structures, with no obvioussequence similarity01/12/2004 10
Why homology model<strong>in</strong>g?Other way to construct 3D modelPrediction method Ab <strong>in</strong>itio Thread<strong>in</strong>gBut :Expansive <strong>in</strong> time <strong>and</strong> <strong>in</strong> calculationThe solution of <strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>From 3D structure for each prote<strong>in</strong> familyConstruct model from this known structure template structure01/12/2004 11
Before build<strong>in</strong>g a model…Elements of sequence analysis, essential forbuild<strong>in</strong>g a molecular model, will be consideredMultiple sequence alignmentAlignment checksProte<strong>in</strong> doma<strong>in</strong>,…01/12/2004 12
Some problems have to be solved Homologous sequences are identified by us<strong>in</strong>g databasesearch methods (BLAST) To build a model, we require the alignment of completeprote<strong>in</strong> sequences, collected from database searches Identical residues must be l<strong>in</strong>ed up The rest should be arranged, based on observed substitution <strong>in</strong> prote<strong>in</strong> families chemical similarity charge similarity Where none of the 19 residues is suitable, the alignmentsimply skips that position a ‘gap’ (<strong>in</strong>sertion/deletionregions) CLUSTALW/CLUSTALX, MAXHOM, MALIGN (MSA method)01/12/2004 13
After alignment, check the resultThe function of a prote<strong>in</strong> depends on thelocalization <strong>in</strong> space of a few key residuesSome residues are critical for the stability of theprote<strong>in</strong> fold or for the formation of functionalquaternary structuresConserve all residues usually <strong>in</strong>dicate someconserved structural or functional role, especiallyburied charges01/12/2004 14
The most important step: f<strong>in</strong>d ahomologous structureThe criteria:Alignment Score <strong>and</strong> E value (discarded: low scores<strong>and</strong> high values (> 0.005) )Doma<strong>in</strong> coverage (at least 60% of the doma<strong>in</strong>)Gaps (the fewer the gaps, the better the structuralmodel)For small prote<strong>in</strong>s, specific search (disulfide bond)No structure found: prediction method used(second <strong>and</strong> tertiary structure prediction method)01/12/2004 17
Selection of template sequenceS<strong>in</strong>gle structural homologue one uniquechoice for template selectionSeveral equally structural homologues areidentified how many <strong>and</strong> which one(s) shouldwe choose?Improve one template <strong>in</strong> view<strong>in</strong>gsimple phylogenetic tree (show the most similarstructure)Completeness of structural <strong>in</strong>formation (view<strong>in</strong>gPDB <strong>in</strong>formation by RASMOL <strong>and</strong> verify thecompleteness of the structure)X-ray <strong>and</strong> NMR entries01/12/2004 18
One or many templates?When we have selected many templateswith same quality <strong>and</strong> similarityCompare 3D structure to check the unique<strong>in</strong>formation each templates providesStructure alignment of Cα atomsIf 2 templates are very close, keep only oneKeep templates that provide new <strong>in</strong>formation01/12/2004 19
Align the template <strong>and</strong> the targetsequencesIn case of homology (>40%), the alignment isconstant <strong>and</strong> every method is availableIn the other cases, the use of multiple sequenceimprove the qualitySome checks are needed to <strong>in</strong>crease the satisfactionof the modelResidue conservation checks (pattern <strong>and</strong> function)Visual <strong>in</strong>spection of <strong>in</strong>del regions (RASMOL)01/12/2004 20
And f<strong>in</strong>ally… Build the modelIt is the moment to use a programIn <strong>in</strong>put: target, template sequence <strong>and</strong> theiralignmentIn output: the 3D structure respond<strong>in</strong>g ofthe constra<strong>in</strong>ts01/12/2004 22
Which program to choose?WHATIF (1990)SWISSMODEL (1993)MODELLER (1994)ICM (1994)CPH Models (1997)SDSC1 (2000) 3D-JIGSAW (2001)01/12/2004 23
Advantages of Modeller Implements comparative prote<strong>in</strong> structuremodell<strong>in</strong>g by satisfaction of spatial restra<strong>in</strong>ts (2,3)Can perform many additional tasks <strong>in</strong>clud<strong>in</strong>g de novo modell<strong>in</strong>g of loops <strong>in</strong> prote<strong>in</strong> structuresOptimize various models of prote<strong>in</strong> structure with respectto a flexibly def<strong>in</strong>ed objective functionPerform multiple alignment of prote<strong>in</strong> sequences <strong>and</strong>/orstructuresSearch sequence <strong>in</strong> databasesCompare prote<strong>in</strong> structures01/12/2004 25
Optimization with iteration01/12/2004 26
To concludeProte<strong>in</strong> structure determ<strong>in</strong>e functionsImportance to know prote<strong>in</strong> structure forapplication <strong>in</strong> biology <strong>and</strong> medec<strong>in</strong>e<strong>Homology</strong> model<strong>in</strong>g :From a known structure <strong>in</strong> prote<strong>in</strong> familyBuild a model of homologous sequence01/12/2004 27