12.07.2015 Views

Homology Modeling in Biology and Medicine - LSIR - EPFL

Homology Modeling in Biology and Medicine - LSIR - EPFL

Homology Modeling in Biology and Medicine - LSIR - EPFL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The goal of <strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>Use homologous sequences to construct amodel of 3D structureAnalyze relationships between DNAsequence <strong>and</strong> 3D structure of prote<strong>in</strong>sKnow prote<strong>in</strong>’s 3D structure to underst<strong>and</strong><strong>in</strong>teractions with other moleculesCreate computer-aided drug design,mutagenesis <strong>and</strong> prote<strong>in</strong> eng<strong>in</strong>eer<strong>in</strong>g01/12/2004 3


Homologous sequencesGene A <strong>in</strong> DNA sequenceDuplication ofthe gene AGene AParalogyGene AOrthologyMutation<strong>in</strong>volv<strong>in</strong>gSpeciationXenologyGene AduplicatedCopytransferred tothe specie froman otherorganismGene A <strong>in</strong>differentorganismGene A Gene A’Gene A <strong>in</strong>species 1Gene A <strong>in</strong>species 201/12/2004 4


How determ<strong>in</strong>e the prote<strong>in</strong> structure?By experimentationX-RayNMR (nuclear magnetic resonance spectroscopy)Today, Sequence Analysis have explodedWe have the dataWe need to construct 3D modelsThe idea!Use similar structure to identify constra<strong>in</strong>ts <strong>and</strong> build foldcorrespond<strong>in</strong>g<strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>01/12/2004 7


Where f<strong>in</strong>d the data?Prote<strong>in</strong> Data Bank (PDB)http://www.rcsb.org/pdb/> 10,000 structures of prote<strong>in</strong>sText file conta<strong>in</strong>: coord<strong>in</strong>ates for each heavy (nonhydrogen)atom from the first residue to the lastATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 22901/12/2004 8


The way to visualize the prote<strong>in</strong> It is impossible to read this text file without the help ofgraphic viewers such as RASMOL http://www.bernste<strong>in</strong>-plus-sons.com/software/rasmol Different way to visualize:All-atom model, <strong>in</strong> ball-<strong>and</strong>stickrepresentationCα TraceColor<strong>in</strong>g: by structureSpace-fill<strong>in</strong>g model01/12/2004 9


Structure <strong>and</strong> homologous sequencesWith at least 30% identity between two sequences, adef<strong>in</strong>ite correlation exists between sequence <strong>and</strong>structureIn particular, homologous sequences show verysimilar structures, with strong conservation <strong>in</strong>secondary structural elementsSome folds are preferred by vastly different sequencesto conserve the structure of the active siteOn the other h<strong>and</strong>, some prote<strong>in</strong>s adoptvery similar structures, with no obvioussequence similarity01/12/2004 10


Why homology model<strong>in</strong>g?Other way to construct 3D modelPrediction method Ab <strong>in</strong>itio Thread<strong>in</strong>gBut :Expansive <strong>in</strong> time <strong>and</strong> <strong>in</strong> calculationThe solution of <strong>Homology</strong> <strong>Model<strong>in</strong>g</strong>From 3D structure for each prote<strong>in</strong> familyConstruct model from this known structure template structure01/12/2004 11


Before build<strong>in</strong>g a model…Elements of sequence analysis, essential forbuild<strong>in</strong>g a molecular model, will be consideredMultiple sequence alignmentAlignment checksProte<strong>in</strong> doma<strong>in</strong>,…01/12/2004 12


Some problems have to be solved Homologous sequences are identified by us<strong>in</strong>g databasesearch methods (BLAST) To build a model, we require the alignment of completeprote<strong>in</strong> sequences, collected from database searches Identical residues must be l<strong>in</strong>ed up The rest should be arranged, based on observed substitution <strong>in</strong> prote<strong>in</strong> families chemical similarity charge similarity Where none of the 19 residues is suitable, the alignmentsimply skips that position a ‘gap’ (<strong>in</strong>sertion/deletionregions) CLUSTALW/CLUSTALX, MAXHOM, MALIGN (MSA method)01/12/2004 13


After alignment, check the resultThe function of a prote<strong>in</strong> depends on thelocalization <strong>in</strong> space of a few key residuesSome residues are critical for the stability of theprote<strong>in</strong> fold or for the formation of functionalquaternary structuresConserve all residues usually <strong>in</strong>dicate someconserved structural or functional role, especiallyburied charges01/12/2004 14


The most important step: f<strong>in</strong>d ahomologous structureThe criteria:Alignment Score <strong>and</strong> E value (discarded: low scores<strong>and</strong> high values (> 0.005) )Doma<strong>in</strong> coverage (at least 60% of the doma<strong>in</strong>)Gaps (the fewer the gaps, the better the structuralmodel)For small prote<strong>in</strong>s, specific search (disulfide bond)No structure found: prediction method used(second <strong>and</strong> tertiary structure prediction method)01/12/2004 17


Selection of template sequenceS<strong>in</strong>gle structural homologue one uniquechoice for template selectionSeveral equally structural homologues areidentified how many <strong>and</strong> which one(s) shouldwe choose?Improve one template <strong>in</strong> view<strong>in</strong>gsimple phylogenetic tree (show the most similarstructure)Completeness of structural <strong>in</strong>formation (view<strong>in</strong>gPDB <strong>in</strong>formation by RASMOL <strong>and</strong> verify thecompleteness of the structure)X-ray <strong>and</strong> NMR entries01/12/2004 18


One or many templates?When we have selected many templateswith same quality <strong>and</strong> similarityCompare 3D structure to check the unique<strong>in</strong>formation each templates providesStructure alignment of Cα atomsIf 2 templates are very close, keep only oneKeep templates that provide new <strong>in</strong>formation01/12/2004 19


Align the template <strong>and</strong> the targetsequencesIn case of homology (>40%), the alignment isconstant <strong>and</strong> every method is availableIn the other cases, the use of multiple sequenceimprove the qualitySome checks are needed to <strong>in</strong>crease the satisfactionof the modelResidue conservation checks (pattern <strong>and</strong> function)Visual <strong>in</strong>spection of <strong>in</strong>del regions (RASMOL)01/12/2004 20


And f<strong>in</strong>ally… Build the modelIt is the moment to use a programIn <strong>in</strong>put: target, template sequence <strong>and</strong> theiralignmentIn output: the 3D structure respond<strong>in</strong>g ofthe constra<strong>in</strong>ts01/12/2004 22


Which program to choose?WHATIF (1990)SWISSMODEL (1993)MODELLER (1994)ICM (1994)CPH Models (1997)SDSC1 (2000) 3D-JIGSAW (2001)01/12/2004 23


Advantages of Modeller Implements comparative prote<strong>in</strong> structuremodell<strong>in</strong>g by satisfaction of spatial restra<strong>in</strong>ts (2,3)Can perform many additional tasks <strong>in</strong>clud<strong>in</strong>g de novo modell<strong>in</strong>g of loops <strong>in</strong> prote<strong>in</strong> structuresOptimize various models of prote<strong>in</strong> structure with respectto a flexibly def<strong>in</strong>ed objective functionPerform multiple alignment of prote<strong>in</strong> sequences <strong>and</strong>/orstructuresSearch sequence <strong>in</strong> databasesCompare prote<strong>in</strong> structures01/12/2004 25


Optimization with iteration01/12/2004 26


To concludeProte<strong>in</strong> structure determ<strong>in</strong>e functionsImportance to know prote<strong>in</strong> structure forapplication <strong>in</strong> biology <strong>and</strong> medec<strong>in</strong>e<strong>Homology</strong> model<strong>in</strong>g :From a known structure <strong>in</strong> prote<strong>in</strong> familyBuild a model of homologous sequence01/12/2004 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!