12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8 3D Motifs 189clustering similar sequences and targeting a representative from each cluster, allowingcomparative models <strong>to</strong> be generated for the remaining sequences. The <strong>to</strong>talnumber of structures in the <strong>Protein</strong> Data Bank (PDB) (Berman et al. 2000) and thenumber from structural genomics initiatives have continued <strong>to</strong> grow at an increasingrate over the past several years. Many of the structures are of unknown function.These trends suggest that 3D motif methods will become more prevalent and usefulas more structures are determined and modelled.8.1.1 What Is <strong>Function</strong>?<strong>Function</strong> can be described at many levels and from many perspectives. Objectiveclassifications of function are needed for training and testing any method of functionalannotation. The gene on<strong>to</strong>logy (GO) system (Ashburner et al. 2000) is a hierarchicalset of functional descrip<strong>to</strong>rs ranging from broad <strong>to</strong> specific in each of threecategories: biological process, cellular component, and molecular function. For thespecific molecular functions of enzymes, GO embeds the Enzyme Commission (EC)system (International Union of Biochemistry and Molecular Biology: NomenclatureCommittee and Webb 1992) which is also hierarchical: catalyzed reactions aredescribed <strong>with</strong> four integers, where the first number refers <strong>to</strong> a broad class of reactionsand the last number refers <strong>to</strong> a specific substrate. GO also includes molecularfunction terms for stable binding relationships (where binding is not functionallyassociated <strong>with</strong> membrane transport or catalytic activity).Because 3D motifs are based on a<strong>to</strong>mic coordinates, they relate most naturally <strong>to</strong>detailed molecular functions such as catalysis of a particular reaction or binding ofa particular ligand. However, GO and EC do not include any details on enzymaticmechanism or which parts of a structure are directly responsible for a function(Babbitt 2003). For example, two enzymes that catalyze the same overall reactionwill be assigned the same EC number even if their overall structures and catalyticmechanisms are very different. Conversely, enzymes that are clearly homologous,and that share mechanistic features (such as a common partial reaction), may catalyzedifferent overall reactions <strong>with</strong> EC numbers that differ in all four integers.Besides functional classifications, structural classifications are also frequentlyused for training and testing annotation methods. SCOP (Structural Classificationof <strong>Protein</strong>s) (Murzin et al. 1995) and CATH (Class, Architecture, Topology, andHomologous superfamily) (Orengo et al. 1997) are hierarchical classifications ofprotein domains, or compact units of structure (Richardson 1981) observed <strong>to</strong>have been “mixed and matched” in evolution (Chothia et al. 2003). In SCOP,domains are classified in<strong>to</strong> families, superfamilies, folds, and classes. Familyassignments are often evident from sequence data alone, representing “easy”cases for annotation. Most benchmarking <strong>with</strong> SCOP has focused on superfamilyassignments. Superfamily membership provides many clues <strong>to</strong> a protein’s function,but it is not functionally specific. A structure may perform any of severalfunctions known for other superfamily members, or perform a related function

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!