12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

152 B.H. Dessailly and C.A. Orengo6.3.1.2 Practical DefinitionsOnly databases that consider structural data are described here.CATH and Gene3D – In the CATH classification, domains in a given <strong>to</strong>pology (seeSection 2.1.2) are further classified in the same Homologous superfamily (H-level) ifthey are believed <strong>to</strong> have a common ances<strong>to</strong>r. Two domains are considered homologousif they satisfy at least two of the following criteria: (a) structural similarity, assessedusing empirically-derived cut-offs; (b) sequence similarity, assessed using standardsequence comparison methods and HMM sequence searches; and (c) functional similarity,identified using manual analysis. Gene3D expands this classification <strong>to</strong> proteinsof unknown structure, by scanning sequences against a library of CATH profile-HMM’s, thus matching parts of these sequences <strong>to</strong> CATH homologous superfamilies(Yeats et al. 2008). CATH superfamilies are further divided in<strong>to</strong> sequence families thatare defined at different cut-offs of sequence identity. A cut-off of 35% sequence identityis used <strong>to</strong> define non-redundant groups of proteins (s35 families).SCOP and Superfamily – For SCOP superfamilies, homologies are determinedby sequence similarity or by manual comparison of structural and functional features(Andreeva et al. 2008). This manual assignment provides the community <strong>with</strong>a curated expert classification of domain structures, but suffers from the concomitantdrawback that any manual process is inevitably prone <strong>to</strong> subjective decisions.Domains are classified in<strong>to</strong> the same SCOP family if they are “clearly evolutionarilyrelated”. In practise, this definition generally means that protein domains aregrouped in<strong>to</strong> the same family if they share pair wise residue identities of more than30%. However, some domains are classified in<strong>to</strong> the same SCOP families in theabsence of high sequence identities if similar structures and functions providedefinitive evidence of common ancestry. This has the advantage of allowing forsome flexibility in the assignment of homology relationships but also gives moreroom for subjectivity in the process. The Superfamily database expands SCOP <strong>to</strong>proteins of unknown structure by annotating sequences <strong>with</strong> SCOP descriptions atthe family and superfamily level (Wilson et al. 2007). As <strong>with</strong> Gene3D, Superfamilyuses SCOP-based HMM profiles <strong>to</strong> assign matches in sequences.SFLD – The <strong>Structure</strong>-<strong>Function</strong> Linkage Database has been developed morerecently <strong>with</strong> the specific aim of studying the structure-function relationships amongsthomologous enzymes. It currently covers a relatively small set of superfamilies, ascompared <strong>with</strong> CATH and SCOP, but provides a detailed description of the evolutionof function <strong>with</strong>in these superfamilies. The SFLD imposes that enzymes <strong>with</strong>in superfamiliesshould not only be homologous but must share a mechanistic attribute in thecatalytic reaction using conserved structural elements (Pegg et al. 2006). SFLD familiesconsist of enzymes that perform the same overall reaction in a given superfamily.6.3.2 Evolution of <strong>Protein</strong> SuperfamiliesUltimately, the criterion <strong>to</strong> group proteins <strong>to</strong>gether in superfamilies is that the genesencoding them descend from a common ances<strong>to</strong>r gene. The processes by which anances<strong>to</strong>r gene gives rise <strong>to</strong> two (or more) copies of itself are commonly referred <strong>to</strong>under the term duplication.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!