12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

294 I.A. Cymerman et al.types, scientists often focus on seeking functions for the vast numbers of sequencescollected in the databases. Obviously, the change from the descriptive <strong>to</strong> the morepredictive applications requires the development of novel <strong>to</strong>ols.The most basic information of interest regarding a particular gene or proteinis its associated function. The most common approach <strong>to</strong> function predictionstarts from the observation that proteins <strong>with</strong> similar sequences frequently havesimilar functions. Particularly <strong>with</strong> the recent increase in the number of availablesequences, related sequences can be aligned and grouped in<strong>to</strong> families. If thefunction of one of the family members is known, the remaining sequences arehypothetically assigned <strong>with</strong> the same function “through inheritance”. This raisesthe question as <strong>to</strong> whether one needs a protein structure <strong>to</strong> predict protein functionalityor is sequence information sufficient. At first sight one could answer thatit depends on the sequence similarity between two compared protein sequences.It is widely accepted that sequence identity higher than about 30% is a strongindica<strong>to</strong>r that the proteins may share a very similar structure that can be predictedby comparative approaches (see Chapter 3), yielding generally accurate models.Below this threshold, however, assignment of protein structure requires moresophisticated approaches (such as those outlined in Chapters 1 and 2) and is nolonger as accurate. Since function depends on structure, one could envisage thata similar dependency holds also for transfer of functional annotations. This, however,is not necessarily the case, as significant functional variation can beobserved even for proteins <strong>with</strong> very similar sequences and structures. Forinstance, functional annotations based on Gene On<strong>to</strong>logy are conserved in only80% of protein pairs even for proteins sharing 90–100% identity; this value dropsbelow 50% for proteins <strong>with</strong> less than 30% sequence identity. Some aspects offunction appear <strong>to</strong> be more conserved than others, e.g. if the enzymatic functionis considered according <strong>to</strong> the EC system, all four EC numbers tend <strong>to</strong> be conservedin nearly 100% cases above 70% pairwise sequence identity. For sequences<strong>with</strong> pairwise identity below 30%, the conservation of EC numbers also dropsbelow 50% (Tress et al. 2008).Conservation of function is more complex than conservation of structure,because functional overlap (e.g. identical function in two copies of one gene aftera duplication) is subjected <strong>to</strong> evolutionary rate acceleration, which howeverdepends on pre-existing functional utility of the protein encoded by the ancestralsingle<strong>to</strong>n (Jordan et al. 2004). Thus, a duplication that gives rise <strong>to</strong> paralogousproteins <strong>with</strong> very similar sequences and structures either results in a loss, byinactivating mutation or deletion, of one the copies (thus reversal <strong>to</strong> the ancestralstate) or in functional divergence of one or both copies, as <strong>to</strong> reduce the overlap.On the other hand, orthologous sequences tend <strong>to</strong> retain identical function, oftendespite considerable sequence divergence. Pairwise sequence comparisons arehowever unable <strong>to</strong> distinguish between orthologous and paralogous sequences,thus arguing against its use for function annotation. There are a number of methodsthat carry out function prediction based on evolutionary analyses, and discriminationof paralogous vs orthologous sequences (e.g. FlowerPower(Krishnamurthy et al. 2007) ). However, these methods require the availability of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!