12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

46 L.A. Kelleyin ever greater detail. Profile-based approaches attempt <strong>to</strong> generate a single statisticalrepresentation for a set of related proteins – a kind of ‘average’ representative.However, this discards much of the information present in the network ofrelationships. A simple approach <strong>to</strong> retrieve some of this information was used byBateman and Finn (2007). Their method compares the output of two profilesearches and asks whether there are more sequences found in common betweenthe two outputs than expected by chance. For highly related query profiles, therewill be a large number of sequences in common. For unrelated queries, the outputswill only share sequence regions in common due <strong>to</strong> chance. This approach is analogous<strong>to</strong> investigating the first order structure of the homology network, i.e. comparingthe neighbours of one sequence <strong>to</strong> the neighbours of another. This simpleapproach was found <strong>to</strong> be highly effective at homology detection (there is noalignment generated by this method) significantly surpassing state-of-the-art profile-profilecomparison methods.Wes<strong>to</strong>n et al. (2004) used more of the global structure of the homology network<strong>with</strong> their Rankprop algorithm. The critical innovation that led <strong>to</strong> the successof the Google search engine is its ability <strong>to</strong> exploit global structure byinferring it from the local hyperlink structure of the Web. Google’s Pagerankalgorithm models the behaviour of a random web surfer who clicks on successivelinks at random and also periodically jumps <strong>to</strong> a random page. The webpages are ranked according <strong>to</strong> the probability distribution of the resulting randomwalk. The Rankprop algorithm begins <strong>with</strong> a precomputed protein similaritynetwork defined on the entire protein database. Analogous <strong>to</strong> a diffusion process,a query protein is added <strong>to</strong> the network and link information from the query <strong>to</strong>its direct sequence neighbours is propagated through the network <strong>to</strong> the neighboursof the neighbours, and so on. After propagation, database proteins areranked according <strong>to</strong> the amount of link information they received from thequery. This approach is shown <strong>to</strong> outperform standard sequence-profile searchingand is comparable <strong>to</strong> profile-profile searching, despite using PSI-BLAST <strong>to</strong>generate the initial similarity network.Finally, Heger et al. (2008) have developed an algorithm called Maxflow whichis capable of traversing large homology networks at the level of individual residues.It searches across the network of pairwise alignments for consistently alignedpairs of residues. This method stands out from the others because of its focus onalignment generation which is critical for protein modelling.All of these new network-centric approaches are exciting developments inhomology detection. One serious drawback is the enormous computational burdenrequired <strong>to</strong> generate all versus all similarity networks. It seems clear thattheir performance would increase if it were possible <strong>to</strong> create truly completenetworks from the current databases of ∼6 million sequences. However, reduceddatabases containing sequences <strong>with</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!