24.10.2014 Views

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

Automatic functional annotation of predicted active sites - European ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Side chain interaction model.<br />

The determination <strong>of</strong> residue interactions requires a<br />

transformation <strong>of</strong> a full atom model into a simpler representation. This is because the<br />

mathematical model, that needs to describe all combinations <strong>of</strong> atom interactions <strong>of</strong> two<br />

residues, would be too complex. The solution is to replace the all-atom structure model<br />

with a coarse grained model, by reducing each residue to a single point. In principle,<br />

a residue point can be calculated either by the centre <strong>of</strong> mass, or the geometric centre<br />

(centroid). Each representation can be calculated from main chain atoms, main and side<br />

chain atoms, or side chain atoms only.<br />

The focus in this study is the side chain interactions within residue triplet configuration.<br />

For this reason, a protein structure is represented as a point spread <strong>of</strong> side chain<br />

centroids.<br />

Protein structure triangulation.<br />

The extraction <strong>of</strong> residue triplets from a protein is<br />

based on triangulation <strong>of</strong> structures. Here structures are triangulated on the basis <strong>of</strong> three<br />

criteria. The first is the compositional constraint. Each residue in a triplet must be an<br />

element <strong>of</strong> the 20 natural amino acids, while hetero atoms are excluded. One prominent<br />

reason is that there are not many examples <strong>of</strong> residue-hetero atom interactions in the<br />

dataset that would support a statistical analysis.<br />

The second condition <strong>of</strong> triplet extraction requires that none <strong>of</strong> the residues are direct<br />

neighbours in the protein sequence. The assumption made here is, that any covalently<br />

bonded residues have a higher likelihood than any other two residues being next to each<br />

other in space that are not bonded. Similarly, the probability <strong>of</strong> finding three residues in<br />

space that are connected, is higher than finding unconnected triplets <strong>of</strong> residues. Consequently,<br />

the distribution <strong>of</strong> interacting residues in space would be over-represented. The<br />

definition <strong>of</strong> residue neighbourhood affects the data mining result, e.g. by requiring a<br />

pair interaction in the triplet to have a distance <strong>of</strong> more than one residue, patches <strong>of</strong><br />

residues at one side <strong>of</strong> a beta-sheet may not be discovered. While tuning this parameter<br />

can modify the result <strong>of</strong> the data mining, the objective here is to discover new knowledge<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!