12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

98 Crabtree et al.Fig. 3. Computing the Jaccard similarity coefficient for P1 and P2. In each of thethree graphs the labeled circles represent proteins and the edges between the circlesindicate which proteins have a BLASTP match above the preset thresholds; the edgesare nondirectional and an edge is drawn if either protein matches the other at the requisitepercent identity and E-value. In addition every protein is assumed to match itself(these edges are not shown). In this example there are six proteins with BLASTPmatches as shown (left panel). There are three proteins that match both P1 and P2(highlighted in the middle panel) and five proteins that match either P1 or P2 (highlightedin the right panel). The Jaccard coefficient for P1 and P2 is therefore 3/5 or 0.6.clusters are the output of the first phase of the clustering process (see Fig. 4,right panel).3.1.3. Clustering Phase 2: Bidirectional Best (BLASTP) Hit ClusteringThe second phase of the clustering algorithm consists of a bidirectional best(BLASTP) hit analysis (see Note 9):1. Identify pairs of JACs (JAC1 and JAC2) that satisfy the following conditions:a. Each of the two clusters (JAC1 and JAC2) is from a different input genome.b. The highest-scoring BLASTP match (see Note 10) of at least one polypeptidein JAC1 is to a polypeptide in JAC2, and vice versa.An optional filtering step limits the BLASTP matches considered in condition bto those with an E-value that falls below a given threshold. In practice, this thresholdis typically set to the same one that is applied in both the all-vs-all BLASTPand Jaccard clustering steps.2. Transform the pairs of JACs found in step 1 into a graph whose nodes are theindividual JACs. An edge should be drawn between two nodes JAC1 and JAC2only if JAC1 and JAC2 are among the pairs of JACs with bidirectional best hitsfrom step 1.3. The connected components (see Note 11) of the graph constructed in step 2 arereferred to as “Jaccard orthologous clusters,” or “JOCs.” Although these clustersare actually clusters of JACs, they can be easily converted to polypeptide clusters,by taking the union of the polypeptides in the JACs. These polypeptide clusters arethe output of the second and final phase of the clustering process (see Fig. 6).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!