Cami 1..10 - SAMSI

RESEARCH ARTICLE

For comparison, if the prediction threshold were chosen such that the

pairs in Table 2 were the only true positives, the model would predict

30 false positives, resulting in a positive predictive value (PPV) of 0.25.

Adding taxonomic covariates in the model

Next,weinvestigatedtheuseoftaxonomiccovariatesinthemodel.

ThesecovariateswerebasedontheATCtaxonomyofdrugsandthe

MedDRA taxonomy of adverse events. As a preliminary step, we computed

for every pair (drug 1,drug 2) the minimum distance d ATC(drug 1,

drug 2), denoting the minimum over all possible ATC positions of

drug 1 and drug 2 of the length of the shortest path between drug 1 and

drug 2 in the ATC taxonomy. Similarly, we computed for every pair of

adverse events (ADE 1, ADE 2) the distance d MedDRA(ADE 1, ADE 2),

denoting the length of the shortest path between ADE 1 and ADE 2

in the MedDRA taxonomy. Using the above distance measures, we

constructed four taxonomic covariates: atc-min, atc-KL, meddra-min,

and meddra-KL (table S1). ATC distance–based covariates were found

to be predictive of drug-ADE associations and drug-drug interactions

using an ERG model. For reference, Perlman et al. used a different

ATC-based metric to predict drug targets (24). The MedDRA-based

covariates defined above are counterparts of the ATC-based covariates.

Fig. 3. Illustration of selected covariate effects. (A to E) Probability of the existence of an edge as a function

of covariates degree-prod (A), jackard-ADE-max (B), atc-min (C), meddra-min (D), and euclid-min (E). (F to M)

Mean distribution of variables jackard-ADE (F and G), jackard-drug (H and I), ATC distance (J and K), and

Euclidean distance (L and M) over the edges and non-edges for both training and validation sets.

The motivation behind atc-KL and meddra-KL was the same as the

motivation behind jackard-ADE-KL, discussed earlier.

Table 1 shows the results of univariate and multivariate analysis

for the taxonomic covariates. It can be seen that edges are more

likely to exist in drug-ADE pairs with smaller values of atc-min,

meddra-min, atc-KL, and meddra-KL. Figure 3 illustrates the effect

of atc-min (Fig. 3C) and meddra-min (Fig. 3D) on the probability

of edge. The means of the distribution of ATC distances over the

edges and non-edges groups are also shown for both the training

(Fig. 3J) and the validation (Fig. 3K) sets. It may be seen that when

a drug-ADE pair (i, j) denotes a true association, the neighborhood

N(j) typically contains more drugs that are at a small minimum ATC

distance—such as at a distance of 2—to the drug i than when the pair

denotes a non-association. The results in Table 1 show that a model

containing all taxonomic (TAX) covariatesperformsreasonablywell

[validation set AUROC (area under the receiver operating characteristic

curve), 0.838], but less well than the network (NET) model

(validation set AUROC, 0.862). A model combining network and

taxonomic covariates (NET + TAX) has a better performance (validation

set AUROC, 0.869) than the network-only or the taxonomyonly

models.

www.ScienceTranslationalMedicine.org 21 December 2011 Vol 3 Issue 114 114ra127 5

Downloaded from

stm.sciencemag.org on January 4, 2012