09.07.2015 Views

Ontology engineering

Ontology engineering

Ontology engineering

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

e s o u r c e© 2010 Nature America, Inc. All rights reserved.Figure 2 Predictive power of AraNet for conservedand plant-specific biological processes. AraNet’spredictive capacity was measured using crossvalidatedreceiver operator characteristic (ROC)curve analysis, as illustrated in (a). For a givenprocess, each gene in the Arabidopsis genomeis rank-ordered by the sum of its network linkagescores to the set of ‘bait’ genes already associatedwith that process (omitting each bait gene from thebait set for purposes of evaluation). High-scoringgenes are most tightly connected to the bait set andare the most likely new candidates to participatein that process. This trend is evident in a ROC plotmeasuring recovery of bait genes as a function ofrank, calculating the true-positive prediction rate(sensitivity; TP/(TP+FN)) versus the false-positiveprediction rate (1−specificity; FP/(FP+TN)). If baitgenes are highly interconnected (red circles), unlikerandom genes (blue circles), additional genesconnected to the bait genes (green circles) are morelikely to be involved in the same process. The areaunder the cross-validated ROC curve (AUC) providesa measure of predictability, ranging from ~0.5 forrandom expectation (blue curve) to 1 for perfectpredictions (red curve). (b) Distributions of AUCvalues are plotted for network-based identificationof genes for each of the 318 GO biological processterms with annotations, (c) for each of the 151biological process terms with annotations sharedbetween plant and animal or between plant andyeast and (d) for each of the 167 biological processterms with annotations found in plants but absentfrom animals and fungi. In bar-and-whiskers plots,abArea under ROC curveeTrue-positive rate001False-positive rateGO BP (all) c GO BP (conserved)Response to0 oxidative stress (0.82) 00 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.01.01.00.90.90.90.80.80.80.70.70.70.60.60.60.50.50.50.40.40.4Abiotic responseRandom (0.50)1.0f 1.0Organ development0.80.60.40.2Response to waterdeprivation (0.73)Response to hydrogenperoxide (0.73)Cold acclimation (0.72)Response to heat (0.80)Response to highlight intensity (0.79)0.80.60.40.2False-positive rateFalse-positive rateRandomAraNetAraNet plantdata onlyAraNet noplant dataArea under ROC curveTrue-positive rate1RandomAraNetAraNet plantdata onlyAraNet noplant datadArea under ROC curveGO BP (plant-specific)1.0RandomAraNetAraNet plantdata onlyAraNet noplant dataRandom (0.50)Root development (0.62)Cuticledevelopment (0.61)Stamendevelopment (0.64)Stomatal complexmorphogenesis (0.71)Trichomemorphogenesis (0.73)Carpeldevelopment (0.75)Ovuledevelopment (0.81)the central horizontal line in the box indicates the median AUC and the boundaries of the box indicate the first and third quartiles of the AUC distribution. Whiskersindicate the 10th and 90th percentiles, and circles indicate individual outliers. AraNet specifically identified genes associated with (e) plant abiotic stress responsegenes and (f) organ developmental processes, as annotated by GO. AUC values are indicated in parentheses.True-positive rateLinked genes share cell type–specific expression patternsMany traits in multicellular organisms pertain to specific tissues orcell types. The predictive strength shown by AraNet for such processesraises the question of how a global gene network, incorporatingdiverse samples and data from orthologs, can correctly identifygenes for cell type– and tissue-specific processes. Using measurementsof transcript observations in 20 root cell types 31 that werenot used in building AraNet, we measured the extent to which geneslinked in AraNet were spatiotemporally co-expressed in these cells.We find that linked genes show strong cell-specific co-expression inArabidopsis (Fig. 3c)—indeed, far stronger than in previous networksof Arabidopsis genes (Supplementary Table 3) 27–30 —with linkedgenes four times more likely to be expressed in the same cell typesthan expected by chance. Thus, although different individual networkswere not constructed for each cell type, such cell and tissue specificityis nonetheless at least in part implicitly encoded in AraNet linkages.This correlation between functional association and spatiotemporalco-expression of genes likely enhances prediction strength for manytraits, and is evident even for linkages between characterized anduncharacterized genes (Fig. 3c), supporting applicability of AraNetto uncharacterized genes.Associating genes with specific mutant phenotypesBecause linked genes in AraNet tend to operate in the same processes(Figs. 1–4), we might expect that they often affect the same phenotypictraits 3,5 . This allows association of new candidate genes with traits ofinterest based on network connections. To test this, we used resultsfrom large-scale mutant seed phenotyping 32 and analyzed geneswhose disruption induced embryonic lethality or changes in seed(embryo) pigmentation. Genes involved in each trait were interlinkedsignificantly more often compared to chance (p < 10 −31 for embryoniclethality and P < 10 −10 for seed pigmentation, normal distribution)(Fig. 3d). Unlike AraNet, previous Arabidopsis gene networks 27–30 donot significantly predict either phenotype (Supplementary Fig. 4).Thus, AraNet offers a feasible approach for selecting genes likely tobe associated with specific plant traits.Tenfold enrichment for seed pigmentation genesTo experimentally test the association of new genes with a trait, weused 23 known seed pigmentation genes (Supplementary Table 4) tosearch AraNet for new pigmentation genes. Genes in this phenotypicclass generally affect chloroplast development or photomorphogenesis,and mutant seedlings show early developmental defects, with albino,pale green, purple or variegated leaves 33 .From AraNet’s top 200 candidate genes, we screened all geneswith available homozygous T-DNA insertional mutant lines(Supplementary Table 5). We screened 90 candidate genes (representedby 118 mutant lines), of which 14 genes (represented by 17lines) exhibited color and morphology defects in young seedlings,reminiscent of seed pigmentation mutants (Supplementary Tables 6and 7). This represents a tenfold enrichment in the discovery rate ofthe mutant phenotype (P ≤ 10 −12 , binomial distribution) over thatobserved during screens of T-DNA insertional lines 33 (see OnlineMethods). This discovery rate compares well to animal networks, forexample, in C. elegans 16 tumor suppressor effectors were identifiedfrom 170 candidates 5 .nature biotechnology VOLUME 28 NUMBER 2 FEBRUARY 2010 151

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!