09.07.2015 Views

Ontology engineering

Ontology engineering

Ontology engineering

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

e s o u r c eRational association of genes with traits using agenome-scale gene network for Arabidopsis thalianaInsuk Lee 1,2,5 , Bindu Ambaru 4,5 , Pranjali Thakkar 4 , Edward M Marcotte 2,3 & Seung Y Rhee 4© 2010 Nature America, Inc. All rights reserved.We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional networkand targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647(73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations are predictive for diverse biologicalpathways, and outperform predictions derived only from literature-based protein interactions, achieving 21% precision for 55%of genes. AraNet prioritizes genes for limited-scale functional screening, resulting in a hit-rate tenfold greater than screens ofrandom insertional mutants, when applied to early seedling development as a test case. By interrogating network neighborhoods,we identify AT1G80710 (now DROUGHT SENSITIVE 1; DRS1) and AT3G05090 (now LATERAL ROOT STIMULATOR 1; LRS1) asregulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) providesa resource for plant gene function identification and genetic dissection of plant traits.Manipulating plant traits that affect the production of food, fiberand renewable energy has important agricultural and economic consequences.What is the best approach for identifying genes for importantplant traits? Forward genetics is limited as mutations in manygenes may generate only moderate or weak phenotypes. Similarly,although reverse genetics allows for directed assay of gene perturbations1 , saturated phenotyping for many plant traits is impractical. Apragmatic near-term solution is the computational identification oflikely candidate genes for desired traits, allowing for focused, efficientuse of reverse genetics. This solution is not unlike rational drug designin which computer-assisted and expert knowledge are combined withtargeted screening for the desired drug, or in this case, trait.One emerging approach for prioritizing candidate genes is networkguidedguilt by association. In this approach, functional associationsare first determined between genes in a genome on the basis of extensiveexperimental data sets, encompassing millions of individual observations.Such a map of functional associations is often represented as agraph model and referred to as a functional gene network 2 .Probabilistic functional gene networks integrate heterogeneousbiological data into a single model, enhancing both model accuracyand coverage. Once a suitable network is generated, new candidategenes are proposed for traits based upon network associations withgenes previously linked to these traits. Such network-guided screeninghas been successfully applied to unicellular organisms 3,4 andCaenorhabditis elegans 5,6 , and is a proposed strategy for identifyinghuman disease genes 4,5,7–10 .We demonstrate here that this approach successfully identifies genesaffecting specified traits for a reference flowering plant, A. thaliana,and we introduce a genome-wide, functional gene network forArabidopsis suitable for prioritizing candidate genes for a wide varietyof traits of economic and agricultural importance.RESULTSReconstruction of an Arabidopsis gene networkWe integrated diverse functional genomics, proteomics and comparativegenomics data sets into a genome-wide functional gene network,using data integration and benchmarking methods customized forArabidopsis genes (Supplementary Methods). The data sets includedmRNA co-expression patterns measured from DNA microarray datasets (Supplementary Table 1), known Arabidopsis protein-proteininteractions 11–14 , protein sequence features including sharing ofprotein domains, similarity of phylogenetic profiles 15–17 or genomiccontext of bacterial or archaebacterial homologs 18–20 , and diversegene-gene associations (mRNA co-expression, physical proteininteractions, multiprotein complexes, genetic interactions, literaturemining) transferred from yeast 21 , fly 11,22–24 , worm 5 and humangenes based on orthology 25 (Supplementary Table 2). In total,24 distinct types of gene-gene associations, encompassing >50 millionindividual experimental or computational observations, were scoredfor their ability to correctly reconstruct shared membership inArabidopsis biological processes. Then, these were incorporated intoa single integrated network model, dubbed AraNet. AraNet contains1,062,222 functional linkages among 19,647 genes (~73% of the totalArabidopsis genes), with each linkage weighted by the log likelihood ofthe linked genes to participate in the same biological processes.Integrating data improves network coverage and accuracy,as tested by recovery of known functional associations (Fig. 1aand Supplementary Fig. 1). AraNet extends substantially beyond1 Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seodaemun-gu, Seoul, Korea. 2 Center for Systems and Synthetic Biology,and 3 Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, USA. 4 Department of PlantBiology, Carnegie Institution for Science, Stanford, California, USA. 5 These authors contributed equally to this work. Correspondence should be addressed to I.L.(insuklee@yonsei.ac.kr) or E.M.M. (marcotte@icmb.utexas.edu) or S.Y.R. (rhee@acoma.stanford.edu).Received 10 June 2009; accepted 23 December 2009; published online 31 January 2010; doi:10.1038/nbt.1603nature biotechnology VOLUME 28 NUMBER 2 FEBRUARY 2010 149

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!