13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

230 ASHBURNER ET AL.man protein annotation (some columns removed for clarity):PRODUCT ID PRODUCT SYMBOL GO IDGO:0042231Q14116 IL13_HUMAN interleukin-13biosynthesisThis would be replaced with:PRODUCTPROPERTYPRODUCT ID SYMBOL GO ID (VALUE)GO:0042089 biosynthesizesQ14116 IL13_HUMAN cytokine (interleukin-13)biosynthesisOf course, the property “biosynthesizes” will be restrictedto specific terms, and it will be the task <strong>of</strong> the ontologycurators to specify the allowable properties for anygiven term. Any child term connected purely by isa links(but not part<strong>of</strong>) will inherit the properties <strong>of</strong> its parent.We anticipate that the immediate need is for a small number<strong>of</strong> properties applicable to a limited number <strong>of</strong> terms.Despite the small number, the list will be maintained in acomputable form and thus allow s<strong>of</strong>tware to automaticallycheck the gene association input files for invalidproperty data and avoid nonsensical annotations (e.g.,“prolactin receptor synthesizes pyrimidines”). This list <strong>of</strong>primary terms and their supporting role-playing propertiescan be represented as a 4-column table:PROPERTY-FILLER-TERM PRODUCT TERM Cardinalitytransport transports compound 1biosynthesis synthesizes compound 1biosynthesis mediated-by compound 0 or 1protein-binding binds protein-or- 1protein-familytranscription factor regulates gene 1<strong>The</strong> property-filler-class is used to restrict the vocabularyfrom which the terms may be drawn for the property. If,for example, a gene product is annotated with the term“protein-binding” then the permissible value for the“binds” property must always be a protein family or a certainprotein. <strong>The</strong> protein family would be representedwith an ID from an appropriate database; for example,UniProt.As shown in the table above, some terms can havemore than one property type (e.g., “biosynthesis”). Propertiescan have different cardinalities (e.g., the mediatedbyproperty is optional). More specific terms can havemore specific property-filler terms (e.g., “proteinbiosynthesis” terms can only synthesize proteins as opposedto the more general “compound”).One disadvantage <strong>of</strong> this system is that we no longerwill represent annotations as a simple 2-dimensional matrix<strong>of</strong> gene product and GO term; rather, we will have amatrix <strong>of</strong> gene product and “annotation phrase,” whichcan have unbounded dimensionality. This has implicationsfor everyone who writes s<strong>of</strong>tware that uses GO. Forexample, in implementing searches, e.g., a search for“pyrimidine biosynthesis,” the s<strong>of</strong>tware would need todecompose the phrase and traverse both the functiongraph and the protein family/compound ontology (pyrimidineisa nucleotide). It is obvious, however, that one cansimply temporarily instantiate the phrases by using theexisting annotations. That is, for every leading term T,find all the supporting values, V1, V2...Vn, in the currentannotations and generate “temporary phrase terms” TV1,TV2 ... TVn. In an AmiGO type view (Lewis et al. 2002),one would then see these phrases appearing beneath theprimary term. <strong>The</strong> existence <strong>of</strong> simple phrase-constructionrules is assumed for clarity—for example, any dynamicallyconstructed “biosynthesis” phrase could bewritten by prefixing the word “biosynthesis” with thevalue filled in the “synthesizes” property.[is-a] metabolism ; GO:0008152[is-a] biosynthesis ; GO:0009058 (2000 products)[is-a] cytokine biosynthesis ; GO:0042089 (500 products)[is-a] interleukin-1 biosynthesis (100 products)[is-a] interleukin-2 biosynthesis (120 products)[is-a] interleukin-3 biosynthesis (140 products)[is-a] interleukin-4 biosynthesis (200 products)This approach is both simple enough to use and powerfulenough to solve many related problems and also opens thedoor to other uses. As an example, consider the followingterm now present in the GO: “positive regulation <strong>of</strong> vulvaldevelopment.” This is clearly a compound term composed<strong>of</strong> terms from many separate vocabularies; it also illustratesthe recursive use <strong>of</strong> other compound terms. <strong>The</strong> linguisticcomposition <strong>of</strong> this term is shown in Figure 1.When the worm gene CE08399 is to be annotated to“positive regulation <strong>of</strong> vulval development,” the underlyingassociation data would look like this:COMPLETION INFORMATIONALPRODUCT GO ID PROPERTY PROPERTYCE08399 GO:nnnnnnn regulates type(+)(regulation) (development<strong>of</strong> (vulva))This would apply in other situations as well:Figure 1. <strong>The</strong> linguistic composition <strong>of</strong> the GO term “positiveregulation <strong>of</strong> vulval development.”

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!