13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ONTOLOGIES FOR BIOLOGISTS 231TOPIC COMPLETION INFORMATIONALPRODUCT TERM PROPERTY PROPERTYGene 1 regulation <strong>of</strong>(growth <strong>of</strong> (eye)) type(+)Gene 2 regulation <strong>of</strong>(growth <strong>of</strong> (leg)) type(+)Gene 3 regulation <strong>of</strong>(development type(-)<strong>of</strong> (B800-850antenna complex))<strong>The</strong> last entry would appear in text (including browserssuch as AmiGO) as “negative regulation <strong>of</strong> development<strong>of</strong> B800-850 antenna complex.” Although this appears asan English-like phrase, internally this structured grammaris maintained as an inverted tree.One unanswered issue is how we will attach definitionsand synonyms to these composed phrases. We believe thatthis can be solved by having the phrase inherit the union <strong>of</strong>the definitions from the terms that it comprises, as has beendescribed by Hill et al. (2002). <strong>The</strong> use <strong>of</strong> properties to indicateprepositions and direct objects will enable other compoundphrases to indicate, with, from, by, and so forth. Forexample, a phenotypic observation such as, “black-markingon(ventral-surface <strong>of</strong> abdomen)),” is composed <strong>of</strong> the topicterm (“marking”), an optional descriptive term (“black”),and the sub-phrase “ventral-surface <strong>of</strong> abdomen,” which inturn is composed <strong>of</strong> a position term with a property for ananatomical term.Although we do not intend to make the proposed solutiondependent on any particular technology or tool, or to forceany complicated recasting <strong>of</strong> GO, it is worth noting that ourapproach is very compatible with that being developed bythe W3C to support the “SemanticWeb,” i.e., OntologyWeb Language (OWL) (OWL 2003). In the Semantic WebActivity Statement, the W3C states “For the Web to reachits full potential, it must evolve into a Semantic Web, providinga universally accessible platform that allows data tobe shared and processed by automated tools as well as bypeople” (Semantic Web 2003). We cannot guarantee when,if ever, the Semantic Web will become more than a dream,but the Ontology Web Language that is being developed isproving useful as a Web-worthy syntax able to fully describethe complex semantic content in a computable form.We have developed tools for converting the GO formatinto a description logic format, DAML+OIL the predecessor<strong>of</strong> OIL. This has also been done independently byWroe et al. (2003). <strong>The</strong> mechanics <strong>of</strong> the conversion aresimple, isa relationships are converted into a subClassOfrelationship, and all other relationship types are convertedusing has-type restrictions on the relationship.Here are two examples <strong>of</strong> GO information in a precursorto OWL, the Ontology Inference Layer (OIL), which weuse here because it is slightly more human-readable:class-def defined biosynthesis subclass-<strong>of</strong> metabolismproperty-constraint synthesizes has-typebiological_entityclass-def defined cytokine_biosynthesis subclass-<strong>of</strong>biosynthesisproperty-constraint synthesizes has-type cytokineclass-def defined regulation subclass-<strong>of</strong>biological_processproperty-constraint regulates has-type biological_processproperty-constraint has-regulation-type has-typeregulation_typeproperty-constraint has-body-part has-type body_partclass-def defined regulation_typeclass-def defined positive_regulation_typeclass-def defined negative_regulation_typeTHE CONTENT, USE, ANDAVAILABILITY OF GO<strong>The</strong> core <strong>of</strong> GO is represented by the three graphs <strong>of</strong>concepts for the annotation <strong>of</strong> the function, biologicalrole, and cellular component <strong>of</strong> gene products. <strong>The</strong>se nowinclude about 14,000 terms, <strong>of</strong> which about 80% are defined.<strong>The</strong>se are maintained by the GO editorial team andtheir close collaborators working as curators for themodel organism databases. However, anyone can suggestnew GO terms, or point out errors in the GO graphs,through a site maintained at SourceForge (http://sourceforge.net/projects/geneontology).<strong>The</strong> primary use <strong>of</strong> GO is for the annotation <strong>of</strong> geneproducts within the context <strong>of</strong> model organism or proteindatabases. <strong>The</strong>re are two major modes <strong>of</strong> annotation. <strong>The</strong>first <strong>of</strong> these is by literature curation, a model used bymost databases for the capture <strong>of</strong> information <strong>of</strong> manyclasses. When annotating a gene product, a database curatorwill relate one or more GO concepts to that geneproduct with an attribution to a particular publication andwith an indication <strong>of</strong> the evidence used by that publicationfor the assertion for the relationship. This evidencemay be the result <strong>of</strong> an experiment (“inferred from directassay”) or may be an inference from sequence comparison(“inferred from sequence similarity”). If the latter evidencecode is used, then the object to which the annotatedsequence is similar is recorded, e.g., “inferred fromsequence similarity to Swiss-Prot:P12345.” This allowsGO curators (and others) to detect transitive errors <strong>of</strong> annotation(see Gilks et al. 2002).GO concepts may also be inferred by electronic annotation.For example, the GO concepts relevant to a particulargene product may be inferred automatically by a programthat compares its sequence with a set <strong>of</strong> protein sequencesthat have previously been annotated with GO concepts(other than those whose annotations are themselves “inferredfrom electronic annotation”). M. Yandell’sLOVEATFIRSTSIGHT program was developed for theDrosophila genome annotation (Adams et al. 2000) for thispurpose, and others have subsequently been developed(see, e.g., Pouliot et al. 2001; Xie et al. 2002; Mi et al.2003). <strong>The</strong> GO Consortium maintains a database <strong>of</strong> annotatedproteins that can be used by automatic prediction programs(ftp://ftp.geneontology.org/pub/go/gp2protein/).Methods other than direct sequence comparison havebeen used to predict GO terms associated with gene products.Some <strong>of</strong> these are indirect, that is via the literatureor microarray data (see below); others use properties <strong>of</strong>proteins other than their primary sequence (e.g., posttranslationalmodifications; see, e.g., Jensen et al. 2003)or recognized protein domains (Schug et al. 2002). Kinget al. (2003; see also Berriz et al. 2003) used the GO annotations<strong>of</strong> both FlyBase and the Saccharomyces<strong>Genom</strong>e Database (SGD) to model relationships amongGO terms with decision trees and Bayesian networks topredict GO terms that had been missed by the curators <strong>of</strong>these databases.Members <strong>of</strong> the GO Consortium contribute tables <strong>of</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!