13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ONTOLOGIES FOR BIOLOGISTS 229al. 2003). Another way to view this is a progression beyonda vocabulary <strong>of</strong> fixed phrases toward a combination<strong>of</strong> a vocabulary and a grammar for the creation <strong>of</strong> phrases(see, e.g., WordNet 2003).GO will use a Description Logic (DL) to allow certainGO terms to have properties (i.e., attributes or slots).Properties provide a formalism for dealing with the classification<strong>of</strong> finely granular terms by allowing the flexiblecreation <strong>of</strong> phrases. This approach will move GO awayfrom being merely a word-turned-phrase-based ontologytoward a property-based ontology. This will <strong>of</strong>fer the biologicalcurators a structured means <strong>of</strong> composing phrasesto use during annotation. A phrase is composed <strong>of</strong> a primaryterm (e.g., “biosynthesis’) which dictates what othertypes <strong>of</strong> terms must interact with it in supporting or secondaryroles (e.g., “interleukin-13”). <strong>The</strong> term “biosynthesis”would include a property for a required term indicatingthe thing-being-synthesized and thus an explicitGO term for “interleukin-13 biosynthesis” would nolonger be necessary. When annotating, if the curator selected“biosynthesis,” she would then be obligated to fillin the “thing-being-synthesized” property, in this case,“interleukin-13” or another identifier from a biochemicalmoietyontology. From the gene product curator’s perspective,the annotations become phrases, composed <strong>of</strong> aprimary term and some number <strong>of</strong> other terms that arenecessary to complete the explanation. <strong>The</strong> number <strong>of</strong>properties required for a term may vary from zero tomany. As an example to illustrate this, consider the term“localization.” A curator would never use this term withoutindicating both the object and a prepositional phrase.Neither “localization,” nor “localization <strong>of</strong> mRNA,” nor“localization to the cell membrane” suffices; to makesense one must say “localization <strong>of</strong> mRNA to the cellmembrane.” From the ontology curator’s perspective,defining a term will now include specifying the requiredproperties for that term, to indicate whether or not, andwhat type <strong>of</strong>, supporting terms are required when a geneproduct is actually being annotated. <strong>The</strong> use <strong>of</strong> propertiesappears robust and extensible, and gives both ontology curatorsand biological curators flexibility, while retainingsemantic rigor. It is entirely consistent with both DescriptionLogics and frame-based ontologies, both <strong>of</strong> which arewidely used in the knowledge representation–artificial intelligence(AI) community. This approach is also verymodular because, as phrases are constructed, they may beused recursively to create more complex phrases.Terms that are modifiers also may be added to thephrase, the difference being that modifying terms are notdemanded by the primary term, but simply provide additionalinformation. So, for the term “biosynthesis,” althoughit is obligatory that a curator supply a term for thething-being-synthesized property, they would not beobliged to fill in the property for a mediator <strong>of</strong> thisbiosynthesis. An annotation for a protein could then bemade for biosynthesis (primary term) <strong>of</strong> GPI anchor(mandatory completion property) via N-glycyl-glycosylphosphatidylinositolethanolamine(optional informationalproperty).To illustrate the difference <strong>of</strong> this approach from thatnow used, this is a current entry from Swiss-Prot for a hutheGO concept “antigen binding activity” in the GOdatabase for utilitarian reasons.All GO concepts have a unique identifier number and adefinition that explicitly states their precise meaning.<strong>The</strong>re is a strict one-to-one correspondence between aGO_identifier and its definition (rather than to the lexicalstring used to refer to the concept). Thus, a change in aconcept’s lexical string (but not its definition) will not alterits GO_identifier; conversely, if there is a change inthe definition that alters the meaning <strong>of</strong> a GO conceptthen, even if its lexical string remains identical, this willresult in the new concept having a new GO_identifier. Ifa GO concept is found to be incorrect or irrelevant it is notsimply discarded, it is marked with the attribute is_obsoleteand it, its GO_identifier, and definition remain in thedatabase.Finally, GO concepts may well be equivalent or have aclose relationship with a term in some other database. Forexample, a GO concept describing the activity <strong>of</strong> an enzymecan be considered to have a close relationship withthe name <strong>of</strong> that enzyme within the Enzyme Commission’sdatabase. This relationship may be expressed as adatabase cross-reference. GO also maintains a number <strong>of</strong>tables that represent semantic mappings <strong>of</strong> GO conceptsto concepts in other databases; e.g., between GO conceptsand Swiss-Prot keywords and between GO concepts andthe functional catalog <strong>of</strong> MIPS (MIPS 2003).LIMITATIONS OF THE GO MODELDuring the early stages <strong>of</strong> the development <strong>of</strong> GO, adecision was made to restrict GO to using only the relationships<strong>of</strong> subsumption and meronomy. This decisionwas made for pragmatic reasons, and it was accepted thatat some time in the future the implicit limitation in the expressivepower <strong>of</strong> GO might become limiting. That futurehas arrived: GO, and similar artifacts, need to be property-basedontologies to be sufficiently expressive.Currently, GO includes a large number <strong>of</strong> terms thatcan be exemplified by “biosynthesis <strong>of</strong> interleukin-13.” Itis obvious that this is a compound term, composed <strong>of</strong> anoun describing the action “biosynthesis” and a prepositionalphrase “<strong>of</strong> interleukin-13,” describing what is beingbiosynthesized. <strong>The</strong> same idea could also be expressedusing the verb “biosynthesizes” and the directobject “interleukin-13.” In either expression there is acentral topic, “biosynthesis” or “biosynthesizes,” that requiresan additional supporting term, “interleukin-13,” tocompletely convey the entire concept.At present, GO terms are indivisible; there is no notion<strong>of</strong> the decomposition <strong>of</strong> phrases into either individualwords or concepts and properties. If GO continues to introducecompound terms, the system will become progressivelymore redundant, less flexible, and increasinglydifficult to manage. This is apparent in the parts <strong>of</strong> GOthat involve implicit cross-products (e.g., between a processlike “transport” and an ontology <strong>of</strong> chemical compounds).We must therefore provide a solution that bothdisambiguates fundamental concepts and also enables accurateannotation. One approach is to migrate to a property-basedontology (a Description Logic; see Baader et

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!