14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

149<br />

screens to identify direct or indirect interactions based on phenotypes or gene<br />

expression patterns. It is thus important for FlyBase to recognize <strong>and</strong> support data<br />

representations <strong>and</strong> reports based on relationships among gene products in addition<br />

to those relationships based on chromosomal location. Some ways of addressing this<br />

need can be addressed now; others present substantial technical hurdles.<br />

The FlyBase architecture supports the curation of different versions of a gene<br />

product -RNAs or polypeptides or molecular complexes – as different data objects,<br />

so that annotations can be attached to the appropriate objects. This is an essential<br />

part of an organism-specific data model, since much of the regulation of cellular<br />

function boils down to gene products that can be toggled between alternative states<br />

based on allosteric interactions, subunit modifications, or differential subunit<br />

interactions.<br />

Describing the interactions <strong>and</strong> the pathways is an even larger <strong>and</strong> more difficult<br />

task. Much of the available physical interaction data involves in vitro assays, usually<br />

in heterologous systems. These data are often hints or suggestions of possible<br />

interactions rather than readily verifiable ones. Genetic interaction data have their<br />

own set of pitfalls. While the individual observations can be represented, our ability<br />

to compile them into computed pathways is impaired by the inherent limitations of<br />

the current data sets. Thus, we need to capture <strong>and</strong> represent data in a manner that<br />

reflects the current state of knowledge, but that will be of value once better st<strong>and</strong>ards<br />

<strong>and</strong> methods are available. This represents a considerable challenge at the strategic<br />

<strong>and</strong> computational levels.<br />

Another aspect of the problem are those of spatial pattern: descriptions of<br />

anatomical phenotypes <strong>and</strong> gene expression patterns. Were rigorous representations<br />

of spatial pattern possible, these could be used in combination with interaction data<br />

to distinguish among possible interactions. (For example, two proteins that are shown<br />

to physically interact but which are never expressed in the same tissues are unlikely<br />

to interact in a biologically meaningful way). FlyBase has developed an extensive<br />

ontology of anatomical parts, <strong>and</strong> using this vocabulary, phenotypes <strong>and</strong> expression<br />

patterns are captured. Either the authors or the curators, however, end up throwing<br />

away a great deal of data in turning two or three dimensional spatial information into<br />

text. Similarly, dependence on text terms to support user queries places inherent<br />

limitations on the depth of questions that can be answered. Ultimately, it will be<br />

important for tools to be developed that can effectively capture quantitative spatial<br />

information. Only in this way can these data can be directly queried without<br />

imposing a strong filter on the data set through its conversion into much coarser<br />

textual objects. This is obviously a major long term issue which is already receiving<br />

attention, <strong>and</strong> we can expect that it will continue to be an important area for<br />

computational research.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!