05.06.2013 Views

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Bioinformatics for High-Throughput Proteomic and Genomic Analysis<br />

Study Control Number: PN00015/1422<br />

William Cannon, Heidi Sophia, Kristin Jarmin<br />

Our goal is to develop the underlying science and technology for analyzing high-throughput genomic and proteomic data,<br />

and for identifying proteins and elucidating the underlying cellular networks. Biology as a system has a logical and<br />

mathematical foundation. Proteomics and DNA microarrays will provide the prerequisite data necessary for such a<br />

systematic understanding of the cell.<br />

Project Description<br />

The long-term objective for this project is to provide the<br />

tools for analyzing proteomic data from high-throughput<br />

mass spectrometry experiments and for elucidating<br />

cellular networks. Initial work focused on 1) developing<br />

optimal clustering tools for examining either proteomic<br />

data or DNA microarray data, 2) creating a library of<br />

graph algorithms for use in the network analysis, and<br />

3) for conceptual development of probability networks for<br />

analyzing proteomic data.<br />

Introduction<br />

The goal of this project is to develop the underlying<br />

science and technology for analyzing high-throughput<br />

genomic and proteomic data, especially elucidating the<br />

underlying cellular networks. We are taking a twopronged<br />

approach. First, statistical and/or mathematical<br />

modeling of the biological data in the form of<br />

protein/gene expression levels will inferentially determine<br />

networks of interactions. Second, previous biological<br />

knowledge in the form of known regulatory motifs at the<br />

DNA and protein levels will be incorporated to<br />

complement the mathematical modeling. These tools will<br />

be usable in isolation or in combination. It is anticipated<br />

that the combinatorial use of these tools will provide the<br />

synergy to bootstrap our way to biological networks much<br />

more powerfully than the use of either set of tools in<br />

isolation. The tools developed in the initial stage of the<br />

project will be expanded in subsequent stages to<br />

1) incorporate the ability to statistically infer cellular<br />

networks from high-throughput data and 2) to sort the<br />

high-throughput data by cellular regulatory function as<br />

determined by, for example, cis-regulatory elements of<br />

DNA and signal peptides.<br />

Results and Accomplishments<br />

New Clustering Techniques<br />

The development of new clustering techniques for<br />

proteomic data has focused on the use of NWGrid to<br />

generate clusters. The distance metric is based on<br />

Delauney triangulation and Voronoi polyhedra routines<br />

within NWGrid. Initial results using a fixed number of<br />

clusters have been encouraging. We are now proceeding<br />

to develop methods by which the program can determine<br />

the optimal number of clusters from the data itself.<br />

Development of Software Library of Graph Algorithms<br />

for Network Analysis<br />

Work to date has been in optimizing existing algorithms<br />

for use with expression data, and in making robust<br />

versions of the algorithms.<br />

Conceptual Development of Probability Networks for<br />

Analyzing Proteomic Data and Strategic Planning<br />

Much of the time during this period was spent developing<br />

the concepts needed for long-term success. This includes<br />

expanding the proposal with clear near- and long-term<br />

goals. Also, a considerable amount of time is being spent<br />

early in this project to develop a probability network<br />

model that will serve as the foundation of our cellular<br />

networks. Issues regarding connectivity of the network,<br />

nature of the data, conditional independence of data are<br />

being addressed.<br />

Summary and Conclusion<br />

In establishing a project foundation, we initiated new<br />

clustering algorithms and a software library for analyzing<br />

graph structures.<br />

Computer Science and Information Technology 155

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!