12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

20 Kirov et al.way to associate genes with the existing knowledge on some of the genes’ majorcharacteristics, such as function and cellular localization. GO has a controlledvocabulary that is understandable to both human and computer, which makes itextremely useful for associative inference analysis. Biocarta and KytoEncyclopedia of Genes (KEGG) (5) pathways are also routinely used in expressiondata analysis. Lin et al. (6) used GO, Biocarta, and KEGG to identify regulatorynetworks involved in cancer progression; Kluger et al. (7) used GO,Biocarta, and KEGG in combination with expression patterns to create a matrixcapable of discriminating the developmental choice of hematopoietic cells. Analternative to GO, KEGG, and Biocarta is the PANTHER project (8), whichrelies on its own ontologies and pathway data. Other ontologies are in theirdevelopmental stage as a part of the Open Biomedical Ontologies (OBO) projectand might expand the inference analysis that is described in this chapter. (9).Finding other associations, such as transcription factor-binding sites, 3′-UTRsignals, and so on, is of very high interest, yet the existing knowledge in this areais too limited for high-throughput analysis. The recent development of differenthigh-throughput techniques such as chip–chip (10,11) and genome-wide DNasefootprinting (12,13) might lead to the accumulation of the critical volume ofdata, necessary for transcription factor-binding sites association studies. As geneset interpretation is becoming a critical step in high-throughput biological studies,many bioinformatics tools have been developed for this purpose. Table 1 listssome common software packages for gene set functional association analysis.Note that this list is not exhaustive. In this chapter, the application of WebGestalt(14) to the management and association analysis of large-scale gene set data willbe illustrated. This analysis usually includes three steps: (1) identifiers (IDs)conversion, (2) gene set management, and (3) gene set analysis. Some distinctfunctions of other software packages will also be discussed.2. MaterialsTypically, any personal computer (Linux, Mac [Apple, Inc., Cupertino, CA]Windows [Microsoft, Inc., Redmond, WA], and so on) with a recent Internetbrowser should be sufficient. Certain analyses generate comma-separatedfiles, which are best viewed with a spreadsheet application, such as OpenOffice, koffice, Microsoft Excel, and so on. PDF reader is required in order toread some of the tool documentation. High-speed Internet connection (suchas T1, Digital Subscriber Line [DSL], or cable modem) is highly desirable.The example gene sets consists of four sets: “lymph node,” “cerebellum,”“cerebrum” (15), and “brain embryo imprint” (16). These gene sets can bedownloaded from http://bioinfo.vanderbilt.edu/mp/gene_sets. The “lymphnode,” “cerebellum,” and “cerebrum” sets include genes that are overexpressedin corresponding tissues (15). The “brain embryo imprint” set is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!