12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Estimating Gene Function With LS-NMF 43The annotation information ClutrFree needs can be prepared by using theASAP system (16) or by generating tab-delimited GO files. For this example,sample GO files are provided in the download (cdc25-sep1_GO.txt).1. Relocate to the directory (folder) wherein the cdc25-sep1.txt file is located (i.e.,where LS-NMF was run). Review the recorded χ 2 values. Move all directories thatare not from the lowest value to another directory outside the directory hierarchy.Alternatively, one might keep all directories that contain runs of similarly low χ 2values, and ClutrFree will analyze all of these simultaneously (as described inChapter 1). For this demonstration, there are two files.2. Place the cdc25-sep1_GO.txt file in the directory and rename it annot.txt.3. Start ClutrFree by double-clicking on the ClutrFree.jar icon on a Macintosh orWindows computer. On a Linux or Solaris computer, use the command java–jarClutrFree.jar. The user interface will appear.4. In ClutrFree, click on the File menu and choose Import Data. In the file chooser,move to the folder containing the annot.txt file, highlight that folder, and click onthe Choose button. This will load the data and GO annotations. A new window forviewing the cluster shapes and a tree relating the clusters to each other for eachanalysis will appear. The >> button allows the users to view the individual clustershapes (or pattern).5. Next, press on the gene table button. A new window will open with the genes inthe analysis listed together with their assignment to each pattern (yellow bars) andtheir persistence along the tree (blue bars).6. For a pattern of interest (herein the third pattern is chosen, which in ones simulationis related to the G1 phase of the cell cycle), click on the number of the pattern abovethe yellow bars (see Fig. 4). This will reorder the genes by their strength within thepattern (see Note 8).7. Use an appropriate website or annotation service to get specifics on each gene. Forthe S. pombe data, this can be done using GeneDB. For each gene that is highlytied to a pattern, one can retrieve details using GeneDB. Alternatively, one can useautomated systems to do this.8. For GeneDB, enter the gene ID in the search field, when the gene page appearsone can add the gene to the basket. Do this for each gene that is strongly tied tothe pattern. Unfortunately, this requires setting a cutoff and there is no reliableway to do this. In general, for this manual method, choosing the top 10 or 15 geneswill typically give a list of genes with known and unknown functions.9. Using the genes with known function, or the behavior of the pattern (herein a G1linked cell cycle pattern), predict the gene function for unknown genes (see Note 9).This is then a prediction for the function of genes with unknown function. Forthis case, it is predicted that the gene SPAC1006.08 is involved in the G1 phaseof the cell cycle, even though it is also involved in other patterns (2 and 6), whichappear related to background processes (see refs. 17 and 18 for examples ofanalyses with such processes). In addition, one would predict that SPAP14E8.02,a predicted transcription factor, is uniquely involved in cell cycle, as its entirebehavior is explained by pattern 3.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!