04.04.2013 Views

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.1 Tag-sequencing Data Processing Methods<br />

the TagDust program [258] with a target false discovery rate (FDR) <strong>of</strong> 1%.<br />

Tags matching ribosomal sequence were identified by using the Bowtie pro-<br />

gram [257] to align against a database consisting <strong>of</strong> all rRNA genes, including<br />

pseudogenes, from Ensembl 56 [147] and all ribosomal repeats in the UCSC<br />

Genome Browser RepeatMasker track for genome assembly GRCh37 [152];<br />

only perfect matches to the extended 21nt tag sequence, consisting <strong>of</strong> the<br />

NlaIII site CATG followed by the observed 17nt tag, were accepted. Mito-<br />

chondrial tags were similarly excluded by searching for perfect matches to the<br />

mitochondrial chromosome sequence. The parameters used to run the Bowtie<br />

program were the following:<br />

-f parameter indicates that the query input files are FASTA files;<br />

-n parameter set to 0 means that the alignments may have no more than n<br />

mismatches. Since we are only looking for perfect matches, n=0.<br />

-y parameter specifies to the program to try as hard as possible to find valid<br />

alignments when they exist, which makes running this mode much slower;<br />

-k 2 instructs the program to report up to two valid alignments;<br />

-m 1 instructs the program to refrain from reporting any alignments for reads<br />

having more than one reportable alignment. This option is useful when<br />

the user wants to guarantee that reported alignments are unique, which<br />

is our case.<br />

To assign tags to genes, we employed a hierarchical strategy based on the ex-<br />

pectation that tags are most likely to originate from the 3’-most NlaIII site in<br />

known transcripts. Tags were assigned to transcripts using virtual tag data<br />

from the SAGE Genie database [65] and virtual tags extracted from Ensembl<br />

transcript models. The SAGE Genie annotation consisted <strong>of</strong> 105 sets <strong>of</strong> virtual<br />

tags obtained by scanning for NlaIII sites in cDNAs from RefSeq, Mammalian<br />

Gene Collection (MGC) and Genbank, then Expressed Sequence Tags (ESTs),<br />

UniGene consensus sequences and transfrags. The virtual tag sets are further<br />

classified based on the position <strong>of</strong> the tag relative to the 3’ end <strong>of</strong> the transcript<br />

and indicators <strong>of</strong> 3’ end reliability such as polyadenylation signal and poly-A<br />

tail. Since SAGE Genie does not cover Ensembl transcripts, we also extracted<br />

virtual tags from the 3’-most cut site in each Ensembl transcript. We used the<br />

CATG recognition sequence as a "signal" that indicated where to start count-<br />

ing 17nt ahead. This allowed us to extract that 17nt tag and know exactly from<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!