An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2.4. A FLEXIBLE SEQUENCING READ DEMULTIPLEXING SYSTEM 37<br />
1 #?lane sample barcode_group read barcode extbarcode barcode_type<br />
# Bar codes for the Ler-1 sample.<br />
1 Ler-1 0 1 AACT TGCAG 5prime<br />
5 1 Ler-1 0 1 TAGC TGCAG 5prime<br />
1 Ler-1 0 2 AACT * read<br />
1 Ler-1 0 2 CCCT * read<br />
# Bar codes for the Col-0 sample.<br />
10 1 Col-0 0 1 AACTT GCAG 5prime<br />
1 Col-0 0 2 GGAC * read<br />
1 Col-0 1 1 TTGCT GCAG 5prime<br />
1 Col-0 1 2 CCCT * read<br />
15 # Third read has the same bar codes for all valid samples.<br />
1 * * 3 GGAC * 5prime<br />
1 * * 3 TTGC * 5prime<br />
# Lane 2 is not multiplexed, discard the index read.<br />
20 2 Bur-0 0 2 * * read<br />
Listing 2.2: Example of a Full Demultiplexing Sheet<br />
lane (e. g. lines 1617).<br />
For certain applications, the identity of several nucleotides immediately following 5 ′ bar code<br />
sequences is known, like for example restriction site sequence in RAD-Seq (section 1.3). While<br />
such sequences can be exploited to correctly assign each read to the appropriate sample, they<br />
usually should in contrast to the bar code oligomers not be removed from the output. Parts of the<br />
recognition sequence that should not be clipped from the read can be specied in the extended<br />
bar code (extbarcode) eld of the table. Internally, the sequence to be recognized is contructed<br />
by concatenation of the barcode <strong>and</strong> extbarcode elds, while the division of sequence among<br />
both elds is translated into a bar code cut position. For bar code types other than 5prime, the<br />
split into bar code <strong>and</strong> extended bar code has no eect. The bar code cut position is determined<br />
in the context of the respective bar code tuple, i. e. for the same recognition sequence dierent<br />
samples may dene a dierent split between bar code <strong>and</strong> extended bar code, as demonstrated<br />
by lines 4 <strong>and</strong> 10 of listing 2.2. If no part of the recognition sequence is to be removed from the<br />
output, then the entire oligomer can be provided as extended bar code, with column barcode<br />
either omitted or set to * .<br />
If neither bar code nor extended bar code are provided with a value other than * , then the<br />
respective sample sheet entry will match any read sequence. With bar code type either none<br />
or 5prime, such an entry can be utilized for assigning a certain sample identier to an entire<br />
sequencing lane. On the other h<strong>and</strong>, this property may be exploited to completely remove all<br />
reads with a certain read index from the output (e. g. listing 2.2, line 20).<br />
The sample sheet column lane serves to allow independent demultiplexing specications for<br />
dierent sequencing lanes in a single sample sheet le. Rows with diering sequencing lane elds<br />
are completely independent of each other. If the sequencing lane column is omitted, then the<br />
entire demultiplexing specication is considered valid for all lanes of the instrument run.