An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
36 CHAPTER 2. A HIGH-THROUGHPUT DNA SEQUENCING DATA ANALYSIS SUITE<br />
1 #?sample read barcode<br />
Col-0 2 TTCACG<br />
Col-0 2 GGATGT<br />
Ler-1 2 CTAGGC<br />
5 Ler-1 2 AGACCA<br />
Listing 2.1: Example of a Simple Demultiplexing Sheet<br />
2.4.2 A Format for Description of Multiplexing Setups<br />
A SHORE demultiplexing sample sheet is a simple tab-delimited text table with named columns.<br />
The recognized column names are lane, sample, barcode_group, read, barcode, extbarcode<br />
<strong>and</strong> barcode_type.<br />
The sample sheet is parsed utilizing a generic table input API provided by the SHORE <strong>lib</strong>rary.<br />
Column names are dened in the header line, the rst non-empty line of the le that is not a line<br />
comment, or is introduced with a special comment tag #?. Columns are recognized by name<br />
<strong>and</strong> may occur in arbitrary order; further columns may be present, but are ignored.<br />
The only m<strong>and</strong>atory column, named sample, denes the sample identiers that bar codes are<br />
to be translated to. All further columns may be added in arbitrary combinations <strong>and</strong> order. A<br />
typical simple index read demultiplexing conguration is described by additionally specifying the<br />
read <strong>and</strong> barcode columns (listing 2.1), with read specifying the index of the read that contains<br />
the bar code tag <strong>and</strong> barcode the actual sequence of the bar code oligomer. The read index<br />
column may be omitted, indicating that all sequence reads of a pair are attached to identical bar<br />
codes.<br />
Index read <strong>and</strong> 5 ′ bar coding require suitable treatment of the respective bar code oligomers.<br />
While 5 ′ bar codes must be clipped, with the remaining part of the read sequence to be retained,<br />
index reads must be removed completely from the data set. The default implemented in SHORE<br />
is to apply bar code clipping if the sample sheet entry refers to the rst read or if the read<br />
index was omitted, <strong>and</strong> to lter bar-coded reads where the read index is greater than one. The<br />
barcode_type column allows to explicitly control this behavior. SHORE accepts bar code types<br />
read, 5prime <strong>and</strong> none, where the type is associated with the read index, i. e. all of a lane's<br />
entries with the same read index must also specify the same bar code type. Bar codes of type<br />
read will trigger ltering of the entire bar code associated read, whereas 5prime bar codes will<br />
be clipped.<br />
For congurations with bar code information distributed across multiple sequence reads, several<br />
entries with dierent read index values are for each sample added to the denition (listing 2.2,<br />
lines 47). The sample sheet entries will then be grouped on the sample identier, with each<br />
possible combination of bar codes with diering read indexes considered a valid bar code tuple<br />
for the respective sample. Sample sheet entries with the special sample identier * are interpreted<br />
as valid for each sample specied for the respective sequencing lane (e. g. lines 1617).<br />
For example, listing 2.2 denes two valid bar codes for the Ler-1 sample for each of the three<br />
sequencing reads, <strong>and</strong> thus 2 3 = 8 valid 3-tuples of bar codes that can be resolved to that sample.<br />
Automatic combination of all entries listed for a sample can be controlled by the user via<br />
specication of the barcode_group column. Sample sheet entries with dierent bar code group<br />
values are not able to form a valid bar code tuple (e. g. listing 2.2, lines 1013). For example, the<br />
rst read bar code specied in line 10 for the Col-0 sample may only combined with the second<br />
read entry from line 11 <strong>and</strong> not the one from line 13. Furthermore, a combination of line 10 <strong>and</strong><br />
13 would collide with the bar code tuples including entries from line 4 <strong>and</strong> 7, which are already<br />
specied to resolve to the Ler-1 sample. As with the sample column, the bar code group may be<br />
set to * to specify <strong>and</strong> entry that refers to all groups of the respective sample in the respective