28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

36 CHAPTER 2. A HIGH-THROUGHPUT DNA SEQUENCING DATA ANALYSIS SUITE<br />

1 #?sample read barcode<br />

Col-0 2 TTCACG<br />

Col-0 2 GGATGT<br />

Ler-1 2 CTAGGC<br />

5 Ler-1 2 AGACCA<br />

Listing 2.1: Example of a Simple Demultiplexing Sheet<br />

2.4.2 A Format for Description of Multiplexing Setups<br />

A SHORE demultiplexing sample sheet is a simple tab-delimited text table with named columns.<br />

The recognized column names are lane, sample, barcode_group, read, barcode, extbarcode<br />

<strong>and</strong> barcode_type.<br />

The sample sheet is parsed utilizing a generic table input API provided by the SHORE <strong>lib</strong>rary.<br />

Column names are dened in the header line, the rst non-empty line of the le that is not a line<br />

comment, or is introduced with a special comment tag #?. Columns are recognized by name<br />

<strong>and</strong> may occur in arbitrary order; further columns may be present, but are ignored.<br />

The only m<strong>and</strong>atory column, named sample, denes the sample identiers that bar codes are<br />

to be translated to. All further columns may be added in arbitrary combinations <strong>and</strong> order. A<br />

typical simple index read demultiplexing conguration is described by additionally specifying the<br />

read <strong>and</strong> barcode columns (listing 2.1), with read specifying the index of the read that contains<br />

the bar code tag <strong>and</strong> barcode the actual sequence of the bar code oligomer. The read index<br />

column may be omitted, indicating that all sequence reads of a pair are attached to identical bar<br />

codes.<br />

Index read <strong>and</strong> 5 ′ bar coding require suitable treatment of the respective bar code oligomers.<br />

While 5 ′ bar codes must be clipped, with the remaining part of the read sequence to be retained,<br />

index reads must be removed completely from the data set. The default implemented in SHORE<br />

is to apply bar code clipping if the sample sheet entry refers to the rst read or if the read<br />

index was omitted, <strong>and</strong> to lter bar-coded reads where the read index is greater than one. The<br />

barcode_type column allows to explicitly control this behavior. SHORE accepts bar code types<br />

read, 5prime <strong>and</strong> none, where the type is associated with the read index, i. e. all of a lane's<br />

entries with the same read index must also specify the same bar code type. Bar codes of type<br />

read will trigger ltering of the entire bar code associated read, whereas 5prime bar codes will<br />

be clipped.<br />

For congurations with bar code information distributed across multiple sequence reads, several<br />

entries with dierent read index values are for each sample added to the denition (listing 2.2,<br />

lines 47). The sample sheet entries will then be grouped on the sample identier, with each<br />

possible combination of bar codes with diering read indexes considered a valid bar code tuple<br />

for the respective sample. Sample sheet entries with the special sample identier * are interpreted<br />

as valid for each sample specied for the respective sequencing lane (e. g. lines 1617).<br />

For example, listing 2.2 denes two valid bar codes for the Ler-1 sample for each of the three<br />

sequencing reads, <strong>and</strong> thus 2 3 = 8 valid 3-tuples of bar codes that can be resolved to that sample.<br />

Automatic combination of all entries listed for a sample can be controlled by the user via<br />

specication of the barcode_group column. Sample sheet entries with dierent bar code group<br />

values are not able to form a valid bar code tuple (e. g. listing 2.2, lines 1013). For example, the<br />

rst read bar code specied in line 10 for the Col-0 sample may only combined with the second<br />

read entry from line 11 <strong>and</strong> not the one from line 13. Furthermore, a combination of line 10 <strong>and</strong><br />

13 would collide with the bar code tuples including entries from line 4 <strong>and</strong> 7, which are already<br />

specied to resolve to the Ler-1 sample. As with the sample column, the bar code group may be<br />

set to * to specify <strong>and</strong> entry that refers to all groups of the respective sample in the respective

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!