29.01.2013 Views

Applied Biosystems SOLiD™ 4 System SETS Software User Guide ...

Applied Biosystems SOLiD™ 4 System SETS Software User Guide ...

Applied Biosystems SOLiD™ 4 System SETS Software User Guide ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

B<br />

<strong>Applied</strong> <strong>Biosystems</strong> SOLiD 4 <strong>System</strong> <strong>SETS</strong> <strong>Software</strong> <strong>User</strong> <strong>Guide</strong><br />

Appendix B<br />

Advanced Topic: Data Analysis<br />

Overview<br />

The topics provided in this appendix are intended for advanced users<br />

of the SOLiD 4 <strong>System</strong> and do not apply to the typical user.<br />

Fundamentals of color-space analysis<br />

The 2-base color coding scheme<br />

The <strong>Applied</strong> <strong>Biosystems</strong> SOLiD 4 <strong>System</strong> sequencing technology<br />

is based on sequential ligation of dye-labeled oligonucleotides. This<br />

technology makes possible massive parallel sequencing of clonally<br />

amplified DNA fragments. Features of this system, such as matepaired<br />

analysis and 2-base encoding, enable studies of complex<br />

genomes by providing a greater degree of accuracy. This section<br />

describes the principles of 2-base encoding and the benefits of<br />

performing analysis in the di-base alphabet, known as color-space<br />

analysis.<br />

Until recently, most DNA sequencing was performed using the chain<br />

termination method developed by Frederick Sanger. (Refer to the<br />

paper by Sanger F., Coulson A. R., 1975, A rapid method for<br />

determining sequences in DNA by primed synthesis with DNA<br />

polymerase. J Mol Biol. 94(3): 441-448.) This type of sequencing is<br />

often referred to as Sanger sequencing. Sanger sequencing data is<br />

also encoded in color-space by the four fluorescent dyes used in the<br />

sequencing chemistry and displayed as peaks in an<br />

electropherogram. In Sanger sequencing, each color, representing<br />

only a single nucleotide, is automatically translated to A, C, G, or T.<br />

With the SOLiD 4 <strong>System</strong>, each color represents four potential 2base<br />

combinations (see Figure 1). The conversion into nucleotide<br />

base space is usually done after the sequence is aligned to a reference<br />

genome transcribed in color-space. As an alternative, translation can<br />

occur following the generation of a consensus sequence.<br />

137

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!