Applied Biosystems SOLiD™ 4 System SETS Software User Guide ...

More documents

Recommendations

Info

B Appendix B Advanced Topic: Data Analysis Overview Data analysis considerations SNP error rates A SNP must have two adjacent color changes. Single-color mismatches are not evidence of a SNP. This feature allows SOLiD software to distinguish errors in measurement from true SNPs. When you use 2-base encoding, any SNP in the original sequence is represented as two adjacent mismatches in color-space. Only three of nine possible adjacent mismatches can correspond to a real SNP. Suppose the raw sequencing error rate for a species is 1% per base, and for the same species that is sequenced, the SNP rate is about 0.1%. For a sequencing project of M total bases, there are about 0.001M real SNP occurrences in the data set. Among these SNPs, 0.001M×0.02 total cases appear as single mismatches or two invalid mismatches because a sequencing error happens at one of the alleles of the SNP. Only 2% of the real SNPs fail to appear as two adjacent, valid mismatches. If only two adjacent, valid mismatches are treated as candidates for SNP detection, then there is a 2% false negative rate. For any data set, two adjacent and valid mismatches can be caused by sequencing errors. However, for a total of M bps sequenced, there are 0.00003M total occurrences of two adjacent mismatches. Note that there are about 0.001M adjacent valid mismatches from real SNPs, among all two adjacent, valid mismatches observed in a particular data set. Of these, 97% are from real SNPs, and only 3% may be from sequencing errors. This is a 97% true discovery rate for that particular data set. Because there are about 0.01M total sequencing errors in the data set caused by 2-base encoding and the software’s ability to remove all single base mismatches, the error is reduced from 0.01M to 0.00003M. This is a reduction of 300 times, making the effective error rate 0.003%. This calculation illustrates the power of 2-base encoding in resequencing and finding SNPs. 154 Applied Biosystems SOLiD 4 System SETS Software User Guide
Sampling Allele ratio Applied Biosystems SOLiD 4 System SETS Software User Guide Appendix B Advanced Topic: Data Analysis Overview Data analysis considerations B If coverage at a genome position is low, the second allele might often not be sampled, even if it is present. Heterozygotes are expected to be underrepresented at low coverage. (They can be called as a homozygote for one of the two alleles.) It is important to have a low false positive rate for SNP detection, because SNPs are expected to exist at only 1 in 1,000 positions for humans and some other species. The false positive rate in an individual species should be at least an order of magnitude lower than this, to avoid a high false discovery rate. An error model that incorporates quality values (QV) can increase the overall accuracy of SNP detection. The SNP detection algorithms of the SOLiD 4 System use explicit error models generated from each run and biologically meaningful prior probabilities to evaluate evidence of a SNP. 2-base encoding provides a built-in way to distinguish errors from true alleles to manage the false positive rate. If the sample preparation method or some other cause produces results in allele ratios that are considerably different from the expected 50:50 ratio, it is more difficult to detect heterozygosity. Detecting heterozygosity then requires more sequence coverage. Find polymorphisms in color-space The SOLiD 4 System can detect complicated genomic variations such as adjacent SNPs, insertions, deletions, and structural rearrangements. For other analysis tools, visit the AB SOLiD Software Community website: http://www3.appliedbiosystems.com/AB_Home/applicationstechnol ogies/SOLiDSystemSequencing/SoftwareCommunityDataAnalysis ResourcesforScientistsDevelopers/index.htm An example of various polymorphisms in color-space is shown in the figure below. 155
Page 1 and 2:
Applied Biosystems SOLiD 4 System
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
How to use this guide Applied Biosy
Page 9 and 10:
1 �� SETS So
Page 11 and 12:
Naming convention for sequencing re
Page 13 and 14:
SETS homepage Applied Biosystems SO
Page 15 and 16:
Page 17 and 18:
2 �� SETS So
Page 19 and 20:
SOLiD 4 System Software workflow A
Page 21 and 22:
Log in to SETS Applied Biosystems S
Page 23 and 24:
Define primary analysis settings Ap
Page 25 and 26:
Primary analysis parameters 5. Clic
Page 27 and 28:
Parameter Description Applied Biosy
Page 29 and 30:
Parameter Description Applied Biosy
Page 31 and 32:
Parameter Description Application D
Page 33 and 34:
Define secondary analysis settings
Page 35 and 36:
Chapter 2 Prepare Run Settings Set
Page 37 and 38:
Load a Reference Sequence Specify a
Page 39 and 40:
Page 41 and 42:
3 �� SETS So
Page 43 and 44:
Note: For in-progress runs, you can
Page 45 and 46:
Note: Refer to the Imaging Metrics
Page 47 and 48:
Chapter 3 Monitor the Run in SETS V
Page 49 and 50:
View sample data Applied Biosystems
Page 51 and 52:
Troubleshoot failed jobs Sample ana
Page 53 and 54:
Colorcall Jobs Applied Biosystems S
Page 55 and 56:
Filter Fasta fails Applied Biosyste
Page 57 and 58:
View History View Run History in SE
Page 59 and 60:
Page 61 and 62:
System Logs Applied Biosystems SOLi
Page 63 and 64:
4 �� SETS So
Page 65 and 66:
Imaging Metrics Report Applied Bios
Page 67 and 68:
Cycle Heat Map Report Select a heat
Page 69 and 70:
Chapter 4 View Reports View overall
Page 71 and 72:
Exposure Time Report Applied Biosys
Page 73 and 74:
Chapter 4 View Reports View overall
Page 75 and 76:
Determine the optimal titration poi
Page 77 and 78:
View sample reports Applied Biosyst
Page 79 and 80:
Resizing images Applied Biosystems
Page 81 and 82:
Scaled data Applied Biosystems SOLi
Page 83 and 84:
Angle data Applied Biosystems SOLiD
Page 85 and 86:
View Analysis Reports Applied Biosy
Page 87 and 88:
View analysis Data Files View Analy
Page 89 and 90:
Auto-Correlation Report Applied Bio
Page 91 and 92:
Error Profiles Report Applied Biosy
Page 93 and 94:
Master Report Tool Generate a Maste
Page 95 and 96:
Chapter 4 View Reports Master Repor
Page 97 and 98:
5 �� SETS So
Page 99 and 100:
Chapter 5 Perform Reanalysis Primar
Page 101 and 102:
Effect on the current run Applied B
Page 103 and 104: 6 �� SETS So
Page 105 and 106: View Multiplexing Series in SETS Ac
Page 107 and 108: Set up Multiplexing run in ICS Crea
Page 109 and 110: Multiplexing Assignment report Appl
Page 111 and 112: Reassign Run Applied Biosystems SOL
Page 113 and 114: 7 �� SETS So
Page 115 and 116: Auto export in SETS Applied Biosyst
Page 117 and 118: Auto export with JMS broker and Exp
Page 119 and 120: 3. Run the following command: Appli
Page 121 and 122: Manually configure the RSA keys App
Page 123 and 124: Change export for current run only
Page 125 and 126: Event notifications and email servi
Page 127 and 128: Applied Biosystems SOLiD 4 System
Page 129 and 130: Delete runs Applied Biosystems SOLi
Page 131 and 132: Chapter 7 Manage Administrative Tas
Page 133 and 134: A Overview Applied Biosystems SOLiD
Page 135 and 136: Appendix A Web Services WSDL A Bind
Page 137 and 138: B Applied Biosystems SOLiD 4 Syste
Page 139 and 140: Appendix B Advanced Topic: Data Ana
Page 145 and 146: Appendix B Advanced Topic: Data Ana
Page 147 and 148: Complementing color-space data Appl
Page 151 and 152: Data analysis considerations Unders
Page 153: SNPs and errors High accuracy for S
Page 157 and 158: C Saving data Applied Biosystems SO
Page 161 and 162: D What is primary analysis? Applied
Page 163 and 164: Primary analysis workflow Applied B
Page 165 and 166: Primary analysis inputs and outputs
Page 167 and 168: Appendix D Advanced Topic: Primary
Page 169 and 170: Storage requirements for primary an
Page 175 and 176: Redo the read-filtering process App
Page 177 and 178: Image metrics Applied Biosystems SO
Page 181 and 182: Usage Validation and observed quali
Page 183 and 184: Key files generated by primary anal
Page 185 and 186: .spch files Applied Biosystems SOLi
Page 187 and 188: Troubleshooting analysis failure Ap
Page 191 and 192: E Applied Biosystems SOLiD 4 Syste
Page 193 and 194: LICENSE Appendix E Software License
Page 195 and 196: Appendix E Software License Agreeme
Page 201 and 202: Alignment Matrix The alignment matr
Page 205 and 206:
Glossary WFA A Workflow Analysis (W
Page 207 and 208:
Related documentation Document BioS
Page 209:
Part Number 4448411 Rev. B 04/2010
show all

Applied Biosystems SOLiD™ 4 System SETS Software User Guide ...

Create successful ePaper yourself

Delete template?

Save as template?