Views
5 years ago

PDF - White Rose Etheses Online

PDF - White Rose Etheses Online

nucleotides to a strand

nucleotides to a strand is detected by a highly sensitive ion sensor on the base of the sequencing chip. This proton detection identifies the fragments that have been extended at each step. In the case of Ion Torrent and 454 sequencing, homopolymers (multiple sequential instances of a particular nucleotide) may be incorporated at once if the complementary strand being sequenced contains several of the same base in succession. The addition of a homopolymer is detected as a proportional increase in either brightness (454 sequencing) or ion concentration (Ion Torrent) at the site of the multiple addition. This is precluded in Illumina and SOLiD sequencing, where the addition of further nucleotides is blocked until a cleavage step has taken place following the previous addition step. The similarities between these massively parallel sequencing methods are manifested in similarities between the data produced from them. The product of a sequencing analysis in these cases is a large number of short sequence reads, each corresponding to a single sequencing reaction in the run. Quality information associated with each read is also produced, to provide a measure of the confidence with which nucleotide identity was called at each position. The length of reads produced depends on the platform used for sequencing. At the shorter end of the spectrum of read lengths are those from SOLiD and Illumina platforms, with SOLiD reads ~75 bp in length, older Illumina machines producing reads ~30 bp in length, and more recent models ~150 bp. Reads produced in 454 and Ion Torrent sequencing are longer, with Ion Torrent and the early 454 GS FLX technology producing sequences ~250-400 bp in length, and more recently upgraded 454 systems increasing the maximum read length to ~1000 bp (Mardis 2011). Although the individual reads are very short relative to the size of a genome, considerable coverage can be obtained in a single run by virtue of the huge number of reads that can be produced. A single massively parallel sequencing run produces hundreds of thousands or millions of reads, providing a massive amount of sequencing data for analysis. Chapter 1 - DNA sequencing - an overview Each platform produces reads with a different sequencing error profile. For example, SOLiD systems have high accuracy in terms of nucleotide identity as each position in the target sequence is interrogated twice, although differences 9

in intensity of emitted fluorescence can make base-calling more difficult in later cycles of a run. As described previously, Ion Torrent and 454 sequencing can introduce multiple copies of the same nucleotide into a sequence in a single cycle, with errors in identifying the true length of these homopolymers more common in these platforms. The signal observed from the incorporation of a homopolymer scales with the number of nucleotides included, but imprecision in this proportionality can result in an erroneous call of the true length of the homopolymer, especially where this length is larger than just a few nucleotides (Le and Durbin 2011). The position of a base in a fragment, and the nucleotides that surround it have both been shown to influence the likelihood of an erroneous nucleotide identity call (Gilles, Meglecz et al. 2011; Nakamura, Oshima et al. 2011). Although the huge yield of high-throughput platforms has facilitated a dramatic decrease in the cost of sequencing, and a huge increase in the rate at which sequencing can be completed, the size of the datasets and the short length of reads produced has introduced a new set of challenges. Where the pace of genomic research was previously limited by the time required to determine the sequence of interest, the limiting factor is now the time taken to effectively store, extract and analyse the sequencing data produced from these methods (Mardis 2011; Scholz, Lo et al. 2012). Sequence assembly The short length of sequencing reads constitutes one of the greatest obstacles to effective analysis of genomic sequences targeted with massively parallel approaches (Miller, Koren et al. 2010; Mardis 2011). Longer sequences can be reconstituted from the short reads through a process known as assembly, where sections of identical sequence are used to identify overlaps between reads and subsequently join them together. The shorter the reads produced, the more difficult it is to identify these overlaps, and the more reads are required to produce a given coverage of a target sequence (Scheibye-Alsing, Hoffmann et al. 2009; Miller, Koren et al. 2010). Chapter 1 - DNA sequencing - an overview Massively parallel sequencing of a genome or large section of sequence is usually performed using the ‘whole genome shotgun’ (WGS) approach. In WGS sequencing, the target genome is first sheared into small fragments and 10

  • Page 1: Clustering Large Raw DNA Sequencing
  • Page 5 and 6: Table of contents Table of Contents
  • Page 7 and 8: Table of Contents Extraction of DNA
  • Page 9 and 10: Table of Contents Contig assembly..
  • Page 11 and 12: List of Tables and Figures Table 3.
  • Page 13 and 14: Figure 5.5 The number of sequencing
  • Page 15: Declaration • The implementation
  • Page 18 and 19: Context Differences between the gen
  • Page 20 and 21: DNA sequencing - an overview Sanger
  • Page 22 and 23: sequence, which can then be assembl
  • Page 26 and 27: separated based on size (typically
  • Page 28 and 29: example, a pair of reads produced f
  • Page 30 and 31: The recent improvements in sequenci
  • Page 32 and 33: Chapter 1 - Metagenomics and sequen
  • Page 34 and 35: complement metagenomics and provide
  • Page 36 and 37: such as sampling time and location,
  • Page 38 and 39: aims of the HMP are described as:
  • Page 40 and 41: As with the larger and more complex
  • Page 42 and 43: Methods of sequence comparison Alig
  • Page 44 and 45: where local alignments can identify
  • Page 46 and 47: Ladunga (1994), led to the coining
  • Page 48 and 49: Project summary The aim of this pro
  • Page 51 and 52: 2 A comparison of genomic signature
  • Page 53 and 54: GC content The GC content of DNA, t
  • Page 55 and 56: In order to ascertain the likelihoo
  • Page 58 and 59: To illustrate this point further, i
  • Page 60 and 61: words in each sequence, is benefici
  • Page 62: values, collected as the sample siz
  • Page 65 and 66: On a related note, the authors of t
  • Page 67 and 68: would likely be more closely relate
  • Page 70 and 71: Breakdown of simLC by reads-per-spe
  • Page 72 and 73: corresponding ‘true’ dataset. T
  • Page 74 and 75:

    if the dataset contains 50 sequence

  • Page 76 and 77:

    Figure 2.5 Clustering of sequences

  • Page 78 and 79:

    Clustering of Dataset 1 Tables 2.2

  • Page 80 and 81:

    Table 2.3 Mean recall values of clu

  • Page 82 and 83:

    use of OFDEG features was only marg

  • Page 84 and 85:

    Table 2.4 Mean precision and recall

  • Page 86 and 87:

    sequences from that genome in the d

  • Page 88 and 89:

    Table 2.6 Mean recall values of clu

  • Page 90 and 91:

    different proportions were grouped.

  • Page 92 and 93:

    R. palustris Bradyrhizobium BTAi1 C

  • Page 94 and 95:

    Figure 2.7(i) - 2.7(xv) Comparative

  • Page 96 and 97:

    IND Cluster 1 Cluster 4 Cluster 2 C

  • Page 98 and 99:

    TNF Cluster 1 Cluster 4 Cluster 2 C

  • Page 100 and 101:

    GC + OFDEG Cluster 1 Cluster 4 Clus

  • Page 102 and 103:

    IND + OFDEG Cluster 1 Cluster 4 Clu

  • Page 104 and 105:

    OFDEG + TNF Cluster 1 Cluster 4 Clu

  • Page 106 and 107:

    GC + IND + TNF Cluster 1 Cluster 4

  • Page 108 and 109:

    IND + OFDEG + TNF Cluster 1 Cluster

  • Page 110 and 111:

    When compared to the distribution o

  • Page 112 and 113:

    to that achieved with GC feature ve

  • Page 114 and 115:

    Table 2.8 Time taken (in seconds) t

  • Page 116 and 117:

    platforms, all with typical lengths

  • Page 118 and 119:

    feature vectors, which were found t

  • Page 120 and 121:

    However, because the sequencing rea

  • Page 122 and 123:

    single-variable GC content feature.

  • Page 125 and 126:

    3 Preparation and analysis of high-

  • Page 127 and 128:

    elonging to either the host species

  • Page 129:

    Materials and Methods Inoculation o

  • Page 133 and 134:

    Assay sequences: • Cucumber mosai

  • Page 135 and 136:

    Analysis of extracted RNA by qRT-PC

  • Page 137 and 138:

    Analysis of extracted DNA by qPCR E

  • Page 139 and 140:

    Table 3.5 Amount of DNA sequenced f

  • Page 141 and 142:

    Results Comparison of bacterial ino

  • Page 143 and 144:

    Mean Ct Value Mean Ct Value (a) (b)

  • Page 145 and 146:

    the CMV assay. If the poor amplific

  • Page 147 and 148:

    40 30 20 Mean Ct Value (COX Assay)

  • Page 149 and 150:

    qRT-PCR Analysis of Viral Treatment

  • Page 151 and 152:

    qPCR analysis of DNA extracts in pr

  • Page 153 and 154:

    Table 3.8 Mean Ct values observed i

  • Page 155 and 156:

    The full inoculation method involve

  • Page 157 and 158:

    Table 3.9 Mean Ct values observed i

  • Page 159 and 160:

    qRT-PCR Analysis of Dummy Inoculate

  • Page 161 and 162:

    Results of high-throughput DNA sequ

  • Page 163 and 164:

    Figure 3.10 Proportion of sequence

  • Page 165 and 166:

    Figure 3.11 Proportion of sequence

  • Page 167 and 168:

    • Viral treatment groups Table 3.

  • Page 169 and 170:

    Figure 3.12 Proportion of sequence

  • Page 171 and 172:

    Figure 3.13 Proportion of sequence

  • Page 173 and 174:

    Discussion Datasets produced from b

  • Page 175 and 176:

    Datasets produced from viral treatm

  • Page 177:

    present in the samples. The use of

  • Page 180 and 181:

    Introduction An evaluation of the p

  • Page 182 and 183:

    discussed elsewhere, the length of

  • Page 184 and 185:

    single clustering method, CLARA. Th

  • Page 186 and 187:

    UT (A. thaliana) UT (unassigned) UT

  • Page 188 and 189:

    Results The scope for the four feat

  • Page 190 and 191:

    Chapter 4 - Results GC (i) Cluster

  • Page 192 and 193:

    TNF GC + IND Cluster 1 UT (A. thali

  • Page 194 and 195:

    IND + OFDEG IND + TNF Cluster 1 UT

  • Page 196 and 197:

    GC + IND + TNF Cluster 1 GC + OFDEG

  • Page 198 and 199:

    Coherent with clustering results ob

  • Page 200 and 201:

    UT+Psp2126 - five clusters Figure 4

  • Page 202 and 203:

    IND + OFDEG (viii) Cluster 1 Cluste

  • Page 204 and 205:

    OFDEG + TNF Cluster 1 Cluster 4* Cl

  • Page 206 and 207:

    GC + IND + TNF (xii) Cluster 1 Clus

  • Page 208 and 209:

    IND + OFDEG + TNF (xiv) Cluster 1 C

  • Page 210 and 211:

    Once again, clustering results prod

  • Page 212 and 213:

    Discussion Several trends were iden

  • Page 214:

    produce large numbers of these feat

  • Page 217 and 218:

    Introduction Previous chapters have

  • Page 219 and 220:

    1981). Partitioning around mediods

  • Page 221 and 222:

    strength (Tibshirani and Walther 20

  • Page 223 and 224:

    Where these linkage metrics are mea

  • Page 225 and 226:

    many of the methods described previ

  • Page 227 and 228:

    Data can be grouped with an SOM in

  • Page 229 and 230:

    separation of data in each case. So

  • Page 231 and 232:

    here). Euclidean distance, the defa

  • Page 233 and 234:

    Beyond this general pattern within

  • Page 235 and 236:

    1.00 0.80 0.60 0.40 0.20 0 Chapter

  • Page 237 and 238:

    Parameter selection for spectral cl

  • Page 239 and 240:

    1.0 0.8 0.6 0.4 0.2 KASP Clustering

  • Page 241 and 242:

    HHSOM When originally published by

  • Page 243 and 244:

    No. of sequences assigned to node 3

  • Page 245 and 246:

    No. of sequences assigned to node 3

  • Page 247 and 248:

    No. of sequences assigned to node 5

  • Page 249 and 250:

    Comparison of partitioning clusteri

  • Page 251 and 252:

    Table 5.4 Precision and recall stat

  • Page 253 and 254:

    Cluster Species 1 2 3 4 5 6 7 A. th

  • Page 255 and 256:

    een grouped into the cluster. Preci

  • Page 257 and 258:

    Discussion The level of accuracy ac

  • Page 260 and 261:

    6 A comparison of de novo sequence

  • Page 262 and 263:

    where a pairwise comparison is made

  • Page 264 and 265:

    performed at random. This also impr

  • Page 266 and 267:

    Dataset Organism Genome Size Genome

  • Page 268 and 269:

    Results UT+Psp2126 The UT+Psp2126 d

  • Page 270 and 271:

    Metric Contigs Combined length (bp)

  • Page 272 and 273:

    As such, the increase in total leng

  • Page 274 and 275:

    As such, the predictions of mapping

  • Page 276 and 277:

    Combined length (bp) 450000 425000

  • Page 278 and 279:

    Combined length (bp) 80000 60000 40

  • Page 280 and 281:

    unclustered reads, for random clust

  • Page 282 and 283:

    Sample 1 - blackberry + suspected b

  • Page 284 and 285:

    Sample 2 - ivy + supected bacterial

  • Page 286 and 287:

    Sample 3 - tomato + Pepino mosaic v

  • Page 288 and 289:

    Speed of assembly The time taken fo

  • Page 290 and 291:

    Discussion UT+Psp2126 In previous c

  • Page 292 and 293:

    of the dataset before and after clu

  • Page 295 and 296:

    The UT+Psp2126 dataset cannot be th

  • Page 298 and 299:

    7 Abstract Discussion and future di

  • Page 300 and 301:

    pathogen material extracted from th

  • Page 302 and 303:

    would be beneficial in spite of the

  • Page 304 and 305:

    investigation might be made into wh

  • Page 306 and 307:

    Sequence assembly As new sequencing

  • Page 308 and 309:

    al. 2012). This method of character

  • Page 310 and 311:

    Appendix A-1: Use of perl scripts i

  • Page 312 and 313:

    Appendix A-3 randomSeqWriter.pl #!

  • Page 314 and 315:

    if (@alphabet < @names) { } foreach

  • Page 316 and 317:

    Appendix A-4 featureWriter.pl #! /u

  • Page 318 and 319:

    } print "GC content done...\n"; #ge

  • Page 320 and 321:

    } } #OFDEG if ($seqLength < $shorte

  • Page 322 and 323:

    } } else { } $revtethash{$tetraseq}

  • Page 324 and 325:

    } } $iteration++; @wordSizeArray =

  • Page 326 and 327:

    } } else { } if ($Odist ne "") { }

  • Page 328 and 329:

    } push (@CEF_array, $CEF); #calcula

  • Page 330 and 331:

    Appendix A-5 featureComboWriter.pl

  • Page 332 and 333:

    {$feat}}) { } } } else { } print OU

  • Page 334 and 335:

    \n"; } $rangeSplit[0] = 2; $rangeUL

  • Page 336 and 337:

    Appendix A-7 claraResultsSummariser

  • Page 338 and 339:

    $speciesPresent{$species}; } if (ex

  • Page 340 and 341:

    Appendix A-8 avePRwriter.pl #! /usr

  • Page 342 and 343:

    Appendix A-9 SAMseqAssigner.pl #! /

  • Page 344 and 345:

    } else { } close OUTFH; close PAFH;

  • Page 346 and 347:

    if ($method eq "fuzzyk" || $method

  • Page 348 and 349:

    Appendix A-11 contigInfo.pl #! /usr

  • Page 350 and 351:

    } #grep lists of reads used in each

  • Page 352 and 353:

    } $meanLength = $totalLength/$numCt

  • Page 354 and 355:

    } unless ($spCumLength > ($spSumCon

  • Page 356 and 357:

    } } else { } $seqLine = $_; chomp $

  • Page 358 and 359:

    } } if ($clusters{$ID} == $clusterN

  • Page 360 and 361:

    Appendix B-1 A table detailing the

  • Page 362 and 363:

    Taxon Genome size Reads used Total

  • Page 364 and 365:

    Taxon Genome size Reads used Total

  • Page 366 and 367:

    Taxon Genome size Reads used Total

  • Page 368 and 369:

    Taxon Genome size Reads used Total

  • Page 370 and 371:

    Taxon Genome size Reads used Total

  • Page 372 and 373:

    Taxon Genome size Reads used Total

  • Page 374 and 375:

    Species Genus Family Order Class Ph

  • Page 376 and 377:

    Species Genus Family Order Class Ph

  • Page 378 and 379:

    Species Genus Family Order Class Ph

  • Page 380 and 381:

    Species Genus Family Order Class Ph

  • Page 382 and 383:

    Species Genus Family Order Class Ph

  • Page 384 and 385:

    Species Genus Family Order Class Ph

  • Page 386 and 387:

    Species Genus Family Order Class Ph

  • Page 388 and 389:

    Species Genus Family Order Class Ph

  • Page 390 and 391:

    Table of Abbreviations Abbreviation

  • Page 392 and 393:

    Abbreviation Term Definition PAM Pa

  • Page 394 and 395:

    Bernardi, G. and G. Bernardi (1986)

  • Page 396 and 397:

    Eisen, J. A. (2007). "Environmental

  • Page 398 and 399:

    Kannan, R., S. Vempala, et al. (200

  • Page 400 and 401:

    Mavromatis, K., N. Ivanova, et al.

  • Page 402 and 403:

    Rico, A., S. L. McCraw, et al. (201

  • Page 404 and 405:

    Teeling, H., J. Waldmann, et al. (2

  • Page 406 and 407:

    Wendl, M. C. (2006). "A general cov

The Archaeology of Medieval Europe - White Rose Research Online
See PDF version here. - Blue & White Online
[+]The best book of the month Rose Red And Snow White [NEWS]
See PDF version here. - Blue & White Online
See PDF version here. - Blue & White Online
See PDF version here. - Blue & White Online
See PDF version here - Blue & White Online
[+][PDF] TOP TREND Baby Animals Black and White [FULL]
Best [PDF] Girl Boss - She Designed A Life She Loved: 6x9 Blank Lined Journal For Business Women: Chic Inspirational Notebook - Floral Roses Black and White Stripes: Volume 1 (Boss Lady Gifts) Best Sellers Rank : #2 For Iphone#D#
Read Online (PDF) Discrete Chaos, Second Edition: With Applications in Science and Engineering - Read Unlimited eBooks and Audiobooks
Download Brochure (PDF) - Platea Online
Read Online (PDF) Draplin Design Co.: Pretty Much Everything - All Ebook Downloads
Read Editorial online - pdf file - Laboratory equipment manufacturers
[+][PDF] TOP TREND Fly Guy Presents: The White House (Scholastic Reader, Level 2) [FULL]
[+][PDF] TOP TREND Passive Income: 25 Proven Business Models To Make Money Online From Home (Passive income ideas) [READ]
[+][PDF] TOP TREND Banana: The Fate of the Fruit That Changed the World [READ]
Read Online (PDF) Dental Terminology (Book Only) - Read Unlimited eBooks and Audiobooks
Download PDF Surgical Critical Care: For the MRCS OSCE Free download and Read online
Read Online (PDF) INFANTS TODDLERS CAREGIVERS:CURRICULUM RELATIONSHIP - All Ebook Downloads
[+][PDF] TOP TREND HBR s 10 Must Reads on Leadership (with featured article "What Makes an Effective Executive," by Peter F. Drucker) [NEWS]
[+][PDF] TOP TREND Junk to Gold: From Salvage to the World s Largest Online Auto Auction [READ]
[+][PDF] TOP TREND Bioinformatics: A Practical Handbook of Next Generation Sequencing and its Applications [READ]
Read Editorial online - pdf file - Laboratory equipment manufacturers
Part 1 Number 2 2011 - Never Give Up (PDF 1MB) - Literacy Online
Read Online (PDF) Maternal Newborn Nursing Care Plans - Read Unlimited eBooks and Audiobooks