Supporting Information (SI) Appendix - Proceedings of the National ...
Supporting Information (SI) Appendix - Proceedings of the National ...
Supporting Information (SI) Appendix - Proceedings of the National ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Supplementary Methods<br />
Genotype data quality filters<br />
To verify genotypes, we compared genotype calls from <strong>the</strong> BeadStudio s<strong>of</strong>tware to Illumina GA<br />
sequence data from 17 accessions <strong>of</strong> diverse origin for which a strict set <strong>of</strong> rules were used to<br />
call genotypes [3]. We only considered genotypes above <strong>the</strong> following SNP quality thresholds:<br />
GenTrain Score ≥ 0.3 and GenCall ≥ 0.2. At <strong>the</strong>se thresholds, we observe 93.18% concordance<br />
for 60411 genotypes called on both platforms, which is an underestimate <strong>of</strong> genotyping accuracy<br />
as reduced representation libraries (RRLs) were sequenced with <strong>the</strong> Illumina GA, which results<br />
in heterozygotes being called homozygotes at an unknown rate. In addition, lower concordance<br />
rates are expected in highly diverse species, where numerous unknown flanking polymorphisms<br />
cause hybridization issues on <strong>the</strong> genotyping arrays. Based on 145 pairwise comparisons<br />
between replicate samples genotyped with <strong>the</strong> Vitis9KSNP array, we discarded SNPs with<br />
replication rates < 97%. The mean replication rate for <strong>the</strong> remaining 6507 SNPs was 0.9981. We<br />
discarded 307 excessively heterozygous SNPs with HWE p-values < 1e-4 within a group <strong>of</strong><br />
vinifera that was pruned so that no two accessions had an Identity-by-State (IBS) > 0.95. SNPs<br />
with significant excess homozygosity were left in after visual inspection <strong>of</strong> cluster plots. An<br />
additional 727 SNPs were removed because <strong>the</strong>y were monomorphic in <strong>the</strong> sample analysed here<br />
and 86 poor quality SNPs were removed after visual inspection <strong>of</strong> cluster plots. The total number<br />
<strong>of</strong> SNPs remaining for analysis was 5387.<br />
The species, cultivar name and cultivar type (wine or table grape) <strong>of</strong> each sample was obtained<br />
from <strong>the</strong> Germplasm Resources <strong>Information</strong> Network (GRIN) database <strong>of</strong> <strong>the</strong> USDA<br />
(http://www.ars-grin.gov/). Twenty-three samples labeled as vinifera in GRIN were excluded<br />
from analysis because <strong>the</strong>y were identified as wild Vitis species or wild/vinifera hybrids based on<br />
multi-dimensional scaling (MDS) plots <strong>of</strong> an IBS matrix that included hundreds <strong>of</strong> samples from<br />
numerous wild Vitis species and wild/vinifera hybrids. A geographic region <strong>of</strong> origin was<br />
assigned to 811 vinifera accessions based on mostly on information from GRIN and <strong>the</strong><br />
geographic origin <strong>of</strong> each sylvestris accession was assigned based on its collection location<br />
(Table S1). Samples with genotype call rates < 0.7 were excluded. In total, 950 vinifera and 59<br />
sylvestris remained for analysis.