11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

E.A. Thompson 45739.5 The 2000s: Association studies and gene expressionSTR markers are highly variable, but expensive to type, and occur relativelysparsely in the genome. The advent of single nucleotide polymorphism (SNP)markers in essentially unlimited numbers (International HapMap Consortium,2005) brought a new dimension and new issues to the genetic mapping of complextraits. Genome-wide association studies (GWAS) became highly populardue to their expected ability to locate causal genes without the need for pedigreedata. However, early GWAS were underpowered. Only with large-scalestudies (Wellcome Trust, 2007) and better methods to control for populationstructure and heterogeneity (Price et al., 2006) did association methods startto have success. Modern GWAS typically consider a few thousand individuals,each typed for up to one million SNPs. New molecular technologies alsoprovided new measures of gene expression variation based on the abundanceof mRNA transcripts (Schena et al., 1995). Again the statistical question isone of association of a trait or sample phenotype with the expression of somesmall subset of many thousands of genes.The need to make valid statistical inferences from both GWAS and fromgene expression studies prompted the development of new general statisticalapproaches. Intrinsic to these problems is that the truth may violate the nullhypothesis in many (albeit a small fraction) of the tests of significance made.This leads to a focus on false discovery rates rather than p-values (Storey,2002, 2003). Both GWAS and gene expression studies also exhibit the modernphenomenon of high-dimensional data (p ≫ n) or very large numbers ofobservations on relatively few subjects (Cai and Shen, 2010), giving scope fornew methods for dimension reduction and inducing sparsity (Tibshirani et al.,2005).Genomic technologies continue to develop, with cDNA sequence data replacingSNPs (Mardis, 2008) and RNAseq data (Shendure, 2008) replacingthe more traditional microarray expression data. Both raise new statisticalchallenges. The opportunities for new statistical modeling and inference areimmense:“... next-generation [sequencing] platforms are helping to open entirelynew areas of biological inquiry, including the investigation of ancientgenomes, the characterization of ecological diversity, and the identificationof unknown etiologic agents.” (Mardis, 2008)But so also are the challenges:“Although these new [RNAseq] technologies may improve the qualityof transcriptome profiling, we will continue to face what has probablybeen the larger challenge with microarrays — how best to generatebiologically meaningful interpretations of complex datasets that are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!