11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

X. Lin 1950’s. The association between a region consisting of the p rare variants G i andthe phenotype Y can be tested by evaluating the null hypothesis that H 0 :β =(β 1 ,...,β p ) ⊤ = 0. As the genotype matrix G is very sparse and p mightbe moderate or large, estimation of β is difficult. Hence the standard p-DFWald and LR tests are difficult to carry out and also lose power when p is large.Further, if the alternative hypothesis is sparse, i.e., only a small fraction ofβ’s are non-zero but one does not know which ones are non-zeros, the classicaltests do not effectively take the knowledge of the sparse alternative and thesparse design matrix into account.18.4.2 Building risk prediction models using whole genomedataAccurate and individualized prediction of risk and treatment response playsa central role in successful disease prevention and treatment. GWAS andGenome-wide Next Generation Sequencing (NGS) studies present rich opportunitiesto develop a risk prediction model using massive common andrare genetic variants across the genome and well known risk factors. Thesemassive genetic data hold great potential for population risk prediction, aswell as improving prediction of clinical outcomes and advancing personalizedmedicine tailored for individual patients. It is a very challenging statisticaltask to develop a reliable and reproducible risk prediction model using millionsor billions of common and rare variants, as a vast majority of thesevariants are likely to be null variants, and the signals of individual variantsare often weak.The simple strategy of building risk prediction models using only the variantsthat are significantly associated with diseases and traits after scanning thegenome miss a substantial amount of information. For breast cancer, GWASshave identified over 32 SNPs that are associated with breast cancer risk. Althougha risk model based on these markers alone can discriminate casesand controls better than risk models incorporating only non-genetic factors(Hüsing et al., 2012), the genetic risk model still falls short of what shouldbe possible if all the genetic variants driving the observed familial aggregationof breast cancer were known: the AUC is .58 (Hüsing et al., 2012) versusthe expected maximum of .89 (Wray et al., 2010). Early efforts of including alarge number of non-significant variants from GWAS in estimating heritabilitymodels show encouraging promises (Yang et al., 2010).The recent advancement in NGS holds great promises in overcoming suchdifficulties. The missing heritability could potentially be uncovered by rare anduncommon genetic variants that are missed by GWAS (Cirulli and Goldstein,2010). However, building risk prediction models using NGS data present substantialchallenges. First, there are a massive number of rare variants cross thegenome. Second, as variants are rare and the data dimension is large, theireffects are difficult to be estimated using standard MLEs. It is of substan-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!