11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

N. Chatterjee 18117.4 Genome-wide association studies (GWAS):Introduction to big scienceBRCA1/2 mutations, which pose high-risks for breast and ovarian cancer butrare in the general population, were originally discovered in the early 1990sthrough linkage studies that involve analysis of the co-segregation of geneticmutations and disease within highly affected families (Hall et al., 1990). Fromthe beginning of the 21st century, after the human genome project got completedand large scale genotyping technologies evolved, the genetic communitystarted focusing on genome-wide association studies (GWAS). The purpose ofthese studies was to identify genetic variants which may pose more modestrisk of diseases, like breast cancer, but are more common in the general populations.Early in this effort, the leadership of our Division decided to launchtwo such studies, one for breast cancer and one for prostate cancer, under therubric of the Cancer Genetics Marker of Susceptibility of Studies (CGEMS)(Hunter et al., 2007; Yeager et al., 2007).Iparticipatedinthesestudiesasafour-memberteamofstatisticianswhoprovided the oversight of the quantitative issues in the design and analysisaspect of these studies. For me, this was my first exposure to large “team science,”where progress could only be made through collaborations of a team ofresearchers with diverse background, such as genomics, epidemiology, bioinformatics,and statistics. Getting into the nitty-gritty of the studies gave mean appreciation of the complexities of large scale genomic studies. I realizedthat while we statisticians are prone to focus on developing an “even moreoptimal” method of analysis, some of the most fundamental and interestingquantitative issues in these types of studies lies elsewhere, in particular in theareas of study design, quality control and characterization following discovery(see next section for more on the last topic).I started thinking seriously about study design when I was helping oneof my epidemiologic collaborators put together a proposal for conducting agenome-wide association study for lung cancer. As a principled statistician,Ifeltsomeresponsibilitytoshowthattheproposedstudyislikelytomakenew discoveries beyond three GWAS of lung cancer that were just publishedin high-profile journals such as Nature and Nature Genetics. Irealizedthatstandard power calculations, where investigators typically show that the studyhas 80–90% power to detect certain effect sizes, is not satisfactory for evergrowingGWA studies. I realized if I had to do a more intelligent power calculation,I first needed to make an assessment of what might be the underlyinggenetic architecture of the trait, in particular how many genetic variants mightbe associated with the trait and what are their effect-sizes.I made a very simple observation that the discoveries made in an existingstudy can be thought of as a random sample from the underlying “population”of susceptibility markers where the probability of discovery of any

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!