Gao X, Starmer J, Martin ER. A multiple testing correction method for ...

Genetic Epidemiology 32: 361–369 (2008) 

A Multiple Testing Correction Method for Genetic Association 

Studies Using Correlated Single Nucleotide Polymorphisms 

Xiaoyi Gao, 1 

Joshua Starmer, 2,3 and Eden R. Martin 1 

1 

Center for Genetic Epidemiology and Statistical Genetics, Miami Institute for Human Genomics, University of Miami Miller School of Medicine, 

Miami, Florida 

2 

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 

3 

Curriculum in Toxicology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 

Multiple testing is a challenging issue in genetic association studies using large numbers of single nucleotide polymorphism 

(SNP) markers, many of which exhibit linkage disequilibrium (LD). Failure to adjust for multiple testing appropriately may 

produce excessive false positives or overlook true positive signals. The Bonferroni method of adjusting for multiple 

comparisons is easy to compute, but is well known to be conservative in the presence of LD. On the other hand, 

permutation-based corrections can correctly account for LD among SNPs, but are computationally intensive. In this work, 

we propose a new multiple testing correction method for association studies using SNP markers. We show that it is simple, 

fast and more accurate than the recently developed methods and is comparable to permutation-based corrections using 

both simulated and real data. We also demonstrate how it might be used in whole-genome association studies to control 

type I error. The efficiency and accuracy of the proposed method make it an attractive choice for multiple testing adjustment 

when there is high intermarker LD in the SNP data set. Genet. Epidemiol. 32:361–369, 2008. r 2008 Wiley-Liss, Inc. 

Key words: single nucleotide polymorphism; composite linkage disequilibrium; multiple testing correction; principal 

component analysis; eigenvalues 

Contract grant sponsor: NIH; Contract grant numbers: NS39764, AG019757, AG20135; Contract grant sponsor: NIEHS; Contract grant 

numbers: T32 ES007126. 

Correspondence to: Xiaoyi Gao, Center for Genetic Epidemiology and Statistical Genetics, Miami Institute for Human Genomics, 

University of Miami Miller School of Medicine, Miami, FL 33136. E-mail: xgao@med.miami.edu 

Received 10 July 2007; Revised 28 November 2007; Accepted 20 December 2007 

Published online 12 February 2008 in Wiley InterScience (www.interscience.wiley.com). 

DOI: 10.1002/gepi.20310 

INTRODUCTION 

Multiple testing is a challenging issue for genetic 

data analysis. Candidate gene and genome-wide 

association studies involve statistical testing of not 

just a single hypothesis, but many. Even when the 

point-wise error rate (PWER, ap) is set to a low level, 

the experiment-wise error rate (EWER, ae) increases 

with the number of tests carried out. For this reason, 

strict significance thresholds have been recommended 

to control EWER [Risch and Merikangas, 

1996]. However, an overly conservative approach 

may result in overlooking true positive signals, 

while an overly liberal criterion could produce 

excessive false positives. Sˇ idák and Bonferroni 

corrections are popular approaches for controlling 

ae by specifying what ap values should be used for 

each individual test. The Sˇ idák correction is calculated 

as ap ¼ 1 ð1 aeÞ 1=N , where N is the number 

of individual hypotheses to be tested [Sˇ idák, 1967]. 

This correction assumes that the hypothesis tests 

are independent. Noting that ð1 apÞ N 

1 Nap 

for small ap, we obtain the Bonferroni correction as 

r 2008 Wiley-Liss, Inc. 

ap ¼ ae=N [Bonferroni, 1935, 1936], which is an 

approximation to the S ˇ idák correction. 

Recently, single nucleotide polymorphisms 

(SNPs), which are often densely genotyped, have 

become popular markers for genetic association 

studies. The closely spaced SNPs frequently yield 

high correlation because of extensive linkage disequilibrium 

(LD) among them [Wall and Pritchard, 

2003]. Therefore, when association studies are conducted 

with many SNPs, the tests performed on 

each SNP are usually not independent, depending 

on the correlation structure among the SNPs. This 

violation of the independence assumption limits the 

S ˇ idák and Bonferroni corrections’ ability to control 

type I error effectively, and the PWER has to be 

adjusted in order to keep the EWER at a nominal 

level. 

In practice, many researchers use permutationbased 

methods to control the EWER when the tests 

are correlated. For example, Churchill and Doerge 

[1994] used a permutation test for estimating threshold 

values in quantitative trait mapping. Ritchie 

et al. [2001] and Hoh et al. [2001] used a permutation

362 Gao et al. 

test to control significance level for dichotomous 

traits. Permutation test correction is very robust and 

has the advantage of drawing the threshold directly 

from the experimental data [Cheverud, 2001]. However, 

permutation tests are computationally intensive. 

Churchill and Doerge [1994] suggested that at 

least 10,000 shuffles are needed to estimate a 0.01 

threshold and 1,000 shuffles to estimate a 0.05 

threshold. 

If the number of independent tests can be correctly 

inferred, we can still use the standard Bonferroni 

correction to rapidly adjust for multiple testing. 

Based on this idea, several researchers have tried to 

derive the effective number of independent tests, 

Meff [Cheverud, 2001; Nyholt, 2004; Li and Ji, 2005]. 

Cheverud [2001] was the first to propose this idea 

for multiple testing correction and published a 

formula for calculating Meff when SNP markers are 

correlated. However, Cheverud’s Meff is still overly 

conservative when there is high LD among SNPs [Li 

and Ji, 2005; Salyakina et al., 2005]. Nyholt [2005] 

suggested excluding all SNPs in perfect LD except 

one prior to using Cheverud’s Meff as a means to 

improve the adjustment accuracy, but this method 

remains overly conservative. Li and Ji [2005] 

proposed another Meff formula and demonstrated 

its improvement over Cheverud’s. However, Li and 

Ji’s approach, partitioning eigenvalues into integer 

and fractional parts, is an intuitive solution, and it 

was tested only on a small number of SNPs (o15 for 

each gene) in their single-locus analyses [Li and Ji, 

2005]. It is not clear how their method performs in 

relatively large SNP data sets (Z100) SNPs. With 

these limitations in mind, we have developed a new 

approach for estimating Meff, and denote it as Meff G, 

which improves on existing methods. 

The first step in calculating Meff for SNP data is 

constructing a correlation matrix, along with the 

corresponding eigenvalues, for the SNP loci. For 

example, Nyholt [2004] used LD correlation. However, 

a problem with calculating LD correlation is 

that the haplotype phase information is not usually 

available and needs to be derived. A common 

technique for inferring LD when the haplotype 

phase is unknown is to use the expectation-maximization 

algorithm under the assumption of Hardy- 

Weinberg equilibrium (HWE) [Excoffier and Slatkin, 

1995]. The potential problem with this approach is 

that HWE may not hold when sample individuals 

are chosen based on phenotypes [Zaykin et al., 2006], 

and HWE can be distorted between cases and 

controls in regions of association [Nielsen et al., 

1999; Wittke-Thompson et al., 2005]. Furthermore, 

this method requires the additional step of estimating 

haplotype frequencies, which may not be 

necessary if our goal is only to capture the correlation 

structure of SNPs. In contrast, the composite LD 

(CLD) correlation, which is calculated directly from 

SNP genotypes, describes the SNP correlation well 

Genet. Epidemiol. 

and is simpler to calculate. Recently, Weir et al. 

[2004], Schaid [2004], and Zaykin [2004] and Zaykin 

et al. [2006] showed that CLD can capture the 

relationship among SNPs comparable to those of 

gametic LD without requiring HWE. 

With the above improvements in mind, we 

propose a new multiple testing correction, 

simpleM, which uses CLD to create the correlation 

matrix and Meff G to calculate the effective number 

of independent tests. We then show that the new 

approach can successfully control the type I error 

rate based on both simulated and real data. 

Compared with either Bonferroni or Li and Ji’s 

approach, the adjusted thresholds from simpleM are 

more accurate, i.e., closer to the permutation-based 

corrections. Moreover, the proposed method can also 

be used to address multiple testing correction issues 

in genome-wide association studies using correlated 

SNPs. 

METHODS 

NOTATION 

For SNP markers, we consider only biallelic cases. 

Each SNP marker has two alleles and correspondingly 

three genotypes. For example, take two SNPs, 

A and B, which both have two alleles: A and a for 

SNP A, and B and b for SNP B. The three genotypes 

are AA, Aa and aa for SNP A, and similarly BB, Bb 

and bb for SNP B. For SNP A, the allele frequencies 

are denoted as pA and pa for the A and a alleles, 

respectively, and the genotype frequencies are 

denoted as PAA, PAa and Paa for the AA, Aa and aa 

genotypes, respectively. Similarly pB, pb, PBB, PBb and 

Pbb are the respective frequencies for SNP B. PAB 

denotes the gametic frequency and P A=B denotes the 

non-gametic frequency between SNPs A and B. Li 

and Ji’s Meff and the proposed Meff are denoted as 

Meff L and Meff G, respectively. Keeping the EWER 

(ae) at a nominal significance level, the adjusted 

PWERs are denoted as aL and aG calculated using 

Meff L, and Meff G, respectively. The Bonferroni and 

the permutation-based point-wise correction thresholds 

are denoted as aB and aP, respectively. M is the 

total number of SNPs in the data set. 

COMPOSITE LD 

The CLD coefficient is defined as 

DAB ¼ PAB þ P A=B 

¼ DAB þ D A=B; 

2pApB 

where DAB ¼ PAB pApB and D A=B ¼ P A=B pApB 

[Weir, 1996]. 

The composite correlation is defined as 

DAB 

rAB ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 

ðpAð1 pAÞþDAÞðpBð1 pBÞþDBÞ

where DA ¼ PAA p 2 A and DB ¼ PBB p 2 B [Weir, 

1979, 2004]. The composite estimator works well in 

capturing the LD correlation among SNPs and it is 

robust to violations in HWE [Weir et al., 2004; 

Schaid, 2004]. It is also easier to compute than LD 

correlation (see the Appendix). The CLD correlation 

can be calculated in R using the cor() function [Team, 

2007] when the SNP genotypes are numerically 

coded as 2, 1 and 0 for wild-type allele homozygotes, 

heterozygotes and variant-type allele homozygotes, 

respectively, which is shown in the Appendix. 

Meff_G ESTIMATION 

Principal component analysis (PCA) is a classical 

statistical approach for reducing dimensionality in 

multivariate analysis [Mardia et al., 1979]. The PCA 

approach has been applied to many recent genetic 

studies, such as haplotype tagging SNP selection 

[Meng et al., 2003; Lin and Altman, 2004] and 

correction for population stratification [Price et al., 

2006]. It is a data-driven approach that allows 

researchers to consider all the SNPs simultaneously, 

which is ideal for inferring Meff for a particular data 

set. For simpleM, we compute eigenvalues from the 

pair-wise SNP correlation matrix created with CLD 

and then derive Meff G using PCA. Each eigenvalue 

can be interpreted as the amount of variance 

explained by the corresponding principal component. 

The eigenvalues, flig, are usually arranged in 

descending order, l1 l2 lM, where M is the 

number of SNPs in the data. Generally, a relatively 

small number of eigenvalues, x, contribute a high 

percentage of the sum of the variances for all of the 

components for correlated data. That is to say that 

only the first x eigenvalues are needed, 

Px i¼1 li= PM i¼1 li4C, where the percentage cutoff, 

C, is determined by the researcher. There are rules of 

thumb for choosing this threshold in PCA [Mardia 

et al., 1979], and in line with these, we propose 

defining x so that the corresponding eigenvalues 

explain 99.5% of the variation for SNP data and 

Meff G ¼ x. It should be noted, however, that too 

large or too small C may cause Meff G to be either 

overly conservative or overly liberal. 

THE simpleM METHOD 

The simpleM method involves four steps: 

Step 1: Derive the CLD correlation matrix from the 

SNP data set. This can be done using the cor() 

function in R (see the Appendix). 

Step 2: Calculate the eigenvalues, for example, by 

the R function eigen(). 

Step 3: Infer Meff G through PCA to estimate the 

effective number of independent tests (see the Meff G 

ESTIMATION section). 

Step 4: Apply the Bonferroni correction formula to 

calculate the adjusted point-wise significance level 

as aG ¼ ae=Meff G. 

Multiple Testing Correction Method 

363 

PERMUTATION TESTS 

All adjusted thresholds were validated by permutation 

tests. Because we wanted to test the validity of 

our algorithm at significance levels of both ae ¼ 0:05 

and 0.01, we performed 100,000 permutations on our 

data sets. In each permutation shuffle, half of the 

individuals were randomly assigned as cases and 

the other half were assigned as controls in the 

balanced data sets we simulated. For each permuted 

case-control sample, Armitage’s trend [Armitage, 

1955; Sasieni, 1997] tested association for each SNP. 

Thus, a total of M test statistics and their corresponding 

P-values were calculated for each permutation 

repeat and the smallest P-value was recorded. 

The smallest P-values were then arranged in 

ascending order and the fifth percentile was the 

permutation-based empirical experiment-wise critical 

value for the overall 0.05 type I error rate. 

Similarly, the first percentile for the threshold of the 

overall 0.01 type I error rate. 

SIMULATION DATA 

Four simulation studies were designed to study 

the performance of simpleM. Our simulation was 

similar to Rinaldo et al. [2005] except that the 

simulated regions were larger. To be more specific, 

we used Wall and Pritchard’s simulation program 

[Wall and Pritchard, 2003], which is a variation of 

Hudson’s MS program [Hudson, 2002], to generate 

recombination cold regions and hot spots/hot regions. 

In simulation 1, eight cold regions (10 kb each) 

were separated by hot spots (1 kb each) giving 

recombination cold regions interlaced with hot 

spots. The mutation rate was y ¼ 4Nem, where Ne 

is the effective population size, set to be 10,000, and 

m is the mutation rate per basepair, per generation, 

set to be 1.4 10 8 . The recombination rate was 

r ¼ 4Ned, where d ¼ 2:5 10 8 is the recombination 

rate per basepair, per generation. Values for m and d 

were chosen because they yield results similar to the 

empirical data in the SeattleSNP database [Rinaldo 

et al., 2005]. The corresponding scaled recombination 

rate for the entire simulated region was the 

product of r and the length of the region in 

basepairs. The per basepair recombination rate in 

hot spots was chosen to be 100 times greater than in 

cold regions. 

In contrast to simulation 1, simulation 2 had four 

cold regions (10 kb each) separated by hot regions 

(15 kb each); thus, cold regions are interlaced with 

hot regions. The recombination rate, d, was chosen to 

be 9 10 8 =bp for the hot regions. Again, the 

population parameters were chosen because they 

generate LD patterns similar to that observed in the 

SeattleSNP database [Rinaldo et al., 2005]. 

For simulations 1 and 2, 100 SNP data sets were 

generated, each with 400 individuals (200 cases vs. 

200 controls, randomly assigned in the permutation 

Genet. Epidemiol.


tests). We used only common SNPs in each data set, 

which had a minor allele frequency 40.10. The 

resulting number of SNPs ranged from approximately 

70 to 140, with about 100 SNPs per simulation 

on average. 

Simulations 3 and 4 were identical to simulations 1 

and 2, except the number of individuals simulated 

increased to 1,000 (500 cases vs. 500 controls, 

randomly assigned in the permutation tests) to 

address how sample size affects simpleM’s performance. 

REAL DATA 

To evaluate simpleM with real data, we used a 

partial SNP data set from an Alzheimer wholegenome 

association project. We randomly chose 

1,723 SNPs spanning a region of 8 Mb on chromosome 

22. Five hundred unrelated unaffected individuals 

were used. The missing values in the data set 

were less than 1% for each SNP and each individual, 

respectively. The minor allele frequency for each 

SNP was 40.05. The total missing value rate for this 

data was 0.065%. 

RESULTS 

SIMULATION RESULTS 

For simulations 1 and 2, the CLD correlation 

matrices were calculated for each simulated SNP 

data set, as well as the eigenvalues, and aL and aG. 

The permutation-based correction threshold, aP, was 

derived using 100,000 random shuffles to serve as 

the true cutoff. The Bonferroni correction, aB, was 

also calculated for comparison purposes. The results 

for the 100 simulations are plotted in Figure 1, (a) 

and (b) for ae ¼ 0:05 and (c) and (d) for ae ¼ 0:01. For 

each simulation data set, the adjusted PWER for 

each method was plotted in separate colors: black, 

red, blue and purple, and marked with different 

letters: B, P, G and L in the figure to denote the 

estimated aB, aP, aG and aL, respectively. The 

number of SNPs in the data sets was sorted and 

arranged in ascending order. Finally, all the points 

(adjusted PWER thresholds) for each method were 

connected to aid visualization. In all plots, the 

Bonferroni correction cutoff is too conservative 

relative to the permutation-based correction threshold. 

aG gives the adjusted PWER closest to the 

permutation-based threshold, almost overlapping it, 

while aL is too liberal. It appears that aL is sensitive 

to whether or not cold regions are interspersed with 

hot spots (Fig. 1(a)), or cold regions are interspersed 

with hot regions (Fig. 1(b)), and similarly for Figure 

1(c) vs. Figure 1(d). 

The adjusted PWERs from simulations 3 and 4 are 

plotted in Figure 2 in the same way the results from 

simulations 1 and 2 were. Our metric, aG, continued 

to be the closest adjustment to aP. The general trends 


between simulation 3 and 1 and between simulation 

4 and 2 are in agreement with each other, which 

indicates that the simpleM method is not sensitive 

to the sample size used in these examples. Again, we 

observed that aL is sensitive to the relationship 

between cold regions interlaced with hot regions 

and cold regions interlaced with hot spots. Both 

Figures 1 and 2 show that aL is sensitive to the 

underlying LD structure. 

The results from simulations 1, 2, 3 and 4 show 

that aG can give a multiple testing adjustment that 

nearly overlaps aP. Furthermore, the simpleM 

method tremendously reduces the computing time 

for each data set compared with the permutation 

method. Calculating aP required over 3 hr per data 

set (1,000 individuals) using 100,000 permutation 

shuffles, whereas calculating aG only required about 

0.1 sec on our desktop computer (Intel Core2 2.4G 

CPU with 2 GB memory), which is at least 100,000 

times faster. Increasing the number of SNPs in the 

data sets makes the differences in speed even more 

dramatic. For example, a 100,000 shuffle permutation 

test on a data set of 1,000 individuals with 1,000 

SNPs takes more than a day to finish, while it takes 

only about a second for the simpleM method to 

derive the adjustment threshold. Assuming the 

computation time for the permutation is proportional 

to the number of SNPs, it will then consume 

over 12 days, 4 months, 20 months and 3 years for 

10K, 100K, 500K and 1M SNP data sets with 1,000 

individuals using 100,000 shuffles on a single PC, 

while the simpleM method could finish any of these 

calculations in less than 1 hr. 

EXPERIMENTAL DATA RESULTS 

We used an Alzheimer whole-genome association 

SNP data set to both validate the simpleM method 

on real data and show how it can be applied to 

whole-genome analysis. In the presence of a large 

number of SNPs, it is challenging to calculate 

eigenvalues efficiently and effectively. Therefore, 

we partitioned the large SNP data set into 13 small 

sets, each with 133 SNPs. Set 1 consisted of SNPs 

1–133, set 2 consisted of SNPs 134–266, and so on, set 

13 consisted of SNPs 1,597–1,723. Because PCA 

requires complete data matrices, we filled the 

missing data in the Alzheimer SNP data using the 

K nearest-neighbor algorithm [Hastie et al., 2001]. 

We then applied the simpleM method to each set 

and got the following series of Meff G: 95 (133), 101 

(133), 92 (133), 90 (133), 91 (133), 92 (133), 68 (133), 71 

(133), 90 (133), 85 (133), 85 (133), 89 (133) and 83 

(127), where the number outside of parenthesis is the 

adjusted effective number of independent SNPs and 

the number within parenthesis is the original 

number of SNPs in the set. Thus, for the entire set 

of 1,723 SNPs, Meff G suggests that there are 1,132 

independent SNPs, while Meff L gives 837. To

compare the quality of our method to the permutation 

critical value, we calculated the permutation test 

threshold with 100,000 shuffles. If we set the 

nominal significance level to be 0.05, the derived 

permutation-based PWER, aP ¼ 4:58 10 5 , the adjusted 

PWER thresholds aG ¼ 0:05=1132 ¼ 

4:42 10 5 and aL ¼ 0:05=837 ¼ 5:97 10 5 , and 

the Bonferroni correction aB ¼ 0:05=1723 ¼ 

2:90 10 5 . In this case aG is very close to aP, while 

the Bonferroni correction is much more conservative 

and Li and Ji’s method is too liberal. If we set the 

nominal significance level to be 0.01, then 

aP ¼ 9:01 10 6 , aG ¼ 0:01=1132 ¼ 8:83 10 6 and 

aL ¼ 0:01=837 ¼ 1:19 10 5 and the Bonferroni correction 

is aB ¼ 0:01=1723 ¼ 5:80 10 6 . Again, aG is 

very close to aP, while aB is too conservative and aL 

is too liberal. 

Because data with large numbers of SNPs need to 

be divided into sets for PCA, we investigated the use 


Fig. 1. Adjusted PWER thresholds comparison for simulation 1 and 2 (400 individuals, 200 cases vs. 200 controls). The adjusted PWER 

thresholds for Bonferroni, permutation, Li and Ji’s approach, and the proposed method are marked by black, red, purple and blue, 

respectively. The data sets are sorted in order of ascending number of SNPs. Data points are connected to aid visualization. (a) 

corresponds to simulation 1 with the EWER equal to 0.05. SNPs were generated as recombination cold regions interlaced with hot spots. 

Four hundred individuals were simulated. (b) corresponds to simulation 2 with EWER 5 0.05. SNPs were generated as recombination 

cold regions interlaced with hot regions. Four hundred individuals were simulated. (c) and (d) are duplicates of (a) and (b), respectively, 

except EWER is equal to 0.01. PWER, point-wise error rate; SNPs, single nucleotide polymorphisms; EWER, experiment-wise error rate. 

365 

of alternative methods for choosing block sizes. The 

simplest method was to define a fixed size for the 

blocks, as seen in the preceding results. We also used 

the Haploview software [Barrett et al., 2005] and 

Gabriel et al.’s definition on haplotype blocks 

[Gabriel et al., 2002] because they take advantage 

of the LD structure among SNPs. Based on the 

haplotype block output from the Haploview software, 

we divided the SNP data into 13 blocks, each 

with about 100–140 SNPs (cutting at block boundaries), 

and then applied our simpleM method to 

each block. This gave us 102 (141), 103 (141), 98 (146), 

84 (119), 94 (140), 86 (128), 65 (133), 57 (110), 96 (142), 

92 (142), 88 (138), 88 (132) and 71 (111), where the 

number outside of parenthesis is the adjusted 

effective number of independent SNPs and within 

parenthesis is the original number of SNPs in the 

block. The sum of the inferred Meff Gs for each block 

was 1,124 for the whole data set, while Meff L gave 

Genet. Epidemiol.


Fig. 2. Adjusted PWER thresholds comparison for simulation 3 and 4 (1,000 individuals, 500 cases vs. 500 controls). The adjusted PWER 

thresholds for Bonferroni, permutation, Li and Ji’s approach, and the proposed method are marked by black, red, purple and blue, 

respectively. The data sets are sorted in order of ascending number of SNPs. Data points are connected to aid visualization. (a) 

corresponds to simulation 3 with the EWER equal to 0.05. SNPs were generated as recombination cold regions interlaced with hot spots. 

One thousand individuals were simulated. (b) corresponds to simulation 4 with EWER 5 0.05. SNPs were generated as recombination 

cold regions interlaced with hot regions. One thousand individuals were simulated. (c) and (d) are duplicates of (a) and (b), respectively, 

except EWER is equal to 0.01. PWER, point-wise error rate; SNPs, single nucleotide polymorphisms; EWER, experiment-wise error rate. 

818. For a nominal significance level of 0.05, the 

adjusted PWER threshold aG is equal to 0:05=1124 ¼ 

4:45 10 5 and aL ¼ 0:05=818 ¼ 6:11 10 5 (compared 

to aP ¼ 4:58 10 5 and, for the fixed length 

blocks, aG ¼ 0:05=1132 ¼ 4:42 10 5 and 

aL ¼ 0:05=837 ¼ 5:97 10 5 ). If we set the nominal 

significance level to be 0.01, then aG ¼ 0:01=1124 ¼ 

8:90 10 6 and aL ¼ 0:01=818 ¼ 1:22 10 5 (compared 

to aP ¼ 9:01 10 6 and, for the fixed length 

blocks, aG ¼ 0:01=1132 ¼ 8:83 10 6 and 

aL ¼ 0:01=837 ¼ 1:19 10 5 ). With variable block 

sizes, the aG improved slightly over that from the 

fixed length partition. This improvement, however, 

is mitigated by the fact that Haploview assumes 

HWE in the estimation of gamete frequencies. 

DISCUSSION 

In our simpleM approach, we use CLD correlation 

instead of LD correlation. The advantages that CLD 


have over LD correlation are that CLD does not 

require HWE and the calculation is simpler and 

faster. The correlation structure among SNPs can be 

derived from their genotypes directly and no 

haplotype frequency estimation is necessary. Cheverud’s 

Meff may not capture the correlation among 

SNPs well [Salyakina et al., 2005; Li and Ji, 2005]. To 

improve the Meff, Nyholt suggested removing all 

SNPs in perfect correlation except one from the data 

set [Nyholt, 2005]. Although this remedy may be 

effective on some small SNP data sets, it shows that 

Cheverud’s Meff does not adjust effectively in many 

situations. In contrast, Meff L showed a significant 

improvement over Cheverud’s Meff C [Li and Ji, 

2005]. In our adjustment comparisons we used 

Cheverud’s Meff, but it did not offer much improvement 

over the Bonferroni correction on our data 

(results not shown). Therefore, we did not include it 

in our comparison. Another multiple testing method 

that we did not consider is the false discovery rate

(FDR) approach. FDR is commonly used in microarray 

data analysis, where studies involve a large 

amount of true alternative hypothesis (genes differently 

expressed). However, in genetic association 

studies, most of the hypotheses are null (SNPs not 

associated with the disease). Moreover, FDR assumes 

that the P-values corresponding to true null 

hypothesis tests are independent and uniformly 

distributed or can be considered as approximately 

independent [Benjamini and Hochberg, 1995; Storey 

et al., 2004], which is likely to be violated when there 

is high LD among SNPs in genetic association 

studies. Therefore, we compared our method only 

to the permutation method which is considered as a 

gold standard in multiple testing correction, Bonferroni 

and Li and Ji’s approach. Among all the 

adjustment methods considered, the simpleM method 

gave the best approximation to the permutationbased 

correction threshold using either the simulated 

or the real data set in the presence of high 

intermarker LD correlations. In the extreme case, if 

the SNPs are nearly independent, there should not 

be much difference in using these adjustment 

methods. 

There are two possible ways to compare different 

multiple testing methods. First, fix the EWER at a 

nominal value and try to find the corresponding 

PWER threshold. Whichever adjustment is closest to 

the permutation-based PWER is considered the best, 

as we did in our comparison (see Figs. 1 and 2). 

Second, given the PWER, derive the corresponding 

EWER and then compare it to the nominal type I 

error rate. These two methods are equivalent, but 

calculated in opposite directions. While PWER is 

useful for determining the threshold for accepting or 

rejecting hypothesis tests, EWER can be useful for 

appreciating the how significant small changes in 

PWER are. For example, with the Alzheimer SNP 

data we calculated aP ¼ 4:58 10 5 , which corresponds 

to the permutation-based EWER of 0.05. We 

then approximate the ‘‘true’’ effective number of 

independent tests with Meff P, using the formula 

Meff P ¼ 0:05=ð4:58 10 5 Þ¼1; 092. From the same 

data set we calculated aG ¼ 4:42 10 5 , aL ¼ 5:97 

10 5 and aB ¼ 2:90 10 5 . We can now derive the 

EWERs by multiplying Meff P by the various values 

for a giving us 0.048, 0.065 and 0.032 for the 

simpleM, Li and Ji’s method and Bonferroni 

approach, respectively. Thus, the small differences 

in the PWERs resulted in rather large changes in 

EWERs. Since the difference between 0.05 and the 

EWER for aG is the smallest of the three methods, we 

conclude that it is the most accurate. 

In analyzing SNP markers, several tests have been 

proposed, such as the w 2 test, the allelic-based test 

and Armitage’s trend test. There are several suggestions 

for which test procedure should be used 

[Sasieni, 1997; Schaid and Jacobsen, 1999; Deng, 

2000; Knapp, 2001; Zou, 2006]. Here, we adopted the 


367 

suggestion by Sasieni [1997] to analyze by genotypes 

and used Armitage’s trend test. Our Meff estimation 

should also apply to the w 2 test based on alleles. 

However, realizing the relationship w2 G ¼ w2a =ð1 þ ^ fÞ, 

where w2 a is the allele-based w2 test statistic, w2 G is the 

trend test statistic and ^ f is the estimated inbreeding 

coefficient [Sasieni, 1997; Zou, 2006], the permutation-based 

correction threshold, aP, may vary with 

the different tests employed slightly because ^ f is 

unlikely to be equal from locus to locus. The aG 

calculated here is only an approximation of the 

permutation-based correction thresholds and not a 

replacement. We should be aware that the precision 

of the permutation-based critical value is associated 

with the number of shuffles used. For more precise 

estimates, a larger number of shuffles should be 

performed. For example, if the permutation critical 

value is set at 0.05, 10,000 shuffles gave a good 

permutation estimate on our data. However, it 

required 100,000 shuffles to get a relatively stable 

permutation estimate for a critical value of 0.01 in 

our tests. 

There may be several limitations for the simpleM 

method. It is not uncommon to have missing data in 

SNP data sets. However, PCA requires complete 

data matrices; otherwise, the CLD correlation matrix 

may not be positive semi-definite. For the Alzheimer 

data, the missing value rate is only 0.065%. Before 

using PCA, we filled the missing values with 

inferred genotypes using the K nearest-neighbor 

method. Missing genotypes can also be filled with 

re-genotyping. With new developments in genotyping 

technology and statistical imputation methods, a 

small amount of missing values is unlikely to hinder 

simpleM’s inference. Another problem with PCA is 

that it becomes inefficient with a large number SNPs 

(41,000). In such situations, the eigenvalue calculation 

is unable to produce enough non-zero eigenvalues 

for either Meff G or Meff L to work well. This is 

not surprising since it was pointed out by Schäfer 

and Strimmer [2005] that a growing number of zero 

eigenvalues will be observed in situations where 

there are a small number of samples and a large 

number of variables, i.e., the usual ‘‘small n and 

large p’’ hurdle. In practice, when using a large 

number of SNPs, the data set has to be divided into 

smaller blocks. We tested two methods for dividing 

data sets: a simple fixed block size and Haploview 

with Gabriel et al.’s definition of haplotype blocks. 

Blocks created with Haploview resulted in a slightly 

better adjusted cutoff than with fixed blocks. This 

improvement, however, is mitigated by the fact that 

Haploview requires HWE to estimate haplotype 

frequencies. 

This study is concerned mainly with single-locus 

association tests. Here, we give some suggestions for 

two popular designs: candidate gene and genomewide 

association studies in human genetics. In 

candidate gene association studies, we can analyze 

Genet. Epidemiol.


the SNPs all together if the eigenvalues can be 

derived. In the situations where the high dimensionality 

prohibits the calculation of eigenvalues, we can 

analyze the SNPs on each chromosome separately or 

according to the gene functions and then sum all of 

the Meff values together. The total Meff can be used to 

calculate the adjusted PWER. In genome-wide 

association studies, we have to partition the SNPs 

into several parts and analyze them separately. Since 

SNPs on different chromosomes are expected to be 

in linkage equilibrium in general populations, the 

genome-wide effective number of independent tests 

can be obtained by summing the chromosome 

specific Meff values. For each chromosome, we may 

use the partition-ligation approach by dividing the 

SNPs into several parts, and then sum the Meff 

values from each partition, similar to how we tested 

our Alzheimer SNP data set. The total Meff is used in 

the final adjustment calculation. Due to the interblock 

correlations that are unlikely to be captured in 

this partition-ligation approach, the total Meff may 

be slightly conservative. However, the interblock 

correlations may be reduced if we partition SNPs 

according to their haplotype block structure. 

In summary, the simpleM algorithm provides a 

highly accurate approximation to the permutationbased 

correction threshold and is easily implemented. 

Itisshowntobesimple,fastandmoreaccuratethan 

recently developed methods and is comparable to the 

permutation-based correction threshold using both 

simulated and real SNP data. The efficiency and 

accuracy of the simpleM method make it an attractive 

choice for multiple testing adjustment when there is 

high intermarker LD in the SNP data set as in 

candidate gene or genome-wide association studies. 

ACKNOWLEDGMENTS 

This work was supported in part by NIH grants 

NS39764, AG019757 and AG20135 and NIEHS T32 

ES007126. We thank Dr. Gary Beecham who prepared 

the Alzheimer data for us. We thank Dr. 

Richard Morris for initial inspiration. 

REFERENCES 

Armitage P. 1955. Tests for linear trends in proportions and 

frequencies. Biometrics 11:375–386. 

Barrett JC, Fry B, Maller J, Daly MJ. 2005. Haploview: analysis and 

visualization of LD and haplotype maps. Bioinformatics 

21:263–265. 

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery 

rate: a practical and powerful approach to multiple testing. J R 

Stat Soc B 57:289–300. 

Bonferroni CE. 1935. Il calcolo delle assicurazioni su gruppi di 

teste, chapter ‘‘Studi in Onore del Professore Salvatore ortu 

Carboni’’. Rome. p 13–60. 

Bonferroni CE. 1936. Teoria statistica delle classi e calcolo delle 

probabilitá. Pubblicazioni del Istituto Superiore di Scienze 

Economiche e Commerciali di Firenze 8:3–62. 


Cheverud JM. 2001. A simple correction for multiple comparisons 

in interval mapping genome scans. Heredity 87:52–58. 

Churchill GA, Doerge RW. 1994. Empirical threshold values for 

quantitative trait mapping. Genetics 138:963–971. 

Deng HW. 2000. Re: ‘‘biased tests of association: comparisons of 

allele frequencies when departing from Hardy-Weinberg 

proportions’’. Am J Epidemiol 151:335–336. 

Excoffier L, Slatkin M. 1995. Maximum-likelihood estimation of 

molecular haplotype frequencies in a diploid population. Mol 

Biol Evol 12:921–927. 

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel 

B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero 

SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly 

MJ, Altshuler D. 2002. The structure of haplotype blocks in the 

human genome. Science 296:2225–2229. 

Hastie T, Tibshirani R, Friedman J. 2001. The Elements of 

Statistical Learning. Berlin: Springer. 

Hoh J, Wille A, Ott J. 2001. Trimming, weighting, and grouping 

SNPs in human case-control association studies. Genome Res 

11:2115–2119. 

Hudson RR. 2002. Generating samples under a Wright-Fisher 

neutral modal of genetic variation. Bioinformatics 18:337–338. 

Knapp M. 2001. Re:‘‘biased tests of association: comparisons of 

allele frequencies when departing from Hardy-Weinberg 

proportions’’. Am J Epidemiol 154:287–288. 

Li J, Ji L. 2005. Adjusting multiple testing in multilocus analyses using 

the eigenvalues of a correlation matrix. Heredity 95:221–227. 

Lin Z, Altman RB. 2004. Finding haplotype tagging SNPs by use of 

principal components analysis. Am J Hum Genet 75:850–861. 

Mardia KV, Kent JT, Bibby JM. 1979. Multivariate Analysis. 

London: Academic Press. 

Meng Z, Zaykin DV, Xu CF, Wagner M, Ehm MG. 2003. Selection 

of genetic markers for association analyses, using linkage 

disequilibrium and haplotypes. Am J Hum Genet 73:115–130. 

Nielsen DM, Ehm MG, Weir BS. 1999. Detecting marker-disease 

association by testing for Hardy-Weinberg disequilibrium at a 

marker locus. Am J Hum Genet 63:1531–1540. 

Nyholt DR. 2004. A simple correction for multiple testing for 

single-nucleotide polymorphisms in linkage disequilibrium 

with each other. Am J Hum Genet 74:765–769. 

Nyholt DR. 2005. Evaluation of Nyholt’s procedure for multiple 

testing correction—author’s reply. Hum Hered 60:61–62. 

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich 

D. 2006. Principal components analysis corrects for stratification in 

genome-wide association studies. Nat Genet 38:904–909. 

Rinaldo A, Bacanu SA, Devlin B, Sonpar V, Wasserman L, Roeder 

K. 2005. Characterization of multilocus linkage disequilibrium. 

Genet Epidemiol 28:193–206. 

Risch N, Merikangas K. 1996. The future of genetic studies of 

complex human diseases. Science 273:1516–1517. 

Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, 

Parl FF, Moore JH. 2001. Multifactor-dimensionality reduction 

reveals high-order interactions among estrogen-metabolism 

genes in sporadic breast cancer. Am J Hum Genet 69:138–147. 

Salyakina D, Seaman SR, Browning BL, Dudbridge F, Muller- 

Myhsok B. 2005. Evaluation of Nyholt’s procedure for multiple 

testing correction. Hum Hered 60:19–25. 

Sasieni PD. 1997. From genotypes to genes: doubling the sample 

size. Biometrics 53:1253–1261. 

Schäfer J, Strimmer K. 2005. A shrinkage approach to large scale 

covariance-matrix estimation and implications for functional 

genomics. Stat Appl Genet Mol Biol 4:32. 

Schaid DJ. 2004. Linkage disequilibrium testing when linkage 

phase is unknown. Genetics 166:505–512.

Schaid DJ, Jacobsen SJ. 1999. Biased tests of association: 

comparisons of allele frequencies when departing from 

Hardy-Weinberg proportions. Am J Epidemiol 149:706–711. 

Storey JD, Taylor JE, Siegmund D. 2004. Strong control, conservative 

point estimation, and simultaneous conservative consistency of 

false discovery rates: A unified approach. J R Stat Soc B 66:187–205. 

Team RDC. 2007. R: A Language and Environment for Statistical 

Computing. Vienna, Austria: R Foundation for Statistical 

Computing, ISBN 3-900051-07-0. 

S ˇ idák Z. 1967. Rectangular confidence regions for the means of 

multivariate normal distributions. J Am Stat Assoc 62:626–633. 

Wall JD, Pritchard JK. 2003. Assessing the performance of the 

haplotype block model of linkage disequilibrium. Am J Hum 

Genet 73:502–515. 

Weir BS. 1979. Inferences about linkage disequilibrium. Biometrics 

35:235–254. 

APPENDIX 

CALCULATING PAIR-WISE CLD 

CORRELATION FOR BIALLELIC SNPS 

We first transform the genotypes into numerical 

coding as 

8 

if the genotype is variant 

0 

>< 

type allele homozygote; 

if the genotype is 

numerical coding ¼ 1 

heterozygote; 

>: 

if the genotype is wild 

2 

type allele homozygote: 

A pair of SNPs, A and B, are represented by two 

vectors, x and y. Denote the number of individuals 

as n and nuv is the number of individuals who carry 

uv genotypes. 

The covariance between x and y is 

X xiyi 

X X 

xi yi 

covðx; yÞ ¼ 1 1 

n n2 ¼ 1 

n ð2nAaBB þ nAaBb þ 4nAABB þ 2nAABbÞ 

1 

n2 ðnAa þ 2nAAÞðnBb þ 2nBBÞ; 

and the CLD coefficient is 

DAB ¼ PAB þ P A=B 

2pApB 

¼ 2P AB 

1 

AB þ PAB Ab þ PAB aB þ 

2 ðPAB ab 

¼ 2nAABB 

n 

þ nAABb 

n 

þ nAaBB 

n 

2 2nAA þ nAa 2nBB þ nBb 

2n 2n 

þ PAb aB Þ 2pApB 

1 nAaBb 

þ 

2 n 


Weir BS. 1996. Genetic Data Analysis, vol. II. Sunderland, MA: 

Sinauer Associates Inc. 

Weir BS, Hill WG, Cardon LR. 2004. Allelic association patterns for 

a dense snp map. Genet Epidemiol 27:442–450. 

Wittke-Thompson JK, Pluzhnikov A, Cox NJ. 2005. Rational 

inferences about departures from Hardy-Weinberg equilibrium. 

Am J Hum Genet 76:967–986. 

Zaykin DV. 2004. Bounds and normalization of the composite 

linkage disequilibrium coefficient. Genet Epidemiol 

27:252–257. 

Zaykin DV, Meng Z, Ehm MG. 2006. Contrasting linkagedisequilibrium 

patterns between cases and controls as a 

novel association-mapping method. Am J Hum Genet 

78:737–746. 

Zou GY. 2006. Statistical methods for the analysis of genetic 

association studies. Ann Hum Genet 70:262–276. 

¼ 1 

2n ð4nAABB þ 2nAABb þ 2nAaBB þ nAaBbÞ 

1 

2n2 ð2nAA þ nAaÞð2nBB þ nBbÞ: 

Therefore, covðx; yÞ ¼2DAB. 

We know that pA ¼ð2nAA þ nAaÞ=2n and 

pB ¼ð2nBB þ nBbÞ=2n, and then the variance of x is 

varðxÞ ¼ 1 X 

P 2 

2 xi 

xi n 

n 

¼ 1 

n ðnAa þ 4nAAÞ 

¼ nAa þ 2nAA 

n 

þ 2nAA 

n 

nAa þ 2nAA 

n 

2 

nAa þ 2nAA 

n 

¼ 2pA þ 2PAA 4p 2 A 

¼ 2½pAð1 pAÞþPAA p 2 AŠ ¼ 2½pAð1 pAÞþDAŠ: 

Similarly, the variance of y is 

varðyÞ ¼2½pBð1 pBÞþDBŠ: 

Therefore, the CLD correlation given by Weir [1979, 

2004], as 

DAB 

rAB ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 

ðpAð1 pAÞþDAÞðpBð1 pBÞþDBÞ 

can be computed from the 0, 1 and 2 genotype 

numerical coding, as correlation 

covðx; yÞ 

rAB ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 

varðxÞvarðyÞ 

The CLD correlation can be calculated simply by 

using the R function cor(). 

2 

369 

Genet. Epidemiol.

Gao X, Starmer J, Martin ER. A multiple testing correction method for ...

Create successful ePaper yourself

Delete template?

Save as template?