11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

406 Buried treasuresmeasurement technology and on different tissues. The findings lead to a newhypothesis about how HPV+/− tumors differentially deregulate the cell-cycleprocesses during tumorigenesis as well as to biomarkers for HPV−associatedcancers (Pyeon et al., 2011). Figure 36.1 shows a summary of gene-level differentialexpression scores between HPV+ and HPV− cancers (so-called logfold changes), for all genes in the genome (left), as well as for m = 99 genesfrom a cell-cycle regulatory pathway.A key statistical issue in this case was how to standardize a sample variancestatistic. The gene-level data were first reduced to the log-scale foldchange between HPV+ and HPV− cell types; these x g ,forgenesg, werethenconsidered fixed in subsequent calculations. For a known functional categoryc ⊆{1,...,G} of size m, thestatisticu(x, c) measured the sample varianceof the x g ’s within c. This statistic was standardized by imagining the distributionof u(x, C), for random sets C, consideredtobedrawnuniformlyfromamong all ( Gm)possible size-m subsets of the genome. Well forgetting aboutall the genomics, the statistical question concerned the distribution of thesample variance in without-replacement finite-population sampling; in particular,I needed an expected value and variance of u(x, C) underthissampling.Not being especially well versed in the findings of finite-population sampling,I approached these moment questions from first principles and with a novice’svigor, figuring that something simple was bound to emerge. I did not makemuch progress on the variance of u(x, C), but was delighted to discover a beautifulsolution in Tukey (1950, p. 517), which had been developed far from thecontext of genomics and which was not widely cited. Tukey’s buried treasureused so-called K functions, which are set-level statistics whose expected valueequals the same statistic computed on the whole population. SubsequentlyIlearnedthatearlierR.A.Fisherhadalsoderivedthisvariance;seealsoChoet al. (2005). In any case, I was glad to have gained some insight from Tukey’sgeneral framework.36.1.2 Bootstrapping and rank statisticsResearchers were actively probing the limits of bootstrap theory when I beganmy statistics career. A case of interest concerned generalized bootstrap means.From a real-valued random sample X 1 ,...,X n , one studied the conditionaldistribution of the randomized statistic¯X W n= 1 nn∑W n,i X i ,i=1conditional on the data X i , and where the random weights W n,i were generatedby the statistician to enable the conditional distribution of ¯XW n toapproximate the marginal sampling distribution of ¯X n .Efron’sbootstrapcorrespondsto weights having a certain multinomial distribution, but indicationswere that useful approximations were available for beyond the multinomial.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!