11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

M. van der Laan 477should also be concerned with data generated by single experiments that havealotofstructure.ThisrequiresthedevelopmentofTMLEtosuchstatisticalmodels, integration of the state of the art in weak convergence theory for dependentdata, and advances in computing due to the additional complexityof estimators and statistical inference in such data generating experiments.We refer to a few recent examples of targeted learning in adaptive designs,and to estimate effects of interventions on a single network of individuals; seevan der Laan (2008), Chambaz and van der Laan (2010, 2011a,b), van derLaan (2012), and van der Laan et al. (2012).40.6.3 Big DataTargeted learning involves super-learning, complex targeted update steps,evaluation of an often complex estimand, and estimation of the asymptoticvariance of the estimator. In addition, since the estimation is tailored to eachquestion separately, for example, the assessment of the effect of a variable(such as the effect of a DNA-mutation on a phenotype) across a large collectionof variables requires many times repeating these computer intensiveestimation procedures. Even for normal size data sets, such data analyses canalready be computationally very challenging.However, nowadays, many applications contain gigantic data sets. For example,one might collect complete genomic profiles on each individual, sothat one collects hundreds of thousands or even millions of measurements onone individual, possibly at various time points. In addition, there are variousinitiatives in building large comprehensive data bases, such as the sentinelproject which builds a data base for all American citizens which is used toevaluate safety issues for drugs. Such data sets cover hundreds of millions ofindividuals. Many companies are involved in analyzing data on the internet,which can result in data sets with billions of records.40.7 Concluding remarksThe biggest mistake we can make is to give up on sound statistics, and besatisfied with the application of algorithms that can handle these data sets inone way or another, without addressing a well defined statistical estimationproblem. As we have seen, the genomic era has resulted in an erosion of soundstatistics, and as a counterforce many advocate to only apply very simplestatistics such as sample means, and univariate regressions. Neither approachis satisfying, and fortunately, it is not needed to give up on sound and complexstatistical estimation procedures targeting interesting questions of interest.Instead, we need to more fully integrate with the computer science, trainour students in software that can handle these immense computational and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!