11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

43Features of Big Data and sparsest solutionin high confidence setJianqing FanDepartment of Operations Research and Financial EngineeringPrinceton University, Princeton, NJThis chapter summarizes some of the unique features of Big Data analysis.These features are shared neither by low-dimensional data nor by small samples.Big Data pose new computational challenges and hold great promises forunderstanding population heterogeneity as in personalized medicine or services.High dimensionality introduces spurious correlations, incidental endogeneity,noise accumulation, and measurement error. These unique features arevery distinguished and statistical procedures should be designed with theseissues in mind. To illustrate, a method called a sparsest solution in highconfidenceset is introduced which is generally applicable to high-dimensionalstatistical inference. This method, whose properties are briefly examined, isnatural as the information about parameters contained in the data is summarizedby high-confident sets and the sparsest solution is a way to deal withthe noise accumulation issue.43.1 IntroductionThe first decade of this century has seen the explosion of data collection inthis age of information and technology. The technological revolution has madeinformation acquisition easy and cheap through automated data collectionprocesses. Massive data and high dimensionality characterize many contemporarystatistical problems from biomedical sciences to engineering and socialsciences. For example, in disease classification using microarray or proteomicsdata, tens of thousands of expressions of molecules or proteins are potentialpredictors; in genome-wide association studies, hundreds of thousands of SNPsare potential covariates; in machine learning, tens of thousands of featuresare extracted from documents, images and other objects; in spatial-temporal507

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!