11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

P.J. Bickel 65We eventually developed methods which were state-of-the-art in the fieldbut the main lesson we drew was that just using patch marginal distributions(a procedure sometimes known as naive Bayes) worked better than trying toestimate joint distributions of patch pixels.The texture problem was too complex to analyze further, so we turned toasimplerprobleminwhichexplicitasymptoticcomparisonscouldbemade:classifying a new p-dimensional multivariate observation into one of two unknownGaussian populations with equal covariance matrices, on the basis of asample of n observations from each of the two populations (Bickel and Levina,2004). In this context we compared the performance of(i) Fisher’s linear discriminant function(ii) Naive Bayes: Replace the empirical covariance matrix in Fisher’s functionby the diagonal matrix of estimated variances, and proceed as usual.We found that if the means and covariance matrices range over a sparselyapproximable set and we let p increase with n, so that p/n →∞,thenFisher’srule (using the Moore–Penrose inverse) performed no better than randomguessing while naive Bayes performed well, though not optimally, as long asn −1 log p → 0.The reason for this behavior was that, with Fisher’s rule, we were unnecessarilytrying to estimate too many covariances. These results led us — Levinaand I, with coworkers (Bickel and Levina, 2008) — to study a number of methodsfor estimating covariance matrices optimally under sparse approximationassumptions. Others, such as Cai et al. (2010), established minimax boundson possible performance.At the same time as this work there was a sharp rise of activity in trying tounderstand sparsity in the linear model with many predictors, and a numberof important generalizations of the lasso were proposed and studied, suchas the group lasso and the elastic net. I had — despite appearing as firstauthor — at most a supporting part in this endeavor on a paper with Ritovand Tsybakov (Bickel et al., 2009) in which we showed the equivalence of aprocedure introduced by Candès and Tao, the Danzig selector, with the morefamiliar lasso.Throughout this period I was (and continue to be) interested in semiparametricmodels and methods. An example I was pleased to work on with mythen student, Aiyou Chen, was Independent Component Analysis, a methodologyarising in electrical engineering, which had some clear advantages overclassical PCA (Chen and Bickel, 2006). Reconciling ICA and an extensionwith sparsity and high dimension is a challenge I’m addressing with anotherstudent.A more startling and important analysis is one that is joint with Bin Yu,several students, and Noureddine El Karoui (Bean et al., 2013; El Karouiet al., 2013), whose result appears in PNAS. We essentially studied robustregression when p/n → c for some c ∈ (0, 1), and showed that, contrary to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!