11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

J. Fan 5174(a) m=210(b) m=1002502045 0 555 0 5 1010(c) m=5004(d) m=45005200521010 0 10 2044 2 0 2 4FIGURE 43.5Scatter plot of projections of observed data (n =100fromeachclass)ontothefirst two principal components of the m-dimensional selected feature space.43.7.1 A general setupWe now elaborate the idea. Assume that the Big Data are collected in the form(X 1 ,Y 1 ),...,(X n ,Y n ), which can be regarded as a random sample from thepopulation (X,Y). We wish to find an estimate of the sparse vector β 0 ∈ R psuch that it minimizes L(β) = E{L(X ⊤ β,Y )}, inwhichthelossfunctionis assumed convex in the first argument so that L(β) isconvex.Thesetupencompasses the generalized linear models (McCullagh and Nelder, 1989)with L(θ, y) = b(θ) − θy under the canonical link where b(θ) isamodeldependentconvex function, robust regression with L(θ, y) =|y − θ|, thehingeloss L(θ, y) =(1− θy) + in the support vector machine (Vapnik, 1999) andexponential loss L(θ, y) =exp(−θy) in AdaBoost (Freund and Schapire, 1997;Breiman, 1998) in classification in which y takes values ±1, among others. LetL n (β) = 1 nn∑L(X ⊤ i β,Y i )i=1be the empirical loss and L ′ n(β) beitsgradient.GiventhatL ′ (β 0 ) = 0, anatural confidence set is of formC n = {β ∈ R p : ‖L ′ n(β)‖ ∞ ≤ γ n }

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!