11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

490 Ah-ha momentdefines a Euclidean distance measure. Advice: think about how you measuredistance or dissimilarity in any problem involving pairwise relationships, itcan be important.41.2 Regularization methods, RKHS and sparse modelsThe optimization problems in RKHS are a rich subclass of what can be calledregularization methods, which solve an optimization problem which tradesfit to the data versus complexity or constraints on the solution. My first encounterwith the term “regularization” was Tikhonov (1963) in the contextof finding numerical solutions to integral equations. There the L i of (41.2)were noisy integrals of an unknown function one wishes to reconstruct, butthe observations only contained a limited amount of information regardingthe unknown function. The basic and possibly revolutionary idea at the timewas to find a solution which involves fit to the data while constraining thesolution by what amounted to an RKHS seminorm, ( ∫ {f ′′ (t)} 2 dt) standinginfor the missing information by an assumption that the solution was “smooth”(O’Sullivan, 1986; Wahba, 1977). Where once RKHS were a niche subject,they are now a major component of the statistical model building/machinelearning literature.However, RKHS do not generally provide sparse models, that is, modelswhere a large number of coefficients are being estimated but only a small butunknown number are believed to be non-zero. Many problems in the “BigData” paradigm are believed to have, or want to have sparse solutions, forexample, genetic data vectors that may have many thousands of componentsand a modest number of subjects, as in a case-control study. The most popularmethod for ensuring sparsity is probably the lasso (Chen et al., 1998; Tibshirani,1996). Here a very large dictionary of basis functions (B j (t), j =1, 2,...)is given and the unknown function is estimated as f(t) = ∑ j β jB j (t) withthe penalty functional λ ∑ j |β j| replacing an RKHS square norm. This willinduce many zeroes in the β j , depending, among other things on the size ofλ. Since then, researchers have commented that there is a “zoo” of proposedvariants of sparsity-inducing penalties, many involving assumptions on structuresin the data; one popular example is Yuan and Lin (2006). Other recentmodels involve mixtures of RKHS and sparsity-inducing penalty functionals.One of our contributions to this “zoo” deals with the situation where the datavectors amount to very large “bar codes,” and it is desired to find patternsin the bar codes relevant to some outcome. An innovative algorithm whichdeals with a humongous number of interacting patterns assuming that onlya small number of coefficients are non-zero is given in Shi et al. (2012), Shiet al. (2008) and Wright (2012).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!