11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

530 Rise of the machines44.4.2 Case study II: Statistical topologyComputational topologists and researchers in Machine Learning have developedmethods for analyzing the shape of functions and data. Here I’ll brieflyreview some of our work on estimating manifolds (Genovese et al., 2012b,a,c).Suppose that M is a manifold of dimension d embedded in R D . LetX 1 ,...,X n be a sample from a distribution in P supported on M. We observeY i = X i + ɛ i , i ∈{1,...,n} (44.4)where ɛ 1 ,...,ɛ n ∼ Φ are noise variables.Machine learning researchers have derived many methods for estimatingthe manifold M. But this leaves open an important statistical question: howwell do these estimators work? One approach to answering this question is tofind the minimax risk under some loss function. Let ̂M be an estimator of M.AnaturallossfunctionforthisproblemisHausdorffloss:H(M, ̂M){=inf ɛ : M ⊂ ̂M ⊕ ɛ and ̂M}⊂ M ⊕ ɛ . (44.5)Let P be a set of distributions. The parameter of interest is M =support(P )whichweassumeisad-dimensional manifold. The minimax riskisR n =infsupE P [H(̂M,M)]. (44.6)̂M P ∈POf course, the risk depends on what conditions we assume on M and on thenoise Φ.Our main findings are as follows. When there is no noise — so the datafall on the manifold — we get R n ≍ n −2/d .Whenthenoiseisperpendicularto M, theriskisR n ≍ n −2/(2+d) .WhenthenoiseisGaussiantherateisR n ≍ 1/ log n. Thelatterisnotsurprisingwhenoneconsidersthesimilarproblem of estimating a function when there are errors in variables.The implications for machine learning are that, the best their algorithmscan do is highly dependent on the particulars of the type of noise.How do we actually estimate these manifolds in practice? In Genovese et al.(2012c) we take the following point of view: If the noise is not too large, thenthe manifold should be close to a d-dimensional hyper-ridge in the densityp(y) for Y . Ridge finding is an extension of mode finding, which is a commontask in computer vision.Let p be a density on R D .Supposethatp has k modes m 1 ,...,m k . Anintegral curve, or path of steepest ascent, is a path π : R → R D such thatπ ′ (t) = d π(t) =∇p{π(t)}. (44.7)dtUnder weak conditions, the paths π partition the space and are disjoint exceptat the modes (Irwin, 1980; Chacón, 2012).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!