Views
3 years ago

1 Projects undertaken - Welcome to the Department of Statistics ...

1 Projects undertaken - Welcome to the Department of Statistics ...

1 PROJECTS UNDERTAKEN

1 PROJECTS UNDERTAKEN 2The goal is to find β from knowledge of the received string Y , the dictionary X and the set B. Themain difference between this and the traditional high-dimensional regression model, analyzed extensively instatistical literature, is the added knowledge of the set B of choices of β.In particular, we arrange each β in B to be a sparse vector consisting of L non-zeroes with ‖β‖ 2 2 = P , whereP represents the signal strength. Here ‖.‖ 2 is the l 2 -norm. Details of our construction of the set B of choicesof β are described in Figure 1(a).Section 1 Section 2 .... Section LM columns M columns M columnsX =. . .β = (.... √ P 1............ ............... √ P 2... .... ............... √ P L...)0.0 0.2 0.4 0.6 0.8 1.0●g(x)●●●●●●●●●●●●● ●B = 2 16 , L = BP σ 2 = 7R = 0.495CNo. of steps = 160.0 0.2 0.4 0.6 0.8 1.0x(a)(b)Figure 1: (a) Schematic rendering of the dictionary matrix X n×N and coefficient vector β. We assume N = LM and partition the X matrixinto L sections, each of size M. The set B of choices for β is then assumed to consist of all vectors with one non-zero, of known value√ Pl , in section l, for l = 1, . . . , L. The quantities P l ’s are free to be chosen by the user, subject to the constraint P 1 +. . .+P L = P ,where P represents the signal strength.The vertical bars in the X matrix indicate the selected columns from a section.(b) Plot demonstrating progression of the algorithm.The dots indicate the proportion of correct detections after a particularnumber of steps. For example, the y-axis co-ordinate of the first dot (from the left) represents the proportion of correct detectionsafter the first step.The above can also be viewed as a problem of multiple hypothesis testing. Indeed, given Y , one has to selectthe correct β (namely β ∗ ) among the elements in B. There are information-theoretic limits, which giveslower bounds on the sample size n, as a function of sparsity L, dimension N and signal-to-noise ratio P/σ 2 ,for which it possible to reliably distinguish between these various hypotheses. In particular, assume,n = L log(N/L)Rfor some R > 0, where R is also known as the communication Rate.Defining C = 1 2 log(1+P/σ2 ), the goal is to have a scheme for the recovery of β ∗ for all R < C. The demandsof communication also require that the probability of failure be exponentially small in n.We remark that the possibility of recovery for R > C is ruled out by a converse theorem by Shannon (seeCover and Thomas [2008]). Analogous converses for signal recovery in the regression setup, for example byWainwright [2009a], Akçakaya and Tarokh [2010], are also relevant.(1)

1 PROJECTS UNDERTAKEN 3In Barron and Joseph [2010b], we analyze the performance of the optimum maximum likelihood (minimumdistance) estimator, which searches over all β ∈ B, choosing the β which minimizes ‖Y − Xβ‖ 2 .demonstrate that for n as in (1), one can recover the support of β ∗ with high probability, for any R < C.When N is large, which is typically the case, the maximum likelihood decoder is computationally infeasible.Accordingly, in Barron and Joseph [2010a], we propose an iterative thresholding algorithm for the estimationof the support of the coefficient vector. The performance can be summarized as follows:Theorem 1. (Barron and Joseph [2010a]) For any R < C andn = L log(N/L) ,Rit is possible to recover the support of β ∗ , with error probability that is exponentially small in L and computationalcomplexity of order nN.WeTaking X to have i.i.d standard Gaussian entries, a key feature of our analysis is that we are able tocharacterize the distribution of the statistics involved in successive iterates. As a results of this, we showthat there is a function g : [0, 1] → [0, 1] which characterizes, with high probability, the proportion of correctdetections after any step. This is shown in Figure 1(b).Performance of the Orthogonal Matching Pursuit (OMP) for variable selection with randomdesigns. The Orthogonal Matching Pursuit (Pati et al. [1993], Mallat and Zhang [1993]) is an iterative algorithmfor signal recovery. Unlike penalized procedures, the regularization in this algorithm appears throughthe stopping criterion. In Joseph [2011], I analyze the orthogonal matching pursuit for the general problemof sparse recovery in high dimensions with random designs. For random designs, since the performance ismeasured after averaging over the distribution of the design matrix, one can ensure support recovery withfar less stringent constraints on sparsity compared to the case with deterministic X, as analyzed for theOMP by Cai and Wang [2010] and Zhang [2009a]. The stopping criterion I used, which was motivatedby my work on the communication problem, was different from what is traditionally used in literature.For correlated Gaussian designs and exact sparse vectors, I show that the support recovery is similar toknown results on the Lasso algorithm by Wainwright [2009b]. Moreover, variable selection under a morerelaxed assumption on sparsity, whereby one has only control on the l 1 norm of the smaller coefficients, isalso addressed. In particular, I was able to demonstrate recovery of coefficients with minimum magnitude≈ √ log p/n, where p represents the dimension and n the sample size, even under this more general notionof sparsity. As a consequence of these results, if ˆβ is the estimate obtained after running the algorithm, itis shown that‖ ˆβ − β‖ 2 ≤ Cp∑j=1(min βj 2 , σ 2 2 log p ),nwith high probability. Such oracle inequalities have been shown for the Lasso by Zhang [2009b] and theDantzig selector by Candes and Tao [2007].

JB515 DHC STATISTICS REPORT(CH-C):Layout 1 - Department of ...
Project Orion Final Report - Department of Statistics - Stanford ...
Theory of Statistics - Welcome to the Department of Statistics
Orthogonal Matching Pursuit with Replacement - Statistics
8 - Welcome to the Department of Statistics
Preprint - Welcome to the Department of Statistics - Yale University
Preprint - Welcome to the Department of Statistics - Yale University
Major Research Projects undertaken during 2012-13 - CCRAS
Curriculum Vitae - Welcome to the Department of Statistics - Yale ...
Teaching Statement - Welcome to the Department of Statistics
1 Introduction - Department of Statistical Science
1 MB - Department of Mathematics & Statistics
Part 1 - Department of Mathematics and Statistics
The R Project - Department of Statistics - The University of Auckland
Review 1 - Department of Mathematics & Statistics
Solutions to Project 1, STA 108 - Statistics
HONOURS PROJECT - The Department of Statistics and Applied ...
slides - Welcome to the Department of Statistics - Yale University
Course book-1 - Department of Mathematics and Statistics
Reference 1 - Department of Statistical and Actuarial Sciences ...
Probability and Mathematical Statistics II - Department of Statistics
Research Statement - Welcome to the Department of Statistics - Yale ...
Honours Project Guideline - The Department of Statistics and ...
Course Notes - Department of Mathematics and Statistics
Lecture 1 - Department of Statistical Science - Duke University
Tutorial 1 - The Department of Statistics and Applied Probability, NUS
Part 1 - The Department of Statistics and Applied Probability, NUS
honours project in mathematics - The Department of Statistics and ...