Nonnegativity Constraints in Numerical Analysis - CiteSeer

More documents

Recommendations

Info

$Math 733: Vector Fields, Differential Forms, and Cohomology$

Furthermore, the product T (m×np) (Z ⊙Y ) may be computed efficiently if T is sparse by notforming the Khatri-Rao product (Z ⊙Y ). Thus, computing X essentially reduces to severalmatrix inner products, tensor-matrix multiplication of Y and Z into T , and inverting anR × R matrix.Analogous least squares steps may be used to update Y and Z. Following is a summaryof the complete NTF algorithm.ALS algorithm for NTF:1. Group x i ’s, y i ’s and z i ’s as columns in X ∈ R m×r+ , Y ∈ R n×r+ and Z ∈ R p×r+ respectively.2. Initialize X, Y .(a) Nonnegative Matrix Factorization of the mean slice,where A is the mean of T across the 3 rd dimension.3. Iterative Tri-Alternating Minimizationmin ||A − XY || 2 F . (25)(a) Fix T , X, Y and fit Z by solving a NMF problem in an alternating fashion.(b) Fix T , X, Z, fit for Y ,(c) Fix T , Y , Z, fit for X.(T (m×np) C) iρX iρ ← X iρ , C = (Z ⊙ Y ) (26)(XC T C) iρ + ɛ(T (m×np) C) jρY jρ ← Y jρ , C = (Z ⊙ X) (27)(Y C T C) jρ + ɛ(T (m×np) C) kρZ kρ ← Z kρ , C = (Y ⊙ X) (28)(ZC T C) kρ + ɛHere ɛ is a small number like 10 −9 that adds stability to the calculation and guardsagainst introducing a negative number from numerical underflow.If T is sparse a simpler computation in the procedure above can be obtained. Eachmatricized version of T is a sparse matrix. The matrix C from each step should not be formedexplicitly because it would be a large, dense matrix. Instead, the product of a matricized Twith C should be computed specially, exploiting the inherent Kronecker product structurein C so that only the required elements in C need to be computed and multiplied with thenonzero elements of T .18
5 Some Applications of Nonnegativity Constraints5.1 Support Vector MachinesSupport Vector machines were introduced by Vapnik and co-workers [13, 23] theoreticallymotivated by Vapnik-Chervonenkis theory (also known as VC theory [87, 88]). Support vectormachines (SVMs) are a set of related supervised learning methods used for classificationand regression. They belong to a family of generalized linear classifiers. They are based onthe following idea: input points are mapped to a high dimensional feature space, where aseparating hyperplane can be found. The algorithm is chosen in such a way to maximize thedistance from the closest patterns, a quantity that is called the margin. This is achieved byreducing the problem to a quadratic programming problem,F (v) = 1 2 vT Av + b T v, v ≥ 0. (29)Here we assume that the matrix A is symmetric and semipositive definite. The problem(29) is then usually solved with optimization routines from numerical libraries. SVMs havea proven impressive performance on a number of real world problems such as optical characterrecognition and face detection.We briefly review the problem of computing the maximum margin hyperplane in SVMs[87]. Let {(x i , y i )} N i = 1} denote labeled examples with binary class labels y i = ±1, and letK(x i , x j ) denote the kernel dot product between inputs. For brevity, we consider only thesimple case where in the high dimensional feature space, the classes are linearly separableand the hyperplane is required to pass through the origin. In this case, the maximum marginhyperplane is obtained by minimizing the loss function:L(α) = − ∑ iα i + 1 ∑α i α j y i y j K(x i , x j ), (30)2subject to the nonnegativity constraints α i ≥ 0. Let α ∗ denote the minimum of equation(30). The maximal margin hyperplane has normal vector w = ∑ i α∗ i y i x i and satisfies themargin constraints y i K(w, x i ) ≥ 1 for all examples in the training set.The loss function in equation (30) is a special case of the non-negative quadratic programming(29) with A ij = y i y j K(x i , x j ) and b i = −1. Thus, the multiplicative updates inthe paper [79] are easily adapted to SVMs. This algorithm for training SVMs is known asMultiplicative Margin Maximization (M 3 ). The algorithm can be generalized to data thatis not linearly separable and to separating hyper-planes that do not pass through the origin.Many iterative algorithms have been developed for nonnegative quadratic programmingin general and for SVMs as a special case. Benchmarking experiments have shown thatM 3 is a feasible algorithm for small to moderately sized data sets. On the other hand, itdoes not converge as fast as leading subset methods for large data sets. Nevertheless, theextreme simplicity and convergence guarantees of M 3 make it a useful starting point forexperimenting with SVMs.ij19
Page 1: Nonnegativity Constraints in Numeri
Page 5: NNLS Problem:Given a matrix A ∈ R
Page 8 and 9: However, when applied in a straight
Page 10 and 11: the active set method fnnls. It als
Page 12 and 13: Block principal pivoting algorithm:
Page 14 and 15: 4.1 Nonnegative Matrix Factorizatio
Page 16 and 17: algorithms can be very fast. The im
Page 20 and 21: 5.2 Image Processing and Computer V
Page 22 and 23: 5.4 Environmetrics and Chemometrics
Page 24 and 25: Figure 3: Artist rendition of a JSA
Page 26 and 27: References[1] S. Bellavia, M. Macco
Page 28 and 29: [29] V. Franc, V. Hlavč, and M. Na
Page 30 and 31: [60] J. G. Nagy, Z. Strakoš, Enfor
Page 32: [89] R. S. Varga, Matrix Iterative

Nonnegativity Constraints in Numerical Analysis - CiteSeer

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?