v2007.11.26 - Convex Optimization

DATTORROCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMεβοο

DattorroCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMeboo

Convex Optimization&Euclidean Distance GeometryJon DattorroMεβoo Publishing

Meboo Publishing USAPO Box 12Palo Alto, California 94302Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo, 2005.ISBN 0976401304 (English)ISBN 9781847280640 (International)Version 2007.11.26 available in print as conceived in color.cybersearch:I. convex optimizationII. semidefinite programmingIII. rank constraintIV. convex geometryV. distance matrixVI. convex conesprograms by Matlabtypesetting bywith donations from SIAM and AMS.This searchable electronic color pdfBook is click-navigable within the text bypage, section, subsection, chapter, theorem, example, definition, cross reference,citation, equation, figure, table, and hyperlink. A pdfBook has no electronic copyprotection and can be read and printed by most computers. The publisher herebygrants the right to reproduce this work in any format but limited to personal use.2005 Jon DattorroAll rights reserved.

for Jennie Columba♦Antonio♦♦& Sze Wan

EDM = S h ∩ ( S ⊥ c − S +)

PreludeThe constant demands of my department and university and theever increasing work needed to obtain funding have stolen much ofmy precious thinking time, and I sometimes yearn for the halcyondays of Bell Labs.−Steven Chu, Nobel laureate [57]Convex Analysis is the calculus of inequalities while Convex Optimizationis its application. Analysis is inherently the domain of the mathematicianwhile Optimization belongs to the engineer.2005 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.7

There is a great race under way to determine which important problemscan be posed in a convex setting. Yet, that skill acquired by understandingthe geometry and application of Convex Optimization will remain more anart for some time to come; the reason being, there is generally no uniquetransformation of a given problem to its convex equivalent. This means, tworesearchers pondering the same problem are likely to formulate the convexequivalent differently; hence, one solution is likely different from the otherfor the same problem. Any presumption of only one right or correct solutionbecomes nebulous. Study of equivalence, sameness, and uniqueness thereforepervade study of Optimization.Tremendous benefit accrues when an optimization problem can betransformed to its convex equivalent, primarily because any locally optimalsolution is then guaranteed globally optimal. Solving a nonlinear system,for example, by instead solving an equivalent convex optimization problemis therefore highly preferable. 0.1 Yet it can be difficult for the engineer toapply theory without an understanding of Analysis.These pages comprise my journal over a seven year period bridginggaps between engineer and mathematician; they constitute a translation,unification, and cohering of about two hundred papers, books, and reportsfrom several different fields of mathematics and engineering. Beacons ofhistorical accomplishment are cited throughout. Much of what is written herewill not be found elsewhere. Care to detail, clarity, accuracy, consistency,and typography accompanies removal of ambiguity and verbosity out ofrespect for the reader. Consequently there is much cross-referencing andbackground material provided in the text, footnotes, and appendices so asto be self-contained and to provide understanding of fundamental concepts.−Jon DattorroStanford, California20070.1 That is what motivates a convex optimization known as geometric programming[46, p.188] [45] which has driven great advances in the electronic circuit design industry.[27,4.7] [182] [293] [296] [67] [130] [138] [139] [140] [141] [142] [143] [195] [196] [204]8

Convex Optimization&Euclidean Distance Geometry1 Overview 192 Convex geometry 332.1 Convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.2 Vectorized-matrix inner product . . . . . . . . . . . . . . . . . 452.3 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.4 Halfspace, Hyperplane . . . . . . . . . . . . . . . . . . . . . . 592.5 Subspace representations . . . . . . . . . . . . . . . . . . . . . 732.6 Extreme, Exposed . . . . . . . . . . . . . . . . . . . . . . . . . 762.7 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.8 Cone boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 902.9 Positive semidefinite (PSD) cone . . . . . . . . . . . . . . . . . 972.10 Conic independence (c.i.) . . . . . . . . . . . . . . . . . . . . . 1202.11 When extreme means exposed . . . . . . . . . . . . . . . . . . 1262.12 Convex polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . 1262.13 Dual cone & generalized inequality . . . . . . . . . . . . . . . 1343 Geometry of convex functions 1833.1 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . 1843.2 Matrix-valued convex function . . . . . . . . . . . . . . . . . . 2163.3 Quasiconvex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2213.4 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 2249

10 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY4 Semidefinite programming 2254.1 Conic problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2264.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2344.3 Rank reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2474.4 Rank-constrained semidefinite program . . . . . . . . . . . . . 2564.5 Constraining cardinality . . . . . . . . . . . . . . . . . . . . . 2734.6 Cardinality and rank constraint examples . . . . . . . . . . . . 2824.7 Convex Iteration rank-1 . . . . . . . . . . . . . . . . . . . . . 3005 Euclidean Distance Matrix 3055.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3065.2 First metric properties . . . . . . . . . . . . . . . . . . . . . . 3075.3 ∃ fifth Euclidean metric property . . . . . . . . . . . . . . . . 3075.4 EDM definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3125.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3425.6 Injectivity of D & unique reconstruction . . . . . . . . . . . . 3455.7 Embedding in affine hull . . . . . . . . . . . . . . . . . . . . . 3515.8 Euclidean metric versus matrix criteria . . . . . . . . . . . . . 3565.9 Bridge: Convex polyhedra to EDMs . . . . . . . . . . . . . . . 3645.10 EDM-entry composition . . . . . . . . . . . . . . . . . . . . . 3725.11 EDM indefiniteness . . . . . . . . . . . . . . . . . . . . . . . . 3745.12 List reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 3825.13 Reconstruction examples . . . . . . . . . . . . . . . . . . . . . 3865.14 Fifth property of Euclidean metric . . . . . . . . . . . . . . . 3936 Cone of distance matrices 4036.1 Defining EDM cone . . . . . . . . . . . . . . . . . . . . . . . . 4056.2√Polyhedral bounds . . . . . . . . . . . . . . . . . . . . . . . . 4076.3 EDM cone is not convex . . . . . . . . . . . . . . . . . . . . 4096.4 a geometry of completion . . . . . . . . . . . . . . . . . . . . . 4116.5 EDM definition in 11 T . . . . . . . . . . . . . . . . . . . . . . 4166.6 Correspondence to PSD cone S N−1+ . . . . . . . . . . . . . . . 4256.7 Vectorization & projection interpretation . . . . . . . . . . . . 4306.8 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 4356.9 Theorem of the alternative . . . . . . . . . . . . . . . . . . . . 4496.10 postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 117 Proximity problems 4537.1 First prevalent problem: . . . . . . . . . . . . . . . . . . . . . 4617.2 Second prevalent problem: . . . . . . . . . . . . . . . . . . . . 4727.3 Third prevalent problem: . . . . . . . . . . . . . . . . . . . . . 4837.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493A Linear algebra 495A.1 Main-diagonal δ operator, λ , trace, vec . . . . . . . . . . . . 495A.2 Semidefiniteness: domain of test . . . . . . . . . . . . . . . . . 499A.3 Proper statements . . . . . . . . . . . . . . . . . . . . . . . . . 502A.4 Schur complement . . . . . . . . . . . . . . . . . . . . . . . . 514A.5 eigen decomposition . . . . . . . . . . . . . . . . . . . . . . . . 518A.6 Singular value decomposition, SVD . . . . . . . . . . . . . . . 521A.7 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526B Simple matrices 533B.1 Rank-one matrix (dyad) . . . . . . . . . . . . . . . . . . . . . 534B.2 Doublet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539B.3 Elementary matrix . . . . . . . . . . . . . . . . . . . . . . . . 540B.4 Auxiliary V -matrices . . . . . . . . . . . . . . . . . . . . . . . 542B.5 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . 547C Some analytical optimal results 551C.1 properties of infima . . . . . . . . . . . . . . . . . . . . . . . . 551C.2 diagonal, trace, singular and eigen values . . . . . . . . . . . . 552C.3 Orthogonal Procrustes problem . . . . . . . . . . . . . . . . . 558C.4 Two-sided orthogonal Procrustes . . . . . . . . . . . . . . . . 560D Matrix calculus 565D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 565D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 586E Projection 595E.1 Idempotent matrices . . . . . . . . . . . . . . . . . . . . . . . 598E.2 I − P , Projection on algebraic complement . . . . . . . . . . . 603E.3 Symmetric idempotent matrices . . . . . . . . . . . . . . . . . 604E.4 Algebra of projection on affine subsets . . . . . . . . . . . . . 610E.5 Projection examples . . . . . . . . . . . . . . . . . . . . . . . 610

12 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYE.6 Vectorization interpretation, . . . . . . . . . . . . . . . . . . . 617E.7 on vectorized matrices of higher rank . . . . . . . . . . . . . . 624E.8 Range/Rowspace interpretation . . . . . . . . . . . . . . . . . 628E.9 Projection on convex set . . . . . . . . . . . . . . . . . . . . . 628E.10 Alternating projection . . . . . . . . . . . . . . . . . . . . . . 642F Matlab programs 661F.1 isedm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661F.2 conic independence, conici() . . . . . . . . . . . . . . . . . . 667F.3 Map of the USA . . . . . . . . . . . . . . . . . . . . . . . . . . 670F.4 Rank reduction subroutine, RRf() . . . . . . . . . . . . . . . . 675F.5 Sturm’s procedure . . . . . . . . . . . . . . . . . . . . . . . . 679F.6 Convex Iteration demonstration . . . . . . . . . . . . . . . . . 681F.7 fast max cut . . . . . . . . . . . . . . . . . . . . . . . . . . 684F.8 Signal dropout problem . . . . . . . . . . . . . . . . . . . . . . 686G Notation and a few definitions 689Bibliography 705Index 735

List of Figures1 Overview 191 Orion nebula . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Application of trilateration is localization . . . . . . . . . . . . 213 Molecular conformation . . . . . . . . . . . . . . . . . . . . . . 224 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Swiss roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 USA map reconstruction . . . . . . . . . . . . . . . . . . . . . 267 Honeycomb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Robotic vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Reconstruction of David . . . . . . . . . . . . . . . . . . . . . 3210 David by distance geometry . . . . . . . . . . . . . . . . . . . 322 Convex geometry 3311 Slab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3512 Intersection of line with boundary . . . . . . . . . . . . . . . . 3813 Tangentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4114 Convex hull of a random list of points . . . . . . . . . . . . . . 5315 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5516 Two Fantopes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5817 A simplicial cone . . . . . . . . . . . . . . . . . . . . . . . . . 6018 Hyperplane illustrated ∂H is a line partially bounding . . . . 6119 Hyperplanes in R 2 . . . . . . . . . . . . . . . . . . . . . . . . 6420 Affine independence . . . . . . . . . . . . . . . . . . . . . . . . 6721 {z ∈ C | a T z = κ i } . . . . . . . . . . . . . . . . . . . . . . . . . 6822 Hyperplane supporting closed set . . . . . . . . . . . . . . . . 6923 Closed convex set illustrating exposed and extreme points . . 7813

14 LIST OF FIGURES24 Two-dimensional nonconvex cone . . . . . . . . . . . . . . . . 8225 Nonconvex cone made from lines . . . . . . . . . . . . . . . . 8226 Nonconvex cone is convex cone boundary . . . . . . . . . . . . 8327 Cone exterior is convex cone . . . . . . . . . . . . . . . . . . . 8328 Truncated nonconvex cone X . . . . . . . . . . . . . . . . . . 8429 Not a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8530 Minimum element. Minimal element. . . . . . . . . . . . . . . 8831 K is a pointed polyhedral cone having empty interior . . . . . 9232 Exposed and extreme directions . . . . . . . . . . . . . . . . . 9633 Positive semidefinite cone . . . . . . . . . . . . . . . . . . . . 9934 Convex Schur-form set . . . . . . . . . . . . . . . . . . . . . . 10135 Projection of the PSD cone . . . . . . . . . . . . . . . . . . . 10436 Circular cone showing axis of revolution . . . . . . . . . . . . 10937 Circular section . . . . . . . . . . . . . . . . . . . . . . . . . . 11138 Polyhedral inscription . . . . . . . . . . . . . . . . . . . . . . 11239 Conically (in)dependent vectors . . . . . . . . . . . . . . . . . 12140 Pointed six-faceted polyhedral cone and its dual . . . . . . . . 12341 Minimal set of generators for halfspace about origin . . . . . . 12542 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13043 Two views of a simplicial cone and its dual . . . . . . . . . . . 13244 Two equivalent constructions of dual cone . . . . . . . . . . . 13645 Dual polyhedral cone construction by right angle . . . . . . . 13746 K is a halfspace about the origin . . . . . . . . . . . . . . . . 13847 Iconic primal and dual objective functions. . . . . . . . . . . . 13948 Orthogonal cones . . . . . . . . . . . . . . . . . . . . . . . . . 14449 Blades K and K ∗ . . . . . . . . . . . . . . . . . . . . . . . . . 14550 Shrouded polyhedral cone . . . . . . . . . . . . . . . . . . . . 15551 Simplicial cone K in R 2 and its dual . . . . . . . . . . . . . . . 16052 Monotone nonnegative cone K M+ and its dual . . . . . . . . . 17053 Monotone cone K M and its dual . . . . . . . . . . . . . . . . . 17254 Two views of monotone cone K M and its dual . . . . . . . . . 17355 First-order optimality condition . . . . . . . . . . . . . . . . . 1763 Geometry of convex functions 18356 Convex functions having unique minimizer . . . . . . . . . . . 18557 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 19458 Supremum of affine functions . . . . . . . . . . . . . . . . . . 196

LIST OF FIGURES 1559 Epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19660 Gradient in R 2 evaluated on grid . . . . . . . . . . . . . . . . 20461 Quadratic function convexity in terms of its gradient . . . . . 20962 Plausible contour plot . . . . . . . . . . . . . . . . . . . . . . 21363 Iconic quasiconvex function . . . . . . . . . . . . . . . . . . . 2224 Semidefinite programming 22564 Visualizing positive semidefinite cone in high dimension . . . . 22965 2-lattice of sensors and anchors for localization example . . . . 26266 3-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26367 4-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26468 5-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26569 ellipsoids of orientation and eccentricity . . . . . . . . . . . . . 26670 a 2-lattice solution for localization . . . . . . . . . . . . . . . . 27171 a 3-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 27172 a 4-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 27273 a 5-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 27274 Signal dropout . . . . . . . . . . . . . . . . . . . . . . . . . . 27875 Signal dropout reconstruction . . . . . . . . . . . . . . . . . . 27976 max cut problem . . . . . . . . . . . . . . . . . . . . . . . . 29077 MIT logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29678 One-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . 2975 Euclidean Distance Matrix 30579 Convex hull of three points . . . . . . . . . . . . . . . . . . . . 30680 Complete dimensionless EDM graph . . . . . . . . . . . . . . 30981 Fifth Euclidean metric property . . . . . . . . . . . . . . . . . 31182 Arbitrary hexagon in R 3 . . . . . . . . . . . . . . . . . . . . . 32083 Kissing number . . . . . . . . . . . . . . . . . . . . . . . . . . 32284 Trilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32685 This EDM graph provides unique isometric reconstruction . . 33086 Two sensors • and three anchors ◦ . . . . . . . . . . . . . . . 33087 Two discrete linear trajectories of sensors . . . . . . . . . . . . 33188 Coverage in cellular telephone network . . . . . . . . . . . . . 33489 Contours of equal signal power . . . . . . . . . . . . . . . . . . 33490 Depiction of molecular conformation . . . . . . . . . . . . . . 33691 Orthogonal complements in S N abstractly oriented . . . . . . . 346

16 LIST OF FIGURES92 Elliptope E 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36593 Elliptope E 2 interior to S 2 + . . . . . . . . . . . . . . . . . . . . 36694 Smallest eigenvalue of −V T N DV N . . . . . . . . . . . . . . . . 37195 Some entrywise EDM compositions . . . . . . . . . . . . . . . 37196 Map of United States of America . . . . . . . . . . . . . . . . 38597 Largest ten eigenvalues of −V T N OV N . . . . . . . . . . . . . . 38898 Relative-angle inequality tetrahedron . . . . . . . . . . . . . . 39599 Nonsimplicial pyramid in R 3 . . . . . . . . . . . . . . . . . . . 3996 Cone of distance matrices 403100 Relative boundary of cone of Euclidean distance matrices . . . 406101 Intersection of EDM cone with hyperplane . . . . . . . . . . . 408102 Neighborhood graph . . . . . . . . . . . . . . . . . . . . . . . 410103 Trefoil knot untied . . . . . . . . . . . . . . . . . . . . . . . . 413104 Trefoil ribbon . . . . . . . . . . . . . . . . . . . . . . . . . . . 415105 Example of V X selection to make an EDM . . . . . . . . . . . 418106 Vector V X spirals . . . . . . . . . . . . . . . . . . . . . . . . . 420107 Three views of translated negated elliptope . . . . . . . . . . . 428108 Halfline T on PSD cone boundary . . . . . . . . . . . . . . . . 432109 Vectorization and projection interpretation example . . . . . . 433110 Orthogonal complement of geometric center subspace . . . . . 438111 EDM cone construction by flipping PSD cone . . . . . . . . . 439112 Decomposing a member of polar EDM cone . . . . . . . . . . 444113 Ordinary dual EDM cone projected on S 3 h . . . . . . . . . . . 4507 Proximity problems 453114 Pseudo-Venn diagram . . . . . . . . . . . . . . . . . . . . . . 456115 Elbow placed in path of projection . . . . . . . . . . . . . . . 457116 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . 477A Linear algebra 495117 Geometrical interpretation of full SVD . . . . . . . . . . . . . 524B Simple matrices 533118 Four fundamental subspaces of any dyad . . . . . . . . . . . . 535119 Four fundamental subspaces of a doublet . . . . . . . . . . . . 539

LIST OF FIGURES 17120 Four fundamental subspaces of elementary matrix . . . . . . . 540121 Gimbal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548D Matrix calculus 565122 Convex quadratic bowl in R 2 × R . . . . . . . . . . . . . . . . 576E Projection 595123 Nonorthogonal projection of x∈ R 3 on R(U)= R 2 . . . . . . . 601124 Biorthogonal expansion of point x∈aff K . . . . . . . . . . . . 612125 Dual interpretation of projection on convex set . . . . . . . . . 631126 Projection product on convex set in subspace . . . . . . . . . 640127 von Neumann-style projection of point b . . . . . . . . . . . . 643128 Alternating projection on two halfspaces . . . . . . . . . . . . 644129 Distance, optimization, feasibility . . . . . . . . . . . . . . . . 646130 Alternating projection on nonnegative orthant and hyperplane 649131 Geometric convergence of iterates in norm . . . . . . . . . . . 649132 Distance between PSD cone and iterate in A . . . . . . . . . . 654133 Dykstra’s alternating projection algorithm . . . . . . . . . . . 655134 Polyhedral normal cones . . . . . . . . . . . . . . . . . . . . . 657135 Normal cone to elliptope . . . . . . . . . . . . . . . . . . . . . 658

List of Tables2 Convex geometryTable 2.9.2.3.1, rank versus dimension of S 3 + faces 106Table 2.10.0.0.1, maximum number of c.i. directions 121Cone Table 1 167Cone Table S 168Cone Table A 169Cone Table 1* 1744 Semidefinite programmingfaces of S 3 + correspond to faces of S 3 + 2315 Euclidean Distance Matrixaffine dimension in terms of rank Précis 5.7.2 354B Simple matricesAuxiliary V -matrix Table B.4.4 546D Matrix calculusTable D.2.1, algebraic gradients and derivatives 587Table D.2.2, trace Kronecker gradients 588Table D.2.3, trace gradients and derivatives 589Table D.2.4, log determinant gradients and derivatives 591Table D.2.5, determinant gradients and derivatives 592Table D.2.6, logarithmic derivatives 593Table D.2.7, exponential gradients and derivatives 5932001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.18

Chapter 1OverviewConvex OptimizationEuclidean Distance GeometryPeople are so afraid of convex analysis.−Claude Lemaréchal, 2003In layman’s terms, the mathematical science of Optimization is the studyof how to make a good choice when confronted with conflicting requirements.The qualifier convex means: when an optimal solution is found, then it isguaranteed to be a best solution; there is no better choice.Any convex optimization problem has geometric interpretation. If a givenoptimization problem can be transformed to a convex equivalent, then thisinterpretive benefit is acquired. That is a powerful attraction: the ability tovisualize geometry of an optimization problem. Conversely, recent advancesin geometry and in graph theory hold convex optimization within their proofs’core. [304] [242]This book is about convex optimization, convex geometry (withparticular attention to distance geometry), and nonconvex, combinatorial,and geometrical problems that can be relaxed or transformed into convexproblems. A virtual flood of new applications follow by epiphany that manyproblems, presumed nonconvex, can be so transformed. [8] [9] [44] [63] [100][102] [206] [223] [231] [273] [274] [301] [304] [27,4.3, p.316-322]2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.19

20 CHAPTER 1. OVERVIEWFigure 1: Orion nebula. (astrophotography by Massimo Robberto)Euclidean distance geometry is, fundamentally, a determination of pointconformation (configuration, relative position or location) by inference frominterpoint distance information. By inference we mean: e.g., given onlydistance information, determine whether there corresponds a realizableconformation of points; a list of points in some dimension that attains thegiven interpoint distances. Each point may represent simply location or,abstractly, any entity expressible as a vector in finite-dimensional Euclideanspace; e.g., distance geometry of music [72].It is a common misconception to presume that some desired pointconformation cannot be recovered in the absence of complete interpointdistance information. We might, for example, want to realize a constellationgiven only interstellar distance (or, equivalently, distance from Earth andrelative angular measurement; the Earth as vertex to two stars). At first itmay seem O(N 2 ) data is required, yet there are many circumstances wherethis can be reduced to O(N).If we agree that a set of points can have a shape (three points can forma triangle and its interior, for example, four points a tetrahedron), then

ˇx 4ˇx 3ˇx 2Figure 2: Application of trilateration (5.4.2.2.4) is localization (determiningposition) of a radio signal source in 2 dimensions; more commonly known byradio engineers as the process “triangulation”. In this scenario, anchorsˇx 2 , ˇx 3 , ˇx 4 are illustrated as fixed antennae. [154] The radio signal source(a sensor • x 1 ) anywhere in affine hull of three antenna bases can beuniquely localized by measuring distance to each (dashed white arrowed linesegments). Ambiguity of lone distance measurement to sensor is representedby circle about each antenna. So & Ye proved trilateration is expressible asa semidefinite program; hence, a convex optimization problem. [241]we can ascribe shape of a set of points to their convex hull. It should beapparent: from distance, these shapes can be determined only to within arigid transformation (rotation, reflection, translation).Absolute position information is generally lost, given only distanceinformation, but we can determine the smallest possible dimension in whichan unknown list of points can exist; that attribute is their affine dimension(a triangle in any ambient space has affine dimension 2, for example). Incircumstances where fixed points of reference are also provided, it becomespossible to determine absolute position or location; e.g., Figure 2.21

22 CHAPTER 1. OVERVIEWFigure 3: [137] [80] Distance data collected via nuclear magnetic resonance(NMR) helped render this 3-dimensional depiction of a protein molecule.At the beginning of the 1980s, Kurt Wüthrich [Nobel laureate], developedan idea about how NMR could be extended to cover biological moleculessuch as proteins. He invented a systematic method of pairing each NMRsignal with the right hydrogen nucleus (proton) in the macromolecule. Themethod is called sequential assignment and is today a cornerstone of all NMRstructural investigations. He also showed how it was subsequently possible todetermine pairwise distances between a large number of hydrogen nuclei anduse this information with a mathematical method based on distance-geometryto calculate a three-dimensional structure for the molecule. [209] [288] [132]Geometric problems involving distance between points can sometimes bereduced to convex optimization problems. Mathematics of this combinedstudy of geometry and optimization is rich and deep. Its application hasalready proven invaluable discerning organic molecular conformation bymeasuring interatomic distance along covalent bonds; e.g., Figure 3. [60][267] [132] [288] [95] Many disciplines have already benefitted and simplifiedconsequent to this theory; e.g., distance-based pattern recognition (Figure 4),localization in wireless sensor networks by measurement of intersensordistance along channels of communication, [35,5] [299] [33] wireless locationof a radio-signal source such as a cell phone by multiple measurements ofsignal strength, the global positioning system (GPS), and multidimensionalscaling which is a numerical representation of qualitative data by finding alow-dimensional scale.

23Figure 4: This coarsely discretized triangulated algorithmically flattenedhuman face (made by Kimmel & the Bronsteins [166]) represents a stagein machine recognition of human identity; called face recognition. Distancegeometry is applied to determine discriminating-features.Euclidean distance geometry together with convex optimization have alsofound application in artificial intelligence:to machine learning by discerning naturally occurring manifolds in:– Euclidean bodies (Figure 5,6.4.0.0.1),– Fourier spectra of kindred utterances [156],– and image sequences [283],to robotics; e.g., automatic control of vehicles maneuvering information. (Figure 8)by chapterWe study the pervasive convex Euclidean bodies and their variousrepresentations. In particular, we make convex polyhedra, cones, and dualcones more visceral through illustration in chapter 2, Convex geometry,and we study the geometric relation of polyhedral cones to nonorthogonal

24 CHAPTER 1. OVERVIEWFigure 5: Swiss roll from Weinberger & Saul [283]. The problem of manifoldlearning, illustrated for N =800 data points sampled from a “Swiss roll” 1Ç.A discretized manifold is revealed by connecting each data point and its k=6nearest neighbors 2Ç. An unsupervised learning algorithm unfolds the Swissroll while preserving the local geometry of nearby data points 3Ç. Finally, thedata points are projected onto the two dimensional subspace that maximizestheir variance, yielding a faithful embedding of the original manifold 4Ç.

ases (biorthogonal expansion). We explain conversion between halfspaceandvertex-description of a convex cone, we motivate the dual cone andprovide formulae for finding it, and we show how first-order optimalityconditions or alternative systems of linear inequality or linear matrixinequality [44] can be explained by generalized inequalities with respect toconvex cones and their duals. The conic analogue to linear independence,called conic independence, is introduced as a new tool in the study of cones;the logical next step in the progression: linear, affine, conic.Any convex optimization problem can be visualized geometrically.Desire to visualize in high dimension is deeply embedded in themathematical psyche. Chapter 2 provides tools to make visualization easier,and we teach how to visualize in high dimension. The concepts of face,extreme point, and extreme direction of a convex Euclidean body areexplained here; crucial to understanding convex optimization. The convexcone of positive semidefinite matrices, in particular, is studied in depth:We interpret, for example, inverse image of the positive semidefinitecone under affine transformation. (Example 2.9.1.0.2)Subsets of the positive semidefinite cone, discriminated by rankexceeding some lower bound, are convex. In other words, high-ranksubsets of the positive semidefinite cone boundary united with itsinterior are convex. (Theorem 2.9.2.6.3) There is a closed form forprojection on those convex subsets.The positive semidefinite cone is a circular cone in low dimension,while Geršgorin discs specify inscription of a polyhedral cone into thatpositive semidefinite cone. (Figure 38)Chapter 3, Geometry of convex functions, observes analogiesbetween convex sets and functions: We explain, for example, how thereal affine function relates to convex functions as the hyperplane relatesto convex sets. Partly a cookbook for the most useful of convex functionsand optimization problems, methods are drawn from matrix calculus fordetermining convexity and discerning geometry.Semidefinite programming is reviewed in chapter 4 with particularattention to optimality conditions for prototypical primal and dual programs,their interplay, and the perturbation method of rank reduction of optimalsolutions (extant but not well-known). Positive definite Farkas’ lemma25

26 CHAPTER 1. OVERVIEWoriginalreconstructionFigure 6: About five thousand points along the borders constituting theUnited States were used to create an exhaustive matrix of interpoint distancefor each and every pair of points in the ordered set (a list); called aEuclidean distance matrix. From that noiseless distance information, it iseasy to reconstruct the map exactly via the Schoenberg criterion (753).(5.13.1.0.1, confer Figure 96) Map reconstruction is exact (to within a rigidtransformation) given any number of interpoint distances; the greater thenumber of distances, the greater the detail (just as it is for all conventionalmap preparation).is derived, and we also show how to determine if a feasible set belongsexclusively to a positive semidefinite cone boundary. A three-dimensionalpolyhedral analogue for the positive semidefinite cone of 3 ×3 symmetricmatrices is introduced. This analogue is a new tool for visualizing coexistenceof low- and high-rank optimal solutions in 6 dimensions. We find aminimum-cardinality Boolean solution to an instance of Ax = b :minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(577)The sensor-network localization problem is solved in any dimension in thischapter. We introduce the method of convex iteration for constraining rankin the form rankG ≤ ρ for some semidefinite programs in G .

27Figure 7: These bees construct a honeycomb by solving a convex optimizationproblem. (5.4.2.2.3) The most dense packing of identical spheres about acentral sphere in two dimensions is 6. Packed sphere centers describe aregular lattice.The EDM is studied in chapter 5, Euclidean distance matrix, itsproperties and relationship to both positive semidefinite and Gram matrices.We relate the EDM to the four classical properties of the Euclidean metric;thereby, observing existence of an infinity of properties of the Euclideanmetric beyond the triangle inequality. We proceed by deriving the fifthEuclidean metric property and then explain why furthering this endeavor isinefficient because the ensuing criteria (while describing polyhedra in angleor area, volume, content, and so on ad infinitum) grow linearly in complexityand number with problem size.Reconstruction methods are explained and applied to a map of theUnited States; e.g., Figure 6. We also generate a distorted but recognizableisotonic map using only comparative distance information (only ordinaldistance data). We demonstrate an elegant method for including dihedral(or torsion) angle constraints into a molecular conformation problem. Moregeometrical problems solvable via EDMs are presented with the best methodsfor posing them, EDM problems are posed as convex optimizations, andwe show how to recover exact relative position given incomplete noiselessinterpoint distance information.The set of all Euclidean distance matrices forms a pointed closed convexcone called the EDM cone, EDM N . We offer a new proof of Schoenberg’sseminal characterization of EDMs:{−VTD ∈ EDM NN DV N ≽ 0⇔(753)D ∈ S N hOur proof relies on fundamental geometry; assuming, any EDM mustcorrespond to a list of points contained in some polyhedron (possibly at

28 CHAPTER 1. OVERVIEWFigure 8: Robotic vehicles in concert can move larger objects, guard aperimeter, perform surveillance, find land mines, or localize a plume of gas,liquid, or radio waves. [94]its vertices) and vice versa. It is known, but not obvious, this Schoenbergcriterion implies nonnegativity of the EDM entries; proved here.We characterize the eigenvalue spectrum of an EDM, and then devisea polyhedral spectral cone for determining membership of a given matrix(in Cayley-Menger form) to the convex cone of Euclidean distance matrices;id est, a matrix is an EDM if and only if its nonincreasingly ordered vectorof eigenvalues belongs to a polyhedral spectral cone for EDM N ;D ∈ EDM N⇔⎧ ([ ])⎪⎨ 0 1Tλ∈1 −D⎪⎩D ∈ S N h[ ]RN+∩ ∂HR −(958)We will see: spectral cones are not unique.In chapter 6, EDM cone, we explain the geometric relationshipbetween the cone of Euclidean distance matrices, two positive semidefinitecones, and the elliptope. We illustrate geometric requirements, in particular,for projection of a given matrix on a positive semidefinite cone that establish

its membership to the EDM cone. The faces of the EDM cone are described,but still open is the question whether all its faces are exposed as theyare for the positive semidefinite cone. The Schoenberg criterion (753),relating the EDM cone and a positive semidefinite cone, is revealed tobe a discretized membership relation (dual generalized inequalities, a newFarkas’-like lemma) between the EDM cone and its ordinary dual, EDM N∗ .A matrix criterion for membership to the dual EDM cone is derived that issimpler than the Schoenberg criterion:D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1106)We derive a new concise equality of the EDM cone to two subspaces and apositive semidefinite cone;EDM N = S N h ∩ ( )S N⊥c − S N + (1100)In chapter 7, Proximity problems, we explore methods of solutionto a few fundamental and prevalent Euclidean distance matrix proximityproblems; the problem of finding that distance matrix closest, in some sense,to a given matrix H :29minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN(1142)We apply the new convex iteration method for constraining rank. Knownheuristics for solving the problems when compounded with rank minimizationare also explained. We offer a new geometrical proof of a famous resultdiscovered by Eckart & Young in 1936 [85] regarding Euclidean projectionof a point on that generally nonconvex subset of the positive semidefinitecone boundary comprising all positive semidefinite matrices having ranknot exceeding a prescribed bound ρ . We explain how this problem istransformed to a convex optimization for any rank ρ .

30 CHAPTER 1. OVERVIEWnovelty - Convex Optimization & Euclidean Distance Geometryp.120 Conic independence is introduced as a natural extension to linear andaffine independence; a new tool in convex analysis most useful formanipulation of cones.p.148 Arcane theorems of alternative generalized inequality are simply derivedfrom cone membership relations; generalizations of Farkas’ lemmatranslated to the geometry of convex cones.p.229 We present an arguably good 3-dimensional polyhedral analogue, tothe isomorphically 6-dimensional positive semidefinite cone, as an aidto understand semidefinite programming.p.256, p.273p.321, p.325We show how to constrain rank in the form rankG ≤ ρ and cardinalityin the form cardx ≤ k . We show how to transform a rank constrainedproblem to a rank-1 problem.Kissing-number of sphere packing (first solved by Isaac Newton)and trilateration or localization are shown to be convex optimizationproblems.p.336 We show how to elegantly include torsion or dihedral angle constraintsinto the molecular conformation problem.p.367 Geometrical proof: Schoenberg criterion for a Euclidean distancematrix.p.385 We experimentally demonstrate a conjecture of Borg & Groenen byreconstructing a map of the USA using only ordinal (comparative)distance information.p.6, p.403There is an equality, relating the convex cone of Euclidean distancematrices to the positive semidefinite cone, apparently overlooked inthe literature; an equality between two large convex Euclidean bodies.p.447 The Schoenberg criterion for a Euclidean distance matrix is revealed tobe a discretized membership relation (or dual generalized inequalities)between the EDM cone and its dual.

31appendicesProvided so as to be more self-contained:linear algebra (appendix A is primarily concerned with properstatements of semidefiniteness for square matrices),simple matrices (dyad, doublet, elementary, Householder, Schoenberg,orthogonal, etcetera, in appendix B),a collection of known analytical solutions to some importantoptimization problems (appendix C),matrix calculus remains somewhat unsystematized when comparedto ordinary calculus (appendix D concerns matrix-valued functions,matrix differentiation and directional derivatives, Taylor series, andtables of first- and second-order gradients and matrix derivatives),an elaborate exposition offering insight into orthogonal andnonorthogonal projection on convex sets (the connection betweenprojection and positive semidefiniteness, for example, or betweenprojection and a linear objective function in appendix E),software in appendix F to discriminate EDMs, to determine conicindependence, to reduce or constrain rank of an optimal solution toa semidefinite program, and two distinct methods of reconstructing amap of the United States: one given only distance information, theother given only relative distance data (greater or less than).

32 CHAPTER 1. OVERVIEWFigure 9: Three-dimensional reconstruction of David from distance data.Figure 10: Digital Michelangelo Project, Stanford University. Measuringdistance to David by laser rangefinder. Spatial resolution is 0.29mm.

Chapter 2Convex geometryConvexity has an immensely rich structure and numerousapplications. On the other hand, almost every “convex” idea canbe explained by a two-dimensional picture.−Alexander Barvinok [20, p.vii]As convex geometry and linear algebra are inextricably bonded, we providemuch background material on linear algebra (especially in the appendices)although a reader is assumed comfortable with [251] [253] [150] or any otherintermediate-level text. The essential references to convex analysis are [148][232]. The reader is referred to [249] [20] [282] [30] [46] [229] [270] for acomprehensive treatment of convexity. There is relatively less publishedpertaining to matrix-valued convex functions. [158] [151,6.6] [220]2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.33

34 CHAPTER 2. CONVEX GEOMETRY2.1 Convex setA set C is convex iff for all Y, Z ∈ C and 0≤µ≤1µ Y + (1 − µ)Z ∈ C (1)Under that defining condition on µ , the linear sum in (1) is called a convexcombination of Y and Z . If Y and Z are points in Euclidean real vectorspace R n or R m×n (matrices), then (1) represents the closed line segmentjoining them. All line segments are thereby convex sets. Apparent from thedefinition, a convex set is a connected set. [190,3.4,3.5] [30, p.2]An ellipsoid centered at x = a (Figure 12, p.38), given matrix C ∈ R m×n{x∈ R n | ‖C(x − a)‖ 2 = (x − a) T C T C(x − a) ≤ 1} (2)is a good icon for a convex set.2.1.1 subspaceA nonempty subset of Euclidean real vector space R n is called a subspace(formally defined in2.5) if every vector 2.1 of the form αx + βy , forα,β∈R , is in the subset whenever vectors x and y are. [183,2.3] Asubspace is a convex set containing the origin 0, by definition. [232, p.4]Any subspace is therefore open in the sense that it contains no boundary,but closed in the sense [190,2]It is not difficult to showR n + R n = R n (3)R n = −R n (4)as is true for any subspace R , because x∈ R n ⇔ −x∈ R n .The intersection of an arbitrary collection of subspaces remains asubspace. Any subspace not constituting the entire ambient vector space R nis a proper subspace; e.g., 2.2 any line through the origin in two-dimensionalEuclidean space R 2 . The vector space R n is itself a conventional subspace,inclusively, [167,2.1] although not proper.2.1 A vector is assumed, throughout, to be a column vector.2.2 We substitute the abbreviation e.g. in place of the Latin exempli gratia.

2.1. CONVEX SET 35Figure 11: A slab is a convex Euclidean body infinite in extent butnot affine. Illustrated in R 2 , it may be constructed by intersectingtwo opposing halfspaces whose bounding hyperplanes are parallel but notcoincident. Because number of halfspaces used in its construction is finite,slab is a polyhedron. (Cartesian axes and vector inward-normal to eachhalfspace-boundary are drawn for reference.)2.1.2 linear independenceArbitrary given vectors in Euclidean space {Γ i ∈ R n , i=1... N} are linearlyindependent (l.i.) if and only if, for all ζ ∈ R NΓ 1 ζ 1 + · · · + Γ N−1 ζ N−1 + Γ N ζ N = 0 (5)has only the trivial solution ζ = 0 ; in other words, iff no vector from thegiven set can be expressed as a linear combination of those remaining.2.1.2.1 preservationLinear independence can be preserved under linear transformation. Givenmatrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N , consider the mappingT(Γ) : R n×N → R n×N ∆ = ΓY (6)whose domain is the set of all matrices Γ∈ R n×N holding a linearlyindependent set columnar. Linear independence of {Γy i ∈ R n , i=1... N}demands, by definition, there exists no nontrivial solution ζ ∈ R N toΓy 1 ζ i + · · · + Γy N−1 ζ N−1 + Γy N ζ N = 0 (7)By factoring Γ , we see that is ensured by linear independence of {y i ∈ R N }.

36 CHAPTER 2. CONVEX GEOMETRY2.1.3 Orthant:name given to a closed convex set that is the higher-dimensionalgeneralization of quadrant from the classical Cartesian partition of R 2 .The most common is the nonnegative orthant R n + or R n×n+ (analogueto quadrant I) to which membership denotes nonnegative vector- ormatrix-entries respectively; e.g.,R n +∆= {x∈ R n | x i ≥ 0 ∀i} (8)The nonpositive orthant R n − or R n×n− (analogue to quadrant III) denotesnegative and 0 entries. Orthant convexity 2.3 is easily verified bydefinition (1).2.1.4 affine setA nonempty affine set (from the word affinity) is any subset of R n that is atranslation of some subspace. An affine set is convex and open so contains noboundary: e.g., ∅ , point, line, plane, hyperplane (2.4.2), subspace, etcetera.For some parallel 2.4 subspace M and any point x∈AA is affine ⇔ A = x + M= {y | y − x∈M}(9)The intersection of an arbitrary collection of affine sets remains affine. Theaffine hull of a set C ⊆ R n (2.3.1) is the smallest affine set containing it.2.1.5 dimensionDimension of an arbitrary set S is the dimension of its affine hull; [282, p.14]dim S ∆ = dim aff S = dim aff(S − s) , s∈ S (10)the same as dimension of the subspace parallel to that affine set aff S whennonempty. Hence dimension (of a set) is synonymous with affine dimension.[148, A.2.1]2.3 All orthants are self-dual simplicial cones. (2.13.5.1,2.12.3.1.1)2.4 Two affine sets are said to be parallel when one is a translation of the other. [232, p.4]

2.1. CONVEX SET 372.1.6 empty set versus empty interiorEmptiness ∅ of a set is handled differently than interior in the classicalliterature. It is common for a nonempty convex set to have empty interior;e.g., paper in the real world. Thus the term relative is the conventional fixto this ambiguous terminology: 2.5An ordinary flat sheet of paper is an example of a nonempty convexset in R 3 having empty interior but relatively nonempty interior.2.1.6.1 relative interiorWe distinguish interior from relative interior throughout. [249] [282] [270]The relative interior rel int C of a convex set C ⊆ R n is its interior relativeto its affine hull. 2.6 Thus defined, it is common (though confusing) for int Cthe interior of C to be empty while its relative interior is not: this happenswhenever dimension of its affine hull is less than dimension of the ambientspace (dim aff C < n , e.g., were C a flat piece of paper in R 3 ) or in theexception when C is a single point; [190,2.2.1]rel int{x} ∆ = aff{x} = {x} , int{x} = ∅ , x∈ R n (11)In any case, closure of the relative interior of a convex set C always yieldsthe closure of the set itself;rel int C = C (12)If C is convex then rel int C and C are convex, [148, p.24] and it is alwayspossible to pass to a smaller ambient Euclidean space where a nonempty setacquires an interior. [20,II.2.3].Given the intersection of convex set C with an affine set Arel int(C ∩ A) = rel int(C) ∩ A (13)If C has nonempty interior, then rel int C = int C .2.5 Superfluous mingling of terms as in relatively nonempty set would be an unfortunateconsequence. From the opposite perspective, some authors use the term full orfull-dimensional to describe a set having nonempty interior.2.6 Likewise for relative boundary (2.6.1.3), although relative closure is superfluous.[148,A.2.1]

38 CHAPTER 2. CONVEX GEOMETRY(a)R(b)R 2(c)R 3Figure 12: (a) Ellipsoid in R is a line segment whose boundary comprises twopoints. Intersection of line with ellipsoid in R , (b) in R 2 , (c) in R 3 . Eachellipsoid illustrated has entire boundary constituted by zero-dimensionalfaces; in fact, by vertices (2.6.1.0.1). Intersection of line with boundaryis a point at entry to interior. These same facts hold in higher dimension.2.1.7 classical boundary(confer2.6.1.3) Boundary of a set C is the closure of C less its interiorpresumed nonempty; [41,1.1]∂ C = C \ int C (14)which follows from the factint C = C (15)assuming nonempty interior. 2.7 One implication is: an open set has aboundary defined although not contained in the set.2.7 Otherwise, for x∈ R n as in (11), [190,2.1,2.3]the empty set is both open and closed.int{x} = ∅ = ∅

2.1. CONVEX SET 392.1.7.1 Line intersection with boundaryA line can intersect the boundary of a convex set in any dimension at apoint demarcating the line’s entry to the set interior. On one side of thatentry-point along the line is the exterior of the set, on the other side is theset interior. In other words,starting from any point of a convex set, a move toward the interior isan immediate entry into the interior. [20,II.2]When a line intersects the interior of a convex body in any dimension, theboundary appears to the line to be as thin as a point. This is intuitivelyplausible because, for example, a line intersects the boundary of the ellipsoidsin Figure 12 at a point in R , R 2 , and R 3 . Such thinness is a remarkablefact when pondering visualization of convex polyhedra (2.12,5.14.3) infour dimensions, for example, having boundaries constructed from otherthree-dimensional convex polyhedra called faces.We formally define face in (2.6). For now, we observe the boundaryof a convex body to be entirely constituted by all its faces of dimensionlower than the body itself. For example: The ellipsoids in Figure 12 haveboundaries composed only of zero-dimensional faces. The two-dimensionalslab in Figure 11 is a polyhedron having one-dimensional faces making itsboundary. The three-dimensional bounded polyhedron in Figure 14 haszero-, one-, and two-dimensional polygonal faces constituting its boundary.2.1.7.1.1 Example. Intersection of line with boundary in R 6 .The convex cone of positive semidefinite matrices S 3 + (2.9) in the ambientsubspace of symmetric matrices S 3 (2.2.2.0.1) is a six-dimensional Euclideanbody in isometrically isomorphic R 6 (2.2.1). The boundary of thepositive semidefinite cone in this dimension comprises faces having only thedimensions 0, 1, and 3 ; id est, {ρ(ρ+1)/2, ρ=0, 1, 2}.Unique minimum-distance projection PX (E.9) of any point X ∈ S 3 onthat cone S 3 + is known in closed form (7.1.2). Given, for example, λ∈int R 3 +and diagonalization (A.5.2) of exterior point⎡X = QΛQ T ∈ S 3 , Λ =∆ ⎣⎤λ 1 0λ 2⎦ (16)0 −λ 3

40 CHAPTER 2. CONVEX GEOMETRYwhere Q∈ R 3×3 is an orthogonal matrix, then the projection on S 3 + in R 6 is⎡PX = Q⎣λ 1 0λ 20 0⎤⎦Q T ∈ S 3 + (17)This positive semidefinite matrix PX nearest X thus has rank 2, found bydiscarding all negative eigenvalues. The line connecting these two points is{X + (PX−X)t | t∈R} where t=0 ⇔ X and t=1 ⇔ PX . Becausethis line intersects the boundary of the positive semidefinite cone S 3 + atpoint PX and passes through its interior (by assumption), then the matrixcorresponding to an infinitesimally positive perturbation of t there shouldreside interior to the cone (rank 3). Indeed, for ε an arbitrarily small positiveconstant,⎡ ⎤λ 1 0X + (PX−X)t| t=1+ε= Q(Λ+(PΛ−Λ)(1+ε))Q T = Q⎣λ 2⎦Q T ∈ int S 3 +0 ελ 32.1.7.2 Tangential line intersection with boundary(18)A higher-dimensional boundary ∂ C of a convex Euclidean body C is simplya dimensionally larger set through which a line can pass when it does notintersect the body’s interior. Still, for example, a line existing in five ormore dimensions may pass tangentially (intersecting no point interior to C[161,15.3]) through a single point relatively interior to a three-dimensionalface on ∂ C . Let’s understand why by inductive reasoning.Figure 13(a) shows a vertical line-segment whose boundary comprisesits two endpoints. For a line to pass through the boundary tangentially(intersecting no point relatively interior to the line-segment), it must existin an ambient space of at least two dimensions. Otherwise, the line is confinedto the same one-dimensional space as the line-segment and must pass alongthe segment to reach the end points.Figure 13(b) illustrates a two-dimensional ellipsoid whose boundary isconstituted entirely by zero-dimensional faces. Again, a line must exist inat least two dimensions to tangentially pass through any single arbitrarilychosen point on the boundary (without intersecting the ellipsoid interior).

2.1. CONVEX SET 41(a)R 2(b)R 3(c)(d)Figure 13: Line tangential (a) (b) to relative interior of a zero-dimensionalface in R 2 , (c) (d) to relative interior of a one-dimensional face in R 3 .

42 CHAPTER 2. CONVEX GEOMETRYNow let’s move to an ambient space of three dimensions. Figure 13(c)shows a polygon rotated into three dimensions. For a line to pass through itszero-dimensional boundary (one of its vertices) tangentially, it must existin at least the two dimensions of the polygon. But for a line to passtangentially through a single arbitrarily chosen point in the relative interiorof a one-dimensional face on the boundary as illustrated, it must exist in atleast three dimensions.Figure 13(d) illustrates a solid circular pyramid (upside-down) whoseone-dimensional faces are line-segments emanating from its pointed end(its vertex). This pyramid’s boundary is constituted solely by theseone-dimensional line-segments. A line may pass through the boundarytangentially, striking only one arbitrarily chosen point relatively interior toa one-dimensional face, if it exists in at least the three-dimensional ambientspace of the pyramid.From these few examples, way deduce a general rule (without proof):A line may pass tangentially through a single arbitrarily chosen pointrelatively interior to a k-dimensional face on the boundary of a convexEuclidean body if the line exists in dimension at least equal to k+2.Now the interesting part, with regard to Figure 14 showing a boundedpolyhedron in R 3 ; call it P : A line existing in at least four dimensions isrequired in order to pass tangentially (without hitting int P) through a singlearbitrary point in the relative interior of any two-dimensional polygonal faceon the boundary of polyhedron P . Now imagine that polyhedron P is itselfa three-dimensional face of some other polyhedron in R 4 . To pass a linetangentially through polyhedron P itself, striking only one point from itsrelative interior rel int P as claimed, requires a line existing in at least fivedimensions.This rule can help determine whether there exists unique solution to aconvex optimization problem whose feasible set is an intersecting line; e.g.,the trilateration problem (5.4.2.2.4).2.1.8 intersection, sum, difference, product2.1.8.0.1 Theorem. Intersection. [46,2.3.1] [232,2]The intersection of an arbitrary collection of convex sets is convex. ⋄

2.1. CONVEX SET 43This theorem in converse is implicitly false in so far as a convex set canbe formed by the intersection of sets that are not.Vector sum of two convex sets C 1 and C 2C 1 + C 2 = {x + y | x ∈ C 1 , y ∈ C 2 } (19)is convex.By additive inverse, we can similarly define vector difference of two convexsetsC 1 − C 2 = {x − y | x ∈ C 1 , y ∈ C 2 } (20)which is convex. Applying this definition to nonempty convex set C 1 , itsself-difference C 1 − C 1 is generally nonempty, nontrivial, and convex; e.g.,for any convex cone K , (2.7.2) the set K − K constitutes its affine hull.[232, p.15]Cartesian product of convex setsC 1 × C 2 ={[ xy]| x ∈ C 1 , y ∈ C 2}=[C1C 2](21)remains convex. The converse also holds; id est, a Cartesian product isconvex iff each set is. [148, p.23]Convex results are also obtained for scaling κ C of a convex set C ,rotation/reflection Q C , or translation C+ α ; all similarly defined.Given any operator T and convex set C , we are prone to write T(C)meaningT(C) ∆ = {T(x) | x∈ C} (22)Given linear operator T , it therefore follows from (19),T(C 1 + C 2 ) = {T(x + y) | x∈ C 1 , y ∈ C 2 }= {T(x) + T(y) | x∈ C 1 , y ∈ C 2 }= T(C 1 ) + T(C 2 )(23)

44 CHAPTER 2. CONVEX GEOMETRY2.1.9 inverse imageWhile epigraph and sublevel sets (3.1.7) of a convex function must be convex,it generally holds that image and inverse image of a convex function are not.Although there are many examples to the contrary, the most prominent arethe affine functions:2.1.9.0.1 Theorem. Image, Inverse image. [232,3] [46,2.3.2]Let f be a mapping from R p×k to R m×n .The image of a convex set C under any affine function (3.1.6)is convex.The inverse image 2.8 of a convex set F ,f(C) = {f(X) | X ∈ C} ⊆ R m×n (24)f −1 (F) = {X | f(X)∈ F} ⊆ R p×k (25)a single or many-valued mapping, under any affine function f is convex.⋄In particular, any affine transformation of an affine set remains affine.[232, p.8] Ellipsoids are invariant to any [sic] affine transformation.Each converse of this two-part theorem is generally false; id est, givenf affine, a convex image f(C) does not imply that set C is convex, andneither does a convex inverse image f −1 (F) imply set F is convex. Acounter-example is easy to visualize when the affine function is an orthogonalprojector [251] [183]:2.1.9.0.2 Corollary. Projection on subspace. [232,3] 2.9Orthogonal projection of a convex set on a subspace is another convex set.⋄Again, the converse is false. Shadows, for example, are umbral projectionsthat can be convex when the body providing the shade is not.2.8 See2.9.1.0.2 for an example.2.9 The corollary holds more generally for projection on hyperplanes (2.4.2); [282,6.6]hence, for projection on affine subsets (2.3.1, nonempty intersections of hyperplanes).Orthogonal projection on affine subsets is reviewed inE.4.0.0.1.

2.2. VECTORIZED-MATRIX INNER PRODUCT 452.2 Vectorized-matrix inner productEuclidean space R n comes equipped with a linear vector inner-product〈y,z〉 ∆ = y T z (26)We prefer those angle brackets to connote a geometric rather than algebraicperspective; e.g., vector y might represent a hyperplane normal (2.4.2).Two vectors are orthogonal (perpendicular) to one another if and only iftheir inner product vanishes;y ⊥ z ⇔ 〈y,z〉 = 0 (27)When orthogonal vectors each have unit norm, then they are orthonormal.A vector inner-product defines Euclidean norm (vector 2-norm)‖y‖ 2 = ‖y‖ ∆ = √ y T y , ‖y‖ = 0 ⇔ y = 0 (28)For linear operation A on a vector, represented by a real matrix, the adjointoperation A T is transposition and defined for matrix A by [167,3.10]〈y,A T z〉 ∆ = 〈Ay,z〉 (29)The vector inner-product for matrices is calculated just as it is for vectors;by first transforming a matrix in R p×k to a vector in R pk by concatenatingits columns in the natural order. For lack of a better term, we shall callthat linear bijective (one-to-one and onto [167, App.A1.2]) transformationvectorization. For example, the vectorization of Y = [y 1 y 2 · · · y k ] ∈ R p×k[116] [247] is⎡ ⎤y 1vec Y =∆ y⎢ 2⎥⎣ . ⎦ ∈ Rpk (30)y kThen the vectorized-matrix inner-product is trace of matrix inner-product;for Z ∈ R p×k , [46,2.6.1] [148,0.3.1] [292,8] [277,2.2]where (A.1.1)〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z (31)tr(Y T Z) = tr(ZY T ) = tr(YZ T ) = tr(Z T Y ) = 1 T (Y ◦ Z)1 (32)

46 CHAPTER 2. CONVEX GEOMETRYand where ◦ denotes the Hadamard product 2.10 of matrices [150] [110,1.1.4].The adjoint operation A T on a matrix can therefore be defined in like manner:〈Y , A T Z〉 ∆ = 〈AY , Z〉 (33)For example, take any element C 1 from a matrix-valued set in R p×k ,and consider any particular dimensionally compatible real vectors v andw . Then vector inner-product of C 1 with vw T is〈vw T , C 1 〉 = v T C 1 w = tr(wv T C 1 ) = 1 T( (vw T )◦ C 1)1 (34)2.2.0.0.1 Example. Application of the image theorem.Suppose the set C ⊆ R p×k is convex. Then for any particular vectors v ∈R pand w ∈R k , the set of vector inner-productsY ∆ = v T Cw = 〈vw T , C〉 ⊆ R (35)is convex. This result is a consequence of the image theorem. Yet it is easyto show directly that convex combination of elements from Y remains anelement of Y . 2.11More generally, vw T in (35) may be replaced with any particular matrixZ ∈ R p×k while convexity of the set 〈Z , C〉⊆ R persists. Further, byreplacing v and w with any particular respective matrices U and W ofdimension compatible with all elements of convex set C , then set U T CWis convex by the image theorem because it is a linear mapping of C .2.10 The Hadamard product is a simple entrywise product of corresponding entries fromtwo matrices of like size; id est, not necessarily square.2.11 To verify that, take any two elements C 1 and C 2 from the convex matrix-valued setC , and then form the vector inner-products (35) that are two elements of Y by definition.Now make a convex combination of those inner products; videlicet, for 0≤µ≤1µ 〈vw T , C 1 〉 + (1 − µ) 〈vw T , C 2 〉 = 〈vw T , µ C 1 + (1 − µ)C 2 〉The two sides are equivalent by linearity of inner product. The right-hand side remainsa vector inner-product of vw T with an element µ C 1 + (1 − µ)C 2 from the convex set C ;hence it belongs to Y . Since that holds true for any two elements from Y , then it mustbe a convex set.

2.2. VECTORIZED-MATRIX INNER PRODUCT 472.2.1 Frobenius’2.2.1.0.1 Definition. Isomorphic.An isomorphism of a vector space is a transformation equivalent to a linearbijective mapping. The image and inverse image under the transformationoperator are then called isomorphic vector spaces.△Isomorphic vector spaces are characterized by preservation of adjacency;id est, if v and w are points connected by a line segment in one vector space,then their images will also be connected by a line segment. Two Euclideanbodies may be considered isomorphic of there exists an isomorphism of theircorresponding ambient spaces. [278,I.1]When Z =Y ∈ R p×k in (31), Frobenius’ norm is resultant from vectorinner-product; (confer (1493))‖Y ‖ 2 F = ‖ vec Y ‖2 2 = 〈Y , Y 〉 = tr(Y T Y )= ∑ i,jY 2ij = ∑ iλ(Y T Y ) i = ∑ iσ(Y ) 2 i(36)where λ(Y T Y ) i is the i th eigenvalue of Y T Y , and σ(Y ) i the i th singularvalue of Y . Were Y a normal matrix (A.5.2), then σ(Y )= |λ(Y )|[303,8.1] thus‖Y ‖ 2 F = ∑ iλ(Y ) 2 i = ‖λ(Y )‖ 2 2 (37)The converse (37) ⇒ normal matrix Y also holds. [150,2.5.4]Because the metrics are equivalent‖ vec X −vec Y ‖ 2 = ‖X −Y ‖ F (38)and because vectorization (30) is a linear bijective map, then vector spaceR p×k is isometrically isomorphic with vector space R pk in the Euclidean senseand vec is an isometric isomorphism on R p×k . 2.12 Because of this Euclideanstructure, all the known results from convex analysis in Euclidean space R ncarry over directly to the space of real matrices R p×k .2.12 Given matrix A, its range R(A) (2.5) is isometrically isomorphic with its vectorizedrange vec R(A) but not with R(vec A).

48 CHAPTER 2. CONVEX GEOMETRY2.2.1.0.2 Definition. Isometric isomorphism.An isometric isomorphism of a vector space having a metric defined on it is alinear bijective mapping T that preserves distance; id est, for all x,y ∈dom T‖Tx − Ty‖ = ‖x − y‖ (39)Then the isometric isomorphism T is a bijective isometry.△Unitary linear operator Q : R n → R n representing orthogonal matrixQ∈ R n×n (B.5), for example, is an isometric ⎡ isomorphism. ⎤ Yet isometric1 0operator T : R 2 → R 3 representing T = ⎣ 0 1 ⎦ on R 2 is injective but not0 0a surjective map [167,1.6] to R 3 .The Frobenius norm is orthogonally invariant; meaning, for X,Y ∈ R p×kand dimensionally compatible orthonormal matrix 2.13 U and orthogonalmatrix Q‖U(X −Y )Q‖ F = ‖X −Y ‖ F (40)2.2.2 Symmetric matrices2.2.2.0.1 Definition. Symmetric matrix subspace.Define a subspace of R M×M : the convex set of all symmetric M×M matrices;S M ∆ = { A∈ R M×M | A=A T} ⊆ R M×M (41)This subspace comprising symmetric matrices S M is isomorphic with thevector space R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. The orthogonal complement [251] [183] of S M isS M⊥ ∆ = { A∈ R M×M | A=−A T} ⊂ R M×M (42)the subspace of antisymmetric matrices in R M×M ; id est,S M ⊕ S M⊥ = R M×M (43)where unique vector sum ⊕ is defined on page 692.△2.13 Any matrix U whose columns are orthonormal with respect to each other (U T U = I);these include the orthogonal matrices.

2.2. VECTORIZED-MATRIX INNER PRODUCT 49All antisymmetric matrices are hollow by definition (have 0 maindiagonal). Any square matrix A ∈ R M×M can be written as the sum ofits symmetric and antisymmetric parts: respectively,A = 1 2 (A +AT ) + 1 2 (A −AT ) (44)The symmetric part is orthogonal in R M2 to the antisymmetric part; videlicet,tr ( (A T + A)(A −A T ) ) = 0 (45)In the ambient space of real matrices, the antisymmetric matrix subspacecan be described{ }S M⊥ =∆ 12 (A −AT ) | A∈ R M×M ⊂ R M×M (46)because any matrix in S M is orthogonal to any matrix in S M⊥ . Furtherconfined to the ambient subspace of symmetric matrices, because ofantisymmetry, S M⊥ would become trivial.2.2.2.1 Isomorphism on symmetric matrix subspaceWhen a matrix is symmetric in S M , we may still employ the vectorizationtransformation (30) to R M2 ; vec , an isometric isomorphism. We mightinstead choose to realize in the lower-dimensional subspace R M(M+1)/2 byignoring redundant entries (below the main diagonal) during transformation.Such a realization would remain isomorphic but not isometric. Lack ofisometry is a spatial distortion due now to disparity in metric between R M2and R M(M+1)/2 . To realize isometrically in R M(M+1)/2 , we must make acorrection: For Y = [Y ij ]∈ S M we introduce the symmetric vectorization⎡ ⎤√2Y12Y 11Y 22svec Y =∆ √2Y13√2Y23∈ R M(M+1)/2 (47)⎢ Y 33 ⎥⎣ ⎦.Y MM

50 CHAPTER 2. CONVEX GEOMETRYwhere all entries off the main diagonal have been scaled. Now for Z ∈ S M〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z = svec(Y ) T svec Z (48)Then because the metrics become equivalent, for X ∈ S M‖ svec X − svec Y ‖ 2 = ‖X − Y ‖ F (49)and because symmetric vectorization (47) is a linear bijective mapping, thensvec is an isometric isomorphism on the symmetric matrix subspace. In otherwords, S M is isometrically isomorphic with R M(M+1)/2 in the Euclidean senseunder transformation svec .The set of all symmetric matrices S M forms a proper subspace in R M×M ,so for it there exists a standard orthonormal basis in isometrically isomorphicR M(M+1)/2 ⎧⎨{E ij ∈ S M } =⎩e i e T i ,1 √2(e i e T j + e j e T ii = j = 1...M), 1 ≤ i < j ≤ M⎫⎬⎭(50)where M(M + 1)/2 standard basis matrices E ij are formed from thestandard basis vectors[{ ]1, i = je i =0, i ≠ j , j = 1... M ∈ R M (51)Thus we have a basic orthogonal expansion for Y ∈ S MY =M∑ j∑〈E ij ,Y 〉 E ij (52)j=1 i=1whose coefficients〈E ij ,Y 〉 ={Yii , i = 1... MY ij√2 , 1 ≤ i < j ≤ M(53)correspond to entries of the symmetric vectorization (47).

2.2. VECTORIZED-MATRIX INNER PRODUCT 512.2.3 Symmetric hollow subspace2.2.3.0.1 Definition. Hollow subspaces. [268]Define a subspace of R M×M : the convex set of all (real) symmetric M ×Mmatrices having 0 main diagonal;R M×Mh∆= { A∈ R M×M | A=A T , δ(A) = 0 } ⊂ R M×M (54)where the main diagonal of A∈ R M×M is denoted (A.1)δ(A) ∈ R M (1246)Operating on a vector, linear operator δ naturally returns a diagonal matrix;δ(δ(A)) is a diagonal matrix. Operating recursively on a vector Λ∈ R N ordiagonal matrix Λ∈ S N , operator δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1248)The subspace R M×Mh(54) comprising (real) symmetric hollow matrices isisomorphic with subspace R M(M−1)/2 . The orthogonal complement of R M×MhisR M×M⊥h∆= { A∈ R M×M | A=−A T + 2δ 2 (A) } ⊆ R M×M (55)the subspace of antisymmetric antihollow matrices in R M×M ; id est,R M×Mh⊕ R M×M⊥h= R M×M (56)Yet defined instead as a proper subspace of S MS M h∆= { A∈ S M | δ(A) = 0 } ⊂ S M (57)the orthogonal complement S M⊥h of S M h in ambient S MS M⊥h∆= { A∈ S M | A=δ 2 (A) } ⊆ S M (58)is simply the subspace of diagonal matrices; id est,S M h ⊕ S M⊥h = S M (59)△

52 CHAPTER 2. CONVEX GEOMETRYAny matrix A∈ R M×M can be written as the sum of its symmetric hollowand antisymmetric antihollow parts: respectively,( ) ( 1 1A =2 (A +AT ) − δ 2 (A) +2 (A −AT ) + δ (A))2 (60)The symmetric hollow part is orthogonal in R M2 to the antisymmetricantihollow part; videlicet,))1 1tr((2 (A +AT ) − δ (A))( 2 2 (A −AT ) + δ 2 (A) = 0 (61)In the ambient space of real matrices, the antisymmetric antihollow subspaceis described{ }∆ 1=2 (A −AT ) + δ 2 (A) | A∈ R M×M ⊆ R M×M (62)S M⊥hbecause any matrix in S M h is orthogonal to any matrix in S M⊥h . Yet inthe ambient space of symmetric matrices S M , the antihollow subspace isnontrivial;S M⊥h∆= { δ 2 (A) | A∈ S M} = { δ(u) | u∈ R M} ⊆ S M (63)In anticipation of their utility with Euclidean distance matrices (EDMs)in5, for symmetric hollow matrices we introduce the linear bijectivevectorization dvec that is the natural analogue to symmetric matrixvectorization svec (47): for Y = [Y ij ]∈ S M h⎡dvecY = ∆ √ 2⎢⎣⎤Y 12Y 13Y 23Y 14Y 24Y 34 ⎥.⎦Y M−1,M∈ R M(M−1)/2 (64)Like svec (47), dvec is an isometric isomorphism on the symmetric hollowsubspace.

2.3. HULLS 53Figure 14: Convex hull of a random list of points in R 3 . Some pointsfrom that generating list reside in the interior of this convex polyhedron.[284, Convex Polyhedron] (Avis-Fukuda-Mizukoshi)The set of all symmetric hollow matrices S M h forms a proper subspace inR M×M , so for it there must be a standard orthonormal basis in isometricallyisomorphic R M(M−1)/2{E ij ∈ S M h } ={ }1 ( )√ ei e T j + e j e T i , 1 ≤ i < j ≤ M 2(65)where M(M −1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M .The symmetric hollow majorization corollary on page 498 characterizeseigenvalues of symmetric hollow matrices.2.3 Hulls2.3.1 Affine hull, affine dimensionAffine dimension of any set in R n is the dimension of the smallest affineset (empty set, point, line, plane, hyperplane (2.4.2), subspace, R n ) thatcontains it. For nonempty sets, affine dimension is the same as dimension ofthe subspace parallel to that affine set. [232,1] [148,A.2.1]Ascribe the points in a list {x l ∈ R n , l=1... N} to the columns ofmatrix X :X = [x 1 · · · x N ] ∈ R n×N (66)

54 CHAPTER 2. CONVEX GEOMETRYIn particular, we define affine dimension r of the N-point list X asdimension of the smallest affine set in Euclidean space R n that contains X ;r ∆ = dim aff X (67)Affine dimension r is a lower bound sometimes called embedding dimension.[268] [134] That affine set A in which those points are embedded is uniqueand called the affine hull [46,2.1.2] [249,2.1];A ∆ = aff {x l ∈ R n , l=1... N} = aff X= x 1 + R{x l − x 1 , l=2... N} = {Xa | a T 1 = 1} ⊆ R n (68)parallel to subspacewhereR{x l − x 1 , l=2... N} = R(X − x 1 1 T ) ⊆ R n (69)R(A) = {Ax | ∀x} (121)Given some arbitrary set C and any x∈ Cwhere aff(C−x) is a subspace.aff C = x + aff(C − x) (70)2.3.1.0.1 Definition. Affine subset.We analogize affine subset to subspace, 2.14 defining it to be any nonemptyaffine set (2.1.4).△The affine hull of a point x is that point itself;aff ∅ ∆ = ∅ (71)aff{x} = {x} (72)The affine hull of two distinct points is the unique line through them.(Figure 15) The affine hull of three noncollinear points in any dimensionis that unique plane containing the points, and so on. The subspace ofsymmetric matrices S m is the affine hull of the cone of positive semidefinitematrices; (2.9)aff S m + = S m (73)2.14 The popular term affine subspace is an oxymoron.

2.3. HULLS 55Aaffine hull (drawn truncated)Cconvex hullKconic hull (truncated)Rrange or span is a plane (truncated)Figure 15: Given two points in Euclidean vector space of any dimension,their various hulls are illustrated. Each hull is a subset of range; generally,A , C, K ⊆ R . (Cartesian axes drawn for reference.)

56 CHAPTER 2. CONVEX GEOMETRY2.3.1.0.2 Example. Affine hull of rank-1 correlation matrices. [160]The set of all m ×m rank-1 correlation matrices is defined by all the binaryvectors y in R m (confer5.9.1.0.1){yy T ∈ S m + | δ(yy T )=1} (74)Affine hull of the rank-1 correlation matrices is equal to the set of normalizedsymmetric matrices; id est,aff{yy T ∈ S m + | δ(yy T )=1} = {A∈ S m | δ(A)=1} (75)2.3.1.0.3 Exercise. Affine hull of correlation matrices.Prove (75) via definition of affine hull. Find the convex hull instead. 2.3.1.1 Comparison with respect to R N + and S M +The notation a ≽ 0 means vector a belongs to the nonnegative orthantR N + , whereas a ≽ b denotes comparison of vector a to vector b on R N withrespect to the nonnegative orthant; id est, a ≽ b means a −b belongs to thenonnegative orthant, but neither a or b necessarily belongs to that orthant.With particular respect to the nonnegative orthant, a ≽ b ⇔ a i ≽ b i ∀i(321). More generally, a ≽ Kb denotes comparison with respect to pointedclosed convex cone K . (2.7.2.2)The symbol ≥ is reserved for scalar comparison on the real line R withrespect to the nonnegative real line R + as in a T y ≥ b . Comparison ofmatrices with respect to the positive semidefinite cone S M + , like I ≽A ≽ 0in Example 2.3.2.0.1, is explained in2.9.0.1.2.3.2 Convex hullThe convex hull [148,A.1.4] [46,2.1.4] [232] of any bounded 2.15 list (or set)of N points X ∈ R n×N forms a unique convex polyhedron (2.12.0.0.1) whosevertices constitute some subset of that list;P ∆ = conv {x l , l=1... N} = conv X = {Xa | a T 1 = 1, a ≽ 0} ⊆ R n(76)2.15 A set in R n is bounded if and only if it can be contained in a Euclidean ball havingfinite radius. [77,2.2] (confer5.7.3.0.1)

2.3. HULLS 57The union of relative interior and relative boundary (2.6.1.3) of thepolyhedron comprise the convex hull P , the smallest closed convex set thatcontains the list X ; e.g., Figure 14. Given P , the generating list {x l } isnot unique.Given some arbitrary set C ⊆ R n , its convex hull conv C is equivalent tothe smallest closed convex set containing it. (confer2.4.1.1.1) The convexhull is a subset of the affine hull;conv C ⊆ aff C = aff C = aff C = aff conv C (77)Any closed bounded convex set C is equal to the convex hull of its boundary;C = conv ∂ C (78)conv ∅ ∆ = ∅ (79)2.3.2.0.1 Example. Hull of outer product. [214] [9,4.1][218,3] [176,2.4] Convex hull of the set comprising outer product oforthonormal matrices has equivalent expression: for 1 ≤ k ≤ N (2.9.0.1)conv { UU T | U ∈ R N×k , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A〉=k } ⊂ S N +(80)This important convex body we call Fantope (after mathematician Ky Fan).In case k = 1, there is slight simplification: ((1429), Example 2.9.2.4.1)conv { UU T | U ∈ R N , U T U = I } = { A∈ S N | A ≽ 0, 〈I , A〉=1 } (81)In case k = N , the Fantope is identity matrix I . More generally, the set{UU T | U ∈ R N×k , U T U = I} (82)comprises the extreme points (2.6.0.0.1) of its convex hull. By (1294), eachand every extreme point UU T has only k nonzero eigenvalues λ and they allequal 1 ; id est, λ(UU T ) 1:k = λ(U T U) = 1. So the Frobenius norm of eachand every extreme point equals the same constant‖UU T ‖ 2 F = k (83)Each extreme point simultaneously lies on the boundary of the positivesemidefinite cone (when k < N ,2.9) and on the boundary of a hypersphere

58 CHAPTER 2. CONVEX GEOMETRYsvec ∂ S 2 +I[ α ββ γ]γα√2βFigure 16: Two Fantopes. Circle, (radius 1 √2) shown here on boundary ofpositive semidefinite cone S 2 + in isometrically isomorphic R 3 from Figure 33,comprises boundary of a Fantope (80) in this dimension (k = 1, N = 2). Lonepoint illustrated is identity matrix I and is that Fantope corresponding tok = 2, N = 2. (View is from inside PSD cone looking toward origin.)

2.4. HALFSPACE, HYPERPLANE 59of dimension k(N −√k(1 k + 1) and radius − k ) centered at kI along2 2 N Nthe ray (base 0) through the identity matrix I in isomorphic vector spaceR N(N+1)/2 (2.2.2.1).Figure 16 illustrates extreme points (82) comprising the boundary of aFantope, the boundary of a disc corresponding to k = 1, N = 2 ; but thatcircumscription does not hold in higher dimension. (2.9.2.5) 2.3.3 Conic hullIn terms of a finite-length point list (or set) arranged columnar in X ∈ R n×N(66), its conic hull is expressedK ∆ = cone {x l , l=1... N} = coneX = {Xa | a ≽ 0} ⊆ R n (84)id est, every nonnegative combination of points from the list. The conic hullof any finite-length list forms a polyhedral cone [148,A.4.3] (2.12.1.0.1;e.g., Figure 17); the smallest closed convex cone that contains the list.By convention, the aberration [249,2.1]Given some arbitrary set C , it is apparent2.3.4 Vertex-descriptioncone ∅ ∆ = {0} (85)conv C ⊆ cone C (86)The conditions in (68), (76), and (84) respectively define an affinecombination, convex combination, and conic combination of elements fromthe set or list. Whenever a Euclidean body can be described as somehull or span of a set of points, then that representation is loosely calleda vertex-description.2.4 Halfspace, HyperplaneA two-dimensional affine subset is called a plane. An (n −1)-dimensionalaffine subset in R n is called a hyperplane. [232] [148] Every hyperplanepartially bounds a halfspace (which is convex but not affine).

60 CHAPTER 2. CONVEX GEOMETRYFigure 17: A simplicial cone (2.12.3.1.1) in R 3 whose boundary is drawntruncated; constructed using A∈ R 3×3 and C = 0 in (247). By the mostfundamental definition of a cone (2.7.1), entire boundary can be constructedfrom an aggregate of rays emanating exclusively from the origin. Eachof three extreme directions corresponds to an edge (2.6.0.0.3); they areconically, affinely, and linearly independent for this cone. Because thisset is polyhedral, exposed directions are in one-to-one correspondence withextreme directions; there are only three. Its extreme directions give rise towhat is called a vertex-description of this polyhedral cone; simply, the conichull of extreme directions. Obviously this cone can also be constructed byintersection of three halfspaces; hence the equivalent halfspace-description.

2.4. HALFSPACE, HYPERPLANE 61H +ay pcyd∆∂H={y | a T (y − y p )=0}H −N(a T )={y | a T y=0}Figure 18: Hyperplane illustrated ∂H is a line partially bounding halfspacesH − = {y | a T (y − y p )≤0} and H + = {y | a T (y − y p )≥0} in R 2 . Shaded isa rectangular piece of semi-infinite H − with respect to which vector a isoutward-normal to bounding hyperplane; vector a is inward-normal withrespect to H + . Halfspace H − contains nullspace N(a T ) (dashed linethrough origin) because a T y p > 0. Hyperplane, halfspace, and nullspace areeach drawn truncated. Points c and d are equidistant from hyperplane, andvector c − d is normal to it. ∆ is distance from origin to hyperplane.

62 CHAPTER 2. CONVEX GEOMETRY2.4.1 Halfspaces H + and H −Euclidean space R n is partitioned into two halfspaces by any hyperplane∂H ; id est, H − + H + = R n . The resulting (closed convex) halfspaces, bothpartially bounded by ∂H , may be describedH − = {y | a T y ≤ b} = {y | a T (y − y p ) ≤ 0} ⊂ R n (87)H + = {y | a T y ≥ b} = {y | a T (y − y p ) ≥ 0} ⊂ R n (88)where nonzero vector a∈R n is an outward-normal to the hyperplane partiallybounding H − while an inward-normal with respect to H + . For any vectory −y p that makes an obtuse angle with normal a , vector y will lie in thehalfspace H − on one side (shaded in Figure 18) of the hyperplane while acuteangles denote y in H + on the other side.An equivalent more intuitive representation of a halfspace comes aboutwhen we consider all the points in R n closer to point d than to point c orequidistant, in the Euclidean sense; from Figure 18,H − = {y | ‖y − d‖ ≤ ‖y − c‖} (89)This representation, in terms of proximity, is resolved with the moreconventional representation of a halfspace (87) by squaring both sides ofthe inequality in (89);}H − ={y | (c − d) T y ≤ ‖c‖2 − ‖d‖ 2=2( {y | (c − d) T y − c + d ) }≤ 02(90)2.4.1.1 PRINCIPLE 1: Halfspace-description of convex setsThe most fundamental principle in convex geometry follows from thegeometric Hahn-Banach theorem [183,5.12] [16,1] [88,I.1.2] whichguarantees any closed convex set to be an intersection of halfspaces.2.4.1.1.1 Theorem. Halfspaces. [46,2.3.1] [232,18][148,A.4.2(b)] [30,2.4] A closed convex set in R n is equivalent to theintersection of all halfspaces that contain it.⋄

2.4. HALFSPACE, HYPERPLANE 63Intersection of multiple halfspaces in R n may be represented using amatrix constant A ;⋂H i−= {y | A T y ≼ b} = {y | A T (y − y p ) ≼ 0} (91)i⋂H i+= {y | A T y ≽ b} = {y | A T (y − y p ) ≽ 0} (92)iwhere b is now a vector, and the i th column of A is normal to a hyperplane∂H i partially bounding H i . By the halfspaces theorem, intersections likethis can describe interesting convex Euclidean bodies such as polyhedra andcones, giving rise to the term halfspace-description.2.4.2 Hyperplane ∂H representationsEvery hyperplane ∂H is an affine set parallel to an (n −1)-dimensionalsubspace of R n ; it is itself a subspace if and only if it contains the origin.dim∂H = n − 1 (93)so a hyperplane is a point in R , a line in R 2 , a plane in R 3 , and so on.Every hyperplane can be described as the intersection of complementaryhalfspaces; [232,19]∂H = H − ∩ H + = {y | a T y ≤ b , a T y ≥ b} = {y | a T y = b} (94)a halfspace-description. Assuming normal a∈ R n to be nonzero, then anyhyperplane in R n can be described as the solution set to vector equationa T y = b (illustrated in Figure 18 and Figure 19 for R 2 )∂H ∆ = {y | a T y = b} = {y | a T (y −y p ) = 0} = {Zξ+y p | ξ ∈ R n−1 } ⊂ R n (95)All solutions y constituting the hyperplane are offset from the nullspace ofa T by the same constant vector y p ∈ R n that is any particular solution toa T y=b ; id est,y = Zξ + y p (96)where the columns of Z ∈ R n×n−1 constitute a basis for the nullspaceN(a T ) = {x∈ R n | a T x=0} . 2.162.16 We will later find this expression for y in terms of nullspace of a T (more generally, ofmatrix A T (123)) to be a useful device for eliminating affine equality constraints, much aswe did here.

64 CHAPTER 2. CONVEX GEOMETRY11−1−11[ 1a =1](a)[ −1b =−1]−1−11(b){y | a T y=1}{y | b T y=−1}{y | a T y=−1}{y | b T y=1}(c)[ −1c =1]−11−11−11−1[ 1d =−11](d){y | c T y=1}{y | c T y=−1}{y | d T y=−1}{y | d T y=1}[ 1e =0]−11(e){y | e T y=−1} {y | e T y=1}Figure 19: (a)-(d) Hyperplanes in R 2 (truncated). Movement in normaldirection increases vector inner-product. This visual concept is exploitedto attain analytical solution of linear programs; e.g., Example 2.4.2.6.2,Example 3.1.6.0.1, [46, exer.4.8-exer.4.20]. Each graph is also interpretableas a contour plot of a real affine function of two variables as inFigure 57. (e) Ratio |b|/‖a‖ from {x | a T x = b} represents radius ofhypersphere about 0 supported by hyperplane whose normal is a .

2.4. HALFSPACE, HYPERPLANE 65Conversely, given any point y p in R n , the unique hyperplane containingit having normal a is the affine set ∂H (95) where b equals a T y p andwhere a basis for N(a T ) is arranged in Z columnar. Hyperplane dimensionis apparent from the dimensions of Z ; that hyperplane is parallel to thespan of its columns.2.4.2.0.1 Exercise. Hyperplane scaling.Given normal y , draw a hyperplane {x∈ R 2 | x T y =1}. Suppose z = 1y . 2On the same plot, draw the hyperplane {x∈ R 2 | x T z =1}. Now supposez = 2y , then draw the last hyperplane again with this new z . What is theapparent effect of scaling normal y ?2.4.2.0.2 Example. Distance from origin to hyperplane.Given the (shortest) distance ∆∈ R + from the origin to a hyperplanehaving normal vector a , we can find its representation ∂H by droppinga perpendicular. The point thus found is the orthogonal projection of theorigin on ∂H (E.5.0.0.5), equal to a∆/‖a‖ if the origin is known a priorito belong to halfspace H − (Figure 18), or equal to −a∆/‖a‖ if the originbelongs to halfspace H + ; id est, when H − ∋0or when H + ∋0∂H = { y | a T (y − a∆/‖a‖) = 0 } = { y | a T y = ‖a‖∆ } (97)∂H = { y | a T (y + a∆/‖a‖) = 0 } = { y | a T y = −‖a‖∆ } (98)Knowledge of only distance ∆ and normal a thus introduces ambiguity intothe hyperplane representation.2.4.2.1 Matrix variableAny halfspace in R mn may be represented using a matrix variable. Forvariable Y ∈ R m×n , given constants A∈ R m×n and b = 〈A , Y p 〉 ∈ R ,H − = {Y ∈ R mn | 〈A, Y 〉 ≤ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≤ 0} (99)H + = {Y ∈ R mn | 〈A, Y 〉 ≥ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≥ 0} (100)Recall vector inner-product from2.2, 〈A, Y 〉= tr(A T Y ).

66 CHAPTER 2. CONVEX GEOMETRYHyperplanes in R mn may, of course, also be represented using matrixvariables.∂H = {Y | 〈A, Y 〉 = b} = {Y | 〈A, Y −Y p 〉 = 0} ⊂ R mn (101)Vector a from Figure 18 is normal to the hyperplane illustrated. Likewise,nonzero vectorized matrix A is normal to hyperplane ∂H ;2.4.2.2 Vertex-description of hyperplaneA ⊥ ∂H in R mn (102)Any hyperplane in R n may be described as the affine hull of a minimal set ofpoints {x l ∈ R n , l = 1... n} arranged columnar in a matrix X ∈ R n×n (66):∂H = aff{x l ∈ R n , l = 1... n} ,dim aff{x l ∀l}=n−1= aff X , dim aff X = n−1= x 1 + R{x l − x 1 , l=2... n} , dim R{x l − x 1 , l=2... n}=n−1= x 1 + R(X − x 1 1 T ) , dim R(X − x 1 1 T ) = n−1whereR(A) = {Ax | ∀x} (121)(103)2.4.2.3 Affine independence, minimal setFor any particular affine set, a minimal set of points constituting itsvertex-description is an affinely independent descriptive set and vice versa.Arbitrary given points {x i ∈ R n , i=1... N} are affinely independent(a.i.) if and only if, over all ζ ∈ R N ζ T 1=1, ζ k = 0 (confer2.1.2)x i ζ i + · · · + x j ζ j − x k = 0, i≠ · · · ≠j ≠k = 1... N (104)has no solution ζ ; in words, iff no point from the given set can be expressedas an affine combination of those remaining. We deducel.i. ⇒ a.i. (105)Consequently, {x i , i=1... N} is an affinely independent set if and only if{x i −x 1 , i=2... N} is a linearly independent (l.i.) set. [ [153,3] ] (Figure 20)XThis is equivalent to the property that the columns of1 T (for X ∈ R n×Nas in (66)) form a linearly independent set. [148,A.1.3]

2.4. HALFSPACE, HYPERPLANE 67A 1A 2A 30Figure 20: Any one particular point of three points illustrated does not belongto affine hull A i (i∈1, 2, 3, each drawn truncated) of points remaining.Three corresponding vectors in R 2 are, therefore, affinely independent (butneither linearly or conically independent).2.4.2.4 Preservation of affine independenceIndependence in the linear (2.1.2.1), affine, and conic (2.10.1) senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (66)holds an affinely independent set in its columns. Consider a transformationT(X) : R n×N → R n×N ∆ = XY (106)where the given matrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Affine independence of {Xy i ∈ R n , i=1... N} demands (bydefinition (104)) there exists no solution ζ ∈ R N ζ T 1=1, ζ k = 0, toXy i ζ i + · · · + Xy j ζ j − Xy k = 0, i≠ · · · ≠j ≠k = 1... N (107)By factoring X , we see that is ensured by affine independence of {y i ∈ R N }and by R(Y )∩ N(X) = 0 whereN(A) = {x | Ax=0} (122)

68 CHAPTER 2. CONVEX GEOMETRYCH −H +0 > κ 3 > κ 2 > κ 1{z ∈ R 2 | a T z = κ 1 }a{z ∈ R 2 | a T z = κ 2 }{z ∈ R 2 | a T z = κ 3 }Figure 21: Each shaded line segment {z ∈ C | a T z = κ i } belonging to setC ⊂ R 2 shows intersection with hyperplane parametrized by scalar κ i ; eachshows a (linear) contour in vector z of equal inner product with normalvector a . Cartesian axes drawn for reference. (confer Figure 57)2.4.2.5 affine mapsAffine transformations preserve affine hulls. Given any affine mapping T ofvector spaces and some arbitrary set C [232, p.8]aff(T C) = T(aff C) (108)2.4.2.6 PRINCIPLE 2: Supporting hyperplaneThe second most fundamental principle of convex geometry also follows fromthe geometric Hahn-Banach theorem [183,5.12] [16,1] that guaranteesexistence of at least one hyperplane in R n supporting a convex set (having

2.4. HALFSPACE, HYPERPLANE 69tradition(a)Yy paH +H −∂H −nontraditional(b)Yy pãH −H +∂H +Figure 22: (a) Hyperplane ∂H − (109) supporting closed set Y ∈ R 2 .Vector a is inward-normal to hyperplane with respect to halfspace H + ,but outward-normal with respect to set Y . A supporting hyperplane canbe considered the limit of an increasing sequence in the normal-direction likethat in Figure 21. (b) Hyperplane ∂H + nontraditionally supporting Y .Vector ã is inward-normal to hyperplane now with respect to bothhalfspace H + and set Y . Tradition [148] [232] recognizes only positivenormal polarity in support function σ Y as in (109); id est, normal a ,figure (a). But both interpretations of supporting hyperplane are useful.

70 CHAPTER 2. CONVEX GEOMETRYnonempty interior) 2.17 at each point on its boundary.2.4.2.6.1 Definition. Supporting hyperplane ∂H .The partial boundary ∂H of a closed halfspace that contains arbitrary set Yis called a supporting hyperplane ∂H to Y when the hyperplane contains atleast one point of Y . [232,11] Specifically, given normal a≠0 (belongingto H + by definition), the supporting hyperplane to Y at y p ∈ ∂Y [sic] is∂H − = { y | a T (y − y p ) = 0, y p ∈ Y , a T (z − y p ) ≤ 0 ∀z∈Y }= { y | a T y = sup{a T z |z∈Y} } (109)where normal a and set Y reside in opposite halfspaces. (Figure 22(a)) Realfunctionσ Y (a) ∆ = sup{a T z |z∈Y} (459)is called the support function for Y .An equivalent but nontraditional representation 2.18 for a supportinghyperplane is obtained by reversing polarity of normal a ; (1485)∂H + = { y | ã T (y − y p ) = 0, y p ∈ Y , ã T (z − y p ) ≥ 0 ∀z∈Y }= { y | ã T y = − inf{ã T z |z∈Y} = sup{−ã T z |z∈Y} } (110)where normal ã and set Y now both reside in H + . (Figure 22(b))When a supporting hyperplane contains only a single point of Y , thathyperplane is termed strictly supporting (and termed tangent to Y if thesupporting hyperplane is unique there [232,18, p.169]).△A closed convex set C ⊂ R n , for example, can be expressed as theintersection of all halfspaces partially bounded by hyperplanes supportingit; videlicet, [183, p.135]C = ⋂a∈R n {y | a T y ≤ σ C (a) } (111)by the halfspaces theorem (2.4.1.1.1).2.17 It is conventional to speak of a hyperplane supporting set C but not containing C ;called nontrivial support. [232, p.100] Hyperplanes in support of lower-dimensional bodiesare admitted.2.18 useful for constructing the dual cone; e.g., Figure 44(b). Tradition recognizes thepolar cone; which is the negative of the dual cone.

2.4. HALFSPACE, HYPERPLANE 71There is no geometric difference 2.19 between supporting hyperplane ∂H +or ∂H − or ∂H and an ordinary hyperplane ∂H coincident with them.2.4.2.6.2 Example. Minimization on the unit cube.Consider minimization of a linear function on a hypercube, given vector cminimize c T xxsubject to −1 ≼ x ≼ 1(112)This convex optimization problem is called a linear program 2.20 because theobjective of minimization is linear and the constraints describe a polyhedron(intersection of a finite number of halfspaces and hyperplanes). Applyinggraphical concepts from Figure 19, Figure 21, and Figure 22, an optimalsolution can be shown to be x ⋆ = − sgn(c) but is not necessarily unique.Because an optimal solution always exists at a hypercube vertex (2.6.1.0.1)regardless of the value of nonzero vector c [64], mathematicians see thisgeometry as a means to relax a discrete problem (whose desired solution isinteger, confer Example 4.2.3.1.1). [176,3.1]2.4.2.6.3 Exercise. Unbounded below.Suppose instead we minimize over the unit hypersphere in Example 2.4.2.6.2;‖x‖ ≤ 1. What is an expression for optimal solution now? Is that programstill linear?Now suppose we instead minimize absolute value in (112). Are thefollowing programs equivalent for some arbitrary real convex set C ?(confer (434))minimize |x|x∈Rsubject to −1 ≤ x ≤ 1x ∈ C≡minimize x + + x −x + , x −subject to 1 ≥ x − ≥ 01 ≥ x + ≥ 0x + − x − ∈ C(113)Many optimization problems of interest and some older methods ofsolution require nonnegative variables. The method illustrated below splits2.19 If vector-normal polarity is unimportant, we may instead signify a supportinghyperplane by ∂H .2.20 The term program has its roots in economics. It was originally meant with regard toa plan or to efficient organization of some production process. [64,2]

72 CHAPTER 2. CONVEX GEOMETRYa variable into its nonnegative and negative parts; x = x + − x − (extensibleto vectors). Under what conditions on vector a and scalar b is an optimalsolution x ⋆ negative infinity?minimizex + ∈ R , x − ∈ Rx + − x −subject to x − ≥ 0x + ≥ 0[ ]a T x+= bx −(114)Minimization of the objective function 2.21 entails maximization of x − .2.4.2.7 PRINCIPLE 3: Separating hyperplaneThe third most fundamental principle of convex geometry again follows fromthe geometric Hahn-Banach theorem [183,5.12] [16,1] [88,I.1.2] thatguarantees existence of a hyperplane separating two nonempty convex sets inR n whose relative interiors are nonintersecting. Separation intuitively meanseach set belongs to a halfspace on an opposing side of the hyperplane. Thereare two cases of interest:1) If the two sets intersect only at their relative boundaries (2.6.1.3), thenthere exists a separating hyperplane ∂H containing the intersection butcontaining no points relatively interior to either set. If at least one ofthe two sets is open, conversely, then the existence of a separatinghyperplane implies the two sets are nonintersecting. [46,2.5.1]2) A strictly separating hyperplane ∂H intersects the closure of neitherset; its existence is guaranteed when the intersection of the closures isempty and at least one set is bounded. [148,A.4.1]2.4.3 Angle between hyperspacesGiven halfspace descriptions, the dihedral angle between hyperplanes andhalfspaces is defined as the angle between their defining normals. Given2.21 The objective is the function that is argument to minimization or maximization.

2.5. SUBSPACE REPRESENTATIONS 73normals a and b respectively describing ∂H a and ∂H b , for example( )(∂H a , ∂H b ) = ∆ 〈a , b〉arccos radians (115)‖a‖ ‖b‖2.5 Subspace representationsThere are two common forms of expression for Euclidean subspaces, bothcoming from elementary linear algebra: range form and nullspace form;a.k.a, vertex-description and halfspace-description respectively.The fundamental vector subspaces associated with a matrix A∈ R m×n[251,3.1] are ordinarily relatedand of dimension:with complementarityR(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (116)R(A T ) ⊕ N(A) = R n , N(A T ) ⊕ R(A) = R m (117)dim R(A T ) = dim R(A) = rankA ≤ min{m,n} (118)dim N(A) = n − rankA , dim N(A T ) = m − rankA (119)From these four fundamental subspaces, the rowspace and range identify oneform of subspace description (range form or vertex-description (2.3.4))R(A T ) ∆ = spanA T = {A T y | y ∈ R m } = {x∈ R n | A T y=x , y ∈R(A)} (120)R(A) ∆ = spanA = {Ax | x∈ R n } = {y ∈ R m | Ax=y , x∈R(A T )} (121)while the nullspaces identify the second common form (nullspace form orhalfspace-description (94))N(A) ∆ = {x∈ R n | Ax=0} (122)N(A T ) ∆ = {y ∈ R m | A T y=0} (123)Range forms (120) (121) are realized as the respective span of the columnvectors in matrices A T and A , whereas nullspace form (122) or (123) is thesolution set to a linear equation similar to hyperplane definition (95). Yetbecause matrix A generally has multiple rows, halfspace-description N(A) isactually the intersection of as many hyperplanes through the origin; for (122),each row of A is normal to a hyperplane while each row of A T is a normalfor (123).

74 CHAPTER 2. CONVEX GEOMETRY2.5.1 Subspace or affine subset ...Any particular vector subspace R p can be described as N(A) the nullspaceof some matrix A or as R(B) the range of some matrix B .More generally, we have the choice of expressing an n − m-dimensionalaffine subset in R n as the intersection of m hyperplanes, or as the offset spanof n − m vectors:2.5.1.1 ...as hyperplane intersectionAny affine subset A of dimension n−m can be described as an intersectionof m hyperplanes in R n ; given fat (m≤n) full-rank (rank = min{m , n})matrix⎡a T1 ⎤A =∆ ⎣ . ⎦∈ R m×n (124)and vector b∈R m ,a T mA ∆ = {x∈ R n | Ax=b} =m⋂ { }x | aTi x=b ii=1(125)a halfspace-description. (94)For example: The intersection of any two independent 2.22 hyperplanesin R 3 is a line, whereas three independent hyperplanes intersect at apoint. In R 4 , the intersection of two independent hyperplanes is a plane(Example 2.5.1.2.1), whereas three hyperplanes intersect at a line, four at apoint, and so on.For n>kA ∩ R k = {x∈ R n | Ax=b} ∩ R k =m⋂ { }x∈ R k | a i (1:k) T x=b ii=1(126)The result in2.4.2.2 is extensible; id est, any affine subset A also has avertex-description:2.22 Hyperplanes are said to be independent iff the normals defining them are linearlyindependent.

2.5. SUBSPACE REPRESENTATIONS 752.5.1.2 ...as span of nullspace basisAlternatively, we may compute a basis for the nullspace of matrix A in (124)and then equivalently express the affine subset as its range plus an offset:DefineZ ∆ = basis N(A)∈ R n×n−m (127)so AZ = 0. Then we have the vertex-description,A = {x∈ R n | Ax = b} = { Zξ + x p | ξ ∈ R n−m} ⊆ R n (128)the offset span of n − m column vectors, where x p is any particular solutionto Ax = b .2.5.1.2.1 Example. Intersecting planes in 4-space.Two planes can intersect at a point in four-dimensional Euclidean vectorspace. It is easy to visualize intersection of two planes in three dimensions;a line can be formed. In four dimensions it is harder to visualize. So let’sresort to the tools acquired.Suppose an intersection of two hyperplanes in four dimensions is specifiedby a fat full-rank matrix A 1 ∈ R 2×4 (m = 2, n = 4) as in (125):{ ∣ [ ] }∣∣∣A ∆ 1 = x∈ R 4 a11 a 12 a 13 a 14x = ba 21 a 22 a 23 a 1 (129)24The nullspace of A 1 is two dimensional (from Z in (128)), so A 1 representsa plane in four dimensions. Similarly define a second plane in terms ofA 2 ∈ R 2×4 :{ ∣ [ ] }∣∣∣A ∆ 2 = x∈ R 4 a31 a 32 a 33 a 34x = ba 41 a 42 a 43 a 2 (130)44If the two planes are independent (meaning any line in one is linearlyindependent [ ] of any line from the other), they will intersect at a point becauseA1then is invertible;A 2A 1 ∩ A 2 ={ ∣ [ ] [ ]}∣∣∣x∈ R 4 A1 b1x =A 2 b 2(131)

76 CHAPTER 2. CONVEX GEOMETRY2.5.2 Intersection of subspacesThe intersection of nullspaces associated with two matrices A∈ R m×n andB ∈ R k×n can be expressed most simply as([ AN(A) ∩ N(B) = NB])[∆ A= {x∈ R n |B]x = 0} (132)the nullspace of their rowwise concatenation.Suppose the columns of a matrix Z constitute a basis for N(A) while thecolumns of a matrix W constitute a basis for N(BZ). Then [110,12.4.2]N(A) ∩ N(B) = R(ZW) (133)If each basis is orthonormal, then the columns of ZW constitute anorthonormal basis for the intersection.In the particular circumstance A and B are each positive semidefinite[18,6], or in the circumstance A and B are two linearly independent dyads(B.1.1), thenN(A) ∩ N(B) = N(A + B),⎧⎨⎩A,B ∈ S M +orA + B = u 1 v T 1 + u 2 v T 2(l.i.)(134)2.6 Extreme, Exposed2.6.0.0.1 Definition. Extreme point.An extreme point x ε of a convex set C is a point, belonging to its closure C[30,3.3], that is not expressible as a convex combination of points in Cdistinct from x ε ; id est, for x ε ∈ C and all x 1 ,x 2 ∈ C \x εµ x 1 + (1 − µ)x 2 ≠ x ε , µ ∈ [0, 1] (135)△

2.6. EXTREME, EXPOSED 77In other words, x ε is an extreme point of C if and only if x ε is not apoint relatively interior to any line segment in C . [270,2.10]Borwein & Lewis offer: [41,4.1.6] An extreme point of a convex set C isa point x ε in C whose relative complement C \x ε is convex.The set consisting of a single point C ={x ε } is itself an extreme point.2.6.0.0.2 Theorem. Extreme existence. [232,18.5.3] [20,II.3.5]A nonempty closed convex set containing no lines has at least one extremepoint.⋄2.6.0.0.3 Definition. Face, edge. [148,A.2.3]A face F of convex set C is a convex subset F ⊆ C such that everyclosed line segment x 1 x 2 in C , having a relatively interior point(x∈rel intx 1 x 2 ) in F , has both endpoints in F . The zero-dimensionalfaces of C constitute its extreme points. The empty set and C itselfare conventional faces of C . [232,18]All faces F are extreme sets by definition; id est, for F ⊆ C and allx 1 ,x 2 ∈ C \Fµ x 1 + (1 − µ)x 2 /∈ F , µ ∈ [0, 1] (136)A one-dimensional face of a convex set is called an edge.△Dimension of a face is the penultimate number of affinely independentpoints (2.4.2.3) belonging to it;dim F = sup dim{x 2 − x 1 , x 3 − x 1 , ... , x ρ − x 1 | x i ∈ F , i=1... ρ} (137)ρThe point of intersection in C with a strictly supporting hyperplaneidentifies an extreme point, but not vice versa. The nonempty intersection ofany supporting hyperplane with C identifies a face, in general, but not viceversa. To acquire a converse, the concept exposed face requires introduction:

78 CHAPTER 2. CONVEX GEOMETRYABCDFigure 23: Closed convex set in R 2 . Point A is exposed hence extreme;a classical vertex. Point B is extreme but not an exposed point. Point Cis exposed and extreme; zero-dimensional exposure makes it a vertex.Point D is neither an exposed or extreme point although it belongs to aone-dimensional exposed face. [148,A.2.4] [249,3.6] Closed face AB isexposed; a facet. The arc is not a conventional face, yet it is composedentirely of extreme points. Union of all rotations of this entire set aboutits vertical edge produces another convex set in three dimensions havingno edges; but that convex set produced by rotation about horizontal edgecontaining D has edges.

2.6. EXTREME, EXPOSED 792.6.1 Exposure2.6.1.0.1 Definition. Exposed face, exposed point, vertex, facet.[148,A.2.3, A.2.4]Fis an exposed face of an n-dimensional convex set C iff there is asupporting hyperplane ∂H to C such thatF = C ∩ ∂H (138)Only faces of dimension −1 through n −1 can be exposed by ahyperplane.An exposed point, the definition of vertex, is equivalent to azero-dimensional exposed face; the point of intersection with a strictlysupporting hyperplane.Afacet is an (n −1)-dimensional exposed face of an n-dimensionalconvex set C ; in one-to-one correspondence with the(n −1)-dimensional faces. 2.23{exposed points} = {extreme points}{exposed faces} ⊆ {faces}△2.6.1.1 Density of exposed pointsFor any closed convex set C , its exposed points constitute a dense subset ofits extreme points; [232,18] [254] [249,3.6, p.115] dense in the sense [284]that closure of that subset yields the set of extreme points.For the convex set illustrated in Figure 23, point B cannot be exposedbecause it relatively bounds both the facet AB and the closed quarter circle,each bounding the set. Since B is not relatively interior to any line segmentin the set, then B is an extreme point by definition. Point B may be regardedas the limit of some sequence of exposed points beginning at vertex C .2.23 This coincidence occurs simply because the facet’s dimension is the same as thedimension of the supporting hyperplane exposing it.

80 CHAPTER 2. CONVEX GEOMETRY2.6.1.2 Face transitivity and algebraFaces of a convex set enjoy transitive relation. If F 1 is a face (an extreme set)of F 2 which in turn is a face of F 3 , then it is always true that F 1 is aface of F 3 . (The parallel statement for exposed faces is false. [232,18])For example, any extreme point of F 2 is an extreme point of F 3 ; inthis example, F 2 could be a face exposed by a hyperplane supportingpolyhedron F 3 . [164, def.115/6, p.358] Yet it is erroneous to presume thata face, of dimension 1 or more, consists entirely of extreme points, nor is aface of dimension 2 or more entirely composed of edges, and so on.For the polyhedron in R 3 from Figure 14, for example, the nonemptyfaces exposed by a hyperplane are the vertices, edges, and facets; thereare no more. The zero-, one-, and two-dimensional faces are in one-to-onecorrespondence with the exposed faces in that example.Define the smallest face F that contains some element G of a convexset C :F(C ∋G) (139)videlicet, C ⊇ F(C ∋G) ∋ G . An affine set has no faces except itself and theempty set. The smallest face that contains G of the intersection of convexset C with an affine set A [176,2.4]F((C ∩ A)∋G) = F(C ∋G) ∩ A (140)equals the intersection of A with the smallest face that contains G of set C .2.6.1.3 BoundaryThe classical definition of boundary of a set C presumes nonempty interior:∂ C = C \ int C (14)More suitable for the study of convex sets is the relative boundary; defined[148,A.2.1.2]rel∂C = C \ rel int C (141)the boundary relative to the affine hull of C , conventionally equivalent to:

2.7. CONES 812.6.1.3.1 Definition. Conventional boundary of convex set. [148,C.3.1]The relative boundary ∂ C of a nonempty convex set C is the union of all theexposed faces of C .△Equivalence of this definition to (141) comes about because it isconventionally presumed that any supporting hyperplane, central to thedefinition of exposure, does not contain C . [232, p.100]Any face F of convex set C (that is not C itself) belongs to rel∂C .(2.8.2.1) In the exception when C is a single point {x} , (11)rel∂{x} = {x}\{x} = ∅ , x∈ R n (142)A bounded convex polyhedron (2.12.0.0.1) having nonempty interior, forexample, in R has a boundary constructed from two points, in R 2 fromat least three line segments, in R 3 from convex polygons, while a convexpolychoron (a bounded polyhedron in R 4 [284]) has a boundary constructedfrom three-dimensional convex polyhedra.By Definition 2.6.1.3.1, an affine set has no relative boundary.2.7 ConesIn optimization, convex cones achieve prominence because they generalizesubspaces. Most compelling is the projection analogy: Projection on asubspace can be ascertained from projection on its orthogonal complement(E), whereas projection on a closed convex cone can be determined fromprojection instead on its algebraic complement (2.13,E.9.2.1); called thepolar cone.2.7.0.0.1 Definition. Ray.The one-dimensional set{ζΓ + B | ζ ≥ 0, Γ ≠ 0} ⊂ R n (143)defines a halfline called a ray in nonzero direction Γ∈ R n having baseB ∈ R n . When B=0, a ray is the conic hull of direction Γ ; hence a convexcone.△The conventional boundary of a single ray, base 0, in any dimension isthe origin because that is the union of all exposed faces not containing theentire set. Its relative interior is the ray itself excluding the origin.

82 CHAPTER 2. CONVEX GEOMETRYX(a)00(b)XFigure 24: (a) Two-dimensional nonconvex cone drawn truncated. Boundaryof this cone is itself a cone. Each polar half is itself a convex cone. (b) Thisconvex cone (drawn truncated) is a line through the origin in any dimension.It has no relative boundary, while its relative interior comprises entire line.0Figure 25: This nonconvex cone in R 2 is a pair of lines through the origin.[183,2.4]

2.7. CONES 830Figure 26: Boundary of a convex cone in R 2 is a nonconvex cone; a pair ofrays emanating from the origin.X0Figure 27: Nonconvex cone X drawn truncated in R 2 . Boundary is also acone. [183,2.4] Cone exterior is convex cone.

84 CHAPTER 2. CONVEX GEOMETRYXXFigure 28: Truncated nonconvex cone X = {x ∈ R 2 | x 1 ≥ x 2 , x 1 x 2 ≥ 0}.Boundary is also a cone. [183,2.4] Cartesian axes drawn for reference. Eachhalf (about the origin) is itself a convex cone.2.7.1 Cone definedA set X is called, simply, cone if and only ifΓ ∈ X ⇒ ζΓ ∈ X for all ζ ≥ 0 (144)where X denotes closure of cone X . An example of such a cone is theunion of two opposing quadrants; e.g., X = { x∈ R 2 | x 1 x 2 ≥0 } which is notconvex. [282,2.5] Similar examples are shown in Figure 24 and Figure 28.All cones can be defined by an aggregate of rays emanating exclusivelyfrom the origin (but not all cones are convex). Hence all closed cones containthe origin and are unbounded, excepting the simplest cone {0}. The emptyset ∅ is not a cone, but its conic hull is;cone ∅ ∆ = {0} (85)

2.7. CONES 852.7.2 Convex coneWe call the set K ⊆ R M a convex cone iffΓ 1 , Γ 2 ∈ K ⇒ ζΓ 1 + ξΓ 2 ∈ K for all ζ,ξ ≥ 0 (145)Apparent from this definition, ζΓ 1 ∈ K and ξΓ 2 ∈ K for all ζ,ξ ≥ 0. Theset K is convex since, for any particular ζ,ξ ≥ 0because µ ζ,(1 − µ)ξ ≥ 0.Obviously,µ ζΓ 1 + (1 − µ)ξΓ 2 ∈ K ∀µ ∈ [0, 1] (146){X } ⊃ {K} (147)the set of all convex cones is a proper subset of all cones. The set ofconvex cones is a narrower but more familiar class of cone, any memberof which can be equivalently described as the intersection of a possibly(but not necessarily) infinite number of hyperplanes (through the origin)and halfspaces whose bounding hyperplanes pass through the origin; ahalfspace-description (2.4). The interior of a convex cone is possibly empty.Figure 29: Not a cone; ironically, the three-dimensional flared horn (with orwithout its interior) resembling the mathematical symbol ≻ denoting conemembership and partial order.

86 CHAPTER 2. CONVEX GEOMETRYFamiliar examples of convex cones include an unbounded ice-cream coneunited with its interior (a.k.a: second-order cone, quadratic cone, circularcone (2.9.2.5.1), Lorentz cone (confer Figure 36) [46, exmps.2.3 & 2.25]),{[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l=2 (148)and any polyhedral cone (2.12.1.0.1); e.g., any orthant generated byCartesian half-axes (2.1.3). Esoteric examples of convex cones includethe point at the origin, any line through the origin, any ray having theorigin as base such as the nonnegative real line R + in subspace R , anyhalfspace partially bounded by a hyperplane through the origin, the positivesemidefinite cone S M + (161), the cone of Euclidean distance matrices EDM N(736) (6), any subspace, and Euclidean vector space R n .2.7.2.1 cone invariance(confer Figures: 17, 24, 25, 26, 27, 28, 29, 31, 33, 40, 43, 46, 48, 49,50, 51, 52, 53, 54, 100, 113, 135) More Euclidean bodies are cones,it seems, than are not. This class of convex body, the convex cone, isinvariant to nonnegative scaling, vector summation, affine and inverse affinetransformation, Cartesian product, and intersection, [232, p.22] but is notinvariant to projection; e.g., Figure 35.2.7.2.1.1 Theorem. Cone intersection. [232,2,19]The intersection of an arbitrary collection of convex cones is a convex cone.Intersection of an arbitrary collection of closed convex cones is a closedconvex cone. [190,2.3] Intersection of a finite number of polyhedral cones(2.12.1.0.1, Figure 40 p.123) is polyhedral.⋄The property pointedness is associated with a convex cone.2.7.2.1.2 Definition. Pointed convex cone. (confer2.12.2.2)A convex cone K is pointed iff it contains no line. Equivalently, K is notpointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K .[46,2.4.1] If the origin is an extreme point of K or, equivalently, ifK ∩ −K = {0} (149)then K is pointed, and vice versa. [249,2.10]△

2.7. CONES 87Thus the simplest and only bounded [282, p.75] convex coneK = {0} ⊆ R n is pointed by convention, but has empty interior. Its relativeboundary is the empty set (142) while its relative interior is the pointitself (11). The pointed convex cone that is a halfline emanating from theorigin in R n has the origin as relative boundary while its relative interior isthe halfline itself, excluding the origin.2.7.2.1.3 Theorem. Pointed cones. [41,3.3.15, exer.20]A closed convex cone K ⊂ R n is pointed if and only if there exists a normalα such that the setC ∆ = {x ∈ K | 〈x,α〉=1} (150)is closed, bounded, and K = cone C . Equivalently, if and only if there existsa vector β and positive scalar ǫ such that〈x, β〉 ≥ ǫ‖x‖ ∀x∈ K (151)is K pointed.⋄If closed convex cone K is not pointed, then it has no extreme point. Yeta pointed closed convex cone has only one extreme point; it resides at theorigin. [30,3.3]From the cone intersection theorem it follows that an intersection ofconvex cones is pointed if at least one of the cones is.2.7.2.2 Pointed closed convex cone and partial orderA pointed closed convex cone K induces partial order [284] on R n or R m×n ,[18,1] [244, p.7] respectively defined by vector or matrix inequality;x ≼Kx ≺Kz ⇔ z − x ∈ K (152)z ⇔ z − x ∈ rel int K (153)Neither x or z is necessarily a member of K for these relations to hold. Onlywhen K is the nonnegative orthant do these inequalities reduce to ordinaryentrywise comparison. (2.13.4.2.3) Inclusive of that special case, we ascribe

88 CHAPTER 2. CONVEX GEOMETRYC 1C 2(a)x + Kyx(b)y − KFigure 30: (a) Point x is the minimum element of set C 1 with respect tocone K because cone translated to x∈ C 1 contains set. (b) Point y is aminimal element of set C 2 with respect to cone K because negative conetranslated to y ∈ C 2 contains only y . (Cones drawn truncated in R 2 .)

2.7. CONES 89nomenclature generalized inequality to comparison with respect to a pointedclosed convex cone.The visceral mechanics of actually comparing points when the cone Kis not an orthant is well illustrated in the example of Figure 51 whichrelies on the equivalent membership-interpretation in definition (152) or(153). Comparable points and the minimum element of some vector- ormatrix-valued partially ordered set are thus well defined, so nonincreasingsequences with respect to cone K can therefore converge in this sense: Pointx ∈ C is the (unique) minimum element of set C with respect to cone K ifffor each and every z ∈ C we have x ≼ z ; equivalently, iff C ⊆ x + K . 2.24Further properties of partial ordering with respect to pointed closedconvex cone K are:reflexivity (x≼x)antisymmetry (x≼z , z ≼x ⇒ x=z)transitivity (x≼y , y ≼z ⇒ x≼z),homogeneity (x≼y , λ≥0 ⇒ λx≼λz),additivity (x≼z , u≼v ⇒ x+u ≼ z+v),(x≼y , y ≺z ⇒ x≺z)(x≺y , λ>0 ⇒ λx≺λz)(x≺z , u≼v ⇒ x+u ≺ z+v)A closely related concept, minimal element, is useful for partially orderedsets having no minimum element: Point x ∈ C is a minimal element of set Cwith respect to K if and only if (x − K) ∩ C = x . (Figure 30) No uniquenessis implied here, although implicit is the assumption: dim K ≥ dim aff C .2.7.2.2.1 Definition. Proper cone: [46,2.4.1] a cone that ispointedclosedconvexhas nonempty interior (is full-dimensional).△2.24 Borwein & Lewis [41,3.3, exer.21] ignore possibility of equality to x + K in thiscondition, and require a second condition: . . . and C ⊂ y + K for some y in R n impliesx ∈ y + K .

90 CHAPTER 2. CONVEX GEOMETRYA proper cone remains proper under injective linear transformation.[62,5.1]Examples of proper cones are the positive semidefinite cone S M + in theambient space of symmetric matrices (2.9), the nonnegative real line R + invector space R , or any orthant in R n .2.8 Cone boundaryEvery hyperplane supporting a convex cone contains the origin. [148,A.4.2]Because any supporting hyperplane to a convex cone must therefore itself bea cone, then from the cone intersection theorem it follows:2.8.0.0.1 Lemma. Cone faces. [20,II.8]Each nonempty exposed face of a convex cone is a convex cone. ⋄2.8.0.0.2 Theorem. Proper-cone boundary.Suppose a nonzero point Γ lies on the boundary ∂K of proper cone K in R n .Then it follows that the ray {ζΓ | ζ ≥ 0} also belongs to ∂K . ⋄Proof. By virtue of its propriety, a proper cone guarantees the existenceof a strictly supporting hyperplane at the origin. [232, Cor.11.7.3] 2.25 Hencethe origin belongs to the boundary of K because it is the zero-dimensionalexposed face. The origin belongs to the ray through Γ , and the ray belongsto K by definition (144). By the cone faces lemma, each and every nonemptyexposed face must include the origin. Hence the closed line segment 0Γ mustlie in an exposed face of K because both endpoints do by Definition 2.6.1.3.1.That means there exists a supporting hyperplane ∂H to K containing 0Γ .So the ray through Γ belongs both to K and to ∂H . ∂H must thereforeexpose a face of K that contains the ray; id est,{ζΓ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K (154)2.25 Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex conein R n not equal to R n .

2.8. CONE BOUNDARY 91Proper cone {0} in R 0 has no boundary (141) because (11)rel int{0} = {0} (155)The boundary of any proper cone in R is the origin.The boundary of any convex cone whose dimension exceeds 1 can beconstructed entirely from an aggregate of rays emanating exclusively fromthe origin.2.8.1 Extreme directionThe property extreme direction arises naturally in connection with thepointed closed convex cone K ⊂ R n , being analogous to extreme point.[232,18, p.162] 2.26 An extreme direction Γ ε of pointed K is a vectorcorresponding to an edge that is a ray emanating from the origin. 2.27Nonzero direction Γ ε in pointed K is extreme if and only ifζ 1 Γ 1 +ζ 2 Γ 2 ≠ Γ ε ∀ ζ 1 , ζ 2 ≥ 0, ∀ Γ 1 , Γ 2 ∈ K\{ζΓ ε ∈ K | ζ ≥0} (156)In words, an extreme direction in a pointed closed convex cone is thedirection of a ray, called an extreme ray, that cannot be expressed as a coniccombination of any ray directions in the cone distinct from it.An extreme ray is a one-dimensional face of K . By (86), extremedirection Γ ε is not a point relatively interior to any line segment inK\{ζΓ ε ∈ K | ζ ≥0}. Thus, by analogy, the corresponding extreme ray{ζΓ ε ∈ K | ζ ≥0} is not a ray relatively interior to any plane segment 2.28in K .2.8.1.1 extreme distinction, uniquenessAn extreme direction is unique, but its vector representation Γ ε is notbecause any positive scaling of it produces another vector in the same(extreme) direction. Hence an extreme direction is unique to within a positivescaling. When we say extreme directions are distinct, we are referring todistinctness of rays containing them. Nonzero vectors of various length in2.26 We diverge from Rockafellar’s extreme direction: “extreme point at infinity”.2.27 An edge (2.6.0.0.3) of a convex cone is not necessarily a ray. A convex cone maycontain an edge that is a line; e.g., a wedge-shaped polyhedral cone (K ∗ in Figure 31).2.28 A planar fragment; in this context, a planar cone.

92 CHAPTER 2. CONVEX GEOMETRY∂K ∗KFigure 31: K is a pointed polyhedral cone having empty interior in R 3(drawn truncated and in a plane parallel to the floor upon which you stand).K ∗ is a wedge whose truncated boundary is illustrated (drawn perpendicularto the floor). In this particular instance, K ⊂ int K ∗ (excepting the origin).Cartesian coordinate axes drawn for reference.

2.8. CONE BOUNDARY 93the same extreme direction are therefore interpreted to be identical extremedirections. 2.29The extreme directions of the polyhedral cone in Figure 17 (page 60), forexample, correspond to its three edges.The extreme directions of the positive semidefinite cone (2.9) comprisethe infinite set of all symmetric rank-one matrices. [18,6] [145,III] Itis sometimes prudent to instead consider the less infinite but completenormalized set, for M >0 (confer (194)){zz T ∈ S M | ‖z‖= 1} (157)The positive semidefinite cone in one dimension M =1, S + the nonnegativereal line, has one extreme direction belonging to its relative interior; anidiosyncrasy of dimension 1.Pointed closed convex cone K = {0} has no extreme direction becauseextreme directions are nonzero by definition.If closed convex cone K is not pointed, then it has no extreme directionsand no vertex. [18,1]Conversely, pointed closed convex cone K is equivalent to the convex hullof its vertex and all its extreme directions. [232,18, p.167] That is thepractical utility of extreme direction; to facilitate construction of polyhedralsets, apparent from the extremes theorem:2.8.1.1.1 Theorem. (Klee) Extremes. [249,3.6] [232,18, p.166](confer2.3.2,2.12.2.0.1) Any closed convex set containing no lines canbe expressed as the convex hull of its extreme points and extreme rays. ⋄It follows that any element of a convex set containing no lines maybe expressed as a linear combination of its extreme elements; e.g.,Example 2.9.2.4.1.2.8.1.2 GeneratorsIn the narrowest sense, generators for a convex set comprise any collectionof points and directions whose convex hull constructs the set.2.29 Like vectors, an extreme direction can be identified by the Cartesian point at thevector’s head with respect to the origin.

94 CHAPTER 2. CONVEX GEOMETRYWhen the extremes theorem applies, the extreme points and directionsare called generators of a convex set. An arbitrary collection of generatorsfor a convex set includes its extreme elements as a subset; the set of extremeelements of a convex set is a minimal set of generators for that convex set.Any polyhedral set has a minimal set of generators whose cardinality is finite.When the convex set under scrutiny is a closed convex cone, coniccombination of generators during construction is implicit as shown inExample 2.8.1.2.1 and Example 2.10.2.0.1. So, a vertex at the origin (if itexists) becomes benign.We can, of course, generate affine sets by taking the affine hull of anycollection of points and directions. We broaden, thereby, the meaning ofgenerator to be inclusive of all kinds of hulls.Any hull of generators is loosely called a vertex-description. (2.3.4)Hulls encompass subspaces, so any basis constitutes generators for avertex-description; span basis R(A).2.8.1.2.1 Example. Application of extremes theorem.Given an extreme point at the origin and N extreme rays, denoting the i thextreme direction by Γ i ∈ R n , then the convex hull is (76)P = { [0 Γ 1 Γ 2 · · · Γ N ]aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] aζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] b | b ≽ 0 } (158)⊂ R na closed convex set that is simply a conic hull like (84).2.8.2 Exposed direction2.8.2.0.1 Definition. Exposed point & direction of pointed convex cone.[232,18] (confer2.6.1.0.1)When a convex cone has a vertex, an exposed point, it resides at theorigin; there can be only one.In the closure of a pointed convex cone, an exposed direction is thedirection of a one-dimensional exposed face that is a ray emanatingfrom the origin.{exposed directions} ⊆ {extreme directions}△

2.8. CONE BOUNDARY 95For a proper cone in vector space R n with n ≥ 2, we can say more:{exposed directions} = {extreme directions} (159)It follows from Lemma 2.8.0.0.1 for any pointed closed convex cone, thereis one-to-one correspondence of one-dimensional exposed faces with exposeddirections; id est, there is no one-dimensional exposed face that is not a raybase 0.The pointed closed convex cone EDM 2 , for example, is a ray inisomorphic subspace R whose relative boundary (2.6.1.3.1) is the origin.The conventionally exposed directions of EDM 2 constitute the empty set∅ ⊂ {extreme direction}. This cone has one extreme direction belonging toits relative interior; an idiosyncrasy of dimension 1.2.8.2.1 Connection between boundary and extremes2.8.2.1.1 Theorem. Exposed. [232,18.7] (confer2.8.1.1.1)Any closed convex set C containing no lines (and whose dimension is atleast 2) can be expressed as the closure of the convex hull of its exposedpoints and exposed rays.⋄From Theorem 2.8.1.1.1,rel∂C = C \ rel int C (141)= conv{exposed points and exposed rays} \ rel int C= conv{extreme points and extreme rays} \ rel int C⎫⎪⎬⎪⎭(160)Thus each and every extreme point of a convex set (that is not a point)resides on its relative boundary, while each and every extreme direction of aconvex set (that is not a halfline and contains no line) resides on its relativeboundary because extreme points and directions of such respective sets donot belong to the relative interior by definition.The relationship between extreme sets and the relative boundary actuallygoes deeper: Any face F of convex set C (that is not C itself) belongs torel ∂ C , so dim F < dim C . [232,18.1.3]2.8.2.2 Converse caveatIt is inconsequent to presume that each and every extreme point and directionis necessarily exposed, as might be erroneously inferred from the conventional

96 CHAPTER 2. CONVEX GEOMETRYBCADFigure 32: Properties of extreme points carry over to extreme directions.[232,18] Four rays (drawn truncated) on boundary of conic hull oftwo-dimensional closed convex set from Figure 23 lifted to R 3 . Raythrough point A is exposed hence extreme. Extreme direction B on coneboundary is not an exposed direction, although it belongs to the exposedface cone{A,B}. Extreme ray through C is exposed. Point D is neitheran exposed or extreme direction although it belongs to a two-dimensionalexposed face of the conic hull.0

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 97boundary definition (2.6.1.3.1); although it can correctly be inferred: eachand every extreme point and direction belongs to some exposed face.Arbitrary points residing on the relative boundary of a convex set are notnecessarily exposed or extreme points. Similarly, the direction of an arbitraryray, base 0, on the boundary of a convex cone is not necessarily an exposedor extreme direction. For the polyhedral cone illustrated in Figure 17, forexample, there are three two-dimensional exposed faces constituting theentire boundary, each composed of an infinity of rays. Yet there are onlythree exposed directions.Neither is an extreme direction on the boundary of a pointed convex conenecessarily an exposed direction. Lift the two-dimensional set in Figure 23,for example, into three dimensions such that no two points in the set arecollinear with the origin. Then its conic hull can have an extreme directionB on the boundary that is not an exposed direction, illustrated in Figure 32.2.9 Positive semidefinite (PSD) coneThe cone of positive semidefinite matrices studied in this sectionis arguably the most important of all non-polyhedral cones whosefacial structure we completely understand.−Alexander Barvinok [20, p.78]2.9.0.0.1 Definition. Positive semidefinite cone.The set of all symmetric positive semidefinite matrices of particulardimension M is called the positive semidefinite cone:S M ∆+ = { A ∈ S M | A ≽ 0 }= { A ∈ S M | y T Ay ≥0 ∀ ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 ≥ 0 } (161)‖y‖=1formed by the intersection of an infinite number of halfspaces (2.4.1.1) invectorized variable A , 2.30 each halfspace having partial boundary containingthe origin in isomorphic R M(M+1)/2 . It is a unique immutable proper conein the ambient space of symmetric matrices S M .2.30 infinite in number when M >1. Because y T A y=y T A T y , matrix A is almost alwaysassumed symmetric. (A.2.1)

98 CHAPTER 2. CONVEX GEOMETRYThe positive definite (full-rank) matrices comprise the cone interior, 2.31int S M + = { A ∈ S M | A ≻ 0 }= { A ∈ S M | y T Ay>0 ∀ ‖y‖ = 1 }= {A ∈ S M + | rankA = M}(162)while all singular positive semidefinite matrices (having at least one 0eigenvalue) reside on the cone boundary (Figure 33); (A.7.5)∂S M + = { A ∈ S M | min{λ(A) i , i=1... M} = 0 }= { A ∈ S M + | 〈yy T , A〉=0 for some ‖y‖ = 1 }= { A ∈ S M + | rankA < M } (163)where λ(A)∈ R M holds the eigenvalues of A .The only symmetric positive semidefinite matrix in S M +0-eigenvalues resides at the origin. (A.7.3.0.1)△having M2.9.0.1 MembershipObserve the notation A ≽ 0 ; meaning, 2.32 matrix A belongs to the positivesemidefinite cone in the subspace of symmetric matrices, whereas A ≻ 0denotes membership to that cone’s interior. (2.13.2) Notation A ≻ 0 canbe read: symmetric matrix A is greater than zero with respect to thepositive semidefinite cone. This notation further implies that coordinates[sic] for orthogonal expansion of a positive (semi)definite matrix must be its(nonnegative) positive eigenvalues (2.13.7.1.1,E.6.4.1.1) when expanded inits eigenmatrices (A.5.1).Generalizing comparison on the real line, the notation A ≽B denotescomparison with respect to the positive semidefinite cone; (A.3.1) id est,A ≽B ⇔ A −B ∈ S M + but neither matrix A or B necessarily belongs tothe positive semidefinite cone. Yet, (1312) A ≽B , B ≽0 ⇒ A≽0 ; id est,A ∈ S M + . (confer Figure 51)2.31 The remaining inequalities in (161) also become strict for membership to the coneinterior.2.32 The symbol ≥ is reserved for scalar comparison on the real line R with respect to thenonnegative real line R + as in a T y ≥ b, while a ≽ b denotes comparison of vectors onR M with respect to the nonnegative orthant R M + (2.3.1.1).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 99γsvec ∂ S 2 +[ α ββ γ]α√2βMinimal set of generators are the extreme directions: svec{yy T | y ∈ R M }Figure 33: Truncated boundary of PSD cone in S 2 plotted in isometricallyisomorphic R 3 via svec (47); courtesy, Alexandre W. d’Aspremont. (Plottedis 0-contour of smallest eigenvalue (163). Lightest shading is closest.Darkest shading is furthest and inside shell.) Entire boundary can beconstructed from an aggregate of rays (2.7.0.0.1) emanating exclusivelyfrom the origin, { √κ 2 [z12 2z1 z 2 z2 2 ] T | κ∈ R } . In this dimension the coneis circular (2.9.2.5) while each and every ray on boundary corresponds toan extreme direction, but such is not the case in any higher dimension(confer Figure 17). PSD cone geometry is not as simple in higher dimensions[20,II.12], although for real matrices it is self-dual (322) in ambient spaceof symmetric matrices. [145,II] PSD cone has no two-dimensional faces inany dimension, and its only extreme point resides at the origin.

100 CHAPTER 2. CONVEX GEOMETRY2.9.0.1.1 Example. Equality constraints in semidefinite program (547).Employing properties of partial ordering (2.7.2.2) for the pointed closedconvex positive semidefinite cone, it is easy to show, given A + S = CS ≽ 0 ⇔ A ≼ C (164)2.9.1 Positive semidefinite cone is convexThe set of all positive semidefinite matrices forms a convex cone in theambient space of symmetric matrices because any pair satisfies definition(145); [150,7.1] videlicet, for all ζ 1 , ζ 2 ≥ 0 and each and every A 1 , A 2 ∈ S Mζ 1 A 1 + ζ 2 A 2 ≽ 0 ⇐ A 1 ≽ 0, A 2 ≽ 0 (165)a fact easily verified by the definitive test for positive semidefiniteness of asymmetric matrix (A):A ≽ 0 ⇔ x T Ax ≥ 0 for each and every ‖x‖ = 1 (166)id est, for A 1 , A 2 ≽ 0 and each and every ζ 1 , ζ 2 ≥ 0ζ 1 x T A 1 x + ζ 2 x T A 2 x ≥ 0 for each and every normalized x ∈ R M (167)The convex cone S M + is more easily visualized in the isomorphic vectorspace R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. When M = 2 the PSD cone is semi-infinite inexpanse in R 3 , having boundary illustrated in Figure 33. When M = 3 thePSD cone is six-dimensional, and so on.2.9.1.0.1 Example. Sets from maps of positive semidefinite cone.The setC = {X ∈ S n × x∈ R n | X ≽ xx T } (168)is convex because it has Schur-form; (A.4)X − xx T ≽ 0 ⇔ f(X , x) ∆ =[ X xx T 1]≽ 0 (169)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 101CXFigure 34: Convex set C ={X ∈ S × x∈ R | X ≽ xx T } drawn truncated.xe.g., Figure 34. Set C is the inverse image (2.1.9.0.1) of S n+1+ under theaffine mapping f . The set {X ∈ S n × x∈ R n | X ≼ xx T } is not convex, incontrast, having no Schur-form. Yet for fixed x = x p , the set{X ∈ S n | X ≼ x p x T p } (170)is simply the negative semidefinite cone shifted to x p x T p .2.9.1.0.2 Example. Inverse image of positive semidefinite cone.Now consider finding the set of all matrices X ∈ S N satisfyinggiven A,B∈ S N . Define the setAX + B ≽ 0 (171)X ∆ = {X | AX + B ≽ 0} ⊆ S N (172)which is the inverse image of the positive semidefinite cone under affinetransformation g(X) =AX+B ∆ . Set X must therefore be convex byTheorem 2.1.9.0.1.Yet we would like a less amorphous characterization of this set, so insteadwe consider its vectorization (30) which is easier to visualize:vec g(X) = vec(AX) + vec B = (I ⊗A) vec X + vec B (173)

102 CHAPTER 2. CONVEX GEOMETRYwhereI ⊗A ∆ = QΛQ T ∈ S N 2 (174)is a block-diagonal matrix formed by Kronecker product (A.1 no.21,D.1.2.1). Assignx ∆ = vec X ∈ R N 2b ∆ = vec B ∈ R N 2 (175)then make the equivalent problem: Findwherevec X = {x∈ R N 2 | (I ⊗A)x + b ∈ K} (176)K ∆ = vec S N + (177)is a proper cone isometrically isomorphic with the positive semidefinite conein the subspace of symmetric matrices; the vectorization of every element ofS N + . Utilizing the diagonalization (174),vec X = {x | ΛQ T x ∈ Q T (K − b)}= {x | ΦQ T x ∈ Λ † Q T (K − b)} ⊆ R N 2 (178)where †denotes matrix pseudoinverse (E) andΦ ∆ = Λ † Λ (179)is a diagonal projection matrix whose entries are either 1 or 0 (E.3). Wehave the complementary sumΦQ T x + (I − Φ)Q T x = Q T x (180)So, adding (I −Φ)Q T x to both sides of the membership within (178) admitsvec X = {x∈ R N 2 | Q T x ∈ Λ † Q T (K − b) + (I − Φ)Q T x}= {x | Q T x ∈ Φ ( Λ † Q T (K − b) ) ⊕ (I − Φ)R N 2 }= {x ∈ QΛ † Q T (K − b) ⊕ Q(I − Φ)R N 2 }= (I ⊗A) † (K − b) ⊕ N(I ⊗A)(181)where we used the facts: linear function Q T x in x on R N 2 is a bijection,and ΦΛ † = Λ † .vec X = (I ⊗A) † vec(S N + − B) ⊕ N(I ⊗A) (182)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 103In words, set vec X is the vector sum of the translated PSD cone(linearly mapped onto the rowspace of I ⊗ A (E)) and the nullspace ofI ⊗ A (synthesis of fact fromA.6.3 andA.7.3.0.1). Should I ⊗A have nonullspace, then vec X =(I ⊗A) −1 vec(S N + − B) which is the expected result.2.9.2 Positive semidefinite cone boundaryFor any symmetric positive semidefinite matrix A of rank ρ , there mustexist a rank ρ matrix Y such that A be expressible as an outer productin Y ; [251,6.3]A = Y Y T ∈ S M + , rankA=ρ , Y ∈ R M×ρ (183)Then the boundary of the positive semidefinite cone may be expressed∂S M + = { A ∈ S M + | rankA 1, resides on the positive semidefinite cone boundary.2.9.2.1.1 Exercise. Closure and rank ρ subset.Prove equality in (186).For example,∂S M + = { A ∈ S M + | rankA=M− 1 } = { A ∈ S M + | rankA≤M− 1 } (187)

104 CHAPTER 2. CONVEX GEOMETRY√2βsvec S 2 +(a)(b)α[ α ββ γ]Figure 35: (a) Projection of the PSD cone S 2 + , truncated above γ =1, onαβ-plane in isometrically isomorphic R 3 . View is from above with respect toFigure 33. (b) Truncated above γ =2. From these plots we may infer, forexample, the line { [ 0 1/ √ 2 γ ] T | γ ∈ R } intercepts the PSD cone at somelarge value of γ ; in fact, γ =∞ .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 105In S 2 , each and every ray on the boundary of the positive semidefinite conein isomorphic R 3 corresponds to a symmetric rank-1 matrix (Figure 33), butthat does not hold in any higher dimension.2.9.2.2 Subspace tangent to open rank-ρ subsetWhen the positive semidefinite cone subset in (186) is left unclosed as inM(ρ) ∆ = { A ∈ S M + | rankA=ρ } (188)then we can specify a subspace tangent to the positive semidefinite coneat a particular member of manifold M(ρ). Specifically, the subspace R Mtangent to manifold M(ρ) at B ∈ M(ρ) [135,5, prop.1.1]R M (B) ∆ = {XB + BX T | X ∈ R M×M } ⊆ S M (189)has dimension(dim svec R M (B) = ρ M − ρ − 1 )= ρ(M − ρ) +2ρ(ρ + 1)2(190)Tangent subspace R M contains no member of the positive semidefinite coneS M + whose rank exceeds ρ .A good example of such a tangent subspace is given inE.7.2.0.2 by(1794); R M (11 T ) = S M⊥c , orthogonal complement to the geometric centersubspace. (Figure 110, p.438)2.9.2.3 Faces of PSD cone, their dimension versus rankEach and every face of the positive semidefinite cone, having dimension lessthan that of the cone, is exposed. [180,6] [158,2.3.4] Because each andevery face of the positive semidefinite cone contains the origin (2.8.0.0.1),each face belongs to a subspace of the same dimension.Given positive semidefinite matrix A∈ S M + , define F ( S M + ∋A ) (139) asthe smallest face that contains A of the positive semidefinite cone S M + . ThenA , having ordered diagonalization QΛQ T (A.5.2), is relatively interior to[20,II.12] [77,31.5.3] [176,2.4]F ( S M + ∋A ) = {X ∈ S M + | N(X) ⊇ N(A)}= {X ∈ S M + | 〈Q(I − ΛΛ † )Q T , X〉 = 0}≃ S rank A+(191)

106 CHAPTER 2. CONVEX GEOMETRYwhich is isomorphic with the convex cone S rank A+ . Thus dimension of thesmallest face containing given matrix A isdim F ( S M + ∋A ) = rank(A)(rank(A) + 1)/2 (192)in isomorphic R M(M+1)/2 , and each and every face of S M + is isomorphic witha positive semidefinite cone having dimension the same as the face. Observe:not all dimensions are represented, and the only zero-dimensional face is theorigin. The positive semidefinite cone has no facets, for example.2.9.2.3.1 Table: Rank k versus dimension of S 3 + facesk dim F(S 3 + ∋ rank-k matrix)0 0boundary 1 12 3interior 3 6For the positive semidefinite cone S 2 + in isometrically isomorphic R 3depicted in Figure 33, for example, rank-2 matrices belong to the interior ofthe face having dimension 3 (the entire closed cone), while rank-1 matricesbelong to the relative interior of a face having dimension 1 (the boundaryconstitutes all the one-dimensional faces, in this dimension, which are raysemanating from the origin), and the only rank-0 matrix is the point at theorigin (the zero-dimensional face).Any simultaneously diagonalizable positive semidefinite rank-k matricesbelong to the same face (191). That observation leads to the followinghyperplane characterization of PSD cone faces: Any rank-k

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 107makes projectors Q(:,k+1:M)Q(:,k+1:M) T indexed by k , in other words,each projector describing a normal svec ( Q(:,k+1:M)Q(:,k+1:M) T) to asupporting hyperplane ∂H + (containing the origin) exposing (2.11) a faceof the positive semidefinite cone containing only rank-k matrices.2.9.2.4 Extreme directions of positive semidefinite coneBecause the positive semidefinite cone is pointed (2.7.2.1.2), there is aone-to-one correspondence of one-dimensional faces with extreme directionsin any dimension M ; id est, because of the cone faces lemma (2.8.0.0.1)and the direct correspondence of exposed faces to faces of S M + , it followsthere is no one-dimensional face of the positive semidefinite cone that is nota ray emanating from the origin.Symmetric dyads constitute the set of all extreme directions: For M >0{yy T ∈ S M | y ∈ R M } ⊂ ∂S M + (194)this superset (confer (157)) of extreme directions for the positive semidefinitecone is, generally, a subset of the boundary. For two-dimensional matrices,(Figure 33){yy T ∈ S 2 | y ∈ R 2 } = ∂S 2 + (195)while for one-dimensional matrices, in exception, (2.7){yy T ∈ S | y≠0} = int S + (196)Each and every extreme direction yy T makes the same angle with theidentity matrix in isomorphic R M(M+1)/2 , dependent only on dimension;videlicet, 2.33(yy T , I) = arccos( )〈yy T , I〉 1= arccos √‖yy T ‖ F ‖I‖ F M∀y ∈ R M (197)2.33 Analogy with respect to the EDM cone is considered by Hayden & Wells et alii[134, p.162] where it is found: angle is not constant. The extreme directions of the EDMcone can be found in6.5.3.1 while the cone axis is −E =11 T − I (928).

108 CHAPTER 2. CONVEX GEOMETRY2.9.2.4.1 Example. Positive semidefinite matrix from extreme directions.Diagonalizability (A.5) of symmetric matrices yields the following results:Any symmetric positive semidefinite matrix (1279) can be written in theformA = ∑ λ i z i zi T = ÂÂT = ∑ â i â T i ≽ 0, λ ≽ 0 (198)iia conic combination of linearly independent extreme directions (â i â T i or z i z T iwhere ‖z i ‖=1), where λ is a vector of eigenvalues.If we limit consideration to all symmetric positive semidefinite matricesbounded such that trA=1C ∆ = {A ≽ 0 | trA = 1} (199)then any matrix A from that set may be expressed as a convex combinationof linearly independent extreme directions;A = ∑ iλ i z i z T i ∈ C , 1 T λ = 1, λ ≽ 0 (200)Implications are:1. set C is convex, (it is an intersection of PSD cone with hyperplane)2. because the set of eigenvalues corresponding to a given square matrix Ais unique, no single eigenvalue can exceed 1 ; id est, I ≽ A .Set C is an instance of Fantope (81).2.9.2.5 Positive semidefinite cone is generally not circularExtreme angle equation (197) suggests that the positive semidefinite conemight be invariant to rotation about its axis of revolution; id est, a circularcone. We investigate this now:2.9.2.5.1 Definition. Circular cone: 2.34a pointed closed convex cone having hyperspherical sections orthogonal toits axis of revolution about which the cone is invariant to rotation. △2.34 A circular cone is assumed convex throughout, although not so by other authors. Wealso assume a right circular cone.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 109R0Figure 36: This circular cone continues upward infinitely. Axis of revolutionis illustrated as vertical line segment through origin. R is the radius, thedistance measured from any extreme direction to axis of revolution. Werethis a Lorentz cone, any plane slice containing the axis of revolution wouldmake a right angle.

110 CHAPTER 2. CONVEX GEOMETRYA conic section is the intersection of a cone with any hyperplane. In threedimensions, an intersecting plane perpendicular to a circular cone’s axis ofrevolution produces a section bounded by a circle. (Figure 36) A prominentexample of a circular cone in convex analysis is the Lorentz cone (148). Wealso find that the positive semidefinite cone and cone of Euclidean distancematrices are circular cones, but only in low dimension.The positive semidefinite cone has axis of revolution that is the ray(base 0) through the identity matrix I . Consider the set of normalizedextreme directions of the positive semidefinite cone: for some arbitrarypositive constant a∈ R +{yy T ∈ S M | ‖y‖ = √ a} ⊂ ∂S M + (201)The distance from each extreme direction to the axis of revolution is theradius√R ∆ = infc ‖yyT − cI‖ F = a1 − 1 M(202)which is the distance from yy T to a M I ; the length of vector yyT − a M I .Because distance R (in a particular dimension) from the axis of revolutionto each and every normalized extreme direction is identical, the extremedirections lie on the boundary of a hypersphere in isometrically isomorphicR M(M+1)/2 . From Example 2.9.2.4.1, the convex hull (excluding the vertexat the origin) of the normalized extreme directions is a conic sectionC ∆ = conv{yy T | y ∈ R M , y T y = a} = S M + ∩ {A∈ S M | 〈I , A〉 = a} (203)orthogonal to the identity matrix I ;〈C − a M I , I〉 = tr(C − a I) = 0 (204)MAlthough the positive semidefinite cone possesses some characteristics ofa circular cone, we can prove it is not by demonstrating a shortage of extremedirections; id est, some extreme directions corresponding to each and everyangle of rotation about the axis of revolution are nonexistent: Referring toFigure 37, [290,1-7]

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 111yy TaI θ aMRM 11TFigure 37: Illustrated is a section, perpendicular to axis of revolution, ofcircular cone from Figure 36. Radius R is distance from any extremedirection to axis at a I . Vector aM M 11T is an arbitrary reference by whichto measure angle θ .cos θ =〈 aM 11T − a M I , yyT − a M I〉a 2 (1 − 1 M ) (205)Solving for vector y we geta(1 + (M −1) cosθ) = (1 T y) 2 (206)Because this does not have real solution for every matrix dimension M andfor all 0 ≤ θ ≤ 2π , then we can conclude that the positive semidefinite conemight be circular but only in matrix dimensions 1 and 2 . 2.35 Because of a shortage of extreme directions, conic section (203) cannotbe hyperspherical by the extremes theorem (2.8.1.1.1).2.35 In fact, the positive semidefinite cone is circular in matrix dimensions 1 and 2 whileit is a rotation of the Lorentz cone in matrix dimension 2.

112 CHAPTER 2. CONVEX GEOMETRY0-1-0.5p =[ 1/21]10.500.5110.750.50 0.25p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 120-1-0.50.50[ 1p =1]10.5110.750.50 0.250-1-0.5A 11√2A12A 220.50[ 2p =1]10.5110.750.50 0.25svec ∂S 2 +Figure 38: Polyhedral proper cone K , created by intersection of halfspaces,inscribes PSD cone in isometrically isomorphic R 3 as predicted by Geršgorindiscs theorem for A=[A ij ]∈ S 2 . Hyperplanes supporting K intersect alongboundary of PSD cone. Four extreme directions of K coincide with extremedirections of PSD cone.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1132.9.2.5.2 Example. PSD cone inscription in three dimensions.Theorem. Geršgorin discs. [150,6.1] [276]For p∈R m + given A=[A ij ]∈ S m , then all eigenvalues of A belong to the unionof m closed intervals on the real line;λ(A) ∈ m ⋃i=1⎧⎪⎨ξ ∈ R⎪⎩|ξ − A ii | ≤ ̺i∆= 1 m∑p ij=1j ≠ i⎫⎪⎬⋃p j |A ij | = m [A ii −̺i , A ii +̺i]i=1 ⎪⎭(207)Furthermore, if a union of k of these m [intervals] forms a connected regionthat is disjoint from all the remaining n −k [intervals], then there areprecisely k eigenvalues of A in this region.⋄To apply the theorem to determine positive semidefiniteness of symmetricmatrix A , we observe that for each i we must haveSupposeA ii ≥ ̺i (208)m = 2 (209)so A ∈ S 2 . Vectorizing A as in (47), svec A belongs to isometricallyisomorphic R 3 . Then we have m2 m−1 = 4 inequalities, in the matrix entriesA ij with Geršgorin parameters p =[p i ]∈ R 2 + ,p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 12(210)which describe an intersection of four halfspaces in R m(m+1)/2 . Thatintersection creates the polyhedral proper cone K (2.12.1) whoseconstruction is illustrated in Figure 38. Drawn truncated is the boundaryof the positive semidefinite cone svec S 2 + and the bounding hyperplanessupporting K .Created by means of Geršgorin discs, K always belongs to the positivesemidefinite cone for any nonnegative value of p ∈ R m + . Hence any point inK corresponds to some positive semidefinite matrix A . Only the extremedirections of K intersect the positive semidefinite cone boundary in thisdimension; the four extreme directions of K are extreme directions of the

114 CHAPTER 2. CONVEX GEOMETRYpositive semidefinite cone. As p 1 /p 2 increases in value from 0, two extremedirections of K sweep the entire boundary of this positive semidefinite cone.Because the entire positive semidefinite cone can be swept by K , the systemof linear inequalitiesY T svec A ∆ =[p1 ±p 2 / √ ]2 00 ±p 1 / √ svec A ≽ 0 (211)2 p 2when made dynamic can replace a semidefinite constraint A≽0 ; id est, forgiven p where Y ∈ R m(m+1)/2×m2m−1butK = {z | Y T z ≽ 0} ⊂ svec S m + (212)svec A ∈ K ⇒ A ∈ S m + (213)∃p Y T svec A ≽ 0 ⇔ A ≽ 0 (214)In other words, diagonal dominance [150, p.349,7.2.3]A ii ≥m∑|A ij | , ∀i = 1... m (215)j=1j ≠ iis only a sufficient condition for membership to the PSD cone; but bydynamic weighting p in this dimension, it was made necessary andsufficient.In higher dimension (m > 2), the boundary of the positive semidefinitecone is no longer constituted completely by its extreme directions (symmetricrank-one matrices); the geometry becomes complicated. How all the extremedirections can be swept by an inscribed polyhedral cone, 2.36 similarly to theforegoing example, remains an open question.2.9.2.5.3 Exercise. Dual inscription.Find dual polyhedral proper cone K ∗ from Figure 38.2.36 It is not necessary to sweep the entire boundary in higher dimension.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1152.9.2.6 Boundary constituents of the positive semidefinite cone2.9.2.6.1 Lemma. Sum of positive semidefinite matrices.For A,B ∈ S M +rank(A + B) = rank(µA + (1 −µ)B) (216)over the open interval (0, 1) of µ .⋄Proof. Any positive semidefinite matrix belonging to the PSD conehas an eigen decomposition that is a positively scaled sum of linearlyindependent symmetric dyads. By the linearly independent dyads definitioninB.1.1.0.1, rank of the sum A +B is equivalent to the number of linearlyindependent dyads constituting it. Linear independence is insensitive tofurther positive scaling by µ . The assumption of positive semidefinitenessprevents annihilation of any dyad from the sum A +B . 2.9.2.6.2 Example. Rank function quasiconcavity. (confer3.3)For A,B ∈ R m×n [150,0.4]that follows from the fact [251,3.6]rankA + rankB ≥ rank(A + B) (217)dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (218)For A,B ∈ S M +[46,3.4.2]rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} (219)that follows from the factN(A + B) = N(A) ∩ N(B) , A,B ∈ S M + (134)Rank is a quasiconcave function on S M + because the right-hand inequality in(219) has the concave form (540); videlicet, Lemma 2.9.2.6.1. From this example we see, unlike convex functions, quasiconvex functionsare not necessarily continuous. (3.3) We also glean:

116 CHAPTER 2. CONVEX GEOMETRY2.9.2.6.3 Theorem. Convex subsets of positive semidefinite cone.The subsets of the positive semidefinite cone S M + , for 0≤ρ≤MS M +(ρ) ∆ = {X ∈ S M + | rankX ≥ ρ} (220)are pointed convex cones, but not closed unless ρ = 0 ; id est, S M +(0)= S M + .⋄Proof. Given ρ , a subset S M +(ρ) is convex if and only ifconvex combination of any two members has rank at least ρ . That isconfirmed applying identity (216) from Lemma 2.9.2.6.1 to (219); id est, forA,B ∈ S M +(ρ) on the closed interval µ∈[0, 1]rank(µA + (1 −µ)B) ≥ min{rankA, rankB} (221)It can similarly be shown, almost identically to proof of the lemma, any coniccombination of A,B in subset S M +(ρ) remains a member; id est, ∀ζ, ξ ≥0rank(ζA + ξB) ≥ min{rank(ζA), rank(ξB)} (222)Therefore, S M +(ρ) is a convex cone.Another proof of convexity can be made by projection arguments:2.9.2.7 Projection on S M +(ρ)Because these cones S M +(ρ) indexed by ρ (220) are convex, projection onthem is straightforward. Given a symmetric matrix H having diagonalizationH = ∆ QΛQ T ∈ S M (A.5.2) with eigenvalues Λ arranged in nonincreasingorder, then its Euclidean projection (minimum-distance projection) on S M +(ρ)P S M+ (ρ)H = QΥ ⋆ Q T (223)corresponds to a map of its eigenvalues:Υ ⋆ ii ={ max {ǫ , Λii } , i=1... ρmax {0, Λ ii } , i=ρ+1... M(224)where ǫ is positive but arbitrarily close to 0.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1172.9.2.7.1 Exercise. Projection on open convex cones.Prove (224) using Theorem E.9.2.0.1.Because each H ∈ S M has unique projection on S M +(ρ) (despite possibilityof repeated eigenvalues in Λ), we may conclude it is a convex set by theBunt-Motzkin theorem (E.9.0.0.1).Compare (224) to the well-known result regarding Euclidean projectionon a rank ρ subset of the positive semidefinite cone (2.9.2.1)S M + \ S M +(ρ + 1) = {X ∈ S M + | rankX ≤ ρ} (225)P S M+ \S M + (ρ+1) H = QΥ ⋆ Q T (226)As proved in7.1.4, this projection of H corresponds to the eigenvalue mapΥ ⋆ ii ={ max {0 , Λii } , i=1... ρ0 , i=ρ+1... M(1173)Together these two results (224) and (1173) mean: A higher-rank solutionto projection on the positive semidefinite cone lies arbitrarily close to anygiven lower-rank projection, but not vice versa. Were the number ofnonnegative eigenvalues in Λ known a priori not to exceed ρ , then thesetwo different projections would produce identical results in the limit ǫ→0.2.9.2.8 Uniting constituentsInterior of the PSD cone int S M + is convex by Theorem 2.9.2.6.3, for example,because all positive semidefinite matrices having rank M constitute the coneinterior.All positive semidefinite matrices of rank less than M constitute the coneboundary; an amalgam of positive semidefinite matrices of different rank.Thus each nonconvex subset of positive semidefinite matrices, for 0

118 CHAPTER 2. CONVEX GEOMETRYThe composite sequence, the cone interior in union with each successiveconstituent, remains convex at each step; id est, for 0≤k ≤MM⋃{Y ∈ S M + | rankY = ρ} (229)ρ=kis convex for each k by Theorem 2.9.2.6.3.2.9.2.9 Peeling constituentsProceeding the other way: To peel constituents off the complete positivesemidefinite cone boundary, one starts by removing the origin; the onlyrank-0 positive semidefinite matrix. What remains is convex. Next, theextreme directions are removed because they constitute all the rank-1 positivesemidefinite matrices. What remains is again convex, and so on. Proceedingin this manner eventually removes the entire boundary leaving, at last, theconvex interior of the PSD cone; all the positive definite matrices.2.9.2.9.1 Exercise. Difference A − B .What about the difference of matrices A,B belonging to the positivesemidefinite cone? Show:The difference of any two points on the boundary belongs to theboundary or exterior.The difference A−B , where A belongs to the boundary while B isinterior, belongs to the exterior.2.9.3 Barvinok’s propositionBarvinok posits existence and quantifies an upper bound on rank of a positivesemidefinite matrix belonging to the intersection of the PSD cone with anaffine subset:2.9.3.0.1 Proposition. (Barvinok) Affine intersection with PSD cone.[20,II.13] [21,2.2] Consider finding a matrix X ∈ S N satisfyingX ≽ 0 , 〈A j , X〉 = b j , j =1... m (230)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 119given nonzero linearly independent A j ∈ S N and real b j . Define the affinesubsetA ∆ = {X | 〈A j , X〉=b j , j =1... m} ⊆ S N (231)If the intersection A ∩ S N + is nonempty given a number m of equalities,then there exists a matrix X ∈ A ∩ S N + such thatrankX (rankX + 1)/2 ≤ m (232)whence the upper bound 2.37⌊√ ⌋ 8m + 1 − 1rankX ≤2Given desired rank instead, equivalently,(233)m < (rankX + 1)(rankX + 2)/2 (234)An extreme point of A ∩ S N + satisfies (233) and (234). (confer4.1.1.2)A matrix X =R ∆ T R is an extreme point if and only if the smallest face thatcontains X of A ∩ S N + has dimension 0 ; [176,2.4] id est, iff (139)dim F ( (A ∩ S N +)∋X )= rank(X)(rank(X) + 1)/2 − rank [ (235)svec RA 1 R T svec RA 2 R T · · · svec RA m R ] Tequals 0 in isomorphic R N(N+1)/2 .Now the intersection A ∩ S N + is assumed bounded: Assume a givennonzero upper bound ρ on rank, a number of equalitiesm=(ρ + 1)(ρ + 2)/2 (236)and matrix dimension N ≥ ρ + 2 ≥ 3. If the intersection is nonempty andbounded, then there exists a matrix X ∈ A ∩ S N + such thatrankX ≤ ρ (237)This represents a tightening of the upper bound; a reduction by exactly 1of the bound provided by (233) given the same specified number m (236) ofequalities; id est,rankX ≤√ 8m + 1 − 12− 1 (238)⋄2.374.1.1.2 contains an intuitive explanation. This bound is itself limited above, of course,by N ; a tight limit corresponding to an interior point of S N + .

120 CHAPTER 2. CONVEX GEOMETRYWhen the intersection A ∩ S N + is known a priori to consist only of asingle point, then Barvinok’s proposition provides the greatest upper boundon its rank not exceeding N . The intersection can be a single nonzero pointonly if the number of linearly independent hyperplanes m constituting Asatisfies 2.38 N(N + 1)/2 − 1 ≤ m ≤ N(N + 1)/2 (239)2.10 Conic independence (c.i.)In contrast to extreme direction, the property conically independent directionis more generally applicable, inclusive of all closed convex cones (notonly pointed closed convex cones). Similar to the definition for linearindependence, arbitrary given directions {Γ i ∈ R n , i=1... N} are conicallyindependent if and only if, for all ζ ∈ R N +Γ i ζ i + · · · + Γ j ζ j − Γ l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (240)has only the trivial solution ζ =0; in words, iff no direction from the givenset can be expressed as a conic combination of those remaining. (Figure 39,for example. A Matlab implementation of test (240) is given inF.2.) Itis evident that linear independence (l.i.) of N directions implies their conicindependence;l.i. ⇒ c.i.Arranging any set of generators for a particular convex cone in a matrixcolumnar,X ∆ = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (241)then the relationship l.i. ⇒ c.i. suggests: the number of l.i. generators inthe columns of X cannot exceed the number of c.i. generators. Denoting byk the number of conically independent generators contained in X , we havethe most fundamental rank inequality for convex conesdim aff K = dim aff[0 X ] = rankX ≤ k ≤ N (242)Whereas N directions in n dimensions can no longer be linearly independentonce N exceeds n , conic independence remains possible:2.38 For N >1, N(N+1)/2 −1 independent hyperplanes in R N(N+1)/2 can make a line

2.10. CONIC INDEPENDENCE (C.I.) 121000(a) (b) (c)Figure 39: Vectors in R 2 : (a) affinely and conically independent,(b) affinely independent but not conically independent, (c) conicallyindependent but not affinely independent. None of the examples exhibitslinear independence. (In general, a.i. c.i.)2.10.0.0.1 Table: Maximum number of c.i. directionsn supk (pointed) supk (not pointed)0 0 01 1 22 2 43.∞.∞.Assuming veracity of this table, there is an apparent vastness between twoand three dimensions. The finite numbers of conically independent directionsindicate:Convex cones in dimensions 0, 1, and 2 must be polyhedral. (2.12.1)Conic independence is certainly one convex idea that cannot be completelyexplained by a two-dimensional picture. [20, p.vii]From this table it is also evident that dimension of Euclidean space cannotexceed the number of conically independent directions possible;n ≤ supktangent to svec ∂ S N + at a point because all one-dimensional faces of S N + are exposed.Because a pointed convex cone has only one vertex, the origin, there can be no intersectionof svec ∂ S N + with any higher-dimensional affine subset A that will make a nonzero point.

122 CHAPTER 2. CONVEX GEOMETRY2.10.0.0.2 Exercise. Conically independent columns and rows.We suspect the number of conically independent columns (rows) of X tobe the same for X †T , where † denotes matrix pseudoinverse (E). Provewhether it holds that the columns (rows) of X are c.i. ⇔ the columns (rows)of X †T are c.i.2.10.1 Preservation of conic independenceIndependence in the linear (2.1.2.1), affine (2.4.2.4), and conic senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (241)holds a conically independent set columnar. Consider the transformationT(X) : R n×N → R n×N ∆ = XY (243)where the given matrix Y = ∆ [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Conic independence of {Xy i ∈ R n , i=1... N} demands, bydefinition (240),Xy i ζ i + · · · + Xy j ζ j − Xy l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (244)have no nontrivial solution ζ ∈ R N + . That is ensured by conic independenceof {y i ∈ R N } and by R(Y )∩ N(X) = 0 ; seen by factoring X .2.10.1.1 linear maps of cones[18,7] If K is a convex cone in Euclidean space R and T is any linearmapping from R to Euclidean space M , then T(K) is a convex cone in Mand x ≼ y with respect to K implies T(x) ≼ T(y) with respect to T(K).If K is closed or has nonempty interior in R , then so is T(K) in M .If T is a linear bijection, then x ≼ y ⇔ T(x) ≼ T(y). Further, if F isa face of K , then T(F) is a face of T(K).2.10.2 Pointed closed convex K & conic independenceThe following bullets can be derived from definitions (156) and (240) inconjunction with the extremes theorem (2.8.1.1.1):The set of all extreme directions from a pointed closed convex coneK ⊂ R n is not necessarily a linearly independent set, yet it must be a conicallyindependent set; (compare Figure 17 on page 60 with Figure 40(a))

2.10. CONIC INDEPENDENCE (C.I.) 123K(a)K(b)∂K ∗Figure 40: (a) A pointed polyhedral cone (drawn truncated) in R 3 having sixfacets. The extreme directions, corresponding to six edges emanating fromthe origin, are generators for this cone; not linearly independent but theymust be conically independent. (b) The boundary of dual cone K ∗ (drawntruncated) is now added to the drawing of same K . K ∗ is polyhedral, proper,and has the same number of extreme directions as K has facets.

124 CHAPTER 2. CONVEX GEOMETRY{extreme directions} ⇒ {c.i.}Conversely, when a conically independent set of directions from pointedclosed convex cone K is known a priori to comprise generators, then alldirections from that set must be extreme directions of the cone;{extreme directions} ⇔ {c.i. generators of pointed closed convex K}Barker & Carlson [18,1] call the extreme directions a minimal generatingset for a pointed closed convex cone. A minimal set of generators is thereforea conically independent set of generators, and vice versa, 2.39 for a pointedclosed convex cone.Any collection of n or fewer extreme directions from pointed closedconvex cone K ⊂ R n must be linearly independent;{≤ n extreme directions in R n } ⇒ {l.i.}Conversely, because l.i. ⇒ c.i.,{extreme directions} ⇐ {l.i. generators of pointed closed convex K}2.10.2.0.1 Example. Vertex-description of halfspace H about origin.From n + 1 points in R n we can make a vertex-description of a convexcone that is a halfspace H , where {x l ∈ R n , l=1... n} constitutes aminimal set of generators for a hyperplane ∂H through the origin. Anexample is illustrated in Figure 41. By demanding the augmented set{x l ∈ R n , l=1... n + 1} be affinely independent (we want x n+1 not parallelto ∂H), thenH = ⋃ (ζ x n+1 + ∂H)ζ ≥0= {ζ x n+1 + cone{x l ∈ R n , l=1... n} | ζ ≥0}= cone{x l ∈ R n , l=1... n + 1}(245)a union of parallel hyperplanes. Cardinality is one step beyond dimension ofthe ambient space, but {x l ∀l} is a minimal set of generators for this convexcone H which has no extreme elements.2.39 This converse does not hold for nonpointed closed convex cones as Table 2.10.0.0.1implies; e.g., ponder four conically independent generators for a plane (case n=2).

2.10. CONIC INDEPENDENCE (C.I.) 125Hx 2x 30∂HFigure 41: Minimal set of generators X = [x 1 x 2 x 3 ]∈ R 2×3 for halfspaceabout origin.x 12.10.3 Utility of conic independencePerhaps the most useful application of conic independence is determinationof the intersection of closed convex cones from their halfspace-descriptions,or representation of the sum of closed convex cones from theirvertex-descriptions.⋂KiA halfspace-description for the intersection of any number of closedconvex cones K i can be acquired by pruning normals; specifically,only the conically independent normals from the aggregate of all thehalfspace-descriptions need be retained.∑KiGenerators for the sum of any number of closed convex cones K i canbe determined by retaining only the conically independent generatorsfrom the aggregate of all the vertex-descriptions.Such conically independent sets are not necessarily unique or minimal.

126 CHAPTER 2. CONVEX GEOMETRY2.11 When extreme means exposedFor any convex polyhedral set in R n having nonempty interior, distinctionbetween the terms extreme and exposed vanishes [249,2.4] [77,2.2] forfaces of all dimensions except n ; their meanings become equivalent as wesaw in Figure 14 (discussed in2.6.1.2). In other words, each and every faceof any polyhedral set (except the set itself) can be exposed by a hyperplane,and vice versa; e.g., Figure 17.Lewis [180,6] [158,2.3.4] claims nonempty extreme proper subsets andthe exposed subsets coincide for S n + ; id est, each and every face of the positivesemidefinite cone, whose dimension is less than the dimension of the cone,is exposed. A more general discussion of cones having this property can befound in [259]; e.g., the Lorentz cone (148) [17,II.A].2.12 Convex polyhedraEvery polyhedron, such as the convex hull (76) of a bounded list X , canbe expressed as the solution set of a finite system of linear equalities andinequalities, and vice versa. [77,2.2]2.12.0.0.1 Definition. Convex polyhedra, halfspace-description.[46,2.2.4] A convex polyhedron is the intersection of a finite number ofhalfspaces and hyperplanes;P = {y | Ay ≽ b, Cy = d} ⊆ R n (246)where coefficients A and C generally denote matrices. Each row of C is avector normal to a hyperplane, while each row of A is a vector inward-normalto a hyperplane partially bounding a halfspace.△By the halfspaces theorem in2.4.1.1.1, a polyhedron thus described is aclosed convex set having possibly empty interior; e.g., Figure 14. Convexpolyhedra 2.40 are finite-dimensional comprising all affine sets (2.3.1),polyhedral cones, line segments, rays, halfspaces, convex polygons, solids[164, def.104/6, p.343], polychora, polytopes, 2.41 etcetera.2.40 We consider only convex polyhedra throughout, but acknowledge the existence ofconcave polyhedra. [284, Kepler-Poinsot Solid]2.41 Some authors distinguish bounded polyhedra via the designation polytope. [77,2.2]

2.12. CONVEX POLYHEDRA 127It follows from definition (246) by exposure that each face of a convexpolyhedron is a convex polyhedron.The projection of any polyhedron on a subspace remains a polyhedron.More generally, the image of a polyhedron under any linear transformationis a polyhedron. [20,I.9]When b and d in (246) are 0, the resultant is a polyhedral cone. Theset of all polyhedral cones is a subset of convex cones:2.12.1 Polyhedral coneFrom our study of cones, we see: the number of intersecting hyperplanes andhalfspaces constituting a convex cone is possibly but not necessarily infinite.When the number is finite, the convex cone is termed polyhedral. That isthe primary distinguishing feature between the set of all convex cones andpolyhedra; all polyhedra, including polyhedral cones, are finitely generated[232,19]. We distinguish polyhedral cones in the set of all convex cones forthis reason, although all convex cones of dimension 2 or less are polyhedral.2.12.1.0.1 Definition. Polyhedral cone, halfspace-description. 2.42(confer (253)) A polyhedral cone is the intersection of a finite number ofhalfspaces and hyperplanes about the origin;K = {y | Ay ≽ 0, Cy = 0} ⊆ R n (a)= {y | Ay ≽ 0, Cy ≽ 0, Cy ≼ 0} (b)⎧ ⎡ ⎤ ⎫⎨ A ⎬=⎩ y | ⎣ C ⎦y ≽ 0(c)⎭−C(247)where coefficients A and C generally denote matrices of finite dimension.Each row of C is a vector normal to a hyperplane containing the origin,while each row of A is a vector inward-normal to a hyperplane containingthe origin and partially bounding a halfspace.△A polyhedral cone thus defined is closed, convex, possibly has emptyinterior, and only a finite number of generators (2.8.1.2), and vice versa.(Minkowski/Weyl) [249,2.8] [232, thm.19.1]2.42 Rockafellar [232,19] proposes affine sets be handled via complementary pairs of affineinequalities; e.g., Cy ≽d and Cy ≼d .

128 CHAPTER 2. CONVEX GEOMETRYFrom the definition it follows that any single hyperplane through theorigin, or any halfspace partially bounded by a hyperplane through the originis a polyhedral cone. The most familiar example of polyhedral cone is anyquadrant (or orthant,2.1.3) generated by Cartesian half-axes. Esotericexamples of polyhedral cone include the point at the origin, any line throughthe origin, any ray having the origin as base such as the nonnegative realline R + in subspace R , polyhedral flavors of the (proper) Lorentz cone(confer (148)){[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l=1 or ∞ (248)any subspace, and R n . More examples are illustrated in Figure 40 andFigure 17.2.12.2 Vertices of convex polyhedraBy definition, a vertex (2.6.1.0.1) always lies on the relative boundary of aconvex polyhedron. [164, def.115/6, p.358] In Figure 14, each vertex of thepolyhedron is located at the intersection of three or more facets, and everyedge belongs to precisely two facets [20,VI.1, p.252]. In Figure 17, the onlyvertex of that polyhedral cone lies at the origin.The set of all polyhedral cones is clearly a subset of convex polyhedra anda subset of convex cones. Not all convex polyhedra are bounded, evidently,neither can they all be described by the convex hull of a bounded set of pointsas we defined it in (76). Hence we propose a universal vertex-description ofpolyhedra in terms of that same finite-length list X (66):2.12.2.0.1 Definition. Convex polyhedra, vertex-description.(confer2.8.1.1.1) Denote the truncated a-vector,[ ]aia i:l = .a l(249)By discriminating a suitable finite-length generating list (or set) arrangedcolumnar in X ∈ R n×N , then any particular polyhedron may be describedP = { Xa | a T 1:k1 = 1, a m:N ≽ 0, {1... k} ∪ {m ...N} = {1... N} } (250)where 0 ≤ k ≤ N and 1 ≤ m ≤ N + 1 . Setting k = 0 removes the affineequality condition. Setting m=N + 1 removes the inequality. △

2.12. CONVEX POLYHEDRA 129Coefficient indices in (250) may or may not be overlapping, but allthe coefficients are assumed constrained. From (68), (76), and (84), wesummarize how the coefficient conditions may be applied;affine sets −→ a T 1:k 1 = 1polyhedral cones −→ a m:N ≽ 0}←− convex hull (m ≤ k) (251)It is always possible to describe a convex hull in the region of overlappingindices because, for 1 ≤ m ≤ k ≤ N{a m:k | a T m:k1 = 1, a m:k ≽ 0} ⊆ {a m:k | a T 1:k1 = 1, a m:N ≽ 0} (252)Members of a generating list are not necessarily vertices of thecorresponding polyhedron; certainly true for (76) and (250), some subsetof list members reside in the polyhedron’s relative interior. Conversely, whenboundedness (76) applies, the convex hull of the vertices is a polyhedronidentical to the convex hull of the generating list.2.12.2.1 Vertex-description of polyhedral coneGiven closed convex cone K in a subspace of R n having any set of generatorsfor it arranged in a matrix X ∈ R n×N as in (241), then that cone is describedsetting m=1 and k=0 in vertex-description (250):K = cone(X) = {Xa | a ≽ 0} ⊆ R n (253)a conic hull, like (84), of N generators.This vertex description is extensible to an infinite number of generators;which follows from the extremes theorem (2.8.1.1.1) and Example 2.8.1.2.1.2.12.2.2 Pointedness[249,2.10] Assuming all generators constituting the columns of X ∈ R n×Nare nonzero, polyhedral cone K is pointed (2.7.2.1.2) if and only ifthere is no nonzero a ≽ 0 that solves Xa=0; id est, iff N(X) ∩ R N + = 0.(If rankX = n , then the dual cone K ∗ is pointed. (269))A polyhedral proper cone in R n must have at least n linearly independentgenerators, or be the intersection of at least n halfspaces whose partialboundaries have normals that are linearly independent. Otherwise, the cone

130 CHAPTER 2. CONVEX GEOMETRYS = {s | s ≽ 0, 1 T s ≤ 1}1Figure 42: Unit simplex S in R 3 is a unique solid tetrahedron, but is notregular.will contain at least one line and there can be no vertex; id est, the conecannot otherwise be pointed.For any pointed polyhedral cone, there is a one-to-one correspondence ofone-dimensional faces with extreme directions.Examples of pointed closed convex cones K are not limited to polyhedralcones: the origin, any 0-based ray in a subspace, any two-dimensionalV-shaped cone in a subspace, the Lorentz (ice-cream) cone and its polyhedralflavors, the cone of Euclidean distance matrices EDM N in S N h , the propercones: S M + in ambient S M , any orthant in R n or R m×n ; e.g., the nonnegativereal line R + in vector space R .

2.12. CONVEX POLYHEDRA 1312.12.3 Unit simplexA peculiar convex subset of the nonnegative orthant havinghalfspace-descriptionS ∆ = {s | s ≽ 0, 1 T s ≤ 1} ⊆ R n + (254)is a unique bounded convex polyhedron called unit simplex (Figure 42)having nonempty interior, n + 1 vertices, and dimension [46,2.2.4]dim S = n (255)The origin supplies one vertex while heads of the standard basis [150][251] {e i , i=1... n} in R n constitute those remaining; 2.43 thus itsvertex-description:S2.12.3.1 Simplex= conv {0, {e i , i=1... n}}= { [0 e 1 e 2 · · · e n ]a | a T 1 = 1, a ≽ 0 } (256)The unit simplex comes from a class of general polyhedra called simplex,having vertex-description: [64] [232] [282] [77]conv{x l ∈ R n } | l = 1... k+1, dim aff{x l } = k , n ≥ k (257)So defined, a simplex is a closed bounded convex set having possibly emptyinterior. Examples of simplices, by increasing affine dimension, are: a point,any line segment, any triangle and its relative interior, a general tetrahedron,polychoron, and so on.2.12.3.1.1 Definition. Simplicial cone.A polyhedral proper (2.7.2.2.1) cone K in R n is called simplicial iff Khas exactly n extreme directions; [17,II.A] equivalently, iff proper K hasexactly n linearly independent generators contained in any given set ofgenerators.△There are an infinite variety of simplicial cones in R n ; e.g., Figure 17,Figure 43, Figure 52. Any orthant is simplicial, as is any rotation thereof.2.43 In R 0 the unit simplex is the point at the origin, in R the unit simplex is the linesegment [0,1], in R 2 it is a triangle and its relative interior, in R 3 it is the convex hull ofa tetrahedron (Figure 42), in R 4 it is the convex hull of a pentatope [284], and so on.

132 CHAPTER 2. CONVEX GEOMETRYFigure 43: Two views of a simplicial cone and its dual in R 3 (second view onnext page). Semi-infinite boundary of each cone is truncated for illustration.Cartesian axes are drawn for reference.

2.12. CONVEX POLYHEDRA 133

134 CHAPTER 2. CONVEX GEOMETRY2.12.4 Converting between descriptionsConversion between halfspace-descriptions (246) (247) and equivalentvertex-descriptions (76) (250) is nontrivial, in general, [13] [77,2.2] but theconversion is easy for simplices. [46,2.2] Nonetheless, we tacitly assume thetwo descriptions to be equivalent. [232,19, thm.19.1] We explore conversionsin2.13.4 and2.13.9:2.13 Dual cone & generalized inequality& biorthogonal expansionThese three concepts, dual cone, generalized inequality, and biorthogonalexpansion, are inextricably melded; meaning, it is difficult to completelydiscuss one without mentioning the others. The dual cone is critical in testsfor convergence by contemporary primal/dual methods for numerical solutionof conic problems. [301] [206,4.5] For unique minimum-distance projectionon a closed convex cone K , the negative dual cone −K ∗ plays the role thatorthogonal complement plays for subspace projection. 2.44 (E.9.2.1) Indeed,−K ∗ is the algebraic complement in R n ;K ⊞ −K ∗ = R n (258)where ⊞ denotes unique orthogonal vector sum.One way to think of a pointed closed convex cone is as a new kind ofcoordinate system whose basis is generally nonorthogonal; a conic system,very much like the familiar Cartesian system whose analogous cone isthe first quadrant or nonnegative orthant. Generalized inequality ≽ Kis a formalized means to determine membership to any pointed closedconvex cone (2.7.2.2) whereas biorthogonal expansion is, fundamentally, anexpression of coordinates in a pointed conic system. When cone K is thenonnegative orthant, then these three concepts come into alignment with theCartesian prototype; biorthogonal expansion becomes orthogonal expansion.2.44 Namely, projection on a subspace is ascertainable from its projection on the orthogonalcomplement.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1352.13.1 Dual coneFor any set K (convex or not), the dual cone [46,2.6.1] [73,4.2]K ∗ ∆ = { y ∈ R n | 〈y , x〉 ≥ 0 for all x ∈ K } (259)is a unique cone 2.45 that is always closed and convex because it is anintersection of halfspaces (halfspaces theorem (2.4.1.1.1)) whose partialboundaries each contain the origin, each halfspace having inward-normal xbelonging to K ; e.g., Figure 44(a).When cone K is convex, there is a second and equivalent construction:Dual cone K ∗ is the union of each and every vector y inward-normal toa hyperplane supporting K or bounding a halfspace containing K ; e.g.,Figure 44(b). When K is represented by a halfspace-description such as(247), for example, where⎡a T1A =∆ ⎣ .⎤⎡c T1⎦∈ R m×n , C =∆ ⎣ .⎤⎦∈ R p×n (260)a T mc T pthen the dual cone can be represented as the conic hullK ∗ = cone{a 1 ,..., a m , ±c 1 ,..., ±c p } (261)a vertex-description, because each and every conic combination of normalsfrom the halfspace-description of K yields another inward-normal to ahyperplane supporting or bounding a halfspace containing K .K ∗ can also be constructed pointwise using projection theory fromE.9.2:for P K x the Euclidean projection of point x on closed convex cone K−K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (1819)2.13.1.0.1 Exercise. Manual dual cone construction.Perhaps the most instructive graphical method of dual cone construction iscut-and-try. Find the dual of each polyhedral cone from Figure 45 by usingdual cone equation (259).2.45 The dual cone is the negative polar cone defined by many authors; K ∗ = −K ◦ .[148,A.3.2] [232,14] [29] [20] [249,2.7]

136 CHAPTER 2. CONVEX GEOMETRYK ∗(a)0K ∗KyK(b)0Figure 44: Two equivalent constructions of dual cone K ∗ in R 2 : (a) Showingconstruction by intersection of halfspaces about 0 (drawn truncated). Onlythose two halfspaces whose bounding hyperplanes have inward-normalcorresponding to an extreme direction of this pointed closed convex coneK ⊂ R 2 need be drawn; by (320). (b) Suggesting construction by union ofinward-normals y to each and every hyperplane ∂H + supporting K . Thisinterpretation is valid when K is convex because existence of a supportinghyperplane is then guaranteed (2.4.2.6).

2.13. DUAL CONE & GENERALIZED INEQUALITY 137R 2 R 310.8(a)0.60.40.20∂K ∗K∂K ∗K(b)−0.2−0.4−0.6K ∗−0.8−1−0.5 0 0.5 1 1.5x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) (318)Figure 45: Dual cone construction by right angle. Each extreme direction of apolyhedral cone is orthogonal to a facet of its dual cone, and vice versa, in anydimension. (2.13.6.1) (a) This characteristic guides graphical constructionof dual cone in two dimensions: It suggests finding dual-cone boundary ∂by making right angles with extreme directions of polyhedral cone. Theconstruction is then pruned so that each dual boundary vector does notexceed π/2 radians in angle with each and every vector from polyhedralcone. Were dual cone in R 2 to narrow, Figure 46 would be reached in limit.(b) Same polyhedral cone and its dual continued into three dimensions.(confer Figure 52)

138 CHAPTER 2. CONVEX GEOMETRYKK ∗0Figure 46: K is a halfspace about the origin in R 2 . K ∗ is a ray base 0,hence has empty interior in R 2 ; so K cannot be pointed. (Both convexcones appear truncated.)2.13.1.0.2 Exercise. Dual cone definitions.What is {x∈ R n | x T z ≥0 ∀z∈R n } ?What is {x∈ R n | x T z ≥1 ∀z∈R n } ?What is {x∈ R n | x T z ≥1 ∀z∈R n +} ?As defined, dual cone K ∗ exists even when the affine hull of the originalcone is a proper subspace; id est, even when the original cone has emptyinterior. Rockafellar formulates the dimension of K and K ∗ . [232,14] 2.46To further motivate our understanding of the dual cone, consider theease with which convergence can be observed in the following optimizationproblem (p):2.13.1.0.3 Example. Dual problem. (confer4.1)Duality is a powerful and widely employed tool in applied mathematics fora number of reasons. First, the dual program is always convex even if theprimal is not. Second, the number of variables in the dual is equal to thenumber of constraints in the primal which is often less than the number of2.46 His monumental work Convex Analysis has not one figure or illustration. See[20,II.16] for a good illustration of Rockafellar’s recession cone [30].

2.13. DUAL CONE & GENERALIZED INEQUALITY 139f(x, z p ) or f(x)f(x p , z) or g(z)xzFigure 47: Although objective functions from conic problems (264p) and(264d) are linear, this is a mnemonic icon for primal and dual problems.When problems are strong duals, duality gap is 0 ; meaning, functionsf(x) and g(z) (dotted) kiss at saddle value, as depicted at center.Otherwise, dual functions never meet (f(x) > g(z)) by (262). Drawing byhttp://en.wikipedia.org/wiki/User:Kieff

140 CHAPTER 2. CONVEX GEOMETRYvariables in the primal program. Third, the maximum value achieved by thedual problem is often equal to the minimum of the primal. [224,2.1.3]Essentially, duality theory concerns representation of a given optimizationproblem as half a minimax problem. [232,36] [46,5.4] Given any realfunction f(x,z)minimizexalways holds. Whenminimizexmaximizezmaximizezf(x,z) ≥ maximizezf(x,z) = maximizezminimize f(x,z) (262)xminimize f(x,z) (263)xwe have strong duality and then a saddle value [104] exists. (Figure 47)[229, p.3] Consider primal conic problem (p) (over cone K) and itscorresponding dual problem (d): [219,3.3.1] [176,2.1] given vectors α , βand matrix constant C(p)minimize α T xxsubject to x ∈ KCx = βmaximize β T zy , zsubject to y ∈ K ∗C T z + y = α(d) (264)Observe the dual problem is also conic, and its objective function value neverexceeds that of the primal;α T x ≥ β T zx T (C T z + y) ≥ (Cx) T zx T y ≥ 0(265)which holds by definition (259). Under the sufficient condition: (264p) isa convex problem and satisfies Slater’s condition, 2.47 then each problem (p)and (d) attains the same optimal value of its objective and each problemis called a strong dual to the other because the duality gap (primal−dualoptimal objective difference) is 0. Then (p) and (d) are together equivalentto the minimax problemminimize α T x − β T zx,y,zsubject to x ∈ K ,y ∈ K ∗Cx = β , C T z + y = α(p)−(d) (266)2.47 A convex problem, essentially, has convex objective function optimized over a convexset. (4) In this context, (p) is convex if K is a convex cone. Slater’s condition is satisfiedwhenever any primal strictly feasible point exists. (p.235)

2.13. DUAL CONE & GENERALIZED INEQUALITY 141whose optimal objective always has the saddle value 0 (regardless of theparticular convex cone K and other problem parameters). [271,3.2] Thusdetermination of convergence for either primal or dual problem is facilitated.Were convex cone K polyhedral (2.12.1), then problems (p) and (d)would be linear programs. Were K a positive semidefinite cone, thenproblem (p) has the form of the prototypical semidefinite program 2.48 (547).It is sometimes possible to solve a primal problem by way of its dual;advantageous when the dual problem is easier to solve than the primalproblem, for example, because it can be solved analytically, or has some specialstructure that can be exploited. [46,5.5.5] (4.2.3.1)2.13.1.1 Key properties of dual coneFor any cone, (−K) ∗ = −K ∗For any cones K 1 and K 2 , K 1 ⊆ K 2 ⇒ K ∗ 1 ⊇ K ∗ 2 [249,2.7](Cartesian product) For closed convex cones K 1 and K 2 , theirCartesian product K = K 1 × K 2 is a closed convex cone, andK ∗ = K ∗ 1 × K ∗ 2 (267)(conjugation) [232,14] [73,4.5] When K is any convex cone, the dualof the dual cone is the closure of the original cone; K ∗∗ = K . BecauseK ∗∗∗ = K ∗ K ∗ = (K) ∗ (268)When K is closed and convex, then the dual of the dual cone is theoriginal cone; K ∗∗ = K .If any cone K has nonempty interior, then K ∗ is pointed;K nonempty interior ⇒ K ∗ pointed (269)2.48 A semidefinite program is any convex program constraining a variable to any subset ofa positive semidefinite cone. The qualifier prototypical generally means, by convention: aprogram having linear objective, affine equality constraints, but no inequality constraintsexcept for cone membership. The self-dual nonnegative orthant cone yields the primalprototypical linear program and its dual.

142 CHAPTER 2. CONVEX GEOMETRYConversely, if the closure of any convex cone K is pointed, then K ∗ hasnonempty interior;K pointed ⇒ K ∗ nonempty interior (270)Given that a cone K ⊂ R n is closed and convex, K is pointed ifand only if K ∗ − K ∗ = R n ; id est, iff K ∗has nonempty interior.[41,3.3, exer.20](vector sum) [232, thm.3.8] For convex cones K 1 and K 2K 1 + K 2 = conv(K 1 ∪ K 2 ) (271)(dual vector-sum) [232,16.4.2] [73,4.6] For convex cones K 1 and K 2K ∗ 1 ∩ K ∗ 2 = (K 1 + K 2 ) ∗ = (K 1 ∪ K 2 ) ∗ (272)(closure of vector sum of duals) 2.49 For closed convex cones K 1 and K 2(K 1 ∩ K 2 ) ∗ = K ∗ 1 + K ∗ 2 = conv(K ∗ 1 ∪ K ∗ 2) (273)where closure becomes superfluous under the condition K 1 ∩ int K 2 ≠ ∅[41,3.3, exer.16,4.1, exer.7].(Krein-Rutman) For closed convex cones K 1 ⊆ R m and K 2 ⊆ R nand any linear map A : R n → R m , then provided int K 1 ∩ AK 2 ≠ ∅[41,3.3.13, confer4.1, exer.9](A −1 K 1 ∩ K 2 ) ∗ = A T K ∗ 1 + K ∗ 2 (274)where the dual of cone K 1 is with respect to its ambient space R m andthe dual of cone K 2 is with respect to R n , where A −1 K 1 denotes theinverse image (2.1.9.0.1) of K 1 under mapping A , and where A Tdenotes the adjoint operation.2.49 These parallel analogous results for subspaces R 1 , R 2 ⊆ R n ; [73,4.6]R ⊥⊥ = R for any subspace R.(R 1 + R 2 ) ⊥ = R ⊥ 1 ∩ R ⊥ 2(R 1 ∩ R 2 ) ⊥ = R ⊥ 1 + R⊥ 2

2.13. DUAL CONE & GENERALIZED INEQUALITY 143K is proper if and only if K ∗ is proper.K is polyhedral if and only if K ∗ is polyhedral. [249,2.8]K is simplicial if and only if K ∗ is simplicial. (2.13.9.2) A simplicialcone and its dual are polyhedral proper cones (Figure 52, Figure 43),but not the converse.K ⊞ −K ∗ = R n ⇔ K is closed and convex. (1818) (p.692)Any direction in a proper cone K is normal to a hyperplane separatingK from −K ∗ .2.13.1.2 Examples of dual coneWhen K is R n , K ∗ is the point at the origin, and vice versa.When K is a subspace, K ∗ is its orthogonal complement, and vice versa.(E.9.2.1, Figure 48)When cone K is a halfspace in R n with n > 0 (Figure 46 for example),the dual cone K ∗ is a ray (base 0) belonging to that halfspace but orthogonalto its bounding hyperplane (that contains the origin), and vice versa.When convex cone K is a closed halfplane in R 3 (Figure 49), it is neitherpointed or of nonempty interior; hence, the dual cone K ∗ can be neither ofnonempty interior or pointed.When K is any particular orthant in R n , the dual cone is identical; id est,K = K ∗ .When K is any quadrant in subspace R 2 , K ∗ is a wedge-shaped polyhedralcone in R 3 ; e.g., for K equal to quadrant I , K ∗ =[R2+RWhen K is a polyhedral flavor of the Lorentz cone K l (248), the dual isthe polyhedral proper cone K q : for l=1 or ∞K q = K ∗ l ={[ xt].]}∈ R n × R | ‖x‖ q ≤ t(275)where ‖x‖ q is the dual norm determined via solution to 1/l + 1/q = 1.

144 CHAPTER 2. CONVEX GEOMETRYK ∗K0R 3 R 2K ∗K0Figure 48: When convex cone K is any one Cartesian axis, its dual coneis the convex hull of all axes remaining. In R 3 , dual cone K ∗ (drawn tiledand truncated) is a hyperplane through origin; its normal belongs to line K .In R 2 , dual cone K ∗ is a line through origin while convex cone K is that lineorthogonal.

2.13. DUAL CONE & GENERALIZED INEQUALITY 145KK ∗Figure 49: K and K ∗ are halfplanes in R 3 ; blades. Both semi-infinite convexcones appear truncated. Each cone is like K in Figure 46, but embeddedin a two-dimensional subspace of R 3 . Cartesian coordinate axes drawn forreference.

146 CHAPTER 2. CONVEX GEOMETRY2.13.2 Abstractions of Farkas’ lemma2.13.2.0.1 Corollary. Generalized inequality and membership relation.[148,A.4.2] Let K be any closed convex cone and K ∗ its dual, and let xand y belong to a vector space R n . Theny ∈ K ∗ ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (276)which is, merely, a statement of fact by definition of dual cone (259). Byclosure we have conjugation:x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (277)which may be regarded as a simple translation of the Farkas lemma [89] asin [232,22] to the language of convex cones, and a generalization of thewell-known Cartesian factx ≽ 0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (278)for which implicitly K = K ∗ = R n + the nonnegative orthant.Membership relation (277) is often written instead as dual generalizedinequalities, when K and K ∗ are pointed closed convex cones,x ≽K0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (279)K ∗meaning, coordinates for biorthogonal expansion of x (2.13.8) [277] mustbe nonnegative when x belongs to K . By conjugation [232, thm.14.1]y ≽K ∗0 ⇔ 〈y , x〉 ≥ 0 for all x ≽K0 (280)⋄When pointed closed convex cone K is not polyhedral, coordinate axesfor biorthogonal expansion asserted by the corollary are taken from extremedirections of K ; expansion is assured by Carathéodory’s theorem (E.6.4.1.1).We presume, throughout, the obvious:x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (277)⇔x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ , ‖y‖= 1(281)

2.13. DUAL CONE & GENERALIZED INEQUALITY 1472.13.2.0.2 Exercise. Test of dual generalized inequalities.Test Corollary 2.13.2.0.1 and (281) graphically on the two-dimensionalpolyhedral cone and its dual in Figure 45.When pointed closed convex cone K(confer2.7.2.2)x ≽ 0 ⇔ x ∈ Kx ≻ 0 ⇔ x ∈ rel int Kis implicit from context:(282)Strict inequality x ≻ 0 means coordinates for biorthogonal expansion of xmust be positive when x belongs to rel int K . Strict membership relationsare useful; e.g., for any proper cone K and its dual K ∗x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ , y ≠ 0 (283)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ (284)By conjugation, we also have the dual relations:y ∈ int K ∗ ⇔ 〈y , x〉 > 0 for all x ∈ K , x ≠ 0 (285)y ∈ K ∗ , y ≠ 0 ⇔ 〈y , x〉 > 0 for all x ∈ int K (286)Boundary-membership relations for proper cones are also useful:x ∈ ∂K ⇔ ∃ y 〈y , x〉 = 0, y ∈ K ∗ , y ≠ 0, x ∈ K (287)y ∈ ∂K ∗ ⇔ ∃ x 〈y , x〉 = 0, x ∈ K , x ≠ 0, y ∈ K ∗ (288)2.13.2.0.3 Example. Linear inequality. [256,4](confer2.13.5.1.1) Consider a given matrix A and closed convex cone K .By membership relation we haveThis impliesAy ∈ K ∗ ⇔ x T Ay≥0 ∀x ∈ K⇔ y T z ≥0 ∀z ∈ {A T x | x ∈ K}⇔ y ∈ {A T x | x ∈ K} ∗ (289){y | Ay ∈ K ∗ } = {A T x | x ∈ K} ∗ (290)If we regard A as a linear operator, then A T is its adjoint. When, forexample, K is the self-dual nonnegative orthant, (2.13.5.1) then{y | Ay ≽ 0} = {A T x | x ≽ 0} ∗ (291)

148 CHAPTER 2. CONVEX GEOMETRY2.13.2.1 Null certificate, Theorem of the alternativeIf in particular x p /∈ K a closed convex cone, then the constructionin Figure 44(b) suggests there exists a supporting hyperplane (havinginward-normal belonging to dual cone K ∗ ) separating x p from K ; indeed,(277)x p /∈ K ⇔ ∃ y ∈ K ∗ 〈y , x p 〉 < 0 (292)The existence of any one such y is a certificate of null membership. From adifferent perspective,x p ∈ Kor in the alternative∃ y ∈ K ∗ 〈y , x p 〉 < 0(293)By alternative is meant: these two systems are incompatible; one system isfeasible while the other is not.2.13.2.1.1 Example. Theorem of the alternative for linear inequality.Myriad alternative systems of linear inequality can be explained in terms ofpointed closed convex cones and their duals.Beginning from the simplest Cartesian dual generalized inequalities (278)(with respect to the nonnegative orthant R m + ),y ≽ 0 ⇔ x T y ≥ 0 for all x ≽ 0 (294)Given A∈ R n×m , we make vector substitution A T y ← yA T y ≽ 0 ⇔ x T A T y ≥ 0 for all x ≽ 0 (295)Introducing a new vector by calculating b ∆ = Ax we getA T y ≽ 0 ⇔ b T y ≥ 0, b = Ax for all x ≽ 0 (296)By complementing sense of the scalar inequality:A T y ≽ 0or in the alternativeb T y < 0, ∃ b = Ax, x ≽ 0(297)

2.13. DUAL CONE & GENERALIZED INEQUALITY 149If one system has a solution, then the other does not; define aconvex cone K={y ∆ | A T y ≽0} , then y ∈ K or in the alternative y /∈ K .Scalar inequality b T y

150 CHAPTER 2. CONVEX GEOMETRYderived from (304) simply by taking the complementary sense of theinequality in x T b . These two systems are alternatives; if one system hasa solution, then the other does not. 2.50 [232, p.201]By invoking a strict membership relation between proper cones (283),we can construct a more exotic alternative strengthened by demand for aninterior point;b − Ay ≻0 ⇔ x T b > 0, A T x=0 ∀x ≽K ∗ K0, x ≠ 0 (306)From this, alternative systems of generalized inequality [46, pages:50,54,262]Ay ≺K ∗or in the alternativeb(307)x T b ≤ 0, A T x=0, x ≽K0, x ≠ 0derived from (306) by taking the complementary sense of the inequalityin x T b .And from this, alternative systems with respect to the nonnegativeorthant attributed to Gordan in 1873: [111] [41,2.2] substituting A ←A Tand setting b = 0A T y ≺ 0or in the alternativeAx = 0, x ≽ 0, ‖x‖ 1 = 1(308)2.50 If solutions at ±∞ are disallowed, then the alternative systems become insteadmutually exclusive with respect to nonpolyhedral cones. Simultaneous infeasibility ofthe two systems is not precluded by mutual exclusivity; called a weak alternative.Ye provides an example illustrating simultaneous [ ] infeasibility[ with respect ] to the positivesemidefinite cone: x∈ S 2 1 00 1, y ∈ R , A = , and b = where x0 01 0T b means〈x , b〉 . A better strategy than disallowing solutions at ±∞ is to demand an interiorpoint as in (307) or Lemma 4.2.1.1.2. Then question of simultaneous infeasibility is moot.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1512.13.3 Optimality conditionThe general first-order necessary and sufficient condition for optimalityof solution x ⋆ to a minimization problem ((264p) for example) withreal differentiable convex objective function f(x) : R n →R is [231,3](confer2.13.10.1) (Figure 55)∇f(x ⋆ ) T (x − x ⋆ ) ≥ 0 ∀x ∈ C , x ⋆ ∈ C (309)where C is the feasible set (a convex set of all variable values satisfying theproblem constraints), and where ∇f(x ⋆ ) is the gradient of f (3.1.8) withrespect to x evaluated at x ⋆ . Direct solution to (309) instead is known asa variation inequality from the calculus of variations. [183, p.178] [88, p.37]2.13.3.0.1 Example. Equality constrained problem.Given a real differentiable convex function f(x) : R n →R defined ondomain R n , a fat full-rank matrix C ∈ R p×n , and vector d∈ R p , the convexoptimization problemminimize f(x)x(310)subject to Cx = dis characterized by the well-known necessary and sufficient optimalitycondition [46,4.2.3]∇f(x ⋆ ) + C T ν = 0 (311)where ν ∈ R p is the eminent Lagrange multiplier. [230] Feasible solution x ⋆is optimal, in other words, if and only if ∇f(x ⋆ ) belongs to R(C T ). Viamembership relation, we now derive this particular condition from the generalfirst-order condition for optimality (309):In this case, the feasible set isC ∆ = {x∈ R n | Cx = d} = {Zξ + x p | ξ ∈ R n−rank C } (312)where Z ∈ R n×n−rank C holds basis N(C) columnar, and x p is any particularsolution to Cx = d . Since x ⋆ ∈ C , we arbitrarily choose x p = x ⋆ whichyields the equivalent optimality condition∇f(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (313)But this is simply half of a membership relation, and the cone dual toR n−rank C is the origin in R n−rank C . We must therefore haveZ T ∇f(x ⋆ ) = 0 ⇔ ∇f(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (314)

152 CHAPTER 2. CONVEX GEOMETRYmeaning, ∇f(x ⋆ ) must be orthogonal to N(C). This conditionZ T ∇f(x ⋆ ) = 0, x ⋆ ∈ C (315)is necessary and sufficient for optimality of x ⋆ .2.13.4 Discretization of membership relation2.13.4.1 Dual halfspace-descriptionHalfspace-description of the dual cone is equally simple (and extensibleto an infinite number of generators) as vertex-description (253) for thecorresponding closed convex cone: By definition (259), for X ∈ R n×N as in(241), (confer (247))K ∗ = { y ∈ R n | z T y ≥ 0 for all z ∈ K }= { y ∈ R n | z T y ≥ 0 for all z = Xa , a ≽ 0 }= { y ∈ R n | a T X T y ≥ 0, a ≽ 0 }= { y ∈ R n | X T y ≽ 0 } (316)that follows from the generalized inequality and membership corollary (278).The semi-infinity of tests specified by all z ∈ K has been reduced to a setof generators for K constituting the columns of X ; id est, the test has beendiscretized.Whenever K is known to be closed and convex, then the converse mustalso hold; id est, given any set of generators for K ∗ arranged columnarin Y , then the consequent vertex-description of the dual cone connotes ahalfspace-description for K : [249,2.8]K ∗ = {Y a | a ≽ 0} ⇔ K ∗∗ = K = { z | Y T z ≽ 0 } (317)2.13.4.2 First dual-cone formulaFrom these two results (316) and (317) we deduce a general principle:From any given vertex-description of a convex cone K , ahalfspace-description of the dual cone K ∗ is immediate by matrixtransposition; conversely, from any given halfspace-description, a dualvertex-description is immediate.

2.13. DUAL CONE & GENERALIZED INEQUALITY 153Various other converses are just a little trickier. (2.13.9)We deduce further: For any polyhedral cone K , the dual cone K ∗ is alsopolyhedral and K ∗∗ = K . [249,2.8]The generalized inequality and membership corollary is discretized in thefollowing theorem [18,1] 2.51 that follows directly from (316) and (317):2.13.4.2.1 Theorem. Discrete membership. (confer2.13.2.0.1)Given any set of generators (2.8.1.2) denoted by G(K) for closed convexcone K ⊆ R n and any set of generators denoted G(K ∗ ) for its dual, let xand y belong to vector space R n . Then discretization of the generalizedinequality and membership corollary is necessary and sufficient for certifyingmembership:x ∈ K ⇔ 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ ) (318)y ∈ K ∗ ⇔ 〈γ , y〉 ≥ 0 for all γ ∈ G(K) (319)⋄2.13.4.2.2 Exercise. Test of discretized dual generalized inequalities.Test Theorem 2.13.4.2.1 on Figure 45(a) using the extreme directions asgenerators.From the discrete membership theorem we may further deduce a moresurgical description of dual cone that prescribes only a finite number ofhalfspaces for its construction when polyhedral: (Figure 44(a))K ∗ = {y ∈ R n | 〈γ , y〉 ≥ 0 for all γ ∈ G(K)} (320)2.13.4.2.3 Exercise. Comparison with respect to orthant.When comparison is with respect to the nonnegative orthant K = R n + , thenfrom the discrete membership theorem it directly follows:x ≼ z ⇔ x i ≤ z i ∀i (321)Generate simple counterexamples demonstrating that this equivalence withentrywise inequality holds only when the underlying cone inducing partialorder is the nonnegative orthant.2.51 Barker & Carlson state the theorem only for the pointed closed convex case.

154 CHAPTER 2. CONVEX GEOMETRY2.13.5 Dual PSD cone and generalized inequalityThe dual positive semidefinite cone K ∗ is confined to S M by convention;S M ∗+∆= {Y ∈ S M | 〈Y , X〉 ≥ 0 for all X ∈ S M + } = S M + (322)The positive semidefinite cone is self-dual in the ambient space of symmetricmatrices [46, exmp.2.24] [28] [145,II]; K = K ∗ .Dual generalized inequalities with respect to the positive semidefinite conein the ambient space of symmetric matrices can therefore be simply stated:(Fejér)X ≽ 0 ⇔ tr(Y T X) ≥ 0 for all Y ≽ 0 (323)Membership to this cone can be determined in the isometrically isomorphicEuclidean space R M2 via (31). (2.2.1) By the two interpretations in2.13.1,positive semidefinite matrix Y can be interpreted as inward-normal to ahyperplane supporting the positive semidefinite cone.The fundamental statement of positive semidefiniteness, y T Xy ≥0 ∀y(A.3.0.0.1), evokes a particular instance of these dual generalizedinequalities (323):X ≽ 0 ⇔ 〈yy T , X〉 ≥ 0 ∀yy T (≽ 0) (1271)Discretization (2.13.4.2.1) allows replacement of positive semidefinitematrices Y with this minimal set of generators comprising the extremedirections of the positive semidefinite cone (2.9.2.4).2.13.5.1 self-dual conesFrom (111) (a consequence of the halfspaces theorem (2.4.1.1.1)), where theonly finite value of the support function for a convex cone is 0 [148,C.2.3.1],or from discretized definition (320) of the dual cone we get a ratherself-evident characterization of self-duality:K = K ∗ ⇔ K = ⋂ {y | γ T y ≥ 0 } (324)γ∈G(K)In words: Cone K is self-dual iff its own extreme directions areinward-normals to a (minimal) set of hyperplanes bounding halfspaces whoseintersection constructs it. This means each extreme direction of K is normal

2.13. DUAL CONE & GENERALIZED INEQUALITY 155(a)K0 ∂K ∗∂K ∗K(b)Figure 50: Two (truncated) views of a polyhedral cone K and its dual in R 3 .Each of four extreme directions from K belongs to a face of dual cone K ∗ .Shrouded (inside) cone K is symmetrical about its axis of revolution. Eachpair of diametrically opposed extreme directions from K makes a right angle.An orthant (or any rotation thereof; a simplicial cone) is not the only self-dualpolyhedral cone in three or more dimensions; [17,2.A.21] e.g., consider anequilateral with five extreme directions. In fact, every self-dual polyhedralcone in R 3 has an odd number of extreme directions. [19, thm.3]

156 CHAPTER 2. CONVEX GEOMETRYto a hyperplane exposing one of its own faces; a necessary but insufficientcondition for self-duality (Figure 50, for example).Self-dual cones are of necessarily nonempty interior [25,I] and invariantto rotation about the origin. Their most prominent representatives are theorthants, the positive semidefinite cone S M + in the ambient space of symmetricmatrices (322), and the Lorentz cone (148) [17,II.A] [46, exmp.2.25]. Inthree dimensions, a plane containing the axis of revolution of a self-dual cone(and the origin) will produce a slice whose boundary makes a right angle.2.13.5.1.1 Example. Linear matrix inequality. (confer2.13.2.0.3)Consider a peculiar vertex-description for a closed convex cone definedover a positive semidefinite cone (instead of a nonnegative orthant as indefinition (84)): for X ∈ S n given A j ∈ S n , j =1... m⎧⎡⎨K = ⎣⎩⎧⎡⎨= ⎣⎩〈A 1 , X〉.〈A m , X〉⎤ ⎫⎬⎦ | X ≽ 0⎭ ⊆ Rm⎤ ⎫svec(A 1 ) T⎬. ⎦svec X | X ≽ 0svec(A m ) T ⎭∆= {A svec X | X ≽ 0}(325)where A∈ R m×n(n+1)/2 , and where symmetric vectorization svec is definedin (47). Cone K is indeed convex because, by (145)A svec X p1 , A svec X p2 ∈ K ⇒ A(ζ svec X p1 +ξ svec X p2 )∈ K for all ζ,ξ ≥ 0(326)since a nonnegatively weighted sum of positive semidefinite matrices must bepositive semidefinite. (A.3.1.0.2) Although matrix A is finite-dimensional,K is generally not a polyhedral cone (unless m equals 1 or 2) simply becauseX ∈ S n + . Provided the A j matrices are linearly independent, thenrel int K = int K (327)meaning, the cone interior is nonempty; implying, the dual cone is pointedby (269).If matrix A has no nullspace, on the other hand, then (by2.10.1.1 andDefinition 2.2.1.0.1) A svec X is an isomorphism in X between the positive

2.13. DUAL CONE & GENERALIZED INEQUALITY 157semidefinite cone and R(A). In that case, convex cone K has relativeinteriorrel int K = {A svec X | X ≻ 0} (328)and boundaryrel ∂K = {A svec X | X ≽ 0, X ⊁ 0} (329)Now consider the (closed convex) dual cone:K ∗ = {y | 〈z , y〉 ≥ 0 for all z ∈ K} ⊆ R m= {y | 〈z , y〉 ≥ 0 for all z = A svec X , X ≽ 0}= {y | 〈A svec X , y〉 ≥ 0 for all X ≽ 0}= { y | 〈svec X , A T y〉 ≥ 0 for all X ≽ 0 }= { y | svec −1 (A T y) ≽ 0 } (330)that follows from (323) and leads to an equally peculiar halfspace-descriptionK ∗ = {y ∈ R m |m∑y j A j ≽ 0} (331)j=1The summation inequality with respect to the positive semidefinite cone isknown as a linear matrix inequality. [44] [102] [193] [274]When the A j matrices are linearly independent, function g(y) ∆ = ∑ y j A jon R m is a linear bijection. Inverse image of the positive semidefinite coneunder g(y) must therefore have dimension m . In that circumstance, thedual cone interior is nonemptyint K ∗ = {y ∈ R m |m∑y j A j ≻ 0} (332)j=1having boundary∂K ∗ = {y ∈ R m |m∑y j A j ≽ 0,j=1m∑y j A j ⊁ 0} (333)j=1

158 CHAPTER 2. CONVEX GEOMETRY2.13.6 Dual of pointed polyhedral coneIn a subspace of R n , now we consider a pointed polyhedral cone K given interms of its extreme directions Γ i arranged columnar in X ;X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (241)The extremes theorem (2.8.1.1.1) provides the vertex-description of apointed polyhedral cone in terms of its finite number of extreme directionsand its lone vertex at the origin:2.13.6.0.1 Definition. Pointed polyhedral cone, vertex-description.(encore) (confer (253) (158)) Given pointed polyhedral cone K in a subspaceof R n , denoting its i th extreme direction by Γ i ∈ R n arranged in a matrix Xas in (241), then that cone may be described: (76) (confer (254))K = { [0 X ] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= Xaζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{ } (334)= Xb | b ≽ 0 ⊆ Rnthat is simply a conic hull (like (84)) of a finite number N of directions.△Whenever cone K is pointed closed and convex (not only polyhedral), thendual cone K ∗ has a halfspace-description in terms of the extreme directionsΓ i of K :K ∗ = { y | γ T y ≥ 0 for all γ ∈ {Γ i , i=1... N} ⊆ rel ∂K } (335)because when {Γ i } constitutes any set of generators for K , the discretizationresult in2.13.4.1 allows relaxation of the requirement ∀x∈ K in (259) to∀γ∈{Γ i } directly. 2.52 That dual cone so defined is unique, identical to (259),polyhedral whenever the number of generators N is finiteK ∗ = { y | X T y ≽ 0 } ⊆ R n (316)and has nonempty interior because K is assumed pointed (but K ∗ is notnecessarily pointed unless K has nonempty interior (2.13.1.1)).2.52 The extreme directions of K constitute a minimal set of generators.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1592.13.6.1 Facet normal & extreme directionWe see from (316) that the conically independent generators of cone K(namely, the extreme directions of pointed closed convex cone K constitutingthe columns of X) each define an inward-normal to a hyperplane supportingK ∗ (2.4.2.6.1) and exposing a dual facet when N is finite. Were K ∗pointed and finitely generated, then by conjugation the dual statementwould also hold; id est, the extreme directions of pointed K ∗ each definean inward-normal to a hyperplane supporting K and exposing a facet whenN is finite. Examine Figure 45 or Figure 50, for example.We may conclude the extreme directions of polyhedral proper K arerespectively orthogonal to the facets of K ∗ ; likewise, the extreme directionsof polyhedral proper K ∗ are respectively orthogonal to the facets of K .2.13.7 Biorthogonal expansion by example2.13.7.0.1 Example. Relationship to dual polyhedral cone.Simplicial cone K illustrated in Figure 51 induces a partial order on R 2 . Allpoints greater than x with respect to K , for example, are contained in thetranslated cone x + K . The extreme directions Γ 1 and Γ 2 of K do notmake an orthogonal set; neither do extreme directions Γ 3 and Γ 4 of dualcone K ∗ ; rather, we have the biorthogonality condition, [277]Γ T 4 Γ 1 = Γ T 3 Γ 2 = 0Γ T 3 Γ 1 ≠ 0, Γ T 4 Γ 2 ≠ 0(336)Biorthogonal expansion of x ∈ K is thenx = Γ 1Γ T 3 xΓ T 3 Γ 1+ Γ 2Γ T 4 xΓ T 4 Γ 2(337)where Γ T 3 x/(Γ T 3 Γ 1 ) is the nonnegative coefficient of nonorthogonal projection(E.6.1) of x on Γ 1 in the direction orthogonal to Γ 3 , and whereΓ T 4 x/(Γ T 4 Γ 2 ) is the nonnegative coefficient of nonorthogonal projection ofx on Γ 2 in the direction orthogonal to Γ 4 ; they are coordinates in thisnonorthogonal system. Those coefficients must be nonnegative x ≽ K0because x ∈ K (282) and K is simplicial.If we ascribe the extreme directions of K to the columns of a matrixX ∆ = [ Γ 1 Γ 2 ] (338)

160 CHAPTER 2. CONVEX GEOMETRYΓ 4Γ 2Γ 3K ∗xzKyΓ 10Γ 1 ⊥ Γ 4Γ 2 ⊥ Γ 3K ∗w + KwFigure 51: (confer Figure 124) Simplicial cone K in R 2 and its dual K ∗ drawntruncated. Conically independent generators Γ 1 and Γ 2 constitute extremedirections of K while Γ 3 and Γ 4 constitute extreme directions of K ∗ . Dottedray-pairs bound translated cones K . Point x is comparable to point z(and vice versa) but not to y ; z ≽ Kx ⇔ z − x ∈ K ⇔ z − x ≽ K0 iff ∃nonnegative coordinates for biorthogonal expansion of z − x . Point y is notcomparable to z because z does not belong to y ± K . Flipping a translatedcone is quite helpful for visualization: x ≼ Kz ⇔ x ∈ z − K ⇔ x − z ≼ K0.Points need not belong to K to be comparable; e.g., all points greater than w(with respect to K) belong to w + K .

2.13. DUAL CONE & GENERALIZED INEQUALITY 161then we find that the pseudoinverse transpose matrix[]X †T 1 1= Γ 3 ΓΓ T 43 Γ 1 Γ T 4 Γ 2(339)holds the extreme directions of the dual cone. Therefore,x = XX † x (345)is the biorthogonal expansion (337) (E.0.1), and the biorthogonalitycondition (336) can be expressed succinctly (E.1.1) 2.53X † X = I (346)Expansion w=XX † w for any w ∈ R 2 is unique if and only if the extremedirections of K are linearly independent; id est, iff X has no nullspace.2.13.7.1 Pointed cones and biorthogonalityBiorthogonality condition X † X = I from Example 2.13.7.0.1 means Γ 1 andΓ 2 are linearly independent generators of K (B.1.1.1); generators becauseevery x ∈ K is their conic combination. From2.10.2 we know that meansΓ 1 and Γ 2 must be extreme directions of K .A biorthogonal expansion is necessarily associated with a pointed closedconvex cone; pointed, otherwise there can be no extreme directions (2.8.1).We will address biorthogonal expansion with respect to a pointed polyhedralcone having empty interior in2.13.8.2.13.7.1.1 Example. Expansions implied by diagonalization.(confer6.5.3.1.1) When matrix X ∈ R M×M is diagonalizable (A.5),X = SΛS −1 = [s 1 · · · s M ] Λ⎣⎡w T1.w T M⎤⎦ =M∑λ i s i wi T (1365)2.53 Possibly confusing is the fact that formula XX † x is simultaneously the orthogonalprojection of x on R(X) (1704), and a sum of nonorthogonal projections of x ∈ R(X) onthe range of each and every column of full-rank X skinny-or-square (E.5.0.0.2).i=1

162 CHAPTER 2. CONVEX GEOMETRYcoordinates for biorthogonal expansion are its eigenvalues λ i (contained indiagonal matrix Λ) when expanded in S ;⎡X = SS −1 X = [s 1 · · · s M ] ⎣⎤w1 T X. ⎦ =wM T XM∑λ i s i wi T (340)Coordinate value depend upon the geometric relationship of X to its linearlyindependent eigenmatrices s i w T i . (A.5.1,B.1.1)Eigenmatrices s i w T i are linearly independent dyads constituted by rightand left eigenvectors of diagonalizable X and are generators of somepointed polyhedral cone K in a subspace of R M×M .When S is real and X belongs to that polyhedral cone K , for example,then coordinates of expansion (the eigenvalues λ i ) must be nonnegative.When X = QΛQ T is symmetric, coordinates for biorthogonal expansionare its eigenvalues when expanded in Q ; id est, for X ∈ S Mi=1X = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M (341)i=1becomes an orthogonal expansion with orthonormality condition Q T Q=Iwhere λ i is the i th eigenvalue of X , q i is the corresponding i th eigenvectorarranged columnar in orthogonal matrixQ = [q 1 q 2 · · · q M ] ∈ R M×M (342)and where eigenmatrix q i qiT is an extreme direction of some pointedpolyhedral cone K ⊂ S M and an extreme direction of the positive semidefinitecone S M + .Orthogonal expansion is a special case of biorthogonal expansion ofX ∈ aff K occurring when polyhedral cone K is any rotation about theorigin of an orthant belonging to a subspace.Similarly, when X = QΛQ T belongs to the positive semidefinite cone inthe subspace of symmetric matrices, coordinates for orthogonal expansion

2.13. DUAL CONE & GENERALIZED INEQUALITY 163must be its nonnegative eigenvalues (1279) when expanded in Q ; id est, forX ∈ S M +M∑M∑X = QQ T X = q i qi T X = λ i q i qi T ∈ S M + (343)i=1where λ i ≥0 is the i th eigenvalue of X . This means X simultaneouslybelongs to the positive semidefinite cone and to the pointed polyhedral coneK formed by the conic hull of its eigenmatrices.2.13.7.1.2 Example. Expansion respecting nonpositive orthant.Suppose x ∈ K any orthant in R n . 2.54 Then coordinates for biorthogonalexpansion of x must be nonnegative; in fact, absolute value of the Cartesiancoordinates.Suppose, in particular, x belongs to the nonpositive orthant K = R n − .Then the biorthogonal expansion becomes an orthogonal expansioni=1x = XX T x =n∑−e i (−e T i x) =i=1n∑−e i |e T i x| ∈ R n − (344)i=1and the coordinates of expansion are nonnegative. For this orthant K we haveorthonormality condition X T X = I where X = −I , e i ∈ R n is a standardbasis vector, and −e i is an extreme direction (2.8.1) of K .Of course, this expansion x=XX T x applies more broadly to domain R n ,but then the coordinates each belong to all of R .2.13.8 Biorthogonal expansion, derivationBiorthogonal expansion is a means for determining coordinates in a pointedconic coordinate system characterized by a nonorthogonal basis. Studyof nonorthogonal bases invokes pointed polyhedral cones and their duals;extreme directions of a cone K are assumed to constitute the basis whilethose of the dual cone K ∗ determine coordinates.Unique biorthogonal expansion with respect to K depends upon existenceof its linearly independent extreme directions: Polyhedral cone K must bepointed; then it possesses extreme directions. Those extreme directions mustbe linearly independent to uniquely represent any point in their span.2.54 An orthant is simplicial and self-dual.

164 CHAPTER 2. CONVEX GEOMETRYWe consider nonempty pointed polyhedral cone K having possibly emptyinterior; id est, we consider a basis spanning a subspace. Then we needonly observe that section of dual cone K ∗ in the affine hull of K because, byexpansion of x , membership x∈aff K is implicit and because any breachof the ordinary dual cone into ambient space becomes irrelevant (2.13.9.3).Biorthogonal expansionx = XX † x ∈ aff K = aff cone(X) (345)is expressed in the extreme directions {Γ i } of K arranged columnar inunder assumption of biorthogonalityX = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (241)X † X = I (346)where † denotes matrix pseudoinverse (E). We therefore seek, in thissection, a vertex-description for K ∗ ∩ aff K in terms of linearly independentdual generators {Γ ∗ i }⊂aff K in the same finite quantity 2.55 as the extremedirections {Γ i } ofK = cone(X) = {Xa | a ≽ 0} ⊆ R n (253)We assume the quantity of extreme directions N does not exceed thedimension n of ambient vector space because, otherwise, the expansion couldnot be unique; id est, assume N linearly independent extreme directionshence N ≤ n (X skinny 2.56 -or-square full-rank). In other words, fat full-rankmatrix X is prohibited by uniqueness because of the existence of an infinityof right-inverses;polyhedral cones whose extreme directions number in excess of theambient space dimension are precluded in biorthogonal expansion.2.55 When K is contained in a proper subspace of R n , the ordinary dual cone K ∗ will havemore generators in any minimal set than K has extreme directions.2.56 “Skinny” meaning thin; more rows than columns.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1652.13.8.1 x ∈ KSuppose x belongs to K ⊆ R n . Then x =Xa for some a≽0. Vector a isunique only when {Γ i } is a linearly independent set. 2.57 Vector a∈ R N cantake the form a =Bx if R(B)= R N . Then we require Xa =XBx = x andBx=BXa = a . The pseudoinverse B =X † ∈ R N×n (E) is suitable when Xis skinny-or-square and full-rank. In that case rankX =N , and for all c ≽ 0and i=1... Na ≽ 0 ⇔ X † Xa ≽ 0 ⇔ a T X T X †T c ≥ 0 ⇔ Γ T i X †T c ≥ 0 (347)The penultimate inequality follows from the generalized inequality andmembership corollary, while the last inequality is a consequence of thatcorollary’s discretization (2.13.4.2.1). 2.58 From (347) and (335) we deduceK ∗ ∩ aff K = cone(X †T ) = {X †T c | c ≽ 0} ⊆ R n (348)is the vertex-description for that section of K ∗ in the affine hull of K becauseR(X †T ) = R(X) by definition of the pseudoinverse. From (269), we knowK ∗ ∩ aff K must be pointed if rel int K is logically assumed nonempty withrespect to aff K .Conversely, suppose full-rank skinny-or-square matrix[ ]X †T =∆ Γ ∗ 1 Γ ∗ 2 · · · Γ ∗ N ∈ R n×N (349)comprises the extreme directions {Γ ∗ i } ⊂ aff K of the dual-cone intersectionwith the affine hull of K . 2.59 From the discrete membership theorem and2.57 Conic independence alone (2.10) is insufficient to guarantee uniqueness.2.58a ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀(c ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀a ≽ 0)∀(c ≽ 0 ⇔ Γ T i X†T c ≥ 0 ∀i) Intuitively, any nonnegative vector a is a conic combination of the standard basis{e i ∈ R N }; a≽0 ⇔ a i e i ≽0 for all i. The last inequality in (347) is a consequence of thefact that x=Xa may be any extreme direction of K , in which case a is a standard basisvector; a = e i ≽0. Theoretically, because c ≽ 0 defines a pointed polyhedral cone (in fact,the nonnegative orthant in R N ), we can take (347) one step further by discretizing c :a ≽ 0 ⇔ Γ T i Γ ∗ j ≥ 0 for i,j =1... N ⇔ X † X ≥ 0In words, X † X must be a matrix whose entries are each nonnegative.2.59 When closed convex cone K has empty interior, K ∗ has no extreme directions.

166 CHAPTER 2. CONVEX GEOMETRY(273) we get a partial dual to (335); id est, assuming x∈aff cone X{ }x ∈ K ⇔ γ ∗T x ≥ 0 for all γ ∗ ∈ Γ ∗ i , i=1... N ⊂ ∂K ∗ ∩ aff K (350)⇔ X † x ≽ 0 (351)that leads to a partial halfspace-description,K = { x∈aff cone X | X † x ≽ 0 } (352)For γ ∗ =X †T e i , any x =Xa , and for all i we have e T i X † Xa = e T i a ≥ 0only when a ≽ 0. Hence x∈ K .When X is full-rank, then the unique biorthogonal expansion of x ∈ Kbecomes (345)x = XX † x =N∑i=1Γ i Γ ∗Ti x (353)whose coordinates Γ ∗Ti x must be nonnegative because K is assumed pointedclosed and convex. Whenever X is full-rank, so is its pseudoinverse X † .(E) In the present case, the columns of X †T are linearly independent andgenerators of the dual cone K ∗ ∩ aff K ; hence, the columns constitute itsextreme directions. (2.10) That section of the dual cone is itself a polyhedralcone (by (247) or the cone intersection theorem,2.7.2.1.1) having the samenumber of extreme directions as K .2.13.8.2 x ∈ aff KThe extreme directions of K and K ∗ ∩ aff K have a distinct relationship;because X † X = I , then for i,j = 1... N , Γ T i Γ ∗ i = 1, while for i ≠ j ,Γ T i Γ ∗ j = 0. Yet neither set of extreme directions, {Γ i } nor {Γ ∗ i } , isnecessarily orthogonal. This is, precisely, a biorthogonality condition,[277,2.2.4] [150] implying each set of extreme directions is linearlyindependent. (B.1.1.1)The biorthogonal expansion therefore applies more broadly; meaning,for any x ∈ aff K , vector x can be uniquely expressed x =Xb whereb∈R N because aff K contains the origin. Thus, for any such x∈ R(X)(conferE.1.1), biorthogonal expansion (353) becomes x =XX † Xb =Xb .

2.13. DUAL CONE & GENERALIZED INEQUALITY 1672.13.9 Formulae, algorithm finding dual cone2.13.9.1 Pointed K , dual, X skinny-or-square full-rankWe wish to derive expressions for a convex cone and its ordinary dualunder the general assumptions: pointed polyhedral K denoted by its linearlyindependent extreme directions arranged columnar in matrix X such thatThe vertex-description is given:rank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (354)K = {Xa | a ≽ 0} ⊆ R n (355)from which a halfspace-description for the dual cone follows directly:By defining a matrixK ∗ = {y ∈ R n | X T y ≽ 0} (356)X ⊥ ∆ = basis N(X T ) (357)(a columnar basis for the orthogonal complement of R(X)), we can sayaff cone X = aff K = {x | X ⊥T x = 0} (358)meaning K lies in a subspace, perhaps R n . Thus we have ahalfspace-descriptionK = {x∈ R n | X † x ≽ 0, X ⊥T x = 0} (359)and from (273), a vertex-description 2.60K ∗ = { [X †T X ⊥ −X ⊥ ]b | b ≽ 0 } ⊆ R n (360)These results are summarized for a pointed polyhedral cone, havinglinearly independent generators, and its ordinary dual:Cone Table 1 K K ∗vertex-description X X †T , ±X ⊥halfspace-description X † , X ⊥T X T2.60 These descriptions are not unique. A vertex-description of the dual cone, for example,might use four conically independent generators for a plane (2.10.0.0.1) when only threewould suffice.

168 CHAPTER 2. CONVEX GEOMETRY2.13.9.2 Simplicial caseWhen a convex cone is simplicial (2.12.3), Cone Table 1 simplifies becausethen aff coneX = R n : For square X and assuming simplicial K such thatrank(X ∈ R n×N ) = N ∆ = dim aff K = n (361)we haveCone Table S K K ∗vertex-description X X †Thalfspace-description X † X TFor example, vertex-description (360) simplifies toK ∗ = {X †T b | b ≽ 0} ⊂ R n (362)Now, because dim R(X)= dim R(X †T ) , (E) the dual cone K ∗ is simplicialwhenever K is.2.13.9.3 Cone membership relations in a subspaceIt is obvious by definition (259) of the ordinary dual cone K ∗ in ambient vectorspace R that its determination instead in subspace M⊆ R is identical toits intersection with M ; id est, assuming closed convex cone K ⊆ M andK ∗ ⊆ R(K ∗ were ambient M) ≡ (K ∗ in ambient R) ∩ M (363)because{y ∈ M | 〈y , x〉 ≥ 0 for all x ∈ K}={y ∈ R | 〈y , x〉 ≥ 0 for all x ∈ K}∩ M(364)From this, a constrained membership relation for the ordinary dual coneK ∗ ⊆ R , assuming x,y ∈ M and closed convex cone K ⊆ My ∈ K ∗ ∩ M ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (365)By closure in subspace M we have conjugation (2.13.1.1):x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ ∩ M (366)

2.13. DUAL CONE & GENERALIZED INEQUALITY 169This means membership determination in subspace M requires knowledgeof the dual cone only in M . For sake of completeness, for proper cone Kwith respect to subspace M (confer (283))x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ M, y ≠ 0 (367)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ ∩ M (368)(By conjugation, we also have the dual relations.) Yet when M equals aff Kfor K a closed convex conex ∈ rel int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ aff K , y ≠ 0 (369)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ rel int(K ∗ ∩ aff K) (370)2.13.9.4 Subspace M = aff KAssume now a subspace M that is the affine hull of cone K : Consider againa pointed polyhedral cone K denoted by its extreme directions arrangedcolumnar in matrix X such thatrank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (354)We want expressions for the convex cone and its dual in subspace M=aff K :Cone Table A K K ∗ ∩ aff Kvertex-description X X †Thalfspace-description X † , X ⊥T X T , X ⊥TWhen dim aff K = n , this table reduces to Cone Table S. These descriptionsfacilitate work in a proper subspace. The subspace of symmetric matricesS N , for example, often serves as ambient space. 2.612.61 The dual cone of positive semidefinite matrices S N∗+ = S N + remains in S N by convention,whereas the ordinary dual cone would venture into R N×N .

170 CHAPTER 2. CONVEX GEOMETRY10.80.6K ∗ M+(a)0.40.2K M+0−0.2−0.4−0.6K ∗ M+−0.8−1−0.5 0 0.5 1 1.5(b)⎡X †T (:,3) = ⎣001⎤⎦∂K ∗ M+K M+⎡X = ⎣1 1 10 1 10 0 1⎤⎦⎡1X †T (:,1) = ⎣−10⎤⎦⎡ ⎤0X †T (:,2) = ⎣ 1 ⎦−1Figure 52: Simplicial cones. (a) Monotone nonnegative cone K M+ and itsdual K ∗ M+ (drawn truncated) in R2 . (b) Monotone nonnegative cone andboundary of its dual (both drawn truncated) in R 3 . Extreme directions ofK ∗ M+ are indicated.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1712.13.9.4.1 Example. Monotone nonnegative cone.[46, exer.2.33] [266,2] Simplicial cone (2.12.3.1.1) K M+ is the cone of allnonnegative vectors having their entries sorted in nonincreasing order:K M+ ∆ = {x | x 1 ≥ x 2 ≥ · · · ≥ x n ≥ 0} ⊆ R n += {x | (e i − e i+1 ) T x ≥ 0, i = 1... n−1, e T nx ≥ 0}= {x | X † x ≽ 0}(371)x T y =a halfspace-description where e i is the i th standard basis vector, and whereX †T ∆ = [ e 1 −e 2 e 2 −e 3 · · · e n ] ∈ R n×n (372)(With X † in hand, we might concisely scribe the remaining vertex andhalfspace-descriptions from the tables for K M+ and its dual. Instead we usedual generalized inequalities in their derivation.) For any vectors x and y ,simple algebra demandsn∑x i y i = (x 1 − x 2 )y 1 + (x 2 − x 3 )(y 1 + y 2 ) + (x 3 − x 4 )(y 1 + y 2 + y 3 ) + · · ·i=1+ (x n−1 − x n )(y 1 + · · · + y n−1 ) + x n (y 1 + · · · + y n ) (373)Because x i − x i+1 ≥ 0 ∀i by assumption whenever x ∈ K M+ , we can employdual generalized inequalities (280) with respect to the self-dual nonnegativeorthant R n + to find the halfspace-description of the dual cone K ∗ M+ . We cansay x T y ≥ 0 for all X † x ≽ 0 [sic] if and only ifid est,wherey 1 ≥ 0, y 1 + y 2 ≥ 0, ... , y 1 + y 2 + · · · + y n ≥ 0 (374)x T y ≥ 0 ∀X † x ≽ 0 ⇔ X T y ≽ 0 (375)X = [e 1 e 1 +e 2 e 1 +e 2 +e 3 · · · 1 ] ∈ R n×n (376)Because X † x ≽ 0 connotes membership of x to pointed K M+ , then by(259) the dual cone we seek comprises all y for which (375) holds; thus itshalfspace-descriptionK ∗ M+ = {y ≽K ∗ M+0} = {y | ∑ ki=1 y i ≥ 0, k = 1... n} = {y | X T y ≽ 0} ⊂ R n(377)

172 CHAPTER 2. CONVEX GEOMETRYx 210.50K M−0.5−1K M−1.5K ∗ M−2−1 −0.5 0 0.5 1 1.5 2x 1Figure 53: Monotone cone K M and its dual K ∗ M (drawn truncated) in R2 .The monotone nonnegative cone and its dual are simplicial, illustrated fortwo Euclidean spaces in Figure 52.From2.13.6.1, the extreme directions of proper K M+ are respectivelyorthogonal to the facets of K ∗ M+ . Because K∗ M+ is simplicial, theinward-normals to its facets constitute the linearly independent rows of X Tby (377). Hence the vertex-description for K M+ employs the columns of Xin agreement with Cone Table S because X † =X −1 . Likewise, the extremedirections of proper K ∗ M+ are respectively orthogonal to the facets of K M+whose inward-normals are contained in the rows of X † by (371). So thevertex-description for K ∗ M+ employs the columns of X†T . 2.13.9.4.2 Example. Monotone cone.(Figure 53, Figure 54) Of nonempty interior but not pointed, the monotonecone is polyhedral and defined by the halfspace-descriptionK M ∆ = {x ∈ R n | x 1 ≥ x 2 ≥ · · · ≥ x n } = {x ∈ R n | X ∗T x ≽ 0} (378)Its dual is therefore pointed but of empty interior, having vertex-descriptionK ∗ M = {X ∗ b ∆ = [e 1 −e 2 e 2 −e 3 · · · e n−1 −e n ]b | b ≽ 0 } ⊂ R n (379)where the columns of X ∗ comprise the extreme directions of K ∗ M . BecauseK ∗ M is pointed and satisfiesrank(X ∗ ∈ R n×N ) = N ∆ = dim aff K ∗ ≤ n (380)

2.13. DUAL CONE & GENERALIZED INEQUALITY 173-x 2x 3∂K MK ∗ Mx 2x 3∂K MK ∗ MFigure 54: Two views of monotone cone K M and its dual K ∗ M (drawntruncated) in R 3 . Monotone cone is not pointed. Dual monotone conehas empty interior. Cartesian coordinate axes are drawn for reference.

174 CHAPTER 2. CONVEX GEOMETRYwhere N = n−1, and because K M is closed and convex, we may adapt ConeTable 1 as follows:Cone Table 1* K ∗ K ∗∗ = Kvertex-description X ∗ X ∗†T , ±X ∗⊥halfspace-description X ∗† , X ∗⊥T X ∗TThe vertex-description for K M is thereforewhere X ∗⊥ = 1 andK M = {[X ∗†T X ∗⊥ −X ∗⊥ ]a | a ≽ 0} ⊂ R n (381)⎡⎤n − 1 −1 −1 · · · −1 −1 −1. n − 2 n − 2 −2 . . · · · −2 −2X ∗† = 1 . n − 3 n − 3 . .. −(n − 4) . −3n3 . n − 4 . ∈ R.. n−1×n−(n − 3) −(n − 3) .⎢⎥⎣ 2 2 · · ·... 2 −(n − 2) −(n − 2) ⎦while1 1 1 · · · 1 1 −(n − 1)(382)K ∗ M = {y ∈ R n | X ∗† y ≽ 0, X ∗⊥T y = 0} (383)is the dual monotone cone halfspace-description.2.13.9.5 More pointed cone descriptions with equality conditionConsider pointed polyhedral cone K having a linearly independent set ofgenerators and whose subspace membership is explicit; id est, we are giventhe ordinary halfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(247a)where A ∈ R m×n and C ∈ R p×n . This can be equivalently written in termsof nullspace of C and vector ξ :K = {Zξ ∈ R n | AZξ ≽ 0} (384)

2.13. DUAL CONE & GENERALIZED INEQUALITY 175where R(Z ∈ R n×n−rank C ) ∆ = N(C) . Assuming (354) is satisfiedrankX ∆ = rank ( (AZ) † ∈ R n−rank C×m) = m − l = dim aff K ≤ n − rankC(385)where l is the number of conically dependent rows in AZ (2.10)that must be removed to make ÂZ before the cone tables becomeapplicable. 2.62 Then the results collected in the cone tables admit theassignment ˆX =(ÂZ)† ∆ ∈ R n−rank C×m−l , where Â∈ Rm−l×n , followed withlinear transformation by Z . So we get the vertex-description, for (ÂZ)†skinny-or-square full-rank,K = {Z(ÂZ)† b | b ≽ 0} (386)From this and (316) we get a halfspace-description of the dual coneK ∗ = {y ∈ R n | (Z T Â T ) † Z T y ≽ 0} (387)From this and Cone Table 1 (p.167) we get a vertex-description, (1668)Yet becauseK ∗ = {[Z †T (ÂZ)T C T −C T ]c | c ≽ 0} (388)K = {x | Ax ≽ 0} ∩ {x | Cx = 0} (389)then, by (273), we get an equivalent vertex-description for the dual coneK ∗ = {x | Ax ≽ 0} ∗ + {x | Cx = 0} ∗= {[A T C T −C T ]b | b ≽ 0}(390)from which the conically dependent columns may, of course, be removed.2.13.10 Dual cone-translateFirst-order optimality condition (309) inspires a dual-cone variant: For anyset K , the negative dual of its translation by any a∈ R n is−(K − a) ∗ ∆ = { y ∈ R n | 〈y , x − a〉≤0 for all x ∈ K }= { y ∈ R n | 〈y , x〉≤0 for all x ∈ K − a } (391)2.62 When the conically dependent rows are removed, the rows remaining must be linearlyindependent for the cone tables to apply.

176 CHAPTER 2. CONVEX GEOMETRYαα ≥ β ≥ γβCx ⋆ −∇f(x ⋆ )γ{z | f(z) = α}{y | ∇f(x ⋆ ) T (y − x ⋆ ) = 0, f(x ⋆ )=γ}Figure 55: Shown is a plausible contour plot in R 2 of some arbitrarydifferentiable real convex function f(x) at selected levels α , β , and γ ;id est, contours of equal level f (level sets) drawn (dashed) in function’sdomain. Function is minimized over convex set C at point x ⋆ iff negativegradient −∇f(x ⋆ ) belongs to normal cone to C there. In circumstancedepicted, normal cone is a ray whose direction is coincident with negativegradient. From results in3.1.9 (p.212), ∇f(x ⋆ ) is normal to the γ-sublevelset by Definition E.9.1.0.1.a closed convex cone called the normal cone to K at point a . (E.10.3.2.1)From this, a new membership relation like (277) for closed convex cone K :y ∈ −(K − a) ∗ ⇔ 〈y , x − a〉≤0 for all x ∈ K (392)2.13.10.1 first-order optimality condition - restatementThe general first-order necessary and sufficient condition for optimalityof solution x ⋆ to a minimization problem with real differentiable convexobjective function f(x) : R n →R over convex feasible set C is [231,3]

2.13. DUAL CONE & GENERALIZED INEQUALITY 177(confer (309))−∇f(x ⋆ ) ∈ −(C − x ⋆ ) ∗ , x ⋆ ∈ C (393)id est, the negative gradient (3.1.8) belongs to the normal cone at x ⋆ as inFigure 55.2.13.10.1.1 Example. Normal cone to orthant.Consider proper cone K = R n + , the self-dual nonnegative orthant in R n .The normal cone to R n + at a∈ K is (1886)K ⊥ R n + (a∈ Rn +) = −(R n + − a) ∗ = −R n + ∩ a ⊥ , a∈ R n + (394)where −R n += −K ∗ is the algebraic complement of R n + , and a ⊥ is theorthogonal complement of point a . This means: When point a is interior toR n + , the normal cone is the origin. If n p represents the number of nonzeroentries in point a ∈ ∂R n + , then dim(−R n + ∩ a ⊥ )= n − n p and there is acomplementary relationship between the nonzero entries in point a and thenonzero entries in any vector x∈−R n + ∩ a ⊥ .2.13.10.1.2 Example. Optimality conditions for conic problem.Consider a convex optimization problem having real differentiable convexobjective function f(x) : R n →R defined on domain R n ;minimize f(x)xsubject to x ∈ K(395)The feasible set is a pointed polyhedral cone K possessing a linearlyindependent set of generators and whose subspace membership is madeexplicit by fat full-rank matrix C ∈ R p×n ; id est, we are given thehalfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(247a)where A∈ R m×n . The vertex-description of this cone, assuming (ÂZ)†skinny-or-square full-rank, isK = {Z(ÂZ)† b | b ≽ 0} (386)where Â∈ Rm−l×n , l is the number of conically dependent rows in AZ(2.10) that must be removed, and Z ∈ R n×n−rank C holds basis N(C)columnar.

178 CHAPTER 2. CONVEX GEOMETRYFrom optimality condition (309),because∇f(x ⋆ ) T (Z(ÂZ)† b − x ⋆ )≥ 0 ∀b ≽ 0 (396)−∇f(x ⋆ ) T Z(ÂZ)† (b − b ⋆ )≤ 0 ∀b ≽ 0 (397)x ⋆ ∆ = Z(ÂZ)† b ⋆ ∈ K (398)From membership relation (392) and Example 2.13.10.1.1〈−(Z T Â T ) † Z T ∇f(x ⋆ ), b − b ⋆ 〉 ≤ 0 for all b ∈ R m−l+⇔(399)−(Z T Â T ) † Z T ∇f(x ⋆ ) ∈ −R m−l+ ∩ b ⋆⊥Then the equivalent necessary and sufficient conditions for optimality of theconic program (395) with pointed polyhedral feasible set K are: (confer (315))(Z T Â T ) † Z T ∇f(x ⋆ ) ≽R m−l+0, b ⋆ ≽R m−l+0, ∇f(x ⋆ ) T Z(ÂZ)† b ⋆ = 0 (400)When K = R n + , in particular, then C =0, A=Z =I ∈ S n ; id est,minimize f(x)xsubject to x ≽ 0 (401)The necessary and sufficient conditions become (confer [46,4.2.3])R n +∇f(x ⋆ ) ≽ 0, x ⋆ ≽ 0, ∇f(x ⋆ ) T x ⋆ = 0 (402)R n +R n +2.13.10.1.3 Example. Linear complementarity. [202] [235]Given matrix A ∈ R n×n and vector q ∈ R n , the complementarity problem isa feasibility problem:find w , zsubject to w ≽ 0z ≽ 0w T z = 0w = q + Az(403)

2.13. DUAL CONE & GENERALIZED INEQUALITY 179Volumes have been written about this problem, most notably by Cottle [59].The problem is not convex if both vectors w and z are variable. But if one ofthem is fixed, then the problem becomes convex with a very simple geometricinterpretation: Define the affine subsetA ∆ = {y ∈ R n | Ay=w − q} (404)For w T z to vanish, there must be a complementary relationship between thenonzero entries of vectors w and z ; id est, w i z i =0 ∀i. Given w ≽0, thenz belongs to the convex set of feasible solutions:z ∈ −K ⊥ R n + (w ∈ Rn +) ∩ A = R n + ∩ w ⊥ ∩ A (405)where KR ⊥ (w) is the normal cone to n Rn + + at w (394). If this intersection isnonempty, then the problem is solvable.2.13.11 Proper nonsimplicial K , dual, X fat full-rankAssume we are given a set of N conically independent generators 2.63 (2.10)of an arbitrary polyhedral proper cone K in R n arranged columnar inX ∈ R n×N such that N > n (fat) and rankX = n . Having found formula(362) to determine the dual of a simplicial cone, the easiest way to find avertex-description of the proper dual cone K ∗ is to first decompose K intosimplicial parts K i so that K = ⋃ K i . 2.64 Each component simplicial conein K corresponds to some subset of n linearly independent columns from X .The key idea, here, is how the extreme directions of the simplicial parts mustremain extreme directions of K . Finding the dual of K amounts to findingthe dual of each simplicial part:2.63 We can always remove conically dependent columns from X to construct K or todetermine K ∗ . (F.2)2.64 That proposition presupposes, of course, that we know how to perform simplicialdecomposition efficiently; also called “triangulation”. [228] [123,3.1] [124,3.1] Existenceof multiple simplicial parts means expansion of x∈ K like (353) can no longer be uniquebecause N the number of extreme directions in K exceeds n the dimension of the space.

180 CHAPTER 2. CONVEX GEOMETRY2.13.11.0.1 Theorem. Dual cone intersection. [249,2.7]Suppose proper cone K ⊂ R n equals the union of M simplicial cones K i whoseextreme directions all coincide with those of K . Then proper dual cone K ∗is the intersection of M dual simplicial cones K ∗ i ; id est,M⋃M⋂K = K i ⇒ K ∗ = K ∗ i (406)i=1Proof. For X i ∈ R n×n , a complete matrix of linearly independentextreme directions (p.124) arranged columnar, corresponding simplicial K i(2.12.3.1.1) has vertex-descriptionNow suppose,K =K =i=1K i = {X i c | c ≽ 0} (407)M⋃K i =i=1M⋃{X i c | c ≽ 0} (408)i=1The union of all K i can be equivalently expressed⎧⎡ ⎤⎪⎨[ X 1 X 2 · · · X M ] ⎢⎣⎪⎩aḅ.c⎥⎦ | a,b,... ,c ≽ 0 ⎫⎪ ⎬⎪ ⎭(409)Because extreme directions of the simplices K i are extreme directions of Kby assumption, then by the extremes theorem (2.8.1.1.1),K = { [ X 1 X 2 · · · X M ]d | d ≽ 0 } (410)Defining X ∆ = [X 1 X 2 · · · X M ] (with any redundant [sic] columns optionallyremoved from X), then K ∗ can be expressed, (316) (Cone Table S, p.168)⋄K ∗ = {y | X T y ≽ 0} =M⋂{y | Xi T y ≽ 0} =i=1M⋂K ∗ i (411)i=1

2.13. DUAL CONE & GENERALIZED INEQUALITY 181To find the extreme directions of the dual cone, first we observe that somefacets of each simplicial part K i are common to facets of K by assumption,and the union of all those common facets comprises the set of all facets ofK by design. For any particular polyhedral proper cone K , the extremedirections of dual cone K ∗ are respectively orthogonal to the facets of K .(2.13.6.1) Then the extreme directions of the dual cone can be found amongthe inward-normals to facets of the component simplicial cones K i ; thosenormals are extreme directions of the dual simplicial cones K ∗ i . From thetheorem and Cone Table S (p.168),K ∗ =M⋂K ∗ i =i=1M⋂i=1{X †Ti c | c ≽ 0} (412)The set of extreme directions {Γ ∗ i } for proper dual cone K ∗ is thereforeconstituted by the conically independent generators, from the columnsof all the dual simplicial matrices {X †Ti } , that do not violate discretedefinition (316) of K ∗ ;{ }Γ ∗ 1 , Γ ∗ 2 ... Γ ∗ N{}= c.i. X †Ti (:,j), i=1... M , j =1... n | X † i (j,:)Γ l ≥ 0, l =1... N(413)where c.i. denotes selection of only the conically independent vectors fromthe argument set, argument (:,j) denotes the j th column while (j,:) denotesthe j th row, and {Γ l } constitutes the extreme directions of K . Figure 40(b)(p.123) shows a cone and its dual found via this formulation.2.13.11.0.2 Example. Dual of K nonsimplicial in subspace aff K .Given conically independent generators for pointed closed convex cone K inR 4 arranged columnar inX = [ Γ 1 Γ 2 Γ 3 Γ 4 ] =⎡⎢⎣1 1 0 0−1 0 1 00 −1 0 10 0 −1 −1⎤⎥⎦ (414)

182 CHAPTER 2. CONVEX GEOMETRYhaving dim aff K = rankX = 3, then performing the most inefficientsimplicial decomposition in aff K we find⎡ ⎤ ⎡ ⎤1 1 01 1 0X 1 = ⎢ −1 0 1⎥⎣ 0 −1 0 ⎦ , X 2 = ⎢ −1 0 0⎥⎣ 0 −1 1 ⎦0 0 −10 0 −1X 3 =4X †T1 =⎢⎣⎡⎢⎣1 0 0−1 1 00 0 10 −1 −1⎤⎥⎦ , X 4 =⎡⎢⎣1 0 00 1 0−1 0 10 −1 −1⎤⎥⎦(415)The corresponding dual simplicial cones in aff K have generators respectivelycolumnar in⎡ ⎤ ⎡ ⎤2 1 1−2 1 1 ⎢4X †T3 =⎡⎢⎣Applying (413) we get2 −3 1−2 1 −33 2 −1−1 2 −1−1 −2 3−1 −2 −1⎥⎦ ,⎤⎥⎦ ,[ ]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 44X†T 2 =4X†T 4 =⎡= 1 ⎢4⎣⎢⎣⎡⎢⎣1 2 1−3 2 11 −2 11 −2 −33 −1 2−1 3 −2−1 −1 2−1 −1 −21 2 3 21 2 −1 −21 −2 −1 2−3 −2 −1 −2⎤⎥⎦⎤⎥⎦(416)⎥⎦ (417)whose rank is 3, and is the known result; 2.65 the conically independentgenerators for that pointed section of the dual cone K ∗ in aff K ; id est,K ∗ ∩ aff K .2.65 These calculations proceed so as to be consistent with [78,6]; as if the ambient vectorspace were the proper subspace aff K whose dimension is 3. In that ambient space, Kmay be regarded as a proper cone. Yet that author (from the citation) erroneously statesthe dimension of the ordinary dual cone to be 3 ; it is, in fact, 4.

Chapter 3Geometry of convex functionsThe link between convex sets and convex functions is via theepigraph: A function is convex if and only if its epigraph is aconvex set.−Stephen Boyd & Lieven Vandenberghe [46,3.1.7]We limit our treatment of multidimensional functions 3.1 to finite-dimensionalEuclidean space. Then an icon for a one-dimensional (real) convex functionis bowl-shaped (Figure 61), whereas the concave icon is the inverted bowl;respectively characterized by a unique global minimum and maximum whoseexistence is assumed. Because of this simple relationship, usage of the termconvexity is often implicitly inclusive of concavity in the literature. Despitethe iconic imagery, the reader is reminded that the set of all convex, concave,quasiconvex, and quasiconcave functions contains the monotonic functions[151] [158,2.3.5]; e.g., [46,3.6, exer.3.46].3.1 vector- or matrix-valued functions including the real functions. Appendix D, with itstables of first- and second-order gradients, is the practical adjunct to this chapter.2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.183

184 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1 Convex function3.1.1 real and vector-valued functionVector-valued functionf(X) : R p×k →R M =⎡⎢⎣f 1 (X).f M (X)⎤⎥⎦ (418)assigns each X in its domain domf (a subset of ambient vector space R p×k )to a specific element [190, p.3] of its range (a subset of R M ). Function f(X)is linear in X on its domain if and only if, for each and every Y,Z ∈domfand α , β ∈ Rf(αY + βZ) = αf(Y ) + βf(Z) (419)A vector-valued function f(X) : R p×k →R M is convex in X if and only ifdomf is a convex set and, for each and every Y,Z ∈domf and 0≤µ≤1f(µY + (1 − µ)Z) ≼µf(Y ) + (1 − µ)f(Z) (420)R M +As defined, continuity is implied but not differentiability. Reversing senseof the inequality flips this definition to concavity. Linear functions are,apparently, simultaneously convex and concave.Vector-valued functions are most often compared (152) as in (420) withrespect to the M-dimensional self-dual nonnegative orthant R M + , a propercone. 3.2 In this case, the test prescribed by (420) is simply a comparisonon R of each entry f i of a vector-valued function f . (2.13.4.2.3) Thevector-valued function case is therefore a straightforward generalization ofconventional convexity theory for a real function. [46,3,4] This conclusionfollows from theory of dual generalized inequalities (2.13.2.0.1) which asserts3.2 The definition of convexity can be broadened to other (not necessarily proper) cones;referred to in the literature as K-convexity. [220]

3.1. CONVEX FUNCTION 185f 1 (x)f 2 (x)(a)(b)Figure 56: Each convex real function has a unique minimizer x ⋆ but,for x∈ R , f 1 (x)=x 2 is strictly convex whereas f 2 (x)= √ x 2 =|x| is not.Strict convexity of a real function is therefore only a sufficient condition forminimizer uniqueness.f convex ⇔ w T f convex ∀w ∈ G(R M + ) (421)shown by substitution of the defining inequality (420). Discretization(2.13.4.2.1) allows relaxation of the semi-infinite number of conditions∀w ≽ 0 to:∀w ∈ G(R M + ) = {e i , i=1... M} (422)(the standard basis for R M and a minimal set of generators (2.8.1.2) for R M + )from which the stated conclusion follows; id est, the test for convexity of avector-valued function is a comparison on R of each entry.3.1.1.0.1 Exercise. Cone of convex functions.Prove that relation (421) implies: the set of all vector-valued convex functionsin R M is a convex cone. Indeed, any nonnegatively weighted sum of (strictly)convex functions remains (strictly) convex. 3.3 Interior to the cone are thestrictly convex functions.3.3 The strict case excludes the cone’s point at the origin.

186 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2 strict convexityWhen f(X) instead satisfies, for each and every distinct Y and Z in dom fand all 0

3.1. CONVEX FUNCTION 1873.1.3 norm functions, absolute valueA vector norm on R n is a function f : R n → R satisfying: for x,y ∈ R n ,α∈ R [110,2.2.1]1. f(x) ≥ 0 (f(x) = 0 ⇔ x = 0) (nonnegativity)2. f(x + y) ≤ f(x) + f(y) (triangle inequality)3. f(αx) = |α|f(x) (nonnegative homogeneity)Most useful of the convex norms are 1-, 2-, and infinity-norm:‖x‖ 1 = minimize 1 T tt∈R nsubject to −t ≼ x ≼ t(427)where |x| = t ⋆ (entrywise absolute value equals optimal t ). 3.5‖x‖ 2 = minimizet∈Rsubject tot[ tI xx T t]≽ 0(428)where ‖x‖ 2 = ‖x‖ ∆ = √ x T x = t ⋆ .‖x‖ ∞ = minimize tt∈Rsubject to −t1 ≼ x ≼ t1(429)where max{|x i | , i=1... n} = t ⋆ .‖x‖ 1 = minimizeα∈R n , β∈R n 1 T (α + β)subject to α,β ≽ 0x = α − β(430)where |x| = α ⋆ + β ⋆ because of complementarity α ⋆T β ⋆ = 0.3.5 Vector 1 may be replaced with any positive [sic] vector to get absolute value,theoretically, although 1 provides the 1-norm and is better from a numerical perspective.

188 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSOptimal solution is norm dependent. [46, p.297] Given set Cminimizex∈R n ‖x‖ 1subject to x ∈ C≡minimize 1 T tx∈R n , t∈R nsubject to −t ≼ x ≼ tx ∈ C(431)minimizex∈R n ‖x‖ 2subject to x ∈ C≡minimizex∈R n , t∈Rsubject tot[ tI xx T tx ∈ C]≽ 0(432)minimize ‖x‖ ∞x∈R nsubject to x ∈ C≡minimize tx∈R n , t∈Rsubject to −t1 ≼ x ≼ t1x ∈ C(433)In R n these norms represent: ‖x‖ 1 is length measured along a grid, ‖x‖ 2 isEuclidean length, ‖x‖ ∞ is maximum |coordinate|.minimizex∈R n ‖x‖ 1subject to x ∈ C≡minimize 1 T (α + β)α∈R n , β∈R n , x∈R nsubject to α,β ≽ 0x = α − βx ∈ C(434)All these problems are convex when set C is.3.1.3.1 k smallest/largest entriesSum of the k smallest entries of x∈ R n is the optimal objective value from:for 1≤k ≤n

3.1. CONVEX FUNCTION 189n∑π(x) i =i=n−k+1k∑i=1minimize x T yy∈R nsubject to 0 ≼ y ≼ 11 T y = korn∑π(x) i = maximizei=n−k+1z∈R n , t∈Rk t + 1 T zsubject to x ≽ t1 + zz ≼ 0(435)which are dual programs, where π(x) 1 = max{x i , i=1... n} where πis the nonlinear permutation operator sorting its vector argument intononincreasing order. Finding k smallest entries of an n-length vector x is∑expressible as an infimum of n!/(k!(n − k)!) linear functions of x . The sumπ(x)i is therefore a concave function of x ; in fact, monotonic (3.1.8.1.1)in this instance.Sum of the k largest entries of x∈ R n is the optimal objective value from:[46, exer.5.19]π(x) i = maximize x T yy∈R nsubject to 0 ≼ y ≼ 11 T y = kork∑π(x) i =i=1minimize k t + 1 T zz∈R n , t∈Rsubject to x ≼ t1 + zz ≽ 0(436)which are dual programs. Finding k largest entries of an n-length vector xis expressible as a supremum of n!/(k!(n − k)!) linear functions of x . Thesummation is therefore a convex function (and monotonic in this instance,3.1.8.1.1).Let Πx be a permutation of entries x i such that their absolute valuebecomes arranged in nonincreasing order: |Πx| 1 ≥ |Πx| 2 ≥ · · · ≥ |Πx| n . Byproperties of vector norm, [167, p.59] [110, p.52] sum of the k largest entriesof |x|∈ R n is a norm:‖x‖nk ∆ = k ∑i=1|Πx| i = minimize k t + 1 T zz∈R n , t∈Rsubject to −t1 − z ≼ x ≼ t1 + zz ≽ 0where the norm subscript derives from a binomial coefficient( nk), and(437)‖x‖n n ∆ = ‖x‖ 1‖x‖n1 ∆ = ‖x‖ ∞(438)

190 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSFinding k largest absolute entries of an n-length vector x is expressible asa supremum of 2 k n!/(k!(n − k)!) linear functions of x ; hence, this norm isconvex (nonmonotonic) in x . [46, exer.6.3(e)]minimize ‖x‖nx∈R n ksubject to x ∈ C≡minimizez∈R n , t∈R , x∈R nsubject tok t + 1 T z−t1 − z ≼ x ≼ t1 + zz ≽ 0x ∈ C(439)3.1.3.2 clippingClipping negative vector entries is accomplished:where, for x = [x i ]∈ R nx + = t ⋆ =‖x + ‖ 1 = minimize 1 T tt∈R nsubject to x ≼ t0 ≼ t[{xi , x i ≥ 00, x i < 0 , i=1... n ](440)(441)(clipping)minimizex∈R n ‖x + ‖ 1subject to x ∈ C≡minimize 1 T tx∈R n , t∈R nsubject to x ≼ t0 ≼ tx ∈ C(442)3.1.4 invertedWe wish to implement objectives of the form x −1 . Suppose we have a 2×2matrix[ ]T =∆ x z∈ R 2 (443)z ywhich is positive semidefinite by (1340) whenT ≽ 0 ⇔ x > 0 and xy ≥ z 2 (444)

3.1. CONVEX FUNCTION 191This means we may formulate convex problems, having inverted variables,as semidefinite programs; e.g.,minimize x −1x∈Rsubject to x > 0x ∈ C≡minimizex , y ∈ Rsubject toy[ ] x 1≽ 01 yx ∈ C(445)orx > 0, y ≥ 1 x⇔[ x 11 y]≽ 0 (446)(inverted) For vector x=[x i , i=1... n]∈ R norminimizex∈R nn∑i=1x −1isubject to x ≻ 0x ∈ C≡minimizex∈R n , y∈Rsubject toy[√ xin√ ] n≽ 0 , yi=1... nx ∈ C (447)x ≻ 0, y ≥ tr ( δ(x) −1)⇔[xi√ n√ n y]≽ 0 , i=1... n (448)3.1.5 fractional power[100] To implement an objective of the form x α for positive α , we quantizeα and work instead with that approximation. Choose nonnegative integer qfor adequate quantization of α like so:α ∆ = k 2 q (449)where k ∈{0, 1, 2... 2 q −1}. Any k from that set may be written∑k= q b i 2 i−1 where b i ∈ {0, 1}. Define vector y=[y i , i=0... q] with y 0 =1.i=1

192 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.5.1 positiveThen we have the equivalent semidefinite program for maximizing a concavefunction x α , for quantized 0≤α 0x ∈ C≡maximizex∈R , y∈R q+1subject toy q[yi−1 y iy ix b i]≽ 0 ,i=1... qx ∈ C (450)where nonnegativity of y q is enforced by maximization; id est,x > 0, y q ≤ x α⇔[yi−1 y iy ix b i]≽ 0 , i=1... q (451)3.1.5.2 negativeIs it also desirable implement an objective of the form x −α for positive α .The technique is nearly the same as before: for quantized 0≤α 0x ∈ Cx > 0, z ≥ x −α≡⇔minimizex , z∈R , y∈R q+1subject to[yi−1 y iy ix b iz[yi−1 y iy ix b i][ ]z 1≽ 01 y q≽ 0 ,i=1... qx ∈ C (452)][ ]z 1≽ 01 y q≽ 0 ,i=1... q(453)

3.1. CONVEX FUNCTION 1933.1.5.3 positive invertedNow define vector t=[t i , i=0... q] with t 0 =1. To implement an objectivex 1/α for quantized 0≤α

194 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSf(z)Az 2z 1CH−H+a{z ∈ R 2 | a T z = κ1 }{z ∈ R 2 | a T z = κ2 }{z ∈ R 2 | a T z = κ3 }Figure 57: Cartesian axes in R 3 and three hyperplanes intersecting convex setC ⊂ R 2 reproduced from Figure 21. Plotted with third dimension is affine setA = f(R 2 ) a plane. Sequence of hyperplanes, w.r.t domain R 2 of an affinefunction f(z)= a T z + b : R 2 → R , is increasing in direction of gradient a(3.1.8.0.3) because affine function increases in normal direction (Figure 19).For X ∈ S M and matrices A,B, Q, R of any compatible dimensions, forexample, the expression XAX is not affine in X whereasg(X) =[ R B T XXB Q + A T X + XA](457)is an affine multidimensional function.engineering control. [298,2.2] 3.6 [44] [102]Such a function is typical in3.6 The interpretation from this citation of {X ∈ S M | g(X) ≽ 0} as “an intersectionbetween a linear subspace and the cone of positive semidefinite matrices” is incorrect.(See2.9.1.0.2 for a similar example.) The conditions they state under which strongduality holds for semidefinite programming are conservative. (confer4.2.3.0.1)

3.1. CONVEX FUNCTION 1953.1.6.0.1 Example. Linear objective.Consider minimization of a real affine function f(z)= a T z + b over theconvex feasible set C in its domain R 2 illustrated in Figure 57. Sincevector b is fixed, the problem posed is the same as the convex optimizationminimize a T zzsubject to z ∈ C(458)whose objective of minimization is a real linear function. Were convex set Cpolyhedral (2.12), then this problem would be called a linear program. Wereconvex set C an intersection with a positive semidefinite cone, then thisproblem would be called a semidefinite program.There are two distinct ways to visualize this problem: one in theobjective function’s domain R 2 , the other [ including ] the ambient spaceR2of the objective function’s range as in . Both visualizations areRillustrated in Figure 57. Visualization in the function domain is easierbecause of lower dimension and because level sets of any affine function areaffine (2.1.9). In this circumstance, the level sets are parallel hyperplaneswith respect to R 2 . One solves optimization problem (458) graphically byfinding that hyperplane intersecting feasible set C furthest right (in thedirection of negative gradient −a (3.1.8)).When a differentiable convex objective function f is nonlinear, thenegative gradient −∇f is a viable search direction (replacing −a in (458)).(2.13.10.1, Figure 55) [104] Then the nonlinear objective function can bereplaced with a dynamic linear objective; linear as in (458).3.1.6.0.2 Example. Support function. [46,3.2]For arbitrary set Y ⊆ R n , its support function σ Y (a) : R n → R is definedσ Y (a) = ∆ supa T z (459)z∈Ywhose range contains ±∞ [183, p.135] [148,C.2.3.1]. For each z ∈ Y ,a T z is a linear function of vector a . Because σ Y (a) is the pointwisesupremum of linear functions, it is convex in a . (Figure 58) Applicationof the support function is illustrated in Figure 22(a) for one particularnormal a .

196 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS{a T z 1 + b 1 | a∈ R}supa T pz i + b ii{a T z 2 + b 2 | a∈ R}{a T z 3 + b 3 | a∈ R}a{a T z 4 + b 4 | a∈ R}{a T z 5 + b 5 | a∈ R}Figure 58: Pointwise supremum of convex functions remains a convexfunction. Illustrated is a supremum of affine functions in variable a evaluatedfor a particular argument a p . Topmost affine function is supremum for eachvalue of a .q(x)f(x)quasiconvexconvexxxFigure 59: Quasiconvex function q epigraph is not necessarily convex, butconvex function f epigraph is convex in any dimension. Sublevel sets arenecessarily convex for either.

3.1. CONVEX FUNCTION 1973.1.7 epigraph, sublevel setIt is well established that a continuous real function is convex if andonly if its epigraph makes a convex set. [148] [232] [270] [282] [183]Thereby, piecewise-continuous convex functions are admitted. Epigraph isthe connection between convex sets and convex functions. Its generalizationto a vector-valued function f(X) : R p×k →R M is straightforward: [220]epif ∆ = {(X , t)∈ R p×k × R M | X ∈ domf , f(X) ≼t } (460)R M +id est,f convex ⇔ epi f convex (461)Necessity is proven: [46, exer.3.60] Given any (X, u), (Y , v)∈ epif , wemust show for all µ∈[0, 1] that µ(X, u) + (1−µ)(Y , v)∈ epif ; id est, wemust showf(µX + (1−µ)Y ) ≼R M +µu + (1−µ)v (462)Yet this holds by definition because f(µX+(1−µ)Y ) ≼ µf(X)+(1−µ)f(Y ).The converse also holds.3.1.7.0.1 Exercise. Epigraph sufficiency.Prove that converse: Given any (X, u), (Y , v)∈ epif , if for all µ∈[0, 1]µ(X, u) + (1−µ)(Y , v)∈ epif holds, then f must be convex. Sublevel sets of a real convex function are convex. Likewise, correspondingto each and every ν ∈ R ML ν f ∆ = {X ∈ domf | f(X) ≼ν } ⊆ R p×k (463)R M +sublevel sets of a vector-valued convex function are convex. As for realfunctions, the converse does not hold. (Figure 59)

198 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSTo prove necessity of convex sublevel sets: For any X,Y ∈ L ν f we mustshow for each and every µ∈[0, 1] that µX + (1−µ)Y ∈ L ν f . By definition,f(µX + (1−µ)Y ) ≼R M +µf(X) + (1−µ)f(Y ) ≼R M +ν (464)When an epigraph (460) is artificially bounded above, t ≼ ν , then thecorresponding sublevel set can be regarded as an orthogonal projection ofthe epigraph on the function domain.Sense of the inequality is reversed in (460), for concave functions, and weuse instead the nomenclature hypograph. Sense of the inequality in (463) isreversed, similarly, with each convex set then called superlevel set.3.1.7.0.2 Example. Matrix pseudofractional function.Consider a real function of two variables on domf = S n +× R(A)f(A, x) : S n × R n → R = x T A † x (465)This function is convex simultaneously in both variables when variablematrix A belongs to the entire positive semidefinite cone S n + and variablevector x is confined to range R(A) of matrix A .To explain this, we need only demonstrate that the function epigraph isconvex. Consider Schur-form (1337) fromA.4: for t ∈ R[ ] A zG(A, z , t) =z T ≽ 0t⇔z T (I − AA † ) = 0t − z T A † z ≥ 0A ≽ 0(466)Inverse image of the positive semidefinite cone S n+1+ under affine mappingG(A, z , t) is convex by Theorem 2.1.9.0.1. Of the equivalent conditions forpositive semidefiniteness of G , the first is an equality demanding vector zbelong to R(A). Function f(A, z)=z T A † z is convex on S+× n R(A) becausethe Cartesian product constituting its epigraphepif(A, z) = { (A, z , t) | A ≽ 0, z ∈ R(A), z T A † z ≤ t } = G −1( S n+1+is convex.)(467)

3.1. CONVEX FUNCTION 1993.1.7.0.3 Exercise. Matrix product function.Continue Example 3.1.7.0.2 by introducing vector variable x and makingthe substitution z ←Ax . Because of matrix symmetry (E), for all x∈ R nf(A, z(x)) = z T A † z = x T A T A † Ax = x T Ax = f(A, x) (468)whose epigraph isepif(A, x) = { (A, x, t) | A ≽ 0, x T Ax ≤ t } (469)Provide two simple explanations why f(A, x) = x T Ax is not a functionconvex simultaneously in positive semidefinite matrix A and vector x ondomf = S+× n R n .3.1.7.0.4 Example. Matrix fractional function. (confer3.2.2.0.1)Continuing Example 3.1.7.0.2, now consider a real function of two variableson domf = S n +×R n for small positive constant ǫ (confer (1665))f(A, x) = ǫx T (A + ǫI) −1 x (470)where the inverse always exists by (1281). This function is convexsimultaneously in both variables over the entire positive semidefinite cone S n +and all x∈ R n : Consider Schur-form (1340) fromA.4: for t ∈ R[ ] A + ǫI zG(A, z , t) =z T ǫ −1 ≽ 0t⇔t − ǫz T (A + ǫI) −1 z ≥ 0A + ǫI ≻ 0(471)Inverse image of the positive semidefinite cone S n+1+ under affine mappingG(A, z , t) is convex by Theorem 2.1.9.0.1. Function f(A, z) is convex onS n +×R n because its epigraph is that inverse image:epif(A, z) = { (A, z , t) | A + ǫI ≻ 0, ǫz T (A + ǫI) −1 z ≤ t } = G −1( S n+1+)(472)

200 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.7.1 matrix fractional projector functionConsider nonlinear function f having orthogonal projector W as argument:f(W , x) = ǫx T (W + ǫI) −1 x (473)Projection matrix W has property W † = W T = W ≽ 0 (1709). Anyorthogonal projector can be decomposed into an outer product oforthonormal matrices W = UU T where U T U = I as explained inE.3.2. From (1665) for any ǫ > 0 and idempotent symmetric W ,ǫ(W + ǫI) −1 = I − (1 + ǫ) −1 W from whichThereforef(W , x) = ǫx T (W + ǫI) −1 x = x T( I − (1 + ǫ) −1 W ) x (474)limǫ→0 +f(W, x) = lim (W + ǫI) −1 x = x T (I − W )x (475)ǫ→0 +ǫxTwhere I − W is also an orthogonal projector (E.2).We learned from Example 3.1.7.0.4 that f(W , x)= ǫx T (W +ǫI) −1 x isconvex simultaneously in both variables over all x ∈ R n when W ∈ S n + isconfined to the entire positive semidefinite cone (including its boundary). Itis now our goal to incorporate f into an optimization problem such thatan optimal solution returned always comprises a projection matrix W . Theset of orthogonal projection matrices is a nonconvex subset of the positivesemidefinite cone. So f cannot be convex on the projection matrices, andits equivalent (for idempotent W )f(W , x) = x T( I − (1 + ǫ) −1 W ) x (476)cannot be convex simultaneously in both variables on either the positivesemidefinite or symmetric projection matrices.Suppose we allow domf to constitute the entire positive semidefinitecone but constrain W to a Fantope (80); e.g., for convex set C and 0 < k < nas inminimize ǫx T (W + ǫI) −1 xx∈R n , W ∈S nsubject to 0 ≼ W ≼ I(477)trW = kx ∈ C

3.1. CONVEX FUNCTION 201Although this is a convex problem, there is no guarantee that optimal W isa projection matrix because only extreme points of a Fantope are orthogonalprojection matrices UU T .Let’s try partitioning the problem into two convex parts (one for x andone for W), substitute equivalence (474), and then iterate solution of convexproblemminimize x T (I − (1 + ǫ) −1 W)xx∈R n(478)subject to x ∈ Cwith convex problem(a)minimize x ⋆T (I − (1 + ǫ) −1 W)x ⋆W ∈S nsubject to 0 ≼ W ≼ ItrW = k≡maximize x ⋆T Wx ⋆W ∈S nsubject to 0 ≼ W ≼ ItrW = k(479)until convergence, where x ⋆ represents an optimal solution of (478) fromany particular iteration. The idea is to optimally solve for the partitionedvariables which are later combined to solve the original problem (477).What makes this approach sound is that the constraints are separable, thepartitioned feasible sets are not interdependent, and the fact that the originalproblem (though nonlinear) is convex simultaneously in both variables. 3.7But partitioning alone does not guarantee a projector. To makeorthogonal projector W a certainty, we must invoke a known analyticaloptimal solution to problem (479): Diagonalize optimal solution fromproblem (478) x ⋆ x ⋆T = ∆ QΛQ T (A.5.2) and set U ⋆ = Q(:, 1:k)∈ R n×kper (1506c);W = U ⋆ U ⋆T = x⋆ x ⋆T‖x ⋆ ‖ 2 + Q(:, 2:k)Q(:, 2:k)T (480)Then optimal solution (x ⋆ , U ⋆ ) to problem (477) is found, for small ǫ , byiterating solution to problem (478) with optimal (projector) solution (480)to convex problem (479).3.7 A convex problem has convex feasible set, and the objective surface has one and onlyone global minimum.

202 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSProof. Optimal vector x ⋆ is orthogonal to the last n −1 columns oforthogonal matrix Q , sof ⋆ (478) = ‖x⋆ ‖ 2 (1 − (1 + ǫ) −1 ) (481)after each iteration. Convergence of f ⋆ is proven with the observation(478)that iteration (478) (479a) is a nonincreasing sequence that is bounded belowby 0. Any bounded monotonic sequence in R is convergent. [190,1.2][30,1.1] Expression (480) for optimal projector W holds at each iteration,therefore ‖x ⋆ ‖ 2 (1 − (1 + ǫ) −1 ) must also represent the optimal objectivevalue f ⋆ at convergence.(478)Because the objective f (477) from problem (477) is also bounded belowby 0 on the same domain, this convergent optimal objective value f ⋆ (for (478)positive ǫ arbitrarily close to 0) is necessarily optimal for (477); id est,by (1488), andf ⋆ (478) ≥ f⋆ (477) ≥ 0 (482)lim = 0 (483)ǫ→0 +f⋆ (478)Since optimal (x ⋆ , U ⋆ ) from problem (478) is feasible to problem (477), andbecause their objectives are equivalent for projectors by (474), then converged(x ⋆ , U ⋆ ) must also be optimal to (477) in the limit. Because problem (477)is convex, this represents a globally optimal solution.3.1.7.2 Semidefinite program via SchurSchur complement (1337) can be used to convert a projection problemto an optimization problem in epigraph form. Suppose, for example,we are presented with the constrained projection problem studied byHayden & Wells in [133] (who provide analytical solution): Given A∈ R M×Mand some full-rank matrix S ∈ R M×L with L < Mminimize ‖A − X‖ 2X∈ S MFsubject to S T XS ≽ 0(484)Variable X is constrained to be positive semidefinite, but only on a subspacedetermined by S . First we write the epigraph form:

3.1. CONVEX FUNCTION 203minimize tX∈ S M , t∈Rsubject to ‖A − X‖ 2 F ≤ tS T XS ≽ 0(485)Next we use Schur complement [206,6.4.3] [182] and matrix vectorization(2.2):minimize tX∈ S M , t∈R[]tI vec(A − X)subject tovec(A − X) T ≽ 0 (486)1S T XS ≽ 0This semidefinite program is an epigraph form in disguise, equivalentto (484); it demonstrates how a quadratic objective or constraint can beconverted to a semidefinite constraint.Were problem (484) instead equivalently expressed without the squarethen we get a subtle variation:that leads to an equivalent for (487)minimizeX∈ S M , t∈Rsubject tominimize ‖A − X‖ FX∈ S Msubject to S T XS ≽ 0minimize tX∈ S M , t∈Rsubject to ‖A − X‖ F ≤ tt[S T XS ≽ 0tI vec(A − X)vec(A − X) T tS T XS ≽ 0]≽ 0(487)(488)(489)3.1.7.2.1 Example. Schur anomaly.Consider a problem abstract in the convex constraint, given symmetricmatrix Aminimize ‖X‖ 2X∈ S M F − ‖A − X‖2 F(490)subject to X ∈ Cthe minimization of a difference of two quadratic functions each convex inmatrix X . Observe equality

204 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS21.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 60: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl f(Y )= Y T Y : R 2 → R illustrated in Figure 61.Circular contours are level sets; each defined by a constant function-value.Y 1‖X‖ 2 F − ‖A − X‖ 2 F = 2 tr(XA) − ‖A‖ 2 F (491)So problem (490) is equivalent to the convex optimizationminimize tr(XA)X∈ S Msubject to X ∈ C(492)But this problem (490) does not have Schur-formminimize t − αX∈ S M , α , tsubject to X ∈ C‖X‖ 2 F ≤ t(493)‖A − X‖ 2 F ≥ αbecause the constraint in α is nonconvex. (2.9.1.0.1)

3.1. CONVEX FUNCTION 2053.1.8 gradientGradient ∇f of any differentiable multidimensional function f (formallydefined inD.1) maps each entry f i to a space having the same dimensionas the ambient space of its domain. Notation ∇f is shorthand for gradient∇ x f(x) of f with respect to x . ∇f(y) can mean ∇ y f(y) or gradient∇ x f(y) of f(x) with respect to x evaluated at y ; a distinction that shouldbecome clear from context.Gradient of a differentiable real function f(x) : R K →R with respect toits vector domain is defined⎡∇f(x) =∆ ⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎤⎥⎦ ∈ RK (1562)while the second-order gradient of the twice differentiable real function withrespect to its vector domain is traditionally called the Hessian ; 3.8⎡∇ 2 f(x) =∆ ⎢⎣∂ 2 f(x)∂ 2 x 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 1 ∂x 2· · ·∂ 2 f(x)∂ 2 x 2· · ·....∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂ 2 x K⎤∈ S K (1563)⎥⎦The gradient can be interpreted as a vector pointing in the direction ofgreatest change. [161,15.6] The gradient can also be interpreted as thatvector normal to a level set; e.g., Figure 62, Figure 55.For the quadratic bowl in Figure 61, the gradient maps to R 2 ; illustratedin Figure 60. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)For any differentiable multidimensional function, zero gradient ∇f = 0is a necessary condition for its unconstrained minimization [104,3.2]:3.8 Jacobian is the Hessian transpose, so commonly confused in matrix calculus.

206 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.8.0.1 Example. Projection on a rank-1 subset.For A∈ S N having eigenvalues λ(A)= [λ i ]∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset (2.9.2.1)of positive semidefinite cone S N + : Defining λ 1 ∆ = maxi{λ(A) i } andcorresponding eigenvector v 1minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 (1501)1 , λ 1 > 0arg minimizex‖xx T − A‖ 2 F ={0 , λ1 ≤ 0v 1√λ1 , λ 1 > 0(1502)From (1589) andD.2.1, the gradient of ‖xx T − A‖ 2 F is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (494)Setting the gradient to 0Ax = x(x T x) (495)is necessary for optimal solution. Replace vector x with a normalizedeigenvector v i of A∈ S N , corresponding to a positive eigenvalue λ i , scaledby square root of that eigenvalue. Then (495) is satisfiedx ← v i√λi ⇒ Av i = v i λ i (496)xx T = λ i v i viT is a rank-1 matrix on the positive semidefinite cone boundary,and the minimum is achieved (7.1.2) when λ i =λ 1 is the largest positiveeigenvalue of A . If A has no positive eigenvalue, then x=0 yields theminimum.For any differentiable multidimensional convex function, zero gradient∇f = 0 is a necessary and sufficient condition for its unconstrainedminimization [46,5.5.3]:

3.1. CONVEX FUNCTION 2073.1.8.0.2 Example. Pseudoinverse.The pseudoinverse matrix is the unique solution to an unconstrained convexoptimization problem [110,5.5.4]: given A∈ R m×nwherewhose gradient (D.2.3)vanishes whenminimizeX∈R n×m ‖XA − I‖2 F (497)‖XA − I‖ 2 F = tr ( A T X T XA − XA − A T X T + I ) (498)∇ X ‖XA − I‖ 2 F = 2 ( XAA T − A T) = 0 (499)XAA T = A T (500)When A is fat full-rank, then AA T is invertible, X ⋆ = A T (AA T ) −1 is thepseudoinverse A † , and AA † =I . Otherwise, we can make AA T invertibleby adding a positively scaled identity, for any A∈ R m×nX = A T (AA T + t I) −1 (501)Invertibility is guaranteed for any finite positive value of t by (1281). Thenmatrix X becomes the pseudoinverse X → A † = ∆ X ⋆ in the limit t → 0 + .Minimizing instead ‖AX − I‖ 2 F yields the second flavor in (1664). 3.1.8.0.3 Example. Hyperplane, line, described by affine function.Consider the real affine function of vector variable,f(x) : R p → R = a T x + b (502)whose domain is R p and whose gradient ∇f(x)=a is a constant vector(independent of x). This function describes the real line R (its range), andit describes a nonvertical [148,B.1.2] hyperplane ∂H in the space R p × Rfor any particular vector a (confer2.4.2);{[∂H =having nonzero normalxa T x + bη =[ a−1] }| x∈ R p ⊂ R p ×R (503)]∈ R p ×R (504)

208 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSThis equivalence to a hyperplane holds only for real functions. 3.9[ The ] Rpepigraph of the real affine function f(x) is therefore a halfspace in ,Rso we have:The real affine function is to convex functionsasthe hyperplane is to convex sets.Similarly, the matrix-valued affine function of real variable x , for anyparticular matrix A∈ R M×N ,describes a line in R M×N in direction Aand describes a line in R×R M×N{[h(x) : R→R M×N = Ax + B (505){Ax + B | x∈ R} ⊆ R M×N (506)xAx + B]}| x∈ R ⊂ R×R M×N (507)whose slope with respect to x is A .3.9 To prove that, consider a vector-valued affine functionf(x) : R p →R M = Ax + bhaving gradient ∇f(x)=A T ∈ R p×M : The affine set{[ ] }x| x∈ R p ⊂ R p ×R MAx + bis perpendicular toη ∆ =[∇f(x)−I]∈ R p×M × R M×Mbecauseη T ([xAx + b] [0−b])= 0 ∀x ∈ R pYet η is a vector (in R p ×R M ) only when M = 1.

3.1. CONVEX FUNCTION 209f(Y )[ ∇f(X)−1]∂H −Figure 61: When a real function f is differentiable at each point in its opendomain, there is an intuitive geometric interpretation of function convexityin terms of its gradient ∇f and its epigraph: Drawn is a convex quadraticbowl in R 2 ×R (confer Figure 122, p.576); f(Y )= Y T Y : R 2 → R versus Yon some open disc in R 2 . Supporting hyperplane ∂H − ∈ R 2 × R (which istangent, only partially drawn) and its normal vector [ ∇f(X) T −1 ] T at theparticular point of support [X T f(X) ] T are illustrated. The interpretation:At each and every coordinate Y , there is such a hyperplane containing[Y T f(Y ) ] T and supporting the epigraph.

210 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.8.1 monotonic functionA real differentiable function f of real argument is called monotonic when itsfirst derivative (not necessarily continuous) maintains sign over the functiondomain.3.1.8.1.1 Definition. Monotonicity.Multidimensional function f is monotonic when sgn〈f(Y )−f(X), Y −X〉is invariant (ignoring 0) to all X,Y ∈ dom f . Nonnegative (nonpositive)sign denotes nonnegative (nonpositive) monotonicity.△When argument X and f(X) are dimensionally incompatible, the onehaving smaller dimension is padded with 1 to complete the test. It isnecessary and sufficient for each entry f i from this monotonicity definitionto be monotonic with the same sign.A convex function can be characterized by a similar kind of nonnegativemonotonicity of its gradient:3.1.8.1.2 Theorem. Gradient monotonicity. [148,B.4.1.4][41,3.1, exer.20] Given f(X) : R p×k →R a real differentiable function withmatrix argument on open convex domain, the condition〈∇f(Y ) − ∇f(X) , Y − X〉 ≥ 0 for each and every X,Y ∈ domf (508)is necessary and sufficient for convexity of f . Strict inequality and caveatdistinct Y ,X provide necessary and sufficient conditions for strict convexity.⋄3.1.8.1.3 Example. Composition of functions. [46,3.2.4] [148,B.2.1]Monotonic functions play a vital role determining convexity of functionsconstructed by transformation. Given functions g : R k → R andh : R n → R k , their composition f = g(h) : R n → R defined byf(x) = g(h(x)) , dom f = {x∈ domh | h(x)∈ domg} (509)is convex when

3.1. CONVEX FUNCTION 211g is convex nonnegatively monotonic and h is convexg is convex nonpositively monotonic and h is concaveand composite function f is concave wheng is concave nonnegatively monotonic and h is concaveg is concave nonpositively monotonic and h is convexwhere ∞ (−∞) is assigned to convex (concave) g when evaluated outsideits domain. When functions are differentiable, these rules are consequent to(1590). Convexity (concavity) of any g is preserved when h is affine. 3.1.9 first-order convexity condition, real functionDiscretization of w ≽0 in (421) invites refocus to the real-valued function:3.1.9.0.1 Theorem. Necessary and sufficient convexity condition.[46,3.1.3] [88,I.5.2] [301,1.2.3] [30,1.2] [249,4.2] [231,3] For realdifferentiable function f(X) : R p×k →R with matrix argument on openconvex domain, the condition (conferD.1.7)f(Y ) ≥ f(X) + 〈∇f(X) , Y − X〉 for each and every X,Y ∈ domf (510)is necessary and sufficient for convexity of f .⋄When f(X) : R p →R is a real differentiable convex function with vectorargument on open convex domain, there is simplification of the first-ordercondition (510); for each and every X,Y ∈ domff(Y ) ≥ f(X) + ∇f(X) T (Y − X) (511)

212 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSFrom this we can find a unique [282,5.5.4] nonvertical [148,B.1.2]hyperplane ∂H − (2.4), [ expressed ] in terms of the function gradient,Xsupporting epif at : videlicet, defining f(Y /∈ domf) = ∆ ∞f(X)[46,3.1.7][ Yt]∈ epif ⇔ t ≥ f(Y ) ⇒ [ ∇f(X) T−1 ]([ Yt] [−Xf(X)])≤ 0(512)This means, for each and every point X in the domain of a real convexfunction [ f(X) ] , there exists a hyperplane ∂H − [ in R p × ] R having normal∇f(X)Xsupporting the function epigraph at ∈ ∂H−1f(X)−{[ Y∂H − =t] [ Rp∈R][∇f(X)T−1 ]([ Yt] [−Xf(X)]) }= 0(513)One such supporting hyperplane (confer Figure 22(a)) is illustrated inFigure 61 for a convex quadratic.From (511) we deduce, for each and every X,Y ∈ domf∇f(X) T (Y − X) ≥ 0 ⇒ f(Y ) ≥ f(X) (514)meaning, the gradient at X identifies a supporting hyperplane there in R p{Y ∈ R p | ∇f(X) T (Y − X) = 0} (515)to the convex sublevel sets of convex function f (confer (463))L f(X) f ∆ = {Y ∈ domf | f(Y ) ≤ f(X)} ⊆ R p (516)illustrated for an arbitrary real convex function in Figure 62.

3.1. CONVEX FUNCTION 213αβα ≥ β ≥ γ∇f(X)Y−Xγ{Z | f(Z) = α}{Y | ∇f(X) T (Y − X) = 0, f(X)=α}Figure 62: Shown is a plausible contour plot in R 2 of some arbitrary realconvex function f(Z) at selected levels α , β , and γ ; contours of equallevel f (level sets) drawn in the function’s domain. A convex functionhas convex sublevel sets L f(X) f (516). [232,4.6] The sublevel set whoseboundary is the level set at α , for instance, comprises all the shaded regions.For any particular convex function, the family comprising all its sublevel setsis nested. [148, p.75] Were the sublevel sets not convex, we may certainlyconclude the corresponding function is neither convex. Contour plots of realaffine functions are illustrated in Figure 19 and Figure 57.

214 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.10 first-order convexity condition, vector functionNow consider the first-order necessary and sufficient condition for convexityof a vector-valued function: Differentiable function f(X) : R p×k →R M isconvex if and only if domf is open, convex, and for each and everyX,Y ∈ domff(Y ) ≽R M +f(X) +→Y −Xdf(X)= f(X) + d dt∣ f(X+ t (Y − X)) (517)t=0→Y −Xwhere df(X) is the directional derivative 3.10 [161] [252] of f at X in directionY −X . This, of course, follows from the real-valued function case: by dualgeneralized inequalities (2.13.2.0.1),f(Y ) − f(X) −→Y −Xdf(X) ≽0 ⇔〈〉→Y −Xf(Y ) − f(X) − df(X) , w ≥ 0 ∀w ≽ 0whereR M +→Y −Xdf(X) =⎡⎢⎣R M +(518)tr ( ∇f 1 (X) T (Y − X) ) ⎤tr ( ∇f 2 (X) T (Y − X) )⎥.tr ( ∇f M (X) T (Y − X) ) ⎦ ∈ RM (519)Necessary and sufficient discretization (421) allows relaxation of thesemi-infinite number of conditions w ≽ 0 instead to w ∈ {e i , i=1... M}the extreme directions of the nonnegative orthant. Each extreme direction→Y −Xpicks out a real entry f i and df(X) ifrom vector-valued function f and its→Y −Xdirectional derivative df(X) , then Theorem 3.1.9.0.1 applies.The vector-valued function case (517) is therefore a straightforwardapplication of the first-order convexity condition for real functions to eachentry of the vector-valued function.3.10 We extend the traditional definition of directional derivative inD.1.4 so that directionmay be indicated by a vector or a matrix, thereby broadening the scope of the Taylorseries (D.1.7). The right-hand side of the inequality (517) is the first-order Taylor seriesexpansion of f about X .

3.1. CONVEX FUNCTION 2153.1.11 second-order convexity conditionAgain, by discretization (421), we are obliged only to consider each individualentry f i of a vector-valued function f ; id est, the real functions {f i }.For f(X) : R p →R M , a twice differentiable vector-valued function withvector argument on open convex domain,∇ 2 f i (X) ≽0 ∀X ∈ domf , i=1... M (520)S p +is a necessary and sufficient condition for convexity of f . Obviously, whenM = 1, this convexity condition also serves for a real function. Intuitively,condition (520) precludes points of inflection, as in Figure 63 on page 222.Strict inequality is a sufficient condition for strict convexity, but that isnothing new; videlicet, the strictly convex real function f i (x)=x 4 does nothave positive second derivative at each and every x∈ R . Quadratic formsconstitute a notable exception where the strict-case converse is reliably true.3.1.11.0.1 Exercise. Real fractional function. (confer3.1.4,3.1.7.0.4)Prove that real function f(x,y) = x/y is not convex on the nonnegativeorthant. Also exhibit this in a plot of the function. (In fact, f is quasilinear(p.223) on {y > 0}.)3.1.11.1 second-order ⇒ first-order conditionFor a twice-differentiable real function f i (X) : R p →R having open domain,a consequence of the mean value theorem from calculus allows compressionof its complete Taylor series expansion about X ∈ dom f i (D.1.7) to threeterms: On some open interval of ‖Y ‖ so each and every line segment[X,Y ] belongs to domf i , there exists an α∈[0, 1] such that [301,1.2.3][30,1.1.4]f i (Y ) = f i (X)+ ∇f i (X) T (Y −X)+ 1 2 (Y −X)T ∇ 2 f i (αX +(1 −α)Y )(Y −X)(521)The first-order condition for convexity (511) follows directly from this andthe second-order condition (520).

216 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.2 Matrix-valued convex functionWe need different tools for matrix argument: We are primarily interested incontinuous matrix-valued functions g(X). We choose symmetric g(X)∈ S Mbecause matrix-valued functions are most often compared (522) with respectto the positive semidefinite cone S M + in the ambient space of symmetricmatrices. 3.113.2.0.0.1 Definition. Convex matrix-valued function:1) Matrix-definition.A function g(X) : R p×k →S M is convex in X iff domg is a convex set and,for each and every Y,Z ∈domg and all 0≤µ≤1 [158,2.3.7]g(µY + (1 − µ)Z) ≼µg(Y ) + (1 − µ)g(Z) (522)S M +Reversing sense of the inequality flips this definition to concavity. Strictconvexity is defined less a stroke of the pen in (522) similarly to (423).2) Scalar-definition.It follows that g(X) : R p×k →S M is convex in X iff w T g(X)w : R p×k →R isconvex in X for each and every ‖w‖= 1; shown by substituting the defininginequality (522). By dual generalized inequalities we have the equivalent butmore broad criterion, (2.13.5)g convex ⇔ 〈W , g〉 convex∀W ≽S M +0 (523)Strict convexity on both sides requires caveat W ≠ 0. Because the set ofall extreme directions for the positive semidefinite cone (2.9.2.4) comprisesa minimal set of generators for that cone, discretization (2.13.4.2.1) allowsreplacement of matrix W with symmetric dyad ww T as proposed. △3.11 Function symmetry is not a necessary requirement for convexity; indeed, for A∈R m×pand B ∈R m×k , g(X) = AX + B is a convex (affine) function in X on domain R p×k withrespect to the nonnegative orthant R m×k+ . Symmetric convex functions share the samebenefits as symmetric matrices. Horn & Johnson [150,7.7] liken symmetric matrices toreal numbers, and (symmetric) positive definite matrices to positive real numbers.

3.2. MATRIX-VALUED CONVEX FUNCTION 2173.2.1 first-order convexity condition, matrix functionFrom the scalar-definition we have, for differentiable matrix-valuedfunction g and for each and every real vector w of unit norm ‖w‖= 1,w T g(Y )w ≥ w T →Y −XTg(X)w + w dg(X) w (524)that follows immediately from the first-order condition (510) for convexity ofa real function because→Y −XTw dg(X) w = 〈 ∇ X w T g(X)w , Y − X 〉 (525)→Y −Xwhere dg(X) is the directional derivative (D.1.4) of function g at X indirection Y −X . By discretized dual generalized inequalities, (2.13.5)g(Y ) − g(X) −→Y −Xdg(X) ≽0 ⇔〈g(Y ) − g(X) −→Y −X〉dg(X) , ww T ≥ 0 ∀ww T (≽ 0)S M +For each and every X,Y ∈ domg (confer (517))(526)S M +g(Y ) ≽g(X) +→Y −Xdg(X) (527)S M +must therefore be necessary and sufficient for convexity of a matrix-valuedfunction of matrix variable on open convex domain.3.2.2 epigraph of matrix-valued function, sublevel setsWe generalize the epigraph to a continuous matrix-valued functiong(X) : R p×k →S M :epig ∆ = {(X , T )∈ R p×k × S M | X ∈ domg , g(X) ≼T } (528)S M +from which it follows

218 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSg convex ⇔ epi g convex (529)Proof of necessity is similar to that in3.1.7 on page 197.Sublevel sets of a matrix-valued convex function corresponding to eachand every S ∈ S M (confer (463))L Sg ∆ = {X ∈ dom g | g(X) ≼S } ⊆ R p×k (530)S M +are convex. There is no converse.3.2.2.0.1 Example. Matrix fractional function redux.Generalizing Example 3.1.7.0.4 consider a matrix-valued function of twovariables on domg = S N + ×R n×N for small positive constant ǫ (confer (1665))g(A, X) = ǫX(A + ǫI) −1 X T (531)where the inverse always exists by (1281). This function is convexsimultaneously in both variables over the entire positive semidefinite cone S N +and all X ∈ R n×N : Consider Schur-form (1340) fromA.4: for T ∈ S n[ A + ǫI XTG(A, X, T ) =X ǫ −1 T⇔T − ǫX(A + ǫI) −1 X T ≽ 0A + ǫI ≻ 0]≽ 0(532)By Theorem 2.1.9.0.1, inverse image of the positive semidefinite cone S N+n+under affine mapping G(A, X, T ) is convex. Function g(A, X) is convexon S N + ×R n×N because its epigraph is that inverse image:epig(A, X) = { (A, X, T ) | A + ǫI ≻ 0, ǫX(A + ǫI) −1 X T ≼ T } = G −1( S N+n+)(533)

3.2. MATRIX-VALUED CONVEX FUNCTION 2193.2.3 second-order condition, matrix functionThe following line theorem is a potent tool for establishing convexity of amultidimensional function. To understand it, what is meant by line must firstbe solidified. Given a function g(X) : R p×k →S M and particular X, Y ∈ R p×knot necessarily in that function’s domain, then we say a line {X+ t Y | t ∈ R}passes through domg when X+ t Y ∈ domg over some interval of t ∈ R .3.2.3.0.1 Theorem. Line theorem. [46,3.1.1]Matrix-valued function g(X) : R p×k →S M is convex in X if and only if itremains convex on the intersection of any line with its domain. ⋄Now we assume a twice differentiable function.3.2.3.0.2 Definition. Differentiable convex matrix-valued function.Matrix-valued function g(X) : R p×k →S M is convex in X iff domg is anopen convex set, and its second derivative g ′′ (X+ t Y ) : R→S M is positivesemidefinite on each point of intersection along every line {X+ t Y | t ∈ R}that intersects domg ; id est, iff for each and every X, Y ∈ R p×k such thatX+ t Y ∈ domg over some open interval of t ∈ Rd 2dt 2 g(X+ t Y ) ≽ S M +0 (534)Similarly, ifd 2dt 2 g(X+ t Y ) ≻ S M +0 (535)then g is strictly convex; the converse is generally false. [46,3.1.4] 3.12 △3.2.3.0.3 Example. Matrix inverse. (confer3.1.5)The matrix-valued function X µ is convex on int S M + for −1≤µ≤0or 1≤µ≤2 and concave for 0≤µ≤1. [46,3.6.2] In particular, thefunction g(X) = X −1 is convex on int S M + . For each and every Y ∈ S M(D.2.1,A.3.1.0.5)3.12 Quadratic forms constitute a notable exception where the strict-case converse isreliably true.

220 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSd 2dt 2 g(X+ t Y ) = 2(X+ t Y )−1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 ≽S M +0 (536)on some open interval of t ∈ R such that X + t Y ≻0. Hence, g(X) isconvex in X . This result is extensible; 3.13 trX −1 is convex on that samedomain. [150,7.6, prob.2] [41,3.1, exer.25]3.2.3.0.4 Example. Matrix squared.Iconic real function f(x)= x 2 is strictly convex on R . The matrix-valuedfunction g(X)=X 2 is convex on the domain of symmetric matrices; forX, Y ∈ S M and any open interval of t ∈ R (D.2.1)d 2d2g(X+ t Y ) =dt2 dt 2(X+ t Y )2 = 2Y 2 (537)which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y(1287). 3.14A more appropriate matrix-valued counterpart for f is g(X)=X T Xwhich is a convex function on domain X ∈ R m×n , and strictly convexwhenever X is skinny-or-square full-rank. This matrix-valued function canbe generalized to g(X)=X T AX which is convex whenever matrix A ispositive semidefinite (p.588), and strictly convex when A is positive definiteand X is skinny-or-square full-rank (Corollary A.3.1.0.5). 3.2.3.0.5 Example. Matrix exponential.The matrix-valued function g(X) = e X : S M → S M is convex on the subspaceof circulant [118] symmetric matrices. Applying the line theorem, for all t∈Rand circulant X, Y ∈ S M , from Table D.2.7 we haved 2dt 2eX+ t Y = Y e X+ t Y Y ≽S M +0 , (XY ) T = XY (538)because all circulant matrices are commutative and, for symmetric matrices,XY = Y X ⇔ (XY ) T = XY (1305). Given symmetric argument, the matrixexponential always resides interior to the cone of positive semidefinite3.13 d/dt trg(X+ tY ) = trd/dt g(X+ tY ). [151, p.491]3.14 By (1306) inA.3.1, changing the domain instead to all symmetric and nonsymmetricpositive semidefinite matrices, for example, will not produce a convex function.

3.3. QUASICONVEX 221matrices in the symmetric matrix subspace; e A ≻0 ∀A∈ S M (1662). Thenfor any matrix Y of compatible dimension, Y T e A Y is positive semidefinite.(A.3.1.0.5)The subspace of circulant symmetric matrices contains all diagonalmatrices. The matrix exponential of any diagonal matrix e Λ exponentiateseach individual entry on the main diagonal. [184,5.3] So, changingthe function domain to the subspace of real diagonal matrices reduces thematrix exponential to a vector-valued function in an isometrically isomorphicsubspace R M ; known convex (3.1.1) from the real-valued function case[46,3.1.5].There are, of course, multifarious methods to determine functionconvexity, [46] [30] [88] each of them efficient when appropriate.3.2.3.0.6 Exercise. log det function.Show by two different methods: log detX is concave on the interior of thepositive semidefinite cone.3.3 QuasiconvexQuasiconvex functions [46,3.4] [148] [249] [282] [179,2] are useful inpractical problem solving because they are unimodal (by definition whennonmonotonic); a global minimum is guaranteed to exist over any convex setin the function domain; e.g., Figure 63.3.3.0.0.1 Definition. Quasiconvex function.f(X) : R p×k →R is a quasiconvex function of matrix X iff domf is a convexset and for each and every Y,Z ∈domf , 0≤µ≤1f(µ Y + (1 − µ)Z) ≤ max{f(Y ), f(Z)} (539)A quasiconcave function is determined:f(µ Y + (1 − µ)Z) ≥ min{f(Y ), f(Z)} (540)Unlike convex functions, quasiconvex functions are not necessarilycontinuous; e.g., quasiconcave rank(X) on S M + (2.9.2.6.2) and card(x)△

222 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSFigure 63: Iconic unimodal differentiable quasiconvex function of twovariables graphed in R 2 × R on some open disc in R 2 . Note reversal ofcurvature in direction of gradient.

3.3. QUASICONVEX 223on R M + . Although insufficient for convex functions, convexity of each andevery sublevel set serves as a definition of quasiconvexity:3.3.0.0.2 Definition. Quasiconvex multidimensional function.Scalar-, vector-, or matrix-valued function g(X) : R p×k →S M is a quasiconvexfunction of matrix X iff dom g is a convex set and the sublevel setcorresponding to each and every S ∈ S ML Sg = {X ∈ domg | g(X) ≼ S } ⊆ R p×k (530)is convex. Vectors are compared with respect to the nonnegative orthant R M +while matrices are with respect to the positive semidefinite cone S M + .Convexity of the superlevel set corresponding to each and every S ∈ S M ,likewiseL S g = {X ∈ domg | g(X) ≽ S } ⊆ R p×k (541)is necessary and sufficient for quasiconcavity.3.3.0.0.3 Exercise. Nonconvexity of matrix product.Consider the real function on a positive definite domain[ ]f(X) = tr(X 1 X 2 ) , domf =∆ rel int SN+ 00 rel int S N +wherewith superlevel setsX ∆ =[ ]X1 00 X 2≻S 2N+L s f = {X ∈ domf | f(X) ≥ s }= {X ∈ domf | 〈X 1 , X 2 〉 ≥ s }△(542)0 (543)(544)Prove: f(X) is not quasiconcave except when N = 1, nor is it quasiconvexunless X 1 = X 2 .When a function is simultaneously quasiconvex and quasiconcave, itis called quasilinear. Quasilinear functions are completely determined byconvex level sets. One-dimensional function f(x) = x 3 and vector-valuedsignum function sgn(x) for example, are quasilinear. Any monotonicfunction is quasilinear. 3.153.15 e.g., a monotonic concave function is therefore quasiconvex, but it is best to avoid

224 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.4 Salient propertiesof convex and quasiconvex functions1.Aconvex (or concave) function is assumed continuous but notnecessarily differentiable on the relative interior of its domain.[232,10]A quasiconvex (or quasiconcave) function is not necessarily acontinuous function.2. convexity ⇒ quasiconvexity ⇔ convex sublevel setsconcavity ⇒ quasiconcavity ⇔ convex superlevel setsmonotonicity ⇒ quasilinearity ⇔ convex level sets3.(homogeneity) Convexity, concavity, quasiconvexity, andquasiconcavity are invariant to nonnegative scaling of function.g convex ⇔ −g concaveg quasiconvex ⇔ −g quasiconcave4. The line theorem (3.2.3.0.1) translates identically to quasiconvexity(quasiconcavity). [46,3.4.2]5. (affine transformation of argument) Composition g(h(X)) of aconvex (concave) function g with any affine function h : R m×n → R p×kremains convex (concave) in X ∈ R m×n , where h(R m×n ) ∩ dom g ≠ ∅ .[148,B.2.1] Likewise for the quasiconvex (quasiconcave) functions g .6.A nonnegatively weighted sum of (strictly) convex (concave)functions remains (strictly) convex (concave). (3.1.1.0.1)Pointwise supremum (infimum) of convex (concave) functionsremains convex (concave). (Figure 58) [232,5]A nonnegatively weighted maximum (minimum) of quasiconvex(quasiconcave) functions remains quasiconvex (quasiconcave).Pointwise supremum (infimum) of quasiconvex (quasiconcave)functions remains quasiconvex (quasiconcave).this confusion of terms.

Chapter 4Semidefinite programmingPrior to 1984 , 4.1 linear and nonlinear programming, one a subsetof the other, had evolved for the most part along unconnectedpaths, without even a common terminology. (The use of“programming” to mean “optimization” serves as a persistentreminder of these differences.)−Forsgren, Gill, & Wright (2002) [98]Given some application of convex analysis, it may at first seem puzzling whythe search for its solution ends abruptly with a formalized statement of theproblem itself as a constrained optimization. The explanation is: typicallywe do not seek analytical solution because there are relatively few. (C) If aproblem can be expressed in convex form, rather, then there exist computerprograms providing efficient numerical global solution. [255] [117] [293] [294][295] [301]The goal, then, becomes conversion of a given problem (perhaps anonconvex or combinatorial problem statement) to an equivalent convex formor to an alternation of convex subproblems convergent to a solution of theoriginal problem: A fundamental property of convex optimization problems isthat any locally optimal point is also (globally) optimal. [46,4.2.2] [231,1]Given convex real objective function g and convex feasible set C ⊆domg ,which is the set of all variable values satisfying the problem constraints, we4.1 nascence of interior-point methods of solution [275] [291],2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.225

226 CHAPTER 4. SEMIDEFINITE PROGRAMMINGhave the generic convex optimization problemminimize g(X)Xsubject to X ∈ C(545)where constraints are abstract here in the membership of variable X tofeasible set C . Inequality constraint functions of a convex optimizationproblem are convex while equality constraint functions are conventionallyaffine, but not necessarily so. Affine equality constraint functions (necessarilyconvex), as opposed to the larger set of all convex equality constraintfunctions having convex level sets, make convex optimization tractable.Similarly, the problemmaximize g(X)Xsubject to X ∈ C(546)is convex were g a real concave function. As conversion to convex form is notalways possible, there is much ongoing research to determine which problemclasses have convex expression or relaxation. [27] [44] [102] [206] [262] [100]4.1 Conic problemStill, we are surprised to see the relatively small number ofsubmissions to semidefinite programming (SDP) solvers, as thisis an area of significant current interest to the optimizationcommunity. We speculate that semidefinite programming issimply experiencing the fate of most new areas: Users have yet tounderstand how to pose their problems as semidefinite programs,and the lack of support for SDP solvers in popular modellinglanguages likely discourages submissions.−SIAM News, 2002. [79, p.9]Consider a conic problem (p) and its dual (d): [219,3.3.1] [176,2.1](confer p.140)(p)minimize c T xxsubject to x ∈ KAx = bmaximize b T yy,ssubject to s ∈ K ∗A T y + s = c(d) (264)

4.1. CONIC PROBLEM 227where K is a closed convex cone, K ∗ is its dual, matrix A is fixed, and theremaining quantities are vectors.When K is a polyhedral cone (2.12.1), then each conic problem becomesa linear program [64]; the self-dual nonnegative orthant cone providingthe prototypical primal linear program. More generally, each optimizationproblem is convex when K is a closed convex cone. Unlike the optimalobjective value, a solution to each problem is not necessarily unique; in otherwords, the optimal solution set {x ⋆ } or {y ⋆ , s ⋆ } is convex and may comprisemore than a single point although the corresponding optimal objective valueis unique when the feasible set is nonempty.When K is the self-dual cone of positive semidefinite matrices in thesubspace of symmetric matrices, then each conic problem is called asemidefinite program (SDP); [206,6.4] primal problem (P) having matrixvariable X ∈ S n while corresponding dual (D) has matrix slack variableS ∈ S n and vector variable y = [y i ]∈ R m : [8] [9,2] [301,1.3.8](P)minimizeX∈ S n 〈C , X〉subject to X ≽ 0A svec X = bmaximizey∈R m , S∈S n 〈b, y〉subject to S ≽ 0svec −1 (A T y) + S = C(D)(547)This is the prototypical semidefinite program and its dual, where matrixC ∈ S n and vector b∈R m are fixed, as is⎡ ⎤svec(A 1 ) TA =∆ ⎣ . ⎦∈ R m×n(n+1)/2 (548)svec(A m ) Twhere A i ∈ S n , i=1... m , are given. Thus⎡ ⎤〈A 1 , X〉A svec X = ⎣ . ⎦〈A m , X〉(549)∑svec −1 (A T y) = m y i A iThe vector inner-product for matrices is defined in the Euclidean/Frobeniussense in the isomorphic vector space R n(n+1)/2 ; id est,i=1〈C , X〉 ∆ = tr(C T X) = svec(C) T svec X (31)

228 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhere svec X defined by (47) denotes symmetric vectorization.Semidefinite programming has emerged recently to prominence primarilybecause it admits a new class of problem previously unsolvable by convexoptimization techniques, [44] secondarily because it theoretically subsumesother convex techniques such as linear, quadratic, and second-order coneprogramming. Determination of the Riemann mapping function fromcomplex analysis [213] [24,8,13], for example, can be posed as asemidefinite program.4.1.1 Maximal complementarityIt has been shown that contemporary interior-point methods (developedcirca 1990 [102]) [46,11] [292] [216] [206] [9] [98] for numericalsolution of semidefinite programs can converge to a solution of maximalcomplementarity; [126,5] [300] [186] [109] not a vertex-solution but asolution of highest cardinality or rank among all optimal solutions. 4.2[301,2.5.3]4.1.1.1 Reduced-rank solutionA simple rank reduction algorithm, for construction of a primal optimalsolution X ⋆ to (547P) satisfying an upper bound on rank governed byProposition 2.9.3.0.1, is presented in4.3. That proposition asserts existenceof feasible solutions with an upper bound on their rank; [20,II.13.1]specifically, it asserts an extreme point (2.6.0.0.1) of the primal feasibleset A ∩ S n + satisfies upper bound⌊√ ⌋ 8m + 1 − 1rankX ≤2where, given A∈ R m×n(n+1)/2 and b∈ R m(233)A ∆ = {X ∈ S n | A svec X = b} (550)is the affine subset from primal problem (547P).4.2 This characteristic might be regarded as a disadvantage to this method of numericalsolution, but this behavior is not certain and depends on solver implementation.

4.1. CONIC PROBLEM 229C0PΓ 1Γ 2S+3A=∂HFigure 64: Visualizing positive semidefinite cone in high dimension: Properpolyhedral cone S 3 + ⊂ R 3 representing positive semidefinite cone S 3 + ⊂ S 3 ;analogizing its intersection with hyperplane S 3 + ∩ ∂H . Number of facetsis arbitrary (analogy is not inspired by eigen decomposition). The rank-0positive semidefinite matrix corresponds to the origin in R 3 , rank-1 positivesemidefinite matrices correspond to the edges of the polyhedral cone, rank-2to the facet relative interiors, and rank-3 to the polyhedral cone interior.Vertices Γ 1 and Γ 2 are extreme points of polyhedron P =∂H ∩ S 3 + , andextreme directions of S 3 + . A given vector C is normal to another hyperplane(not illustrated but independent w.r.t ∂H) containing line segment Γ 1 Γ 2minimizing real linear function 〈C , X〉 on P . (confer Figure 19)

230 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.1.1.2 Coexistence of low- and high-rank solutions; analogyThat low-rank and high-rank optimal solutions {X ⋆ } of (547P) coexist maybe grasped with the following analogy: We compare a proper polyhedral coneS 3 + in R 3 (illustrated in Figure 64) to the positive semidefinite cone S 3 + inisometrically isomorphic R 6 , difficult to visualize. The analogy is good:int S 3 + is constituted by rank-3 matricesint S+ 3 has three dimensionsboundary ∂S 3 + contains rank-0, rank-1, and rank-2 matricesboundary ∂S+ 3 contains 0-, 1-, and 2-dimensional facesthe only rank-0 matrix resides in the vertex at the originRank-1 matrices are in one-to-one correspondence with extremedirections of S 3 + and S 3 + . The set of all rank-1 symmetric matrices inthis dimension{G ∈ S3+ | rankG=1 } (551)is not a connected set.In any SDP feasibility problem, an SDP feasible solution with the lowestrank must be an extreme point of the feasible set. Thus, there must existan SDP objective function such that this lowest-rank feasible solutionuniquely optimizes it. −Ye, 2006 [176,2.4]Rank of a sum of members F +G in Lemma 2.9.2.6.1 and location ofa difference F −G in2.9.2.9.1 similarly hold for S 3 + and S 3 + .Euclidean distance from any particular rank-3 positive semidefinitematrix (in the cone interior) to the closest rank-2 positive semidefinitematrix (on the boundary) is generally less than the distance to theclosest rank-1 positive semidefinite matrix. (7.1.2)distance from any point in ∂S 3 + to int S 3 + is infinitesimal (2.1.7.1.1)distance from any point in ∂S 3 + to int S 3 + is infinitesimal

4.1. CONIC PROBLEM 231faces of S 3 + correspond to faces of S 3 + (confer Table 2.9.2.3.1)k dim F(S+) 3 dim F(S 3 +) dim F(S 3 + ∋ rank-k matrix)0 0 0 0boundary 1 1 1 12 2 3 3interior 3 3 6 6Integer k indexes k-dimensional faces F of S 3 + . Positive semidefinitecone S 3 + has four kinds of faces, including cone itself (k = 3,boundary + interior), whose dimensions in isometrically isomorphicR 6 are listed under dim F(S 3 +). Smallest face F(S 3 + ∋ rank-k matrix)that contains a rank-k positive semidefinite matrix has dimensionk(k + 1)/2 by (192).For A equal to intersection of m hyperplanes having independentnormals, and for X ∈ S 3 + ∩ A , we have rankX ≤ m ; the analogueto (233).Proof. With reference to Figure 64: Assume one (m = 1) hyperplaneA = ∂H intersects the polyhedral cone. Every intersecting planecontains at least one matrix having rank less than or equal to 1 ; id est,from all X ∈ ∂H ∩ S 3 + there exists an X such that rankX ≤ 1. Rank 1is therefore an upper bound in this case.Now visualize intersection of the polyhedral cone with two (m = 2)hyperplanes having linearly independent normals. The hyperplaneintersection A makes a line. Every intersecting line contains at least onematrix having rank less than or equal to 2, providing an upper bound.In other words, there exists a positive semidefinite matrix X belongingto any line intersecting the polyhedral cone such that rankX ≤ 2.In the case of three independent intersecting hyperplanes (m = 3), thehyperplane intersection A makes a point that can reside anywhere inthe polyhedral cone. The upper bound on a point in S+ 3 is also thegreatest upper bound: rankX ≤ 3.

232 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.1.1.2.1 Example. Optimization on A ∩ S 3 + .Consider minimization of the real linear function 〈C , X〉 ona polyhedral feasible set;P ∆ = A ∩ S 3 + (552)f ⋆ 0∆= minimize 〈C , X〉Xsubject to X ∈ A ∩ S+3(553)As illustrated for particular vector C and hyperplane A = ∂H in Figure 64,this linear function is minimized (confer Figure 19) on any X belonging tothe face of P containing extreme points {Γ 1 , Γ 2 } and all the rank-2 matricesin between; id est, on any X belonging to the face of PF(P) = {X | 〈C , X〉 = f ⋆ 0 } ∩ A ∩ S 3 + (554)exposed by the hyperplane {X | 〈C , X〉=f ⋆ 0 }. In other words, the set of alloptimal points X ⋆ is a face of P{X ⋆ } = F(P) = Γ 1 Γ 2 (555)comprising rank-1 and rank-2 positive semidefinite matrices. Rank 1 isthe upper bound on existence in the feasible set P for this case m = 1hyperplane constituting A . The rank-1 matrices Γ 1 and Γ 2 in face F(P)are extreme points of that face and (by transitivity (2.6.1.2)) extremepoints of the intersection P as well. As predicted by analogy to Barvinok’sProposition 2.9.3.0.1, the upper bound on rank of X existent in the feasibleset P is satisfied by an extreme point. The upper bound on rank of anoptimal solution X ⋆ existent in F(P) is thereby also satisfied by an extremepoint of P precisely because {X ⋆ } constitutes F(P) ; 4.3 in particular,{X ⋆ ∈ P | rankX ⋆ ≤ 1} = {Γ 1 , Γ 2 } ⊆ F(P) (556)As all linear functions on a polyhedron are minimized on a face, [64] [185][205] [208] by analogy we so demonstrate coexistence of optimal solutions X ⋆of (547P) having assorted rank.4.3 and every face contains a subset of the extreme points of P by the extremeexistence theorem (2.6.0.0.2). This means: because the affine subset A and hyperplane{X | 〈C , X 〉 = f ⋆ 0 } must intersect a whole face of P , calculation of an upper bound onrank of X ⋆ ignores counting the hyperplane when determining m in (233).

4.1. CONIC PROBLEM 2334.1.1.3 Previous workBarvinok showed [21,2.2] when given a positive definite matrix C and anarbitrarily small neighborhood of C comprising positive definite matrices,there exists a matrix ˜C from that neighborhood such that optimal solutionX ⋆ to (547P) (substituting ˜C) is an extreme point of A ∩ S n + and satisfiesupper bound (233). 4.4 Given arbitrary positive definite C , this meansnothing inherently guarantees that an optimal solution X ⋆ to problem (547P)satisfies (233); certainly nothing given any symmetric matrix C , as theproblem is posed. This can be proved by example:4.1.1.3.1 Example. (Ye) Maximal Complementarity.Assume dimension n to be an even positive number. Then the particularinstance of problem (547P),has optimal solutionminimizeX∈ S n 〈[ I 00 2Isubject to X ≽ 0X ⋆ =〈I , X〉 = n[ 2I 00 0] 〉, X(557)]∈ S n (558)with an equal number of twos and zeros along the main diagonal. Indeed,optimal solution (558) is a terminal solution along the central path taken bythe interior-point method as implemented in [301,2.5.3]; it is also a solutionof highest rank among all optimal solutions to (557). Clearly, rank of thisprimal optimal solution exceeds by far a rank-1 solution predicted by upperbound (233).4.1.1.4 Later developmentsThis rational example (557) indicates the need for a more generally applicableand simple algorithm to identify an optimal solution X ⋆ satisfying Barvinok’sProposition 2.9.3.0.1. We will review such an algorithm in4.3, but first weprovide more background.4.4 Further, the set of all such ˜C in that neighborhood is open and dense.

234 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.2 Framework4.2.1 Feasible setsDenote by C and C ∗ the convex sets of primal and dual points respectivelysatisfying the primal and dual constraints in (547), each assumed nonempty;⎧ ⎡ ⎤ ⎫⎨〈A 1 , X〉 ⎬C =⎩ X ∈ Sn + | ⎣ . ⎦= b⎭ = A ∩ Sn +〈A m , X〉(559){}m∑C ∗ = S ∈ S n + , y = [y i ]∈ R m | y i A i + S = CThese are the primal feasible set and dual feasible set in domain intersectionof the respective constraint functions. Geometrically, primal feasible A ∩ S n +represents an intersection of the positive semidefinite cone S n + with anaffine subset A of the subspace of symmetric matrices S n in isometricallyisomorphic R n(n+1)/2 . The affine subset has dimension n(n+1)/2 −m whenthe A i are linearly independent. Dual feasible set C ∗ is the Cartesian productof the positive semidefinite cone with its inverse image (2.1.9.0.1) underaffine transformation C − ∑ y i A i . 4.5 Both sets are closed and convex andthe objective functions on a Euclidean vector space are linear, hence (547P)and (547D) are convex optimization problems.4.2.1.1 A ∩ S n + emptiness determination via Farkas’ lemma4.2.1.1.1 Lemma. Semidefinite Farkas’ lemma.Given an arbitrary set {A i ∈ S n , i=1... m} and a vector b = [b i ]∈ R m ,define the affine subseti=1A = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (550)Primal feasible set A ∩ S n + is nonempty if and only if y T b ≥ 0 holds foreach and every vector y = [y i ]∈ R m ∑such that m y i A i ≽ 0.4.5 The inequality C − ∑ y i A i ≽0 follows directly from (547D) (2.9.0.1.1) and is knownas a linear matrix inequality. (2.13.5.1.1) Because ∑ y i A i ≼C , matrix S is known as aslack variable (a term borrowed from linear programming [64]) since its inclusion raisesthis inequality to equality.i=1

4.2. FRAMEWORK 235Equivalently, primal feasible set A ∩ S n + is nonempty if and onlyif y T b ≥ 0 holds for each and every norm-1 vector ‖y‖= 1 such thatm∑y i A i ≽ 0.⋄i=1Semidefinite Farkas’ lemma follows directly from a membership relation(2.13.2.0.1) and the closed convex cones from linear matrix inequalityexample 2.13.5.1.1; given convex cone K and its dualwhereK = {A svec X | X ≽ 0} (325)m∑K ∗ = {y | y j A j ≽ 0} (331)⎡A = ⎣j=1then we have membership relationand equivalents⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (548)svec(A m ) Tb ∈ K ⇔ 〈y,b〉 ≥ 0 ∀y ∈ K ∗ (277)b ∈ K ⇔ ∃X ≽ 0 A svec X = b ⇔ A ∩ S n + ≠ ∅ (560)b ∈ K ⇔ 〈y,b〉 ≥ 0 ∀y ∈ K ∗ ⇔ A ∩ S n + ≠ ∅ (561)Semidefinite Farkas’ lemma provides the conditions required for a setof hyperplanes to have a nonempty intersection A ∩ S n + with the positivesemidefinite cone. While the lemma as stated is correct, Ye points out[301,1.3.8] that a positive definite version of this lemma is required forsemidefinite programming because any feasible point in the relative interiorA ∩ int S n + is required by Slater’s condition 4.6 to achieve 0 duality gap(primal−dual objective difference4.2.3, Figure 47). In our circumstance,assuming a nonempty intersection, a positive definite lemma is requiredto insure a point of intersection closest to the origin is not at infinity;4.6 Slater’s sufficient condition is satisfied whenever any primal strictly feasible pointexists; id est, any point feasible with the affine equality (or affine inequality) constraintfunctions and relatively interior to convex cone K . If cone K is polyhedral, then Slater’scondition is satisfied when any feasible point exists relatively interior to K or on its relativeboundary. [46,5.2.3] [29, p.325]

236 CHAPTER 4. SEMIDEFINITE PROGRAMMINGe.g., Figure 35. Then given A∈ R m×n(n+1)/2 having rank m , we wish todetect existence of a nonempty relative interior of the primal feasible set; 4.7b ∈ int K ⇔ 〈y, b〉 > 0 ∀y ∈ K ∗ , y ≠ 0 ⇔ A∩int S n + ≠ ∅ (562)A positive definite Farkas’ lemma can easily be constructed from thismembership relation (283) and these proper convex cones K (325) andK ∗ (331):4.2.1.1.2 Lemma. Positive definite Farkas’ lemma.Given a linearly independent set {A i ∈ S n , i=1... m}b = [b i ]∈ R m , define the affine subsetand a vectorA = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (550)Primal feasible set relative interior A ∩ int S n + is nonempty if and only if∑y T b > 0 holds for each and every vector y = [y i ]≠ 0 such that m y i A i ≽ 0.Equivalently, primal feasible set relative interior A ∩ int S n + is nonemptyif and only if y T b > 0 holds for each and every norm-1 vector ‖y‖= 1 such∑that m y i A i ≽ 0.⋄i=1i=14.2.1.1.3 Example. “New” Farkas’ lemma.In 1995, Lasserre [168,III] presented an example originally offered byBen-Israel in 1969 [26, p.378] as evidence of failure in semidefinite Farkas’Lemma 4.2.1.1.1:[ ]A =∆ svec(A1 ) Tsvec(A 2 ) T =[ 0 1 00 0 1] [ 1, b =0](563)The intersection A ∩ S n + is practically empty because the solution set{X ≽ 0 | A svec X = b} ={[α1 √2√120]≽ 0 | α∈ R}(564)4.7 Detection of A ∩ int S n + by examining K interior is a trick need not be lost.

4.2. FRAMEWORK 237is positive semidefinite only asymptotically (α→∞). Yet the dual systemm∑y i A i ≽0 ⇒ y T b≥0 indicates nonempty intersection; videlicet, for ‖y‖= 1i=1y 1[01 √21 √20]+ y 2[ 0 00 1][ ] 0≽ 0 ⇔ y =1⇒ y T b = 0 (565)On the other hand, positive definite Farkas’ Lemma 4.2.1.1.2 showsA ∩ int S n + is empty; what we need to know for semidefinite programming.Based on Ben-Israel’s example, Lasserre suggested addition of anothercondition to semidefinite Farkas’ Lemma 4.2.1.1.1 to make a “new” lemma.Ye recommends positive definite Farkas’ Lemma 4.2.1.1.2 instead; which issimpler and obviates Lasserre’s proposed additional condition. 4.2.1.2 Theorem of the alternative for semidefinite programmingBecause these Farkas’ lemmas follow from membership relations, we mayconstruct alternative systems from them. Applying the method of2.13.2.1.1,then from positive definite Farkas’ lemma, for example, we getA ∩ int S n + ≠ ∅or in the alternativem∑y T b ≤ 0, y i A i ≽ 0, y ≠ 0i=1(566)Any single vector y satisfying the alternative certifies A ∩ int S n + is empty.Such a vector can be found as a solution to another semidefinite program:for linearly independent set {A i ∈ S n , i=1... m}minimizeysubject toy T bm∑y i A i ≽ 0i=1‖y‖ 2 ≤ 1(567)If an optimal vector y ⋆ ≠ 0 can be found such that y ⋆T b ≤ 0, then relativeinterior of the primal feasible set A ∩ int S n + from (559) is empty.

238 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.2.1.3 Boundary-membership criterion(confer (561) (562)) From boundary-membership relation (287) for propercones and from linear matrix inequality cones K (325) and K ∗ (331)b ∈ ∂K ⇔ ∃ y 〈y , b〉 = 0, y ∈ K ∗ , y ≠ 0, b ∈ K ⇔ ∂S n + ⊃ A ∩ S n + ≠ ∅(568)Whether vector b ∈ ∂K belongs to cone K boundary, that is adetermination we can indeed make; one that is certainly expressible as afeasibility problem: assuming b ∈ K (560) given linearly independent set{A i ∈ S n , i=1... m} 4.8 find y ≠ 0subject to y T b = 0m∑y i A i ≽ 0i=1(569)Any such feasible vector y ≠ 0 certifies that affine subset A (550) intersectsthe positive semidefinite cone S n + only on its boundary; in other words,nonempty feasible set A ∩ S n + belongs to the positive semidefinite coneboundary ∂S n + .4.2.2 DualsThe dual objective function evaluated at any feasible point represents a lowerbound on the primal optimal objective value. We can see this by directsubstitution: Assume the feasible sets A ∩ S n + and C ∗ are nonempty. Thenit is always true:〈 ∑i〈C , X〉 ≥ 〈b, y〉〉y i A i + S , X ≥ [ 〈A 1 , X〉 · · · 〈A m , X〉 ] y〈S , X〉 ≥ 0(570)The converse also follows becauseX ≽ 0, S ≽ 0 ⇒ 〈S,X〉 ≥ 0 (1311)4.8 From the results of Example 2.13.5.1.1, vector b on the boundary of K cannot bedetected simply by looking for 0 eigenvalues in matrix X .

4.2. FRAMEWORK 239Optimal value of the dual objective thus represents the greatest lower boundon the primal. This fact is known as the weak duality theorem for semidefiniteprogramming, [301,1.3.8] and can be used to detect convergence in anyprimal/dual numerical method of solution.4.2.3 Optimality conditionsWhen any primal feasible point exists relatively interior to A ∩ S n + in S n ,or when any dual feasible point exists relatively interior to C ∗ in S n × R m ,then by Slater’s sufficient condition these two problems (547P) and (547D)become strong duals. In other words, the primal optimal objective valuebecomes equivalent to the dual optimal objective value: there is no dualitygap (Figure 47); id est, if ∃X ∈ A ∩ int S n + or ∃S,y ∈ rel int C ∗ then〈C , X ⋆ 〉 = 〈b, y ⋆ 〉〈 ∑iy ⋆ i A i + S ⋆ , X ⋆ 〉= [ 〈A 1 , X ⋆ 〉 · · · 〈A m , X ⋆ 〉 ] y ⋆〈S ⋆ , X ⋆ 〉 = 0(571)where S ⋆ , y ⋆ denote a dual optimal solution. 4.9 We summarize this:4.2.3.0.1 Corollary. Optimality and strong duality. [271,3.1][301,1.3.8] For semidefinite programs (547P) and (547D), assume primaland dual feasible sets A ∩ S n + ⊂ S n and C ∗ ⊂ S n × R m (559) are nonempty.ThenX ⋆ is optimal for (P)S ⋆ , y ⋆ are optimal for (D)the duality gap 〈C,X ⋆ 〉−〈b, y ⋆ 〉 is 0if and only ifi) ∃X ∈ A ∩ int S n + or ∃S , y ∈ rel int C ∗andii) 〈S ⋆ , X ⋆ 〉 = 0⋄4.9 Optimality condition 〈S ⋆ , X ⋆ 〉=0 is called a complementary slackness condition, inkeeping with the tradition of linear programming, [64] that forbids dual inequalities in(547) to simultaneously hold strictly. [231,4]

240 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFor symmetric positive semidefinite matrices, requirement ii is equivalentto the complementarity (A.7.4)〈S ⋆ , X ⋆ 〉 = 0 ⇔ S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (572)Commutativity of diagonalizable matrices is a necessary and sufficientcondition [150,1.3.12] for these two optimal symmetric matrices to besimultaneously diagonalizable. ThereforerankX ⋆ + rankS ⋆ ≤ n (573)Proof. To see that, the product of symmetric optimal matricesX ⋆ , S ⋆ ∈ S n must itself be symmetric because of commutativity. (1305) Thesymmetric product has diagonalization [9, cor.2.11]S ⋆ X ⋆ = X ⋆ S ⋆ = QΛ S ⋆Λ X ⋆Q T = 0 ⇔ Λ X ⋆Λ S ⋆ = 0 (574)where Q is an orthogonal matrix. The product of the nonnegative diagonal Λmatrices can be 0 if their main diagonal zeros are complementary or coincide.Due only to symmetry, rankX ⋆ = rank Λ X ⋆ and rankS ⋆ = rank Λ S ⋆ forthese optimal primal and dual solutions. (1290) So, because of thecomplementarity, the total number of nonzero diagonal entries from both Λcannot exceed n .When equality is attained in (573)rankX ⋆ + rankS ⋆ = n (575)there are no coinciding main diagonal zeros in Λ X ⋆Λ S ⋆ , and so we have whatis called strict complementarity. 4.10 Logically it follows that a necessary andsufficient condition for strict complementarity of an optimal primal and dualsolution isX ⋆ + S ⋆ ≻ 0 (576)4.2.3.1 solving primal problem via dualThe beauty of Corollary 4.2.3.0.1 is its conjugacy; id est, one can solveeither the primal or dual problem in (547) and then find a solution tothe other via the optimality conditions. When a dual optimal solution isknown, for example, a primal optimal solution belongs to the hyperplane{X | 〈S ⋆ , X〉=0}.4.10 distinct from maximal complementarity (4.1.1).

4.2. FRAMEWORK 2414.2.3.1.1 Example. Minimum cardinality Boolean. [63] [27,4.3.4] [262](confer Example 4.5.1.0.1) Consider finding a minimum cardinality Booleansolution x to the classic linear algebra problem Ax = b given noiseless dataA∈ R m×n and b∈ R m ;minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(577)where ‖x‖ 0 denotes cardinality of vector x (a.k.a, 0-norm; not a convexfunction).A minimum cardinality solution answers the question: “Which fewestlinear combination of columns in A constructs vector b ?” Cardinalityproblems have extraordinarily wide appeal, arising in many fields of scienceand across many disciplines. [240] [157] [120] [121] Yet designing an efficientalgorithm to optimize cardinality has proved difficult. In this example, wealso constrain the variable to be Boolean. The Boolean constraint forcesan identical solution were the norm in problem (577) instead the 1-norm or2-norm; id est, the two problems(577)minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n=minimize ‖x‖ 1xsubject to Ax = bx i ∈ {0, 1} ,(578)i=1... nare the same. The Boolean constraint makes the 1-norm problem nonconvex.Given data 4.11⎡⎤ ⎡ ⎤−1 1 8 1 1 01⎢1 1 1A = ⎣ −3 2 8 − 1 ⎥ ⎢ ⎥2 3 2 3 ⎦ , b = ⎣ ⎦ (579)−9 4 8141914 − 1 9the obvious and desired solution to the problem posed,x ⋆ = e 4 ∈ R 6 (580)has norm ‖x ⋆ ‖ 2 =1 and minimum cardinality; the minimum number ofnonzero entries in vector x . The Matlab backslash command x=A\b ,4.11 This particular matrix A is full-rank having three-dimensional nullspace (but thecolumns are not conically independent).1214

242 CHAPTER 4. SEMIDEFINITE PROGRAMMINGfor example, findshaving norm ‖x M‖ 2 = 0.7044 .id est, an optimal solution to⎡x M=⎢⎣2128051280901280⎤⎥⎦(581)Coincidentally, x Mis a 1-norm solution;minimize ‖x‖ 1xsubject to Ax = bThe pseudoinverse solution (rounded)⎡ ⎤−0.0456−0.1881x P= A † b =0.0623⎢ 0.2668⎥⎣ 0.3770 ⎦−0.1102(582)(583)has least norm ‖x P‖ 2 =0.5165 ; id est, the optimal solution tominimize ‖x‖ 2xsubject to Ax = b(584)Certainly, none of the traditional methods provide x ⋆ = e 4 (580).We can reformulate this minimum cardinality Boolean problem (577) asa semidefinite program: First transform the variableso ˆx i ∈ {−1, 1} ; equivalently,x ∆ = (ˆx + 1) 1 2(585)minimize ‖(ˆx + 1) 1‖ ˆx2 0subject to A(ˆx + 1) 1 = b 2δ(ˆxˆx T ) = 1(586)

4.2. FRAMEWORK 243where δ is the main-diagonal linear operator (A.1). By assigning (B.1)[ ] [ ] [ ]ˆx [ ˆxG =T 1] X ˆx ∆ ˆxˆxTˆx=1ˆx T =1 ˆx T ∈ S n+1 (587)1problem (586) becomes equivalent to: (Theorem A.3.1.0.7)minimize 1 T ˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG =ˆx T 1δ(X) = 1(G ≽ 0)rankG = 1(588)where solution is confined to rank-1 vertices of the elliptope in S n+1(5.9.1.0.1) by the rank constraint, the positive semidefiniteness, and theequality constraints δ(X)=1. The rank constraint makes this problemnonconvex; by removing it 4.12 we get the semidefinite programminimize 1 T ˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG =ˆx T 1δ(X) = 1≽ 0(589)whose optimal solution x ⋆ (585) is identical to that of minimum cardinalityBoolean problem (577) if and only if rankG ⋆ =1. Hope 4.13 of acquiring arank-1 solution is not ill-founded because 2 n elliptope vertices have rank 1,and we are minimizing an affine function on a subset of the elliptope(Figure 92) containing rank-1 vertices; id est, by assumption that thefeasible set of minimum cardinality Boolean problem (577) is nonempty,4.12 Relaxed problem (589) can also be derived via Lagrange duality; it is a dual of adual program [sic] to (588). [229] [46,5, exer.5.39] [287,IV] [101,11.3.4] The relaxedproblem must therefore be convex having a larger feasible set; its optimal objective valuerepresents a generally loose lower bound (1488) on the optimal objective of problem (588).4.13 A more deterministic approach to constraining rank and cardinality is developed in4.6.0.0.7.

244 CHAPTER 4. SEMIDEFINITE PROGRAMMINGa desired solution resides on the elliptope relative boundary at a rank-1vertex. 4.14For the data given in (579), our semidefinite program solver (accurate toapproximately 1E-8) 4.15 finds optimal solution to (589)⎡round(G ⋆ ) =⎢⎣1 1 1 −1 1 1 −11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 1near a rank-1 vertex of the elliptope in S n+1 ; its sorted eigenvalues,⎡⎤6.999999777990990.000000226872410.00000002250296λ(G ⋆ ) =0.00000000262974⎢ −0.00000000999738⎥⎣ −0.00000000999875 ⎦−0.00000001000000⎤⎥⎦(590)(591)The negative eigenvalues are undoubtedly finite-precision effects. Becausethe largest eigenvalue predominates by many orders of magnitude, we canexpect to find a good approximation to a minimum cardinality Booleansolution by truncating all smaller eigenvalues. By so doing we find, indeed,⎛⎡x ⋆ = round⎜⎢⎝⎣the desired result (580).0.000000001279470.000000005273690.000000001810010.999999974690440.000000014089500.00000000482903⎤⎞= e 4 (592)⎥⎟⎦⎠4.14 Confinement to the elliptope can be regarded as a kind of normalization akin tomatrix A column normalization suggested in [82] and explored in Example 4.2.3.1.2.4.15 A typically ignored limitation of interior-point methods of solution is their relativeaccuracy of only about 1E-8 on a machine using 64-bit (double precision) floating-pointarithmetic; id est, optimal solution cannot be more accurate than square root of machineepsilon (2.2204E-16).

4.2. FRAMEWORK 2454.2.3.1.2 Example. Optimization on elliptope versus 1-norm polyhedronfor minimum cardinality Boolean Example 4.2.3.1.1. A minimumcardinality problem is typically formulated via, what is by now, the standardpractice [82] of column normalization applied to a 1-norm problem surrogatelike (582). Suppose we define a diagonal matrix⎡⎤‖A(:,1)‖ 2 0Λ =∆ ‖A(:, 2)‖ 2 ⎢⎥⎣... ⎦ ∈ S6 (593)0 ‖A(:, 6)‖ 2used to normalize the columns (assumed nonzero) of given noiseless datamatrix A . Then approximate the minimum cardinality Boolean problemaswhere optimal solutionminimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... nminimize ‖ỹ‖ 1ỹsubject to AΛ −1 ỹ = b1 ≽ Λ −1 ỹ ≽ 0(577)(594)y ⋆ = round(Λ −1 ỹ ⋆ ) (595)The inequality in (594) relaxes Boolean constraint y i ∈ {0, 1} from (577);serving to bound any solution y ⋆ to a unit cube whose vertices are binarynumbers. Convex problem (594) is justified by the convex envelopecenv ‖x‖ 0 on {x∈ R n | ‖x‖ ∞ ≤κ} = 1 κ ‖x‖ 1 (1197)Donoho concurs with this particular formulation equivalently expressible asa linear program via (431).Approximation (594) is therefore equivalent to minimization of an affinefunction on a bounded polyhedron, whereas semidefinite programminimize 1 T ˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG =ˆx T 1δ(X) = 1≽ 0(589)

246 CHAPTER 4. SEMIDEFINITE PROGRAMMINGminimizes an affine function on the elliptope intersected byhyperplanes. Although the same Boolean solution is obtained fromthis approximation (594) as compared with semidefinite program (589)when given that particular data from Example 4.2.3.1.1, Singer confides acounter-example: Instead, given data[ ] [ ]1 0 √2 11A =1, b =(596)0 1 √2 1then solving approximation (594) yields⎛⎡1 − √ 12y ⋆ ⎜⎢= round⎝⎣1 − √ 121⎤⎞⎥⎟⎦⎠ =⎡⎢⎣001⎤⎥⎦ (597)(infeasible, with or without rounding, with respect to original problem (577))whereas solving semidefinite program (589) produces⎡⎤1 1 −1 1round(G ⋆ ) = ⎢ 1 1 −1 1⎥⎣ −1 −1 1 −1 ⎦ (598)1 1 −1 1with sorted eigenvaluesλ(G ⋆ ) =⎡⎢⎣3.999999650572640.00000035942736−0.00000000000000−0.00000001000000⎤⎥⎦ (599)Truncating all but the largest eigenvalue, we obtain (confer y ⋆ ) (585)⎛⎡⎤⎞⎡ ⎤0.99999999625299 1x ⋆ = round⎝⎣0.99999999625299 ⎦⎠ = ⎣ 1 ⎦ (600)0.00000001434518 0the desired minimum cardinality Boolean result.We leave pending a general performance assessment of standard-practiceapproximation (594) as compared with our proposed semidefiniteprogram (589).

4.3. RANK REDUCTION 2474.3 Rank reduction...it is not clear generally how to predict rankX ⋆ or rankS ⋆before solving the SDP problem.−Farid Alizadeh (1995) [9, p.22]The premise of rank reduction in semidefinite programming is: an optimalsolution found does not satisfy Barvinok’s upper bound (233) on rank. Theparticular numerical algorithm solving a semidefinite program may haveinstead returned a high-rank optimal solution (4.1.1; e.g., (558)) when alower-rank optimal solution was expected.4.3.1 Posit a perturbation of X ⋆Recall from4.1.1.1, there is an extreme point of A ∩ S n + (550) satisfyingupper bound (233) on rank. [21,2.2] It is therefore sufficient to locatean extreme point of the intersection whose primal objective value (547P) isoptimal: 4.16 [77,31.5.3] [176,2.4] [5,3] [217]Consider again the affine subsetA = {X ∈ S n | A svec X = b} (550)where for A i ∈ S n ⎡A =∆ ⎣⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (548)svec(A m ) TGiven any optimal solution X ⋆ tominimizeX∈ S n 〈C , X 〉subject to X ∈ A ∩ S n +(547)(P)4.16 There is no known construction for Barvinok’s tighter result (238). −Monique Laurent

248 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhose rank does not satisfy upper bound (233), we posit existence of a setof perturbations{t j B j | t j ∈ R , B j ∈ S n , j =1... n} (601)such that, for some 0≤i≤n and scalars {t j , j =1... i} ,X ⋆ +i∑t j B j (602)j=1becomes an extreme point of A ∩ S n + and remains an optimal solution of(547P). Membership of (602) to affine subset A is secured for the i thperturbation by demanding〈B i , A j 〉 = 0, j =1... m (603)while membership to the positive semidefinite cone S n + is insured by smallperturbation (612). In this manner feasibility is insured. Optimality is provedin4.3.3.The following simple algorithm has very low computational intensity andlocates an optimal extreme point, assuming a nontrivial solution:4.3.1.0.1 Procedure. Rank reduction. (F.4)initialize: B i = 0 ∀ifor iteration i=1...n{1. compute a nonzero perturbation matrix B i of X ⋆ + i−1 ∑j=1t j B j2. maximize t isubject to X ⋆ + i ∑j=1t j B j ∈ S n +}

4.3. RANK REDUCTION 249A rank-reduced optimal solution is theni∑X ⋆ ← X ⋆ + t j B j (604)j=14.3.2 Perturbation formThe perturbations are independent of constants C ∈ S n and b∈R m in primaland dual programs (547). Numerical accuracy of any rank-reduced result,found by perturbation of an initial optimal solution X ⋆ , is therefore quitedependent upon initial accuracy of X ⋆ .4.3.2.0.1 Definition. Matrix step function. (conferA.6.5.0.1)Define the signum-like quasiconcave real function ψ : S n → Rψ(Z) ∆ ={ 1, Z ≽ 0−1, otherwise(605)The value −1 is taken for indefinite or nonzero negative semidefiniteargument.△Deza & Laurent [77,31.5.3] prove: every perturbation matrix B i ,i=1... n , is of the formB i = −ψ(Z i )R i Z i R T i ∈ S n (606)where∑i−1X ⋆ = ∆ R 1 R1 T , X ⋆ + t j B ∆ j = R i Ri T ∈ S n (607)j=1where the t j are scalars and R i ∈ R n×ρ is full-rank and skinny where( )∑i−1ρ = ∆ rank X ⋆ + t j B jj=1(608)and where matrix Z i ∈ S ρ is found at each iteration i by solving a very

250 CHAPTER 4. SEMIDEFINITE PROGRAMMINGsimple feasibility problem: 4.17find Z i ∈ S ρsubject to 〈Z i , RiA T j R i 〉 = 0 ,j =1... m(609)Were there a sparsity pattern common to each member of the set{R T iA j R i ∈ S ρ , j =1... m} , then a good choice for Z i has 1 in each entrycorresponding to a 0 in the pattern; id est, a sparsity pattern complement.At iteration i∑i−1X ⋆ + t j B j + t i B i = R i (I − t i ψ(Z i )Z i )Ri T (610)j=1By fact (1279), therefore∑i−1X ⋆ + t j B j + t i B i ≽ 0 ⇔ 1 − t i ψ(Z i )λ(Z i ) ≽ 0 (611)j=1where λ(Z i )∈ R ρ denotes the eigenvalues of Z i .Maximization of each t i in step 2 of the Procedure reduces rank of (610)and locates a new point on the boundary ∂(A ∩ S n +) . 4.18 Maximization oft i thereby has closed form;4.17 A simple method of solution is closed-form projection of a random nonzero point onthat proper subspace of isometrically isomorphic R ρ(ρ+1)/2 specified by the constraints.(E.5.0.0.6) Such a solution is nontrivial assuming the specified intersection of hyperplanesis not the origin; guaranteed by ρ(ρ + 1)/2 > m. Indeed, this geometric intuition aboutforming the perturbation is what bounds any solution’s rank from below; m is fixed bythe number of equality constraints in (547P) while rank ρ decreases with each iteration i.Otherwise, we might iterate indefinitely.4.18 This holds because rank of a positive semidefinite matrix in S n is diminished belown by the number of its 0 eigenvalues (1290), and because a positive semidefinite matrixhaving one or more 0 eigenvalues corresponds to a point on the PSD cone boundary (163).Necessity and sufficiency are due to the facts: R i can be completed to a nonsingular matrix(A.3.1.0.5), and I − t i ψ(Z i )Z i can be padded with zeros while maintaining equivalencein (610).

4.3. RANK REDUCTION 251(t ⋆ i) −1 = max {ψ(Z i )λ(Z i ) j , j =1... ρ} (612)When Z i is indefinite, the direction of perturbation (determined by ψ(Z i )) isarbitrary. We may take an early exit from the Procedure were Z i to become0 or wererank [ svec R T iA 1 R i svec R T iA 2 R i · · · svec R T iA m R i]= ρ(ρ + 1)/2 (613)which characterizes the rank ρ of any [sic] extreme point in A ∩ S n + .[176,2.4]Proof. Assuming the form of every perturbation matrix is indeed (606),then by (609)svec Z i ⊥ [ svec(R T iA 1 R i ) svec(R T iA 2 R i ) · · · svec(R T iA m R i ) ] (614)By orthogonal complement we haverank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] ⊥+ rank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] = ρ(ρ + 1)/2(615)When Z i can only be 0, then the perturbation is null because an extremepoint has been found; thus[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] ⊥= 0 (616)from which the stated result (613) directly follows.

252 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.3.3 Optimality of perturbed X ⋆We show that the optimal objective value is unaltered by perturbation (606);id est,i∑〈C , X ⋆ + t j B j 〉 = 〈C , X ⋆ 〉 (617)j=1Proof. From Corollary 4.2.3.0.1 we have the necessary and sufficientrelationship between optimal primal and dual solutions under the assumptionof existence of a relatively interior feasible point:S ⋆ X ⋆ = S ⋆ R 1 R T 1 = X ⋆ S ⋆ = R 1 R T 1 S ⋆ = 0 (618)This means R(R 1 ) ⊆ N(S ⋆ ) and R(S ⋆ ) ⊆ N(R T 1 ). From (607) and (610)we get the sequence:X ⋆ = R 1 R1TX ⋆ + t 1 B 1 = R 2 R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )R1TX ⋆ + t 1 B 1 + t 2 B 2 = R 3 R3 T = R 2 (I − t 2 ψ(Z 2 )Z 2 )R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )(I − t 2 ψ(Z 2 )Z 2 )R1T.()∑X ⋆ + i i∏t j B j = R 1 (I − t j ψ(Z j )Z j ) R1 T (619)j=1j=1Substituting C = svec −1 (A T y ⋆ ) + S ⋆ from (547),()∑〈C , X ⋆ + i i∏t j B j 〉 =〈svec −1 (A T y ⋆ ) + S ⋆ , R 1 (I − t j ψ(Z j )Z j )j=1〈〉m∑= yk ⋆A ∑k , X ⋆ + i t j B jk=1j=1j=1〈 m〉∑= yk ⋆A k + S ⋆ , X ⋆ = 〈C , X ⋆ 〉 (620)k=1R T 1〉because 〈B i , A j 〉=0 ∀i, j by design (603).

4.3. RANK REDUCTION 2534.3.3.0.1 Example. Aδ(X) = b .This academic example demonstrates that a solution found by rank reductioncan certainly have rank less than Barvinok’s upper bound (233): Assume agiven vector b∈ R m belongs to the conic hull of the columns of a given matrixA∈ R m×n ;⎡⎤ ⎡ ⎤A =⎢⎣−1 1 8 1 1−3 2 812−9 4 8141319Consider the convex optimization problem⎥⎦ , b =⎢⎣11214⎥⎦ (621)minimize trXX∈ S 5subject to X ≽ 0Aδ(X) = b(622)that minimizes the 1-norm of the main diagonal; id est, problem (622) is thesame asminimizeX∈ S 5 ‖δ(X)‖ 1subject to X ≽ 0Aδ(X) = b(623)that finds a solution to Aδ(X)=b. Rank-3 solution X ⋆ = δ(x M) is optimal,where (confer (581))⎡ 2 ⎤1280x M=5⎢ 128 ⎥(624)⎣ 0 ⎦90128Yet upper bound (233) predicts existence of at most a(⌊√ ⌋ )8m + 1 − 1rank-= 22(625)feasible solution from m = 3 equality constraints. To find a lowerrank ρ optimal solution to (622) (barring combinatorics), we invokeProcedure 4.3.1.0.1:

254 CHAPTER 4. SEMIDEFINITE PROGRAMMINGInitialize:C =I , ρ=3, A j ∆ = δ(A(j, :)) , j =1, 2, 3, X ⋆ = δ(x M) , m=3, n=5.{Iteration i=1:⎡Step 1: R 1 =⎢⎣√21280 00 0 00√501280 0√00 090128⎤.⎥⎦find Z 1 ∈ S 3subject to 〈Z 1 , R T 1A j R 1 〉 = 0, j =1, 2, 3(626)A nonzero randomly selected matrix Z 1 having 0 main diagonalis feasible and yields a nonzero perturbation matrix. Choose,arbitrarily,Z 1 = 11 T − I ∈ S 3 (627)then (rounding)⎡B 1 =⎢⎣Step 2: t ⋆ 1= 1 because λ(Z 1 )=[−1 −1⎡X ⋆ ← δ(x M) + B 1 =⎢⎣0 0 0.0247 0 0.10480 0 0 0 00.0247 0 0 0 0.16570 0 0 0 00.1048 0 0.1657 0 02 ] T . So,⎤⎥⎦20 0.0247 0 0.10481280 0 0 0 00.0247 051280 0.16570 0 0 0 00.1048 0 0.1657 090128⎤(628)⎥⎦ (629)has rank ρ ←1 and produces the same optimal objective value.}

4.3. RANK REDUCTION 2554.3.3.0.2 Exercise. Rank reduction of maximal complementarity.Apply rank reduction Procedure 4.3.1.0.1 to the maximal complementarityexample (4.1.1.3.1). Demonstrate a rank-1 solution; which can certainly befound (by Barvinok’s Proposition 2.9.3.0.1) because there is only one equalityconstraint.4.3.4 thoughts regarding rank reductionBecause the rank reduction procedure is guaranteed only to produce anotheroptimal solution conforming to Barvinok’s upper bound (233), the Procedurewill not necessarily produce solutions of arbitrarily low rank; but if they exist,the Procedure can. Arbitrariness of search direction when matrix Z i becomesindefinite, mentioned on page 251, and the enormity of choices for Z i (609)are liabilities for this algorithm.4.3.4.1 Inequality constraintsThe question naturally arises: what to do when a semidefinite program (notin prototypical form (547)) 4.19 has linear inequality constraints of the formα T i svec X ≼ β i , i = 1... k (630)where {β i } are given scalars and {α i } are given vectors. One expedient wayto handle this circumstance is to convert the inequality constraints to equalityconstraints by introducing a slack variable γ ; id est,α T i svec X + γ i = β i , i = 1... k , γ ≽ 0 (631)thereby converting the problem to prototypical form.Alternatively, we say the i th inequality constraint is active when it ismet with equality; id est, when for particular i in (630), αi T svec X ⋆ = β i .An optimal high-rank solution X ⋆ is, of course, feasible satisfying all theconstraints. But for the purpose of rank reduction, inactive inequalityconstraints are ignored while active inequality constraints are interpreted as4.19 Contemporary numerical packages for solving semidefinite programs can solve a rangeof problems wider than prototype (547). Generally, they do so by transforming a givenproblem into prototypical form by introducing new constraints and variables. [9] [295] Weare momentarily considering a departure from the primal prototype that augments theconstraint set with linear inequalities.

256 CHAPTER 4. SEMIDEFINITE PROGRAMMINGequality constraints. In other words, we take the union of active inequalityconstraints (as equalities) with equality constraints A svec X = b to forma composite affine subset Â substituting for (550). Then we proceed withrank reduction of X ⋆ as though the semidefinite program were in prototypicalform (547P).4.4 Rank-constrained semidefinite programHere we introduce a technique for finding low-rank optimal solutions tosemidefinite programs of a more general form:4.4.1 rank constraint by convex iterationConsider a semidefinite feasibility problem of the formfindG∈S NGsubject to G ∈ CG ≽ 0rankG ≤ n(632)where C is a convex set presumed to contain positive semidefinite matricesof rank n or less; id est, C intersects the positive semidefinite cone boundary.We propose that this rank-constrained feasibility problem is equivalentlyexpressed with convex constraints:minimize 〈G , W 〉G∈S N , W ∈ S Nsubject to G ∈ CG ≽ 00 ≼ W ≼ ItrW = N − n(633)This we call the underlying problem which is very difficult to solve becauseof the bilinear objective function. So instead we solve the convex problemminimizeG∈S N 〈G , W 〉subject to G ∈ CG ≽ 0(634)

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 257where direction matrix 4.20 W is an optimal solution to semidefinite programN∑λ(G ⋆ ) ii=n+1= minimizeW ∈ S N 〈G ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N − n(1506a)whose feasible set is a Fantope (2.3.2.0.1), and where G ⋆ is an optimalsolution to problem (634) given some iterate W . The idea is to iteratesolution of (634) and (1506a) until convergence, as defined in4.4.1.1. 4.21Optimal matrix W ⋆ is defined as any direction matrix yielding optimalsolution G ⋆ of rank n or less to then convex equivalent (634) of feasibilityproblem (632); id est, any direction matrix for which the last N − nnonincreasingly ordered eigenvalues λ of G ⋆ are zero: (p.555)N∑λ(G ⋆ ) i = 〈G ⋆ , W ⋆ 〉 = ∆ 0 (635)i=n+1We emphasize that convex problem (634) is not a relaxation of therank-constrained feasibility problem (632); at convergence, convex iteration(634) (1506a) makes it instead an equivalent problem. 4.22We make no assumption regarding uniqueness of direction matrix W .The feasible set of direction matrices in (1506a) is the convex hull of outerproduct of all rank-(N − n) orthonormal matrices; videlicet,conv { UU T | U ∈ R N×N−n , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A 〉= N − n } (80)4.20 Search direction W is a hyperplane-normal pointing opposite to direction of movementdescribing minimization of a real linear function 〈G, W 〉 (p.64).4.21 The proposed iteration is not an alternating projection. (confer Figure 125)4.22 Terminology equivalent problem meaning, optimal solution to one problem can bederived from optimal solution to another. Terminology same problem means: optimalsolution set for one problem is identical to the optimal solution set of another (withouttransformation).

258 CHAPTER 4. SEMIDEFINITE PROGRAMMINGSet {UU T | U ∈ R N×N−n , U T U = I} comprises the extreme points of thisFantope (80). An optimal solution W to (1506a), that is an extreme point,is known in closed form (p.555). By (193), that particular direction (−W)can be regarded as pointing toward rank-n matrices that are simultaneouslydiagonalizable with G ⋆ . When W = I , as in the trace heuristic for example,then −W points directly at the origin (the rank-0 matrix).4.4.1.1 convergenceWe study convergence to ascertain conditions under which a direction matrixwill reveal a feasible G matrix of rank n or less in semidefinite program (634).Denote by W ⋆ a particular optimal direction matrix from semidefiniteprogram (1506a) such that (635) holds. Then we define global convergenceof the iteration (634) (1506a) to correspond with this vanishing vectorinner-product (635) of optimal solutions.Because this iterative technique for constraining rank is not a projectionmethod, it can find a rank-n solution G ⋆ ((635) will be satisfied) only if atleast one exists in the feasible set of program (634).4.4.1.1.1 Proof. Suppose 〈G ⋆ , W 〉=τ is satisfied for somenonnegative constant τ after any particular iteration (634) (1506a) of the twominimization problems. Once a particular value of τ is achieved, it can neverbe exceeded by subsequent iterations because existence of feasible G and Whaving that vector inner-product τ has been established simultaneously ineach problem. Because the infimum of vector inner-product of two positivesemidefinite matrix variables is zero, the nonincreasing sequence of iterationsis thus bounded below hence convergent because any bounded monotonicsequence in R is convergent. [190,1.2] [30,1.1] Local convergence to someτ is thereby established.When a rank-n feasible solution to (634) exists, it remains pending toshow under what conditions 〈G ⋆ , W ⋆ 〉=0 (635) is achieved by iterativesolution of semidefinite programs (634) and (1506a). Then pair (G ⋆ , W ⋆ )becomes a fixed-point of iteration.A nonexistent feasible rank-n solution would mean failure to converge bydefinition (635) but, as proved, the convex iteration always converges locallyif not globally. Now, an application:

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 2594.4.1.1.2 Example. Sensor-Network Localization and Wireless Location.Heuristic solution to a sensor-network localization problem, proposed byCarter & Jin in [51], 4.23 is limited to two Euclidean dimensions and appliessemidefinite programming (SDP) to little subproblems. There, a largenetwork is partitioned into smaller subnetworks (as small as one sensor)and then semidefinite programming and heuristics called spaseloc areapplied to localize each and every partition by two-dimensional distancegeometry. Their partitioning procedure is one-pass, yet termed iterative;a term applicable only in so far as adjoining partitions can share localizedsensors and anchors (absolute sensor positions known a priori). But thereis no iteration on the entire network, hence the term “iterative” is perhapsinappropriate. As partitions are selected based on “rule sets” (heuristics, notgeographics), they also term the partitioning adaptive. But no adaptationactually occurs once a partition has been determined.One can reasonably argue that semidefinite programming methods areunnecessary for localization of large sensor networks. In the past, thesenonlinear localization problems were solved algebraically and computed byleast squares solution to hyperbolic equations; called multilateration. 4.24Indeed, practical contemporary numerical methods for global positioning bysatellite (GPS) do not rely on semidefinite programming.The beauty of semidefinite programming as relates to localization lies inconvex expression of classical multilateration: So & Ye showed [241] that theproblem of finding unique solution, to a noiseless nonlinear system describingthe common point of intersection of hyperspheres in real Euclidean vectorspace, can be expressed as a semidefinite program via distance geometry.But the need for SDP methods in Carter & Jin is also a question logicallyconsequent to their reliance on complicated and extensive heuristics forpartitioning a large network and for solving a partition whose intersensormeasurement data is inadequate for localization by distance geometry. Whilepartitions range in size between 2 and 10 sensors, 5 sensors optimally,heuristics provided are only for 2 spatial dimensions (no higher-dimensionalalgorithm is proposed). For these small numbers it remains unclarified as toprecisely what advantage is gained over traditional least squares by solving4.23 The paper constitutes Jin’s dissertation for University of Toronto although her nameappears as second author.4.24 Multilateration − literally, having many sides; shape of a geometric figure formed bynearly intersecting lines of position. In navigation systems, therefore: Obtaining a fix frommultiple lines of position.

260 CHAPTER 4. SEMIDEFINITE PROGRAMMINGmany little semidefinite programs.Partitioning of large sensor networks is a logical alternative to rapidgrowth of SDP computational complexity with problem size. But whenimpact of noise on distance measurement is of most concern, one is averse toa partitioning scheme because noise-effects vary inversely with problem size.[39,2.2] (5.13.2) Since an individual partition’s solution is not iteratedin Carter & Jin and is interdependent with adjoining partitions, we expecterrors to propagate from one partition to the next; the ultimate partitionsolved, expected to suffer most.Heuristics often fail on real-world data because of unanticipatedcircumstances. When heuristics fail, generally they are repaired by addingmore heuristics. Tenuous is any presumption, for example, that distancemeasurement errors have distribution characterized by circular contours ofequal probability about an unknown sensor-location. That presumptioneffectively appears within Carter & Jin’s optimization problem statementas affine equality constraints relating unknowns to distance measurementsthat are corrupted by noise. Yet in most all urban environments, thismeasurement noise is more aptly characterized by ellipsoids of varyingorientation and eccentricity as one recedes from a sensor. Each unknownsensor must therefore instead be bound to its own particular range ofdistance, primarily determined by the terrain. 4.25 The nonconvex problemwe must instead solve is:find {x i , x j }i , j ∈ I(636)subject to d ij ≤ ‖x i − x j ‖ 2 ≤ d ijwhere x i represents sensor location, and where d ij and d ij respectivelyrepresent lower and upper bounds on measured distance from i th to j thsensor (or from sensor to anchor). Figure 69 illustrates contours of equalsensor-location uncertainty. By establishing these individual upper and lowerbounds, orientation and eccentricity can effectively be incorporated into theproblem statement.4.25 A distinct contour map corresponding to each anchor is required in practice.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 261Generally speaking, there can be no unique solution to the sensor-networklocalization problem because there is no unique formulation; that is the art ofoptimization. Any optimal solution obtained depends on whether or how thenetwork is partitioned and how the problem is formulated. When a particularformulation is a convex optimization problem, then the set of all optimalsolutions forms a convex set containing the actual or true localization.Measurement noise precludes equality constraints representing distance. Theoptimal solution set is consequently expanded; necessitated by introductionof distance inequalities admitting more and higher-rank solutions. Evenwere the optimal solution set a single point, it is not necessarily the truelocalization because there is little hope of exact localization by any algorithmonce significant noise is introduced.Carter & Jin gauge performance of their heuristics to the SDP formulationof author Biswas whom they regard as vanguard to the art. [12,1] Biswasposed localization as an optimization problem minimizing a distance measure.[35] [33] Intuitively, minimization of any distance measure yields compactedsolutions; (confer6.4.0.0.1) precisely the anomaly motivating Carter & Jin.Their two-dimensional heuristics outperformed Biswas’ localizations bothin execution-time and proximity to the desired result. Perhaps, instead ofheuristics, Biswas’ approach to localization can be improved: [32] [34].The sensor-network localization problem is considered difficult. [12,2]Rank constraints in optimization are considered more difficult. In whatfollows, we present the localization problem as a semidefinite program(equivalent to (636)) having an explicit rank constraint which controlsEuclidean dimension of an optimal solution. We show how to achieve thatrank constraint only if the feasible set contains a matrix of desired rank.Our problem formulation is extensible to any spatial dimension.proposed standardized testJin proposes an academic test in real Euclidean two-dimensional spaceR 2 that we adopt. In essence, this test is a localization of sensors andanchors arranged in a regular triangular lattice. Lattice connectivity issolely determined by sensor radio range; a connectivity graph is assumedincomplete. In the interest of test standardization, we propose adoptionof a few small examples: Figure 65 through Figure 68 and their particularconnectivity represented by matrices (637) through (640) respectively.

262 CHAPTER 4. SEMIDEFINITE PROGRAMMING2314Figure 65: 2-lattice in R 2 , hand-drawn. Nodes 3 and 4 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.0 • ? •• 0 • •? • 0 ◦• • ◦ 0(637)Matrix entries dot • indicate measurable distance between nodes whileunknown distance is denoted by ? (question mark). Matrix entrieshollow dot ◦ represent known distance between anchors (to very highaccuracy) while zero distance is denoted 0. Because measured distancesare quite unreliable in practice, our solution to the localization problemsubstitutes a distinct range of possible distance for each measurable distance;equality constraints exist only for anchors.Anchors are chosen so as to increase difficulty for algorithms dependenton existence of sensors in their convex hull. The challenge is to find a solutionin two dimensions close to the true sensor positions given incomplete noisyintersensor distance information.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 2635 763 841 92Figure 66: 3-lattice in R 2 , hand-drawn. Nodes 7, 8, and 9 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.0 • • ? • ? ? • •• 0 • • ? • ? • •• • 0 • • • • • •? • • 0 ? • • • •• ? • ? 0 • • • •? • • • • 0 • • •? ? • • • • 0 ◦ ◦• • • • • • ◦ 0 ◦• • • • • • ◦ ◦ 0(638)

264 CHAPTER 4. SEMIDEFINITE PROGRAMMING1013 11 1271489415 5 611623Figure 67: 4-lattice in R 2 , hand-drawn. Nodes 13, 14, 15, and 16 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.0 ? ? • ? ? • ? ? ? ? ? ? ? • •? 0 • • • • ? • ? ? ? ? ? • • •? • 0 ? • • ? ? • ? ? ? ? ? • •• • ? 0 • ? • • ? • ? ? • • • •? • • • 0 • ? • • ? • • • • • •? • • ? • 0 ? • • ? • • ? ? ? ?• ? ? • ? ? 0 ? ? • ? ? • • • •? • ? • • • ? 0 • • • • • • • •? ? • ? • • ? • 0 ? • • • ? • ?? ? ? • ? ? • • ? 0 • ? • • • ?? ? ? ? • • ? • • • 0 • • • • ?? ? ? ? • • ? • • ? • 0 ? ? ? ?? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦• • • • • ? • • • • • ? ◦ ◦ 0 ◦• • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0(639)

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 2651718 21 19 2013 14221516910 23 11 125 6 24 7 8122534Figure 68: 5-lattice in R 2 . Nodes 21 through 25 are anchors.0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?• 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • •? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • •? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ?• • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? •• • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • •? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • •? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ?• ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ?? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • •? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • •? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ?? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ?? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ?? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ?? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ?? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ?? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ?? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ?? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ?? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0(640)

266 CHAPTER 4. SEMIDEFINITE PROGRAMMINGMARKET St.Figure 69: Uncertainty ellipsoid in R 2 for each of 15 sensors • located withinthree city blocks in downtown San Francisco. Data by Polaris Wireless. [243]problem statementAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (66)where N is regarded as cardinality of list X . Positive semidefinite matrixX T X , formed from inner product of the list, is a Gram matrix; [183,3.6]⎡⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x NxG = ∆ T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX T X =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N + (743)⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2where S N + is the convex cone of N ×N positive semidefinite matrices in thereal symmetric matrix subspace S N .Existence of noise precludes measured distance from the input data. Weinstead assign measured distance to a range estimate specified by individualupper and lower bounds: d ij is an upper bound on distance-square from i th

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 267to j th sensor, while d ij is a lower bound. These bounds become the inputdata. Each measurement range is presumed different from the others becauseof measurement uncertainty; e.g., Figure 69.Our mathematical treatment of anchors and sensors is notdichotomized. 4.26 A known sensor position to high accuracy ˇx i is ananchor. Then the sensor-network localization problem (636) can beexpressed equivalently: Given a number of anchors m , and I a setof indices (corresponding to all existing distance measurements •),for 0 < n < Nminimize trZG∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , ∀i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XZ =X T≽ 0GrankZ= nwhere e i is the i th member of the standard basis for R N . Distance-square(641)d ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (730)is related to Gram matrix entries G ∆ =[g ij ] by a vector inner-productd ij = g ii + g jj − 2g ij= 〈G , (e i − e j )(e i − e j ) T 〉 ∆ = tr(G T (e i − e j )(e i − e j ) T )(745)hence the scalar inequalities. The objective function trZ is a heuristic whosesole purpose is to represent the convex envelope of rankZ . (7.2.2.1.1) BySchur complement (A.4) any feasible G and X provide a comparison withrespect to the positive semidefinite coneG ≽ X T X (778)4.26 Wireless location problem thus stated identically; difference being: fewer sensors.

268 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhich is a convex relaxation of the desired equality constraint[ ] [ ]I X IX T =G X T [ I X ] (779)The rank constraint insures this equality holds thus restricting solution to R n .convex equivalent problem statementProblem statement (641) is nonconvex because of the rank constraint.We do not eliminate or ignore the rank constraint; rather, we find a convexway to enforce it: for 0 < n < Nminimize 〈Z , W 〉G∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , ∀i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XZ =X T≽ 0 (642)GEach linear equality constraint in G∈ S N represents a hyperplane inisometrically isomorphic Euclidean vector space R N(N+1)/2 , while each linearinequality pair represents a convex Euclidean body known as slab (anintersection of two parallel but opposing halfspaces, Figure 11). In thisconvex optimization problem (642), a semidefinite program, we substitutea vector inner-product objective function for trace from nonconvex problem(641);〈Z , I 〉 = trZ ← 〈Z , W 〉 (643)a generalization of the known trace heuristic [91] for minimizing convexenvelope of rank, where W ∈ S N+n+ is constant with respect to (642).Matrix W is normal to a hyperplane in S N+n minimized over a convexfeasible set specified by the constraints in (642). Matrix W is chosen so −Wpoints in the direction of a feasible rank-n Gram matrix. Thus the purposeof vector inner-product objective (643) is to locate a feasible rank-n Grammatrix that is presumed existent on the boundary of positive semidefinitecone S N + .

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 269direction matrix WDenote by Z ⋆ an optimal composite matrix from semidefiniteprogram (642). Then for Z ⋆ ∈ S N+n whose eigenvalues λ(Z ⋆ )∈ R N+n arearranged in nonincreasing order, (Fan)N+n∑λ(Z ⋆ ) ii=n+1= minimizeW ∈ S N+n 〈Z ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N(1506a)which has an optimal solution that is known in closed form. This eigenvaluesum is zero when Z ⋆ has rank n or less.Foreknowledge of optimal Z ⋆ , to make possible this search for W ,implies recursion; id est, semidefinite program (642) is solved for Z ⋆initializing W = I or W = 0. Once found, Z ⋆ becomes constant insemidefinite program (1506a) where a new normal direction W is found asits optimal solution. Then the cycle (642) (1506a) iterates until convergence.When rankZ ⋆ = n , solution via this convex iteration is optimal forsensor-network localization problem (636) and its equivalent (641).numerical solutionIn all examples to follow, number of anchorsm = √ N (644)equals square root of cardinality N of list X . Indices set I identifying allexisting distance measurements • is ascertained from connectivity matrix(637), (638), (639), or (640). We solve iteration (642) (1506a) in dimensionn = 2 for each respective example illustrated in Figure 65 through Figure 68.In presence of negligible noise, actual position is reliably localized forevery standardized example; noteworthy in so far as each example representsan incomplete graph. This means the set of all solutions having lowest rankis a single point, to within a rigid transformation.To make the examples interesting and consistent with previous work, werandomize each range of distance-square that bounds 〈G, (e i −e j )(e i −e j ) T 〉in (642); id est, for each and every (i,j)∈ I

270 CHAPTER 4. SEMIDEFINITE PROGRAMMINGd ij = d ij (1 + √ 3 rand(1)η) 2d ij = d ij (1 − √ 3 rand(1)η) 2 (645)where η = 0.1 is a constant noise factor, rand(1) is the Matlab functionproviding one sample of uniformly distributed noise in the interval [0, 1] ,and d ij is actual distance-square from i th to j th sensor. Because of theseparate function calls rand(1) , each range of distance-square [d ij , d ij ]is not necessarily centered on actual distance-square d ij . The factor √ 3provides unit variance on the stochastic range.Figure 70 through Figure 73 each illustrate one realization of numericalsolution to the standardized lattice problems posed by Figure 65 throughFigure 68 respectively. Exact localization is impossible because ofmeasurement noise. Certainly, by inspection of their published graphicaldata, our new results are competitive with those of Carter & Jin. Obviouslyour solutions do not suffer from those compaction-type errors (clustering oflocalized sensors) exhibited by Biswas’ graphical results for the same noisefactor η ; which is all we intended to demonstrate.localization example conclusionSolution to this sensor-network localization problem became apparent byunderstanding geometry of optimization. Trace of a matrix, to a student oflinear algebra, is perhaps a sum of eigenvalues. But to us, trace representsthe normal I to some hyperplane in Euclidean vector space.The legacy of Carter & Jin [51] is a sobering demonstration of the needfor more efficient methods for solution of semidefinite programs, whilethat of So & Ye [241] is the bonding of distance geometry to semidefiniteprogramming. Elegance of our semidefinite problem statement (642) for asensor-network localization problem in any dimension should provide someimpetus to focus more research on computational intensity. Higher speedand greater accuracy from a simplex-like solver is what is required. We numerically tested the foregoing technique for constraining rank ona wide range of problems including localization of randomized positions,stress (7.2.2.7.1), ball packing (5.4.2.2.3), and cardinality problems. Wehave had some success introducing the direction vector inner-product (643)as a regularization term (Pareto optimization) whose purpose is to constrainrank, affine dimension, or cardinality:

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 2711.210.80.60.40.20−0.2−0.5 0 0.5 1 1.5 2Figure 70: Typical solution for 2-lattice in Figure 65 with noise factorη = 0.1 . Two red rightmost nodes are anchors; two remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 1.14 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 1 iteration (642) (1506a) subject to reflection error.1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 71: Typical solution for 3-lattice in Figure 66 with noise factorη = 0.1 . Three red vertical middle nodes are anchors; remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 1.12 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 2 iterations (642) (1506a).

272 CHAPTER 4. SEMIDEFINITE PROGRAMMING1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 72: Typical solution for 4-lattice in Figure 67 with noise factorη = 0.1 . Four red vertical middle-left nodes are anchors; remaining nodesare sensors. Radio range of sensor 1 indicated by arc; radius = 0.75 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 7 iterations (642) (1506a).1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 73: Typical solution for 5-lattice in Figure 68 with noise factorη = 0.1 . Five red vertical middle nodes are anchors; remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 0.56 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 3 iterations (642) (1506a).

4.5. CONSTRAINING CARDINALITY 2734.5 Constraining cardinality4.5.1 nonnegative variableOur goal is to reliably constrain rank in a semidefinite program. Thereis a direct analogy to linear programming that is simpler to present butequally hard to solve. In Optimization, that analogy is known as thecardinality problem. If we can solve the cardinality problem, then solutionto the rank-constraint problem follows; and vice versa.Consider a feasibility problem Ax = b , but with an upper bound k oncardinality ‖x‖ 0 of a nonnegative solution x : for vector b∈R(A)find x ∈ R nsubject to Ax = bx ≽ 0‖x‖ 0 ≤ k(646)where ‖x‖ 0 ≤ k means 4.27 vector x has at most k nonzero entries; sucha vector is presumed existent in the feasible set. Nonnegativity constraintx ≽ 0 is analogous to positive semidefiniteness; the notation means vector xbelongs to the nonnegative orthant R n + . Cardinality is quasiconcave on R n +just as rank is quasiconcave on S n + . [46,3.4.2]We propose that cardinality-constrained feasibility problem (646) isequivalently expressed with convex constraints:minimize x T yx∈R n , y∈R nsubject to Ax = bx ≽ 00 ≼ y ≼ 1y T 1 = n − k(647)whose bilinear objective function x T y is quasiconcave only when n = 1.This simple-looking problem (647) is very hard to solve, yet is not hardto understand. Because the sets feasible to x and y are not interdependent,4.27 Strictly speaking, cardinality ‖x‖ 0 cannot be a norm (3.1.3) because it is notpositively homogeneous.

274 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwe can separate the problem half in variable y :n∑π(x) i =i=k+1minimizey∈R n 〈x , y〉subject to 0 ≼ y ≼ 1y T 1 = n − k(435)which is analogous to the rank constraint problem; (Ky Fan, confer p.257)N∑λ(G) ii=n+1= minimizeW ∈ S N 〈G , W 〉subject to 0 ≼ W ≼ ItrW = N − n(1506a)Linear program (435) sums the n−k smallest entries from vector x . Incontext of problem (647), we want n−k entries of x to sum to zero; id est,we want a globally optimal objective x ⋆T y ⋆ to vanish:n∑π(x ⋆ ) i = 〈x ⋆ , y ⋆ 〉 = ∆ 0 (648)i=k+1Because all entries in x ⋆ must be nonnegative, then n−k entries arethemselves zero whenever their sum is, and then cardinality of x ⋆ ∈ R n isat most k .Ideally, one wants to solve (647) directly. But contemporary techniquesfor doing so are computationally intensive. 4.28 One efficient way to solvenonconvex problem (647) is by splitting it into a sequence of convex problems:minimize x T yx∈R nsubject to Ax = bx ≽ 0(649)minimize x ⋆T yy∈R nsubject to 0 ≼ y ≼ 1y T 1 = n − k(435)4.28 e.g., a branch and bound method. Solving (647) should not be ruled out; assuming,an efficient method were discovered or transformation to a convex equivalent were found.

4.5. CONSTRAINING CARDINALITY 275This sequence is iterated until x ⋆T y ⋆ vanishes; id est, until desired cardinalityis achieved.Vector y may be interpreted as a negative search direction; it pointsopposite to direction of movement of hyperplane {x | 〈x , y〉=τ} in aminimization of real linear function 〈x , y〉 over the feasible set in linearprogram (649). (p.64) Direction vector y is not unique. The feasible setof direction vectors in (435) is the convex hull of all cardinality-(n−k)one-vectors; videlicet,conv{u∈ R n | cardu = n−k , u i ∈ {0, 1}} = {a∈ R n | 1 ≽ a ≽ 0, 〈1, a〉= n−k}(650)Set {u∈ R n | cardu = n−k , u i ∈ {0, 1}} comprises the extreme points ofset (650); a hypercube slice. An optimal solution y to (435), that is anextreme point, is known in closed form; it has 1 in each entry correspondingto the n−k smallest entries of x ⋆ . That particular direction (−y) can beinterpreted as pointing toward cardinality-k vectors having the same orderingas x ⋆ . When y = 1, as in 1-norm minimization for example, then −y pointsdirectly at the origin (the cardinality-0 vector).This technique for constraining cardinality works often and, for someproblem classes (beyond Ax = b), it works all the time; meaning, optimalsolution to problem (647) can often be found by this convex iteration. Butexamples can be found that make the iteration stall at a solution not ofdesired cardinality. Heuristics for breaking out of a stall can be implementedwith some success:4.5.1.0.1 Example. Sparsest solution to Ax = b.Given data, from Example 4.2.3.1.1,⎡⎤ ⎡−1 1 8 1 1 0⎢1 1 1A = ⎣ −3 2 8 − 1 ⎥ ⎢2 3 2 3 ⎦ , b = ⎣−9 4 8141914 − 1 911214⎤⎥⎦ (579)the sparsest solution to the classical linear equation Ax = b is x = e 4 ∈ R 6(confer (592)). And given data, from Example 4.2.3.1.2,[ ] [ ] [ ]1 0 √2 12 1 11A =1or, b = (596)0 1 √2 1 1 21

276 CHAPTER 4. SEMIDEFINITE PROGRAMMINGthe most sparse solution is respectively x = [ 0 0 √ 2 ] T ∈ R 3 (confer (597))and x = [ 0 1 0 ] T ∈ R 3 . Given random data, in Matlab notation,A=randn(m,n), index=round((n−1)∗rand(1)) +1, b=A(:,index)(651)where m and n are selected arbitrarily, the sparsest solution is x=e index ∈ R nfrom the standard basis. Although these sparsest solutions are recoverableby inspection, we seek to discern them instead by convex iteration; namely,by iterating problem sequence (649) (435). From the numerical data given,cardinality ‖x‖ 0 = 1 is expected. Iteration continues until x T y vanishes (towithin some numerical precision); id est, until desired cardinality is achieved.All three examples return a correct cardinality-1 solution to withinmachine precision in few iterations, but are occasionally subject to stall.Stalls are remedied by reinitializing y to a random state. Stalling is not an inevitable behavior. Convex iteration succeeds, for someproblem types, almost all the time:4.5.1.0.2 Example. Signal dropout.Signal dropout is an old problem; well studied from both an industrial andacademic perspective. Essentially dropout means momentary loss or gap ina signal, while passing through some channel, caused by some man-madeor natural phenomenon. The signal lost is assumed completely destroyedsomehow. What remains within the time-gap is system or idle channel noise.The signal could be voice over Internet protocol (VoIP), for example, audiodata from a compact disc (CD) or video data from a digital video disc (DVD),a television transmission over cable or the airwaves, or a typically ravagedcell phone communication, etcetera.Here we consider signal dropout in a discrete-time signal corrupted byadditive white noise assumed uncorrelated to the signal. The linear channelis assumed to introduce no filtering. We create a discretized windowedsignal for this example by positively combining k randomly chosen vectorsfrom a discrete cosine transform (DCT) basis denoted Ψ∈ R n×n . Frequencyincreases, in the Fourier sense, from DC toward Nyquist as column index ofbasis Ψ increases. Otherwise, details of the basis are unimportant exceptfor its orthogonality Ψ T = Ψ −1 . Transmitted signal is denoteds = Ψz ∈ R n (652)

4.5. CONSTRAINING CARDINALITY 277whose upper bound on DCT basis coefficient cardinality cardz ≤ k isassumed known; 4.29 hence a critical assumption that transmitted signal s issparsely represented (k < n) with respect to the DCT basis. Nonzero signalcoefficients in vector z are assumed to place each chosen basis vector abovethe noise floor.We also assume that the gap’s beginning and ending in time are preciselylocalized to within a sample; id est, index l locates the last sample prior tothe gap’s onset, while index n−l+1 locates the first sample subsequent tothe gap: for rectangularly windowed received signal g possessing a time-gaploss and additive noise η ∈ R n⎡⎤s 1:l + η 1:lg = ⎣ η l+1:n−l⎦∈ R n (653)+ η n−l+1:ns n−l+1:nThe window is thereby centered on the gap and short enough so that the DCTspectrum of signal s can be assumed invariant over the window’s duration n .Signal to noise ratio within this window is defined[ ]∥ s 1:l ∥∥∥∥SNR = ∆ s n−l+1:n20 log(654)‖η‖In absence of noise, knowing the signal DCT basis and having a goodestimate of basis coefficient cardinality makes perfectly reconstructing gaploss easy: it amounts to solving a linear system of equations and requireslittle or no optimization; with caveat, number of equations exceeds sparsityof signal representation (roughly l ≥ k) with respect to DCT basis.But addition of a significant amount of noise η increases level ofdifficulty dramatically; a 1-norm based method of reducing cardinality, forexample, almost always returns DCT basis coefficients numbering in excessof minimum cardinality. We speculate that is because signal cardinality 2lbecomes the predominant cardinality. DCT basis coefficient cardinality is anexplicit constraint to the optimization problem we shall pose: In presence ofnoise, constraints equating reconstructed signal f to received signal g arenot possible. We can instead formulate the dropout recovery problem as a4.29 This simplifies exposition, although it may be an unrealistic assumption in manyapplications.

278 CHAPTER 4. SEMIDEFINITE PROGRAMMINGflatline and g0.250.20.15s + η0.10.050−0.05−0.1ηs + η(a)−0.15−0.2dropout (s = 0)−0.250 100 200 300 400 500f and g0.250.20.15f0.10.050(b)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500Figure 74: (a) Signal dropout in bandlimited signal s corrupted bynoise η (SNR =10dB, g = s + η). Flatline indicates duration of signaldropout. (b) Reconstructed signal f (red) overlaid with corrupted signal g .

4.5. CONSTRAINING CARDINALITY 279f − s0.250.20.150.10.050(a)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500f and s0.250.20.150.10.050(b)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500Figure 75: (a) Error signal power (reconstruction f less original noiselesssignal s) is 36dB below s . (b) Original noiseless bandlimited signal soverlaid with reconstruction f (red) from signal g having dropout plus noise.

280 CHAPTER 4. SEMIDEFINITE PROGRAMMINGbest approximation:∥[∥∥∥minimizex∈R nsubject to f = Ψxx ≽ 0cardx ≤ k]∥f 1:l − g 1:l ∥∥∥f n−l+1:n − g n−l+1:n(655)We propose solving this nonconvex problem (655) as explained in4.5; byiteration of two convex problems until convergence:[]∥ minimize 〈x , y〉 +f 1:l − g 1:l ∥∥∥∥x∈R n f n−l+1:n − g n−l+1:n(656)subject to f = Ψxx ≽ 0andminimizey∈R n 〈x ⋆ , y〉subject to 0 ≼ y ≼ 1y T 1 = n − k(435)Signal cardinality 2l is implicit to the problem statement. When numberof samples in the dropout region exceeds half the window size, then thatdeficient cardinality becomes a source of severe degradation to reconstructionin presence of noise. Thus we have a rule of thumb for reconstruction ofsignal dropout with good noise suppression: l must exceed the maximum ofcardinality boundsl ≥ max{k , n/4} (657)Figure 74 and Figure 75 show one realization of this dropout problem.Original signal s is created by adding four (k = 4) randomly selected DCTbasis vectors, from Ψ (n = 500 in this example), whose amplitudes arerandomly selected from a uniform distribution above the noise floor; in theinterval [10 −10/20 , 1]. Then a 240-sample dropout is realized (l = 130) andGaussian noise η added to make corrupted signal g (from which a bestapproximation f will be made) having 10dB signal to noise ratio (654).The time gap contains much noise, as apparent from Figure 74a. But inonly a few iterations (656) (435), original signal s is recovered with relativeerror power 36dB down; illustrated in Figure 75. Correct cardinality is

4.5. CONSTRAINING CARDINALITY 281also recovered (cardx = cardz) along with the basis vector indices used tomake original signal s . Approximation error is due to DCT basis coefficientestimate error. When this experiment is repeated 1000 times on noisy signalsaveraging 10dB SNR, the correct cardinality and indices are recovered 99%of the time with average relative error power 30dB down. Without noise, weget perfect reconstruction in one iteration. (Matlab code,F.8) 4.5.2 constraining cardinality of signed variableNow consider a feasibility problem equivalent to the classical problem fromlinear algebra Ax = b , but with an upper bound k on cardinality ‖x‖ 0 : forvector b∈R(A)find x ∈ R nsubject to Ax = b‖x‖ 0 ≤ k(658)where ‖x‖ 0 ≤ k means vector x has at most k nonzero entries; such avector is presumed existent in the feasible set. Convex iteration works witha nonnegative variable; we therefore need absolute value |x| . We proposethat problem (658) can be equivalently writtenminimize |x| T yx∈R n , y∈R nsubject to Ax = b0 ≼ y ≼ 1y T 1 = n − k≡T T minimizex∈R n , y∈R n , t∈R n t y + t 1subject to Ax = b0 ≼ y ≼ 1y T 1 = n − k−t ≼ x ≼ t(659)which moves the cardinality constraint to the objective as t T y . The termt T 1 is necessary to determine |x| because vector y can take zero valuesin its entries; a good estimate of |x| is required for (435). To express thisnonconvex problem as a convex iteration, we separate it into halves (4.5.1)minimize t T (y + 1)x∈R n , t∈R nsubject to Ax = b−t ≼ x ≼ t(660)

282 CHAPTER 4. SEMIDEFINITE PROGRAMMINGby iterating withminimizey∈R n 〈t ⋆ , y〉subject to 0 ≼ y ≼ 1y T 1 = n − k(435)to find direction vector y .By setting initial value of direction vector y to 0 or 1, the first iterationof problem (660) is a 1-norm problem; id est,minimize t T 1x∈R n , t∈R nsubject to Ax = b−t ≼ x ≼ t≡minimizex∈R n ‖x‖ 1subject to Ax = b(661)Subsequent iterations of problem (660) engaging cardinality term t T y canbe interpreted as corrections to this 1-norm problem leading to a 0-normsolution.Because of the equivalenceminimizey∈R n 〈t ⋆ , y〉subject to 0 ≼ y ≼ 1≡minimizey∈R n 〈t ⋆ , y + 1〉subject to 0 ≼ y ≼ 1(662)y T 1 = n − ky T 1 = n − kiteration (660) (435) always converges to a locally optimal solution by virtueof a monotonically nonincreasing objective sequence. [190,1.2] [30,1.1]There can be no proof of global optimality, defined by (648).4.6 Cardinality and rank constraint examples4.6.0.0.1 Example. Projection on an ellipsoid boundary. [38] [99,5.1][181,2] Consider classical linear equation Ax = b but with a constraint onnorm of solution x , given matrices C , fat A , and vector b∈R(A)find x ∈ R Nsubject to Ax = b‖Cx‖ = 1(663)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 283The set {x | ‖Cx‖=1} describes an ellipsoid boundary. This is a nonconvexproblem because solution is constrained to that boundary. Assign[ ] [ ] [ ]Cx [ xG =T C T 1] X Cx ∆ Cxx =1x T C T =T C T Cx1 x T C T ∈ S N+11(664)Any rank-1 solution must have this form. (B.1.0.2) Ellipsoidally constrainedfeasibility problem (663) is equivalent to:findX∈S Nx ∈ R Nsubject to Ax = b[X CxG =x T C T 1(G ≽ 0)rankG = 1trX = 1](665)This is transformed to an equivalent convex problem by moving the rankconstraint to the objective: We iterate solution ofminimize 〈G , Y 〉X∈S N , x∈R Nsubject to Ax = b[X CxG =x T C T 1trX = 1]≽ 0(666)withminimizeY ∈ S N+1 〈G ⋆ , Y 〉subject to 0 ≼ Y ≼ ItrY = N(667)until convergence. Direction matrix Y ∈ S N+1 , initially 0, controls rank.(1506a) Singular value decomposition G ⋆ = UΣQ T ∈ S N+1+ (A.6) provides anew direction matrix Y = U(:, 2:N+1)U(:, 2:N+1) T that optimally solves(667) at each iteration. An optimal solution to (663) is thereby found in afew iterations, making convex problem (666) its equivalent.

284 CHAPTER 4. SEMIDEFINITE PROGRAMMINGIt remains possible for the iteration to stall; were a rank-1 G matrix notfound. In that case, the current search direction is momentarily reversedwith an added random element:Y = −U(:, 2:N+1) ( U(:, 2:N+1) T +randn(N,1)U(:, 1) T) (668)This heuristic is quite effective for this problem, which is exceptionally easyto solve by convex iteration.When b /∈R(A) then problem (663) must be restated as a projection:minimize ‖Ax − b‖x∈R Nsubject to ‖Cx‖ = 1(669)This is a projection of point b on an ellipsoid boundary because any affinetransformation of an ellipsoid remains an ellipsoid. Problem (666) in turnbecomesminimize 〈G , Y 〉 + ‖Ax − b‖X∈S N , x∈R N[ ]X Cxsubject to G =x T C T ≽ 0 (670)1trX = 1We iterate this with calculation (667) of direction matrix Y as before untila rank-1 G matrix is found.4.6.0.0.2 Example. Procrustes problem. [38]Example 4.6.0.0.1 is extensible. An orthonormal matrix Q∈ R n×p iscompletely characterized by Q T Q = I . Consider the particular caseQ = [x y ]∈ R n×2 as variable to a Procrustes problem (C.3): givenA∈ R m×n and B ∈ R m×2minimize ‖AQ − B‖ FQ∈R n×2subject to Q T Q = I(671)which is nonconvex. By vectorizing matrix Q we can make the assignment:⎡ ⎤x [x T y T 1]⎡G = ⎣ y ⎦ = ⎣1X Z xZ T Y yx T y T 1⎤⎡⎦=∆ ⎣xx T xy T xyx T yy T yx T y T 1⎤⎦∈ S 2n+1 (672)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 285Now Procrustes problem (671) can be equivalently restated:minimizeX , Y , Z , x , y‖A[x y ] − B‖ F⎡X Z⎤xsubject to G = ⎣ Z T Y y ⎦x T y T 1(G ≽ 0)rankG = 1trX = 1trY = 1trZ = 0(673)To solve this, we form the convex problem sequence:minimizeX , Y , Z , x , y‖A[x y ] −B‖ F + 〈G, W 〉⎡X Z⎤xsubject to G = ⎣ Z T Y y ⎦ ≽ 0x T y T 1trX = 1trY = 1trZ = 0(674)andminimizeW ∈ S 2n+1 〈G ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = 2n(675)which has an optimal solution W that is known in closed form (page 555).These two problems are iterated until convergence and a rank-1 G matrix isfound. A good initial value for direction matrix W is 0. Optimal Q ⋆ equals[x ⋆ y ⋆ ].Numerically, this Procrustes problem is easy to solve; a solution seemsalways to be found in one or few iterations. This problem formulation isextensible, of course, to orthogonal (square) matrices Q .

286 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.6.0.0.3 Example. Tractable polynomial constraint.The ability to handle rank constraints makes polynomial constraints(generally nonconvex) transformable to convex constraints. All optimizationproblems having polynomial objective and polynomial constraints can bereformulated as a semidefinite program with a rank-1 constraint. [212]Suppose we require3 + 2x − xy ≤ 0 (676)Assign⎡ ⎤x [ x y 1]G = ⎣ y ⎦ =1[ X zz T 1]∆=⎡ ⎤x 2 xy x⎣ xy y 2 y ⎦∈ S 3 (677)x y 1The polynomial constraint (676) is equivalent to the constraint set (B.1.0.2)tr(GA) ≤ 0[ X zG =z T 1rankG = 1](≽ 0)(678)in symmetric variable matrix X ∈ S 2 and variable vector z ∈ R 2 where⎡ ⎤0 − 1 12A = ⎣ − 1 0 0 ⎦ (679)21 0 3Then the method of convex iteration from4.4.1 is applied to implement therank constraint.4.6.0.0.4 Example. Boolean vector feasible to Ax ≼ b. (confer4.2.3.1.1)Now we consider solution to a discrete problem whose only known analyticalmethod of solution is combinatorial in complexity: given A∈ R M×N andb∈ R M find x ∈ R Nsubject to Ax ≼ bδ(xx T ) = 1(680)This nonconvex problem demands a Boolean solution [x i = ±1, i=1... N ].

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 287Assign a rank-1 matrix of variables; symmetric variable matrix X andsolution vector x :[ ] [ ] [ ]x [xG =T 1] X x ∆ xxTx=1x T =1 x T ∈ S N+1 (681)1Then design an equivalent semidefinite feasibility problem to find a Booleansolution to Ax ≼ b :findX∈S Nx ∈ R Nsubject to Ax ≼ bG =[ X xx T 1rankG = 1(G ≽ 0)δ(X) = 1](682)where x ⋆ i ∈ {−1, 1}, i=1... N . The two variables X and x are madedependent via their assignment to rank-1 matrix G . By (1430), an optimalrank-1 matrix G ⋆ must take the form (681).As before, we regularize the rank constraint by introducing a directionmatrix Y into the objective:minimize 〈G, Y 〉X∈S N , x∈R Nsubject to Ax ≼ b[ X xG =x T 1δ(X) = 1]≽ 0(683)Solution of this semidefinite program is iterated with calculation of thedirection matrix Y from semidefinite program (667). At convergence, in thesense (635), convex problem (683) becomes equivalent to nonconvex Booleanproblem (680).By (1506a), direction matrix Y can be an orthogonal projector havingclosed-form expression. Given randomized data A and b for a large problem,we find that stalling becomes likely (convergence of the iteration to a positivefixed point 〈G ⋆ , Y 〉). To overcome this behavior, we introduce a heuristicinto the implementation inF.6 that momentarily reverses direction of search(like (668)) upon stall detection. We find that rate of convergence can besped significantly by detecting stalls early.

288 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.6.0.0.5 Example. Variable-vector normalization.Suppose, within some convex optimization problem, we want vector variablesx, y ∈ R N constrained by a nonconvex equality:x‖y‖ = y (684)id est, ‖x‖ = 1 and x points in the same direction as y ; e.g.,minimize f(x, y)x , ysubject to (x, y) ∈ Cx‖y‖ = y(685)where f is some convex function and C is some convex set. We can realizethe nonconvex equality by constraining rank and adding a regularizationterm to the objective. Make the assignment:⎡ ⎤x [x T y T 1]⎡G = ⎣ y ⎦ = ⎣1X Z xZ Y yx T y T 1⎤⎡⎦=∆ ⎣xx T xy T xyx T yy T yx T y T 1⎤⎦∈ S 2N+1 (686)where X , Y ∈ S N , also Z ∈ S N [sic] . Any rank-1 solution must take theform of (686). (B.1) The problem statement equivalent to (685) is thenwrittenminimizeX , Y , Z , x , ysubject to (x, y) ∈ C⎡f(x, y) + ‖X − Y ‖ FG = ⎣rankG = 1(G ≽ 0)tr(X) = 1δ(Z) ≽ 0X Z xZ Y yx T y T 1⎤⎦(687)The trace constraint on X normalizes vector x while the diagonal constrainton Z maintains sign between respective entries of x and y . Regularizationterm ‖X −Y ‖ F then makes x equal to y to within a real scalar. (C.2.0.0.1)To make this program solvable by convex iteration, as explained before, we

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 289move the rank constraint to the objectiveminimizeX , Y , Z , x , ysubject to (x, y) ∈ C⎡f(x, y) + ‖X − Y ‖ F + 〈G, Y 〉G = ⎣tr(X) = 1δ(Z) ≽ 0X Z xZ Y yx T y T 1⎤⎦≽ 0(688)by introducing a direction matrix Y found from (1506a)minimizeY ∈ S 2N+1 〈G ⋆ , Y 〉subject to 0 ≼ Y ≼ ItrY = 2N(689)which has an optimal solution that is known in closed form. Iteration(688) (689) terminates when rankG = 1 and regularization 〈G, Y 〉 vanishesto within some numerical tolerance in (688); typically, in two iterations.If function f competes too much with the regularization, positivelyweighting each regularization term will become required. At convergence,problem (688) becomes a convex equivalent to the original nonconvexproblem (685).4.6.0.0.6 Example. fast max cut. [77]Let Γ be an n-node graph, and let the arcs (i , j) of the graph beassociated with [ ] weights a ij . The problem is to find a cut of thelargest possible weight, i.e., to partition the set of nodes into twoparts S, S ′ in such a way that the total weight of all arcs linkingS and S ′ (i.e., with one incident node in S and the other onein S ′ [Figure 76]) is as large as possible. [27,4.3.3]Literature on the max cut problem is vast because this problem has elegantprimal and dual formulation, its solution is very difficult, and there existmany commercial applications; e.g., semiconductor design [83], quantumcomputing [297].

290 CHAPTER 4. SEMIDEFINITE PROGRAMMING2 3 4S1CUT165S ′154-1-351-36721141398121110Figure 76: A cut partitions nodes {i=1...16} of this graph into S and S ′ .Linear arcs have circled weights. The problem is to find a cut maximizingtotal weight of all arcs linking partitions made by the cut.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 291Our purpose here is to demonstrate how iteration of two simple convexproblems can quickly converge to an optimal solution of the max cutproblem with a 98% success rate, on average. 4.30 max cut is stated:maximizex∑a ij (1 − x i x j ) 1 21≤i

292 CHAPTER 4. SEMIDEFINITE PROGRAMMINGBecause an estimate of upper bound to max cut is needed to ascertainconvergence when vector x has large dimension, we digress to derive thedual problem: Directly from (694), the Lagrangian is [46,5.1.5] (1249)L(x, ν) = 1 4 〈xxT , δ(A1) − A〉 + 〈ν , δ(xx T ) − 1〉= 1 4 〈xxT , δ(A1) − A〉 + 〈δ(ν), xx T 〉 − 〈ν , 1〉= 1 4 〈xxT , δ(A1 + 4ν) − A〉 − 〈ν , 1〉(695)where quadratic x T (δ(A1+ 4ν)−A)x has supremum 0 if δ(A1+ 4ν)−A isnegative semidefinite, and has supremum ∞ otherwise. The finite supremumof dual functiong(ν) = sup L(x, ν) =x{ −〈ν , 1〉 , A − δ(A1 + 4ν) ≽ 0∞otherwise(696)is chosen to be the objective of minimization to dual (convex) problemminimize −ν T 1νsubject to A − δ(A1 + 4ν) ≽ 0(697)whose optimal value provides a least upper bound to max cut, but is nottight ( 1 4 〈xxT , δ(A1)−A〉< g(ν) , duality gap is nonzero). [108] In fact, wefind that the bound’s variance with problem instance is too large to beuseful for this problem; thus ending our digression. 4.31To transform max cut to its convex equivalent, first definethen max cut (694) becomesX = xx T ∈ S n (702)maximize1X∈ S n 4〈X , δ(A1) − A〉subject to δ(X) = 1(X ≽ 0)rankX = 1(698)4.31 Taking the dual of dual problem (697) would provide (698) but without the rankconstraint. [101]

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 293whose rank constraint can be regularized as inmaximize1X∈ S n 4subject to δ(X) = 1X ≽ 0〈X , δ(A1) − A〉 − w〈X , W 〉(699)where w ≈1000 is a nonnegative fixed weight, and W is a direction matrixdetermined fromn∑λ(X ⋆ ) ii=2= minimizeW ∈ S n 〈X ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = n − 1(1506a)which has an optimal solution that is known in closed form. These twoproblems (699) and (1506a) are iterated until convergence as defined onpage 257.Because convex problem statement (699) is so elegant, it is numericallysolvable for large binary vectors within reasonable time. 4.32 To test ourconvex iterative method, we compare an optimal convex result to anactual solution of the max cut problem found by performing a brute forcecombinatorial search of (694) 4.33 for a tight upper bound. Search-time limitsbinary vector lengths to 24 bits (about five days CPU time). Accuracyobtained, 98%, is independent of binary vector length (12, 13, 20, 24)when averaged over more than 231 problem instances including planar,randomized, and toroidal graphs. 4.34 When failure occurred, large andsmall errors were manifest. That same 98% average accuracy is presumedmaintained when binary vector length is further increased. A Matlabprogram is provided inF.7.4.32 We solved for a length-250 binary vector in only a few minutes and convex iterationson a Dell Precision model M90.4.33 more computationally intensive than the proposed convex iteration by many ordersof magnitude. Solving max cut by searching over all binary vectors of length 100, forexample, would occupy a contemporary supercomputer for a million years.4.34 Existence of a polynomial-time approximation to max cut with accuracy better than94.11% would refute proof of NP-hardness, which some researchers believe to be highlyunlikely. [131, thm.8.2]

294 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.6.0.0.7 Example. Cardinality/rank problem.d’Aspremont et alii [65] propose approximating a positive semidefinite matrixA∈ S N + by a rank-one matrix having a constraint on cardinality c : for0 < c < Nminimize ‖A − zz T ‖ Fz(700)subject to cardz ≤ cwhich, they explain, is a hard problem equivalent tomaximize x T Axxsubject to ‖x‖ = 1cardx ≤ c(701)where z ∆ = √ λ x and where optimal solution x ⋆ is a principal eigenvector(1500) (A.5) of A and λ = x ⋆T Ax ⋆ is the principal eigenvalue when c istrue cardinality of that eigenvector. This is principal component analysiswith a cardinality constraint which controls solution sparsity. Define thematrix variableX ∆ = xx T ∈ S N (702)whose desired rank is 1, and whose desired diagonal cardinalitycardδ(X) ≡ cardx (703)is equivalent to cardinality c of vector x . Then we can transform cardinalityproblem (701) to an equivalent problem in new variable X : 4.35maximizeX∈S N 〈X , A〉subject to 〈X , I〉 = 1(X ≽ 0)rankX = 1cardδ(X) ≤ c(704)We transform problem (704) to an equivalent convex problem byintroducing two direction matrices: W to achieve desired cardinality4.35 A semidefiniteness constraint X ≽ 0 is not required, theoretically, because positivesemidefiniteness of a rank-1 matrix is enforced by symmetry. (Theorem A.3.1.0.7)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 295cardδ(X) , and Y to find an approximating rank-one matrix X :maximize 〈X , A − w 1 Y 〉 − w 2 〈δ(X) , δ(W)〉X∈S Nsubject to 〈X , I〉 = 1X ≽ 0(705)where w 1 and w 2 are positive scalars respectively weighting tr(XY ) andδ(X) T δ(W) just enough to insure that they vanish to within some numericalprecision, where direction matrix Y is an optimal solution to semidefiniteprogramminimizeY ∈ S N 〈X ⋆ , Y 〉subject to 0 ≼ Y ≼ ItrY = N − 1(706)and where diagonal direction matrix W ∈ S N optimally solves linear programminimize 〈δ(X ⋆ ) , δ(W)〉W=δ 2 (W)subject to 0 ≼ δ(W) ≼ 1trW = N − c(707)Both direction matrix programs are derived from (1506a) whose analyticalsolution is known but is not necessarily unique. We emphasize (confer p.257):because this iteration (705) (706) (707) (initial Y,W = 0) is not a projectionmethod, success relies on existence of matrices in the feasible set of (705)having desired rank and diagonal cardinality. In particular, the feasible setof convex problem (705) is a Fantope (81) whose extreme points constitutethe set of all normalized rank-one matrices; among those are found rank-onematrices of any desired diagonal cardinality.Convex problem (705) is neither a relaxation of cardinality problem (701);instead, problem (705) is a convex equivalent to (701) at convergence ofiteration (705) (706) (707). Because the feasible set of convex problem (705)contains all normalized rank-one matrices of any desired diagonal cardinality,a constraint too low or high in cardinality will not prevent solution. Anoptimal solution, whose diagonal cardinality is equal to cardinality of aprincipal eigenvector of matrix A , will produce the lowest residual Frobeniusnorm (to within machine precision and other noise processes) in the originalproblem statement (700).

296 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFigure 77: Massachusetts Institute of Technology (MIT) logo, including itswhite boundary, may be interpreted as a rank-5 matrix. (Stanford Universitylogo rank is much higher;) This constitutes Scene Y observed by theone-pixel camera in Figure 78 for Example 4.6.0.0.8.4.6.0.0.8 Example. Compressed sensing, compressive sampling. [225]As our modern technology-driven civilization acquires and exploitsever-increasing amounts of data, everyone now knows that most of the datawe acquire can be thrown away with almost no perceptual loss − witness thebroad success of lossy compression formats for sounds, images, and specializedtechnical data. The phenomenon of ubiquitous compressibility raises verynatural questions: Why go to so much effort to acquire all the data whenmost of what we get will be thrown away? Can’t we just directly measure thepart that won’t end up being thrown away? −David Donoho [81]Lossy data compression techniques are popular, but it is also well knownthat losses become quite perceptible with signal processing that goes beyondmere playback of a compressed signal. Spatial or audible frequencies maskedby a simultaneity become perceptible with significant post-filtering of thecompressed signal, for example. Further, there can be no universallyacceptable and unique metric of perception for gauging exactly how muchdata can be tossed. For these reasons, there will always be need for raw(uncompressed) data.In this example we throw out only so much information as to leaveperfect reconstruction within reach. Specifically, the MIT logo in Figure 77is perfectly reconstructed from 700 time-sequential samples {y i } acquiredby the one-pixel camera illustrated in Figure 78. The MIT-logo imagein this example effectively impinges a 46×81 array micromirror DMD.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 297Yy iFigure 78: One-pixel camera from [281]. Compressive imaging (CI) camerablock diagram. Incident lightfield (corresponding to the desired image Y )is reflected off a digital micromirror device (DMD) array whose mirrororientations are modulated in the pseudorandom pattern supplied by therandom number generators (RNG). Each different mirror pattern producesa voltage at the single photodiode that corresponds to one measurement y i .This mirror array is modulated by a pseudonoise source that independentlypositions all the individual mirrors. A single photodiode (one pixel)integrates incident light from all mirrors. After stabilizing the mirrors toa fixed but pseudorandom pattern, light so collected is then digitized intoone sample y i by analog-to-digital (A/D) conversion. This sampling processis repeated with the micromirror array modulated to a new pseudorandompattern.The most important questions are: How many samples do we need forperfect reconstruction? Does that number of samples represent compressionof the original data?We claim that perfect reconstruction of the MIT logo can be reliablyachieved with as few as 700 samples y=[y i ]∈ R 700 from this one-pixelcamera. That number represents only 19% of total information from themicromirrors. 4.364.36 This number is reported in [225,6] as difficult to achieve. If a minimal basis forthe MIT logo were instead constructed, only five rows or columns worth of data areindependent. This means a lower bound on achievable compression is about 230 samples;which corresponds to 6% of the original information.

298 CHAPTER 4. SEMIDEFINITE PROGRAMMINGOur approach to reconstruction is to look for low-rank solution to anunderdetermined system:find Xsubject to A vec X = yrankX ≤ 5(708)where vec X is vectorized matrix X ∈ R 46×81 (stacked columns). Each rowof fat matrix A is one realization of a pseudorandom pattern applied to themicromirrors. Since these patterns are deterministic (known), then the i thsample y i equals A(i, :) vec Y ; id est, y = A vec Y . Perfect reconstructionmeans optimal solution X ⋆ equals scene Y ∈ R 46×81 to within machineprecision.Because variable matrix X is generally not square or positive semidefinite,we constrain its rank by rewriting the problem equivalentlyfind Xsubject to A vec[X = y]W1 XrankX T ≤ 5W 2(709)This rank constraint on the composite matrix insures rankX ≤ 5 for anychoice of dimensionally compatible matrices W 1 and W 2 . But to solve thisproblem by convex iteration, we alternate solution of semidefinite program([ ] )minimize tr W1 XW 1 , W 2 , X X T ZW 2subject to A vec X = y[ ]W1 XX T ≽ 0W 2(710)with semidefinite programminimizeZ([ ] ⋆ )W1 XtrX T ZW 2subject to 0 ≼ Z ≼ ItrZ = 46 + 81 − 5(711)(which has an optimal solution that is known in closed form, p.555)until a rank-5 composite matrix is found. With 1000 samples {y i } ,

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 299convergence occurs in two iterations; 700 samples require more than ten,but reconstruction remains perfect. Reconstruction is independent ofpseudorandom sequence parameters; e.g., binary sequences succeed with thesame efficiency as Gaussian or uniformly distributed sequences. 4.6.1 rank-constraint midsummaryWe find that this direction matrix idea works well and quite independentlyof desired rank or affine dimension. This idea of direction matrix is goodprincipally because of its simplicity: When confronted with a problemotherwise convex if not for a rank constraint, then that constraint becomesa linear regularization term in the objective.There exists a common thread through all these Examples: that being,convex iteration with a direction matrix as normal to a linear regularization(a generalization of the well-known trace heuristic). But each problem type(per Example) possesses its own idiosyncrasies that slightly modify how arank-constrained optimal solution is actually obtained. The ball packingproblem in Chapter 5.4.2.2.3, for example, requires a problem sequence ina progressively larger number of balls to find a good initial value for thedirection matrix, whereas many of the examples in the present chapter requirean initial value of 0. Finding a feasible Boolean vector in Example 4.6.0.0.4requires a procedure to detect stalls, when other problems have no suchrequirement. Some problems require a careful weighting of the regularizationterm, whereas other problems do not; and so on.It would be nice if there were a universally applicable method forconstraining rank; one that is less susceptible to quirks of a particularproblem type. Further, poor initial value of the direction matrix from theregularization can lead to an erroneous result of convex iteration. That isbecause the underlying problem (633) is nonconvex with many local minima.We speculate the reason to be a simple dearth of optimal solutions; 4.37 anunfortunate choice of initial search direction leading astray. Ease of solutionby convex iteration occurs when numerous optimal solutions exist. Withthis speculation in mind, we now propose a further generalization of convexiteration that attempts to ameliorate quirks and unify problem types:4.37 Recall that, in Convex Optimization, an optimal solution generally comes from aconvex set of optimal solutions that can be large; that is more typical than a uniqueoptimal solution.

300 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.7 Convex Iteration rank-1We now develop a general method for constraining rank that first decomposesa given problem via standard diagonalization of matrices (A.5). Thismethod is motivated by observation that an optimal direction matrix can bediagonalizable simultaneously with an optimal variable matrix. This suggestsminimization of an objective function directly in terms of eigenvalues. Asecond motivating observation is that variable orthogonal matrices seemeasily found by convex iteration; e.g., Procrustes Example 4.6.0.0.2.It turns out that this general method always requires solution to a rank-1constrained problem regardless of desired rank from the original problem. Todemonstrate, we pose a semidefinite feasibility problemfindX ∈ S nsubject to A svec X = bX ≽ 0rankX ≤ ρ(712)given an upper bound 0 < ρ < n on rank, vector b ∈ R m , and typically fatfull-rank⎡ ⎤svec(A 1 ) TA =∆ ⎣ . ⎦∈ R m×n(n+1)/2 (548)svec(A m ) Twhere A i ∈ S n , i=1... m are also given. Symmetric matrix vectorizationsvec is defined in (47). Thus⎡ ⎤tr(A 1 X)A svec X = ⎣ . ⎦ (549)tr(A m X)This program (712) states the classical problem of finding a matrix ofspecified rank in the intersection of the positive semidefinite cone witha number m of hyperplanes in the subspace of symmetric matrices S n .[20,II.13] [21,2.2]Express the nonincreasingly ordered diagonalization of variable matrixX ∆ = QΛQ T =n∑λ i Q ii ∈ S n (713)i=1

4.7. CONVEX ITERATION RANK-1 301which is a sum of rank-1 orthogonal projection matrices Q ii weighted byeigenvalues λ i where Q ij ∆ = q i q T j ∈ R n×n , Q = [q 1 · · · q n ]∈ R n×n , Q T = Q −1 ,Λ ii = λ i ∈ R , and⎡Λ = ⎢⎣⎤λ 1 0λ 2 ⎥ ... ⎦ ∈ Sn (714)0 T λ nThe factΛ ≽ 0 ⇔ X ≽ 0 (1279)allows splitting semidefinite feasibility problem (712) into two parts:( ρ) minimizeΛ∥ A svec ∑λ i Q ii − b∥i=1[ Rρ] (715)+subject to δ(Λ) ∈0( ρ) minimizeQ∥ A svec ∑λ ⋆ i Q ii − b∥i=1(716)subject to Q T = Q −1The linear equality constraint A svec X = b has been moved to the objectivewithin a norm because these two problems (715) (716) are iterated; equalitymight only become feasible near convergence. This iteration always convergesto a local minimum because the sequence of objective values is monotonicand nonincreasing; any real monotonically nonincreasing sequence converges.[190,1.2] [30,1.1] A rank-ρ matrix X feasible to the original problem (712)is found when the objective converges to 0. Positive semidefiniteness ofmatrix X with an upper bound ρ on rank is assured by the constraint oneigenvalue matrix Λ in convex problem (715); it means, the main diagonalof Λ must belong to the nonnegative orthant in a ρ-dimensional subspaceof R n .The second problem (716) in the iteration is not convex. We propose

302 CHAPTER 4. SEMIDEFINITE PROGRAMMINGsolving it by convex iteration: Make the assignment⎡ ⎤q 1 [q1 T · · · qρ T ]G = ⎣.⎦∈ S nρq ρ⎡⎤ ⎡Q 11 · · · Q 1ρ q 1 q= ⎣ .... . ⎦ ∆ 1 T · · · q 1 qρT= ⎣.... .Q T 1ρ · · · Q ρρ q ρ q1 T · · · q ρ qρT⎤⎦(717)Given ρ eigenvalues λ ⋆ i , then a problem equivalent to (716) is∥ ( ∥∥∥ ρ) ∑minimize A svec λ ⋆Q ij ∈R n×ni Q ii − b∥i=1subject to trQ ii = 1,trQ ij = 0,⎡⎤Q 11 · · · Q 1ρG = ⎣ .... . ⎦(≽ 0)Q T 1ρ · · · Q ρρrankG = 1i=1... ρi

4.7. CONVEX ITERATION RANK-1 303the latter providing direction of search W for a rank-1 matrix G in (719).An optimal solution to (720) has closed form (p.555). These convex problems(719) (720) are iterated with (715) until a rank-1 G matrix is found and theobjective of (719) vanishes.For a fixed number of equality constraints m and upper bound ρ , thefeasible solution set in rank-constrained semidefinite feasibility problem (712)grows with matrix dimension n . Empirically, this semidefinite feasibilityproblem becomes easier to solve by method of convex iteration as n increases.We find a good value of weight w ≈ 5 for randomized problems. Initialvalue of direction matrix W is not critical. With dimension n = 26 andnumber of equality constraints m = 10, convergence to a rank-ρ = 2 solutionto semidefinite feasibility problem (712) is achieved for nearly all realizationsin less than 10 iterations. (This finding is significant in so far as Barvinok’sProposition 2.9.3.0.1 predicts existence of matrices having rank 4 or less inthis intersection, but his upper bound can be no tighter than 3.) For small n ,stall detection is required; one numerical implementation is disclosed inF.6.This diagonalization decomposition technique is extensible to otherproblem types, of course, including optimization problems having objectives.Because of a greatly expanded feasible set, for example,find X ∈ S nsubject to A svec X ≽ bX ≽ 0rankX ≤ ρ(721)this relaxation of (712) [165,III] is more easily solved by convexiteration rank-1. (The inequality in A remains in the constraints afterdiagonalization.) Because of the nonconvex nature of the underlying problem(633), more generally, there can be no proof of global convergence of convexiteration from an arbitrary initialization; although, local convergence isguaranteed by virtue of monotonically nonincreasing objective sequences.[190,1.2] [30,1.1]

304 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Chapter 5Euclidean Distance MatrixThese results were obtained by Schoenberg (1935), a surprisinglylate date for such a fundamental property of Euclidean geometry.−John Clifford Gower [112,3]By itself, distance information between many points in Euclidean space islacking. We might want to know more; such as, relative or absolute positionor dimension of some hull. A question naturally arising in some fields(e.g., geodesy, economics, genetics, psychology, biochemistry, engineering)[69] asks what facts can be deduced given only distance information. Whatcan we know about the underlying points that the distance informationpurports to describe? We also ask what it means when given distanceinformation is incomplete; or suppose the distance information is not reliable,available, or specified only by certain tolerances (affine inequalities). Thesequestions motivate a study of interpoint distance, well represented in anyspatial dimension by a simple matrix from linear algebra. 5.1 In what follows,we will answer some of these questions via Euclidean distance matrices.5.1 e.g.,◦√D ∈ R N×N , a classical two-dimensional matrix representation of absoluteinterpoint distance because its entries (in ordered rows and columns) can be written neatlyon a piece of paper. Matrix D will be reserved throughout to hold distance-square.2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.305

306 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX√52D =⎡⎣1 2 30 1 51 0 45 4 0⎤⎦1231Figure 79: Convex hull of three points (N = 3) is shaded in R 3 (n = 3).Dotted lines are imagined vectors to points.5.1 EDMEuclidean space R n is a finite-dimensional real vector space having an innerproduct defined on it, hence a metric as well. [167,3.1] A Euclidean distancematrix, an EDM in R N×N+ , is an exhaustive table of distance-square d ijbetween points taken by pair from a list of N points {x l , l=1... N} in R n ;the squared metric, the measure of distance-square:d ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (722)Each point is labelled ordinally, hence the row or column index of an EDM,i or j =1... N , individually addresses all the points in the list.Consider the following example of an EDM for the case N = 3 :D = [d ij ] =⎡⎣⎤d 11 d 12 d 13d 21 d 22 d 23⎦ =d 31 d 32 d 33⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎡⎦ = ⎣0 1 51 0 45 4 0⎤⎦ (723)Matrix D has N 2 entries but only N(N −1)/2 pieces of information. InFigure 79 are shown three points in R 3 that can be arranged in a list to

5.2. FIRST METRIC PROPERTIES 307correspond to D in (723). Such a list is not unique because any rotation,reflection, or translation (5.5) of the points in Figure 79 would produce thesame EDM D .5.2 First metric propertiesFor i,j =1... N , distance between points x i and x j must satisfy thedefining requirements imposed upon any metric space: [167,1.1] [190,1.7]namely, for Euclidean metric √ d ij (5.4) in R n1. √ d ij ≥ 0, i ≠ j nonnegativity2. √ d ij = 0, i = j self-distance3. √ d ij = √ d ji symmetry4. √ d ij ≤ √ d ik + √ d kj , i≠j ≠k triangle inequalityThen all entries of an EDM must be in concord with these Euclidean metricproperties: specifically, each entry must be nonnegative, 5.2 the main diagonalmust be 0 , 5.3 and an EDM must be symmetric. The fourth propertyprovides upper and lower bounds for each entry. Property 4 is true moregenerally when there are no restrictions on indices i,j,k , but furnishes nonew information.5.3 ∃ fifth Euclidean metric propertyThe four properties of the Euclidean metric provide information insufficientto certify that a bounded convex polyhedron more complicated than atriangle has a Euclidean realization. [112,2] Yet any list of points or thevertices of any bounded convex polyhedron must conform to the properties.5.2 Implicit from the terminology, √ d ij ≥ 0 ⇔ d ij ≥ 0 is always assumed.5.3 What we call self-distance, Marsden calls nondegeneracy. [190,1.6] Kreyszig callsthese first metric properties axioms of the metric; [167, p.4] Blumenthal refers to them aspostulates. [37, p.15]

308 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.3.0.0.1 Example. Triangle.Consider the EDM in (723), but missing one of its entries:⎡0⎤1 d 13D = ⎣ 1 0 4 ⎦ (724)d 31 4 0Can we determine unknown entries of D by applying the metric properties?Property 1 demands √ d 13 , √ d 31 ≥ 0, property 2 requires the main diagonalbe 0, while property 3 makes √ d 31 = √ d 13 . The fourth property tells us1 ≤ √ d 13 ≤ 3 (725)Indeed, described over that closed interval [1, 3] is a family of triangularpolyhedra whose angle at vertex x 2 varies from 0 to π radians. So, yes wecan determine the unknown entries of D , but they are not unique; nor shouldthey be from the information given for this example.5.3.0.0.2 Example. Small completion problem, I.Now consider the polyhedron in Figure 80(b) formed from an unknownlist {x 1 ,x 2 ,x 3 ,x 4 }. The corresponding EDM less one critical piece ofinformation, d 14 , is given by⎡⎤0 1 5 d 14D = ⎢ 1 0 4 1⎥⎣ 5 4 0 1 ⎦ (726)d 14 1 1 0From metric property 4 we may write a few inequalities for the two trianglescommon to d 14 ; we find√5−1 ≤√d14 ≤ 2 (727)We cannot further narrow those bounds on √ d 14 using only the four metricproperties (5.8.3.1.1). Yet there is only one possible choice for √ d 14 becausepoints x 2 ,x 3 ,x 4 must be collinear. All other values of √ d 14 in the interval[ √ 5−1, 2] specify impossible distances in any dimension; id est, in thisparticular example the triangle inequality does not yield an interval for √ d 14over which a family of convex polyhedra can be reconstructed.

5.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 309x 3x 4x 3x 4(a)√512(b)1x 1 1 x 2x 1 x 2Figure 80: (a) Complete dimensionless EDM graph. (b) Emphasizingobscured segments x 2 x 4 , x 4 x 3 , and x 2 x 3 , now only five (2N −3) absolutedistances are specified. EDM so represented is incomplete, missing d 14 asin (726), yet the isometric reconstruction (5.4.2.2.5) is unique as proved in5.9.3.0.1 and5.14.4.1.1. First four properties of Euclidean metric are nota recipe for reconstruction of this polyhedron.We will return to this simple Example 5.3.0.0.2 to illustrate more elegantmethods of solution in5.8.3.1.1,5.9.3.0.1, and5.14.4.1.1. Until then, wecan deduce some general principles from the foregoing examples:Unknown d ij of an EDM are not necessarily uniquely determinable.The triangle inequality does not produce necessarily tight bounds. 5.4Four Euclidean metric properties are insufficient for reconstruction.5.4 The term tight with reference to an inequality means equality is achievable.

310 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.3.1 LookaheadThere must exist at least one requirement more than the four properties ofthe Euclidean metric that makes them altogether necessary and sufficient tocertify realizability of bounded convex polyhedra. Indeed, there are infinitelymany more; there are precisely N + 1 necessary and sufficient Euclideanmetric requirements for N points constituting a generating list (2.3.2). Hereis the fifth requirement:5.3.1.0.1 Fifth Euclidean metric property. Relative-angle inequality.(confer5.14.2.1.1) Augmenting the four fundamental properties of theEuclidean metric in R n , for all i,j,l ≠ k ∈{1... N}, i

5.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 311kθikjθ iklθ jkliljFigure 81: Nomenclature for fifth Euclidean metric property. Each angle θis made by a vector pair at vertex k while i,j,k,l index four points. Thefifth property is necessary for realization of four or more points; a reckoningby three angles in any dimension.

312 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXNow we develop some invaluable concepts, moving toward a link of theEuclidean metric properties to matrix criteria.5.4 EDM definitionAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrixX = [x 1 · · · x N ] ∈ R n×N (66)where N is regarded as cardinality of list X . When matrix D =[d ij ] is anEDM, its entries must be related to those points constituting the list by theEuclidean distance-square: for i,j=1... N (A.1.1 no.23)d ij = ‖x i − x j ‖ 2 = (x i − x j ) T (x i − x j ) = ‖x i ‖ 2 + ‖x j ‖ 2 − 2x T ix j= [ ] [ ] [ ]x T i x T I −I xij−I I x j= vec(X) T (Φ ij ⊗ I) vec X = 〈Φ ij , X T X〉(730)where⎡vec X = ⎢⎣⎤x 1x 2⎥. ⎦ ∈ RnN (731)x Nand where Φ ij ⊗ I has I ∈ S n in its ii th and jj th block of entries while−I ∈ S n fills its ij th and ji th block; id est,Φ ij ∆ = δ((e i e T j + e j e T i )1) − (e i e T j + e j e T i ) ∈ S N += e i e T i + e j e T j − e i e T j − e j e T (732)i= (e i − e j )(e i − e j ) Twhere {e i ∈ R N , i=1... N} is the set of standard basis vectors, and ⊗signifies the Kronecker product (D.1.2.1). Thus each entry d ij is a convexquadratic function [46,3,4] of vec X (30). [232,6]

5.4. EDM DEFINITION 313The collection of all Euclidean distance matrices EDM N is a convex subsetof R N×N+ called the EDM cone (6, Figure 114, p.456);0 ∈ EDM N ⊆ S N h ∩ R N×N+ ⊂ S N (733)An EDM D must be expressible as a function of some list X ; id est, it musthave the formD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (734)= [vec(X) T (Φ ij ⊗ I) vecX , i,j=1... N] (735)Function D(X) will make an EDM given any X ∈ R n×N , conversely, butD(X) is not a convex function of X (5.4.1). Now the EDM cone may bedescribed:EDM N = { D(X) | X ∈ R N−1×N} (736)Expression D(X) is a matrix definition of EDM and so conforms to theEuclidean metric properties:Nonnegativity of EDM entries (property 1,5.2) is obvious from thedistance-square definition (730), so holds for any D expressible in the formD(X) in (734).When we say D is an EDM, reading from (734), it implicitly meansthe main diagonal must be 0 (property 2, self-distance) and D must besymmetric (property 3); δ(D) = 0 and D T = D or, equivalently, D ∈ S N hare necessary matrix criteria.5.4.0.1 homogeneityFunction D(X) is homogeneous in the sense, for ζ ∈ R√ √◦D(ζX) = |ζ|◦D(X) (737)where the positive square root is entrywise.Any nonnegatively scaled EDM remains an EDM; id est, the matrix classEDM is invariant to nonnegative scaling (αD(X) for α≥0) because allEDMs of dimension N constitute a convex cone EDM N (6, Figure 101).

314 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.1 −V T N D(X)V N convexity([ ])xiWe saw that EDM entries d ij are convex quadratic functions. Yetx j−D(X) (734) is not a quasiconvex function of matrix X ∈ R n×N because thesecond directional derivative (3.3)− d2dt 2 ∣∣∣∣t=0D(X+ t Y ) = 2 ( −δ(Y T Y )1 T − 1δ(Y T Y ) T + 2Y T Y ) (738)is indefinite for any Y ∈ R n×N since its main diagonal is 0. [110,4.2.8][150,7.1, prob.2] Hence −D(X) can neither be convex in X .The outcome is different when instead we consider−V T N D(X)V N = 2V T NX T XV N (739)where we introduce the full-rank skinny Schoenberg auxiliary matrix (B.4.2)⎡V ∆ N = √ 12 ⎢⎣(N(V N )=0) having range−1 −1 · · · −11 01. . .0 1⎤⎥⎦= 1 √2[ −1TI]∈ R N×N−1 (740)R(V N ) = N(1 T ) , V T N 1 = 0 (741)Matrix-valued function (739) meets the criterion for convexity in3.2.3.0.2over its domain that is all of R n×N ; videlicet, for any Y ∈ R n×N− d2dt 2 V T N D(X + t Y )V N = 4V T N Y T Y V N ≽ 0 (742)Quadratic matrix-valued function −VN TD(X)V N is therefore convex in Xachieving its minimum, with respect to a positive semidefinite cone (2.7.2.2),at X = 0. When the penultimate number of points exceeds the dimensionof the space n < N −1, strict convexity of the quadratic (739) becomesimpossible because (742) could not then be positive definite.

5.4. EDM DEFINITION 3155.4.2 Gram-form EDM definitionPositive semidefinite matrix X T X in (734), formed from inner product oflist X , is known as a Gram matrix; [183,3.6]⎡⎤‖x 1 ‖ 2 x T⎡ ⎤1x 2 x T 1x 3 · · · x T 1x Nx T 1 [x 1 · · · x N ]xG = ∆ X T ⎢ ⎥T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX = ⎣ . ⎦ =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N +xNT ⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2⎡⎤⎛⎡⎤⎞1 cos ψ 12 cos ψ 13 · · · cos ψ 1N ⎛⎡⎤⎞‖x 1 ‖‖x ‖x 2 ‖cos ψ 12 1 cos ψ 23 · · · cos ψ 2N1 ‖= δ⎜⎢⎥⎟⎝⎣. ⎦⎠cos ψ 13 cos ψ 23 1... cos ψ ‖x 2 ‖3Nδ⎜⎢⎥⎟⎢⎥ ⎝⎣. ⎦⎠‖x N ‖⎣ . ....... . ⎦‖x N ‖cos ψ 1N cosψ 2N cos ψ 3N · · · 1∆= √ δ 2 (G) Ψ √ δ 2 (G) (743)where ψ ij (763) is angle between vectors x i and x j , and where δ 2 denotesa diagonal matrix in this case. Positive semidefiniteness of interpoint anglematrix Ψ implies positive semidefiniteness of Gram matrix G ; [46,8.3]G ≽ 0 ⇐ Ψ ≽ 0 (744)When δ 2 (G) is nonsingular, then G ≽ 0 ⇔ Ψ ≽ 0. (A.3.1.0.5)Distance-square d ij (730) is related to Gram matrix entries G T = G ∆ = [g ij ]d ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(745)where Φ ij is defined in (732). Hence the linear EDM definition}D(G) = ∆ δ(G)1 T + 1δ(G) T − 2G ∈ EDM N⇐ G ≽ 0 (746)= [〈Φ ij , G〉 , i,j=1... N]The EDM cone may be described, (confer (823)(829))EDM N = { }D(G) | G ∈ S N +(747)

316 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.2.1 First point at originAssume the first point x 1 in an unknown list X resides at the origin;Xe 1 = 0 ⇔ Ge 1 = 0 (748)Consider the symmetric translation (I − 1e T 1 )D(G)(I − e 1 1 T ) that shiftsthe first row and column of D(G) to the origin; setting Gram-form EDMoperator D(G) = D for convenience,− ( D − (De 1 1 T + 1e T 1D) + 1e T 1De 1 1 T) 12 = G − (Ge 11 T + 1e T 1G) + 1e T 1Ge 1 1 Twheree 1 ∆ =⎡⎢⎣10.0⎤(749)⎥⎦ (750)is the first vector from the standard basis. Then it follows for D ∈ S N hG = − ( D − (De 1 1 T + 1e T 1D) ) 12 , x 1 = 0= − [ 0 √ ] TD [ √ ]2V N 0 1 2VN2[ ] 0 0T=0 −VN TDV NVN TGV N = −VN TDV N 1 2∀X(751)whereI − e 1 1 T =[0 √ 2V N](752)is a projector nonorthogonally projecting (E.1) onS N 1 = {G∈ S N | Ge 1 = 0}{ [0 √ ] T [ √ ]= 2VN Y 0 2VN | Y ∈ SN} (1796)in the Euclidean sense. From (751) we get sufficiency of the first matrixcriterion for an EDM proved by Schoenberg in 1935; [236] 5.75.7 From (741) we know R(V N )= N(1 T ) , so (753) is the same as (729). In fact, anymatrix V in place of V N will satisfy (753) whenever R(V )= R(V N )= N(1 T ). But V N isthe matrix implicit in Schoenberg’s seminal exposition.

5.4. EDM DEFINITION 317D ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(753)We provide a rigorous complete more geometric proof of this Schoenbergcriterion in5.9.1.0.3.[ ] 0 0TBy substituting G =0 −VN TDV (751) into D(G) (746), assumingNx 1 = 0[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( ) ]−VNDV T TNWe provide details of this bijection in5.6.2.[ ] 0 0T− 20 −VN TDV N(754)5.4.2.2 0 geometric centerAssume the geometric center (5.5.1.0.1) of an unknown list X is the origin;X1 = 0 ⇔ G1 = 0 (755)Now consider the calculation (I − 1 N 11T )D(G)(I − 1 N 11T ) , a geometriccentering or projection operation. (E.7.2.0.2) Setting D(G) = D forconvenience as in5.4.2.1,G = − ( D − 1 N (D11T + 11 T D) + 1N 2 11 T D11 T) 12 , X1 = 0= −V DV 1 2V GV = −V DV 1 2∀X(756)where more properties of the auxiliary (geometric centering, projection)matrixV ∆ = I − 1 N 11T ∈ S N (757)are found inB.4. From (756) and the assumption D ∈ S N h we get sufficiencyof the more popular form of Schoenberg’s criterion:

318 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N⇔{−V DV ∈ SN+D ∈ S N h(758)Of particular utility when D ∈ EDM N is the fact, (B.4.2 no.20) (730)tr ( −V DV 1 2)=12N∑i,jd ij = 12N vec(X)T (∑i,jΦ ij ⊗ I= tr(V GV ) , G ≽ 0)vec X∑= trG = N ‖x l ‖ 2 = ‖X‖ 2 F , X1 = 0l=1(759)where ∑ Φ ij ∈ S N + (732), therefore convex in vecX . We will find this traceuseful as a heuristic to minimize affine dimension of an unknown list arrangedcolumnar in X , (7.2.2) but it tends to facilitate reconstruction of a listconfiguration having least energy; id est, it compacts a reconstructed list byminimizing total norm-square of the vertices.By substituting G=−V DV 1 (756) into D(G) (746), assuming X1=02(confer5.6.1)D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(760)These relationships will allow combination of distance and Gramconstraints in any optimization problem we may pose:Constraining all main diagonal entries of a Gram matrix to 1, forexample, is equivalent to the constraint that all points lie on ahypersphere (5.9.1.0.2) of radius 1 centered at the origin. Thisis equivalent to the EDM constraint: D1 = 2N1. [61, p.116] Anyfurther constraint on that Gram matrix then applies only to interpointangle Ψ .More generally, interpoint angle Ψ can be constrained by fixing all theindividual point lengths δ(G) 1/2 ; thenΨ = − 1 2 δ2 (G) −1/2 V DV δ 2 (G) −1/2 (761)

5.4. EDM DEFINITION 3195.4.2.2.1 Example. List member constraints via Gram matrix.Capitalizing on identity (756) relating Gram and EDM D matrices, aconstraint set such astr ( − 1V DV e )2 ie T i = ‖xi ‖ 2⎫⎪tr ( ⎬− 1V DV (e 2 ie T j + e j e T i ) 2) 1 = xTi x jtr ( (762)− 1V DV e ) ⎪2 je T j = ‖xj ‖ 2 ⎭relates list member x i to x j to within an isometry through inner-productidentity [290,1-7]cos ψ ij =xT i x j‖x i ‖ ‖x j ‖For M list members, there are a total of M(M+1)/2 such constraints.(763)Consider the academic problem of finding a Gram matrix subject toconstraints on each and every entry of the corresponding EDM:findD∈S N h−V DV 1 2 ∈ SNsubject to 〈 D , (e i e T j + e j e T i ) 1 2〉= ď ij , i,j=1... N , i < j−V DV ≽ 0(764)where the ďij are given nonnegative constants. EDM D can, of course,be replaced with the equivalent Gram-form (746). Requiring only theself-adjointness property (1249) of the main-diagonal linear operator δ weget, for A∈ S N〈D , A〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , A 〉 = 〈G , δ(A1) − A〉 2 (765)Then the problem equivalent to (764) becomes, for G∈ S N c ⇔ G1=0findG∈S N csubject toG ∈ S N〈G , δ ( (e i e T j + e j e T i )1 ) 〉− (e i e T j + e j e T i ) = ďij , i,j=1... N , i < jG ≽ 0 (766)

320 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXx 3x 4x 5x 6Figure 82: Arbitrary hexagon in R 3 whose vertices are labelled clockwise.x 1x 2Barvinok’s Proposition 2.9.3.0.1 predicts existence for either formulation(764) or (766) such that implicit equality constraints induced by subspacemembership are ignored⌊√ ⌋8(N(N −1)/2) + 1 − 1rankG , rankV DV ≤= N − 1 (767)2because, in each case, the Gram matrix is confined to a face of positivesemidefinite cone S N + isomorphic with S N−1+ (6.7.1). (E.7.2.0.2) This boundis tight (5.7.1.1) and is the greatest upper bound. 5.85.4.2.2.2 Example. Hexagon.Barvinok [22,2.6] poses a problem in geometric realizability of an arbitraryhexagon (Figure 82) having:1. prescribed (one-dimensional) face-lengths l2. prescribed angles ϕ between the three pairs of opposing faces3. a constraint on the sum of norm-square of each and every vertex x5.8 −V DV | N←1 = 0 (B.4.1)

5.4. EDM DEFINITION 321ten affine equality constraints in all on a Gram matrix G∈ S 6 (756). Let’srealize this as a convex feasibility problem (with constraints written in thesame order) also assuming 0 geometric center (755):findD∈S 6 h−V DV 1 2 ∈ S6subject to tr ( )D(e i e T j + e j e T i ) 1 2 = l2ij , j−1 = (i = 1... 6) mod 6tr ( − 1V DV (A 2 i + A T i ) 2) 1 = cos ϕi , i = 1, 2, 3tr(− 1 2 V DV ) = 1−V DV ≽ 0 (768)where, for A i ∈ R 6×6 (763)A 1 = (e 1 − e 6 )(e 3 − e 4 ) T /(l 61 l 34 )A 2 = (e 2 − e 1 )(e 4 − e 5 ) T /(l 12 l 45 )A 3 = (e 3 − e 2 )(e 5 − e 6 ) T /(l 23 l 56 )(769)and where the first constraint on length-square l 2 ij can be equivalently writtenas a constraint on the Gram matrix −V DV 1 via (765). We show how to2numerically solve such a problem by alternating projection inE.10.2.1.1.Barvinok’s Proposition 2.9.3.0.1 asserts existence of a list, correspondingto Gram matrix G solving this feasibility problem, whose affine dimension(5.7.1.1) does not exceed 3 because the convex feasible set is bounded bythe third constraint tr(− 1 V DV ) = 1 (759).25.4.2.2.3 Example. Kissing-number of sphere packing.Two nonoverlapping Euclidean balls are said to kiss if they touch. Anelementary geometrical problem can be posed: Given hyperspheres, eachhaving the same diameter 1, how many hyperspheres can simultaneouslykiss one central hypersphere? [304] The noncentral hyperspheres are allowed,but not required, to kiss.As posed, the problem seeks the maximal number of spheres K kissinga central sphere in a particular dimension. The total number of spheres isN = K + 1. In one dimension the answer to this kissing problem is 2. In twodimensions, 6. (Figure 7)The question was presented in three dimensions to Isaac Newton in thecontext of celestial mechanics, and became controversy with David Gregoryon the campus of Cambridge University in 1694. Newton correctly identified

322 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFigure 83: Sphere-packing illustration from [284, kissing number].Translucent balls illustrated all have the same diameter.the kissing number as 12 (Figure 83) while Gregory argued for 13. Theirdispute was finally resolved in 1953 by Schütte & van der Waerden. [223] In2003, Oleg Musin tightened the upper bound on kissing number K in fourdimensions from 25 to K = 24 by refining a method by Philippe Delsartefrom 1973 providing an infinite number [14] of linear inequalities necessaryfor converting a rank-constrained semidefinite program 5.9 to a linearprogram. 5.10 [203]There are no proofs known for kissing number in higher dimensionexcepting dimensions eight and twenty four.Translating this problem to an EDM graph realization (Figure 80,Figure 84) is suggested by Pfender & Ziegler. Imagine the centers of eachsphere are connected by line segments. Then the distance between centersmust obey simple criteria: Each sphere touching the central sphere has a linesegment of length exactly 1 joining its center to the central sphere’s center.All spheres, excepting the central sphere, must have centers separated by adistance of at least 1.From this perspective, the kissing problem can be posed as a semidefiniteprogram. Assign index 1 to the central sphere, and assume a total of N5.9 whose feasible set belongs to that subset of an elliptope (5.9.1.0.1) bounded aboveby some desired rank.5.10 Simplex-method solvers for linear programs produce numerically better results thancontemporary log-barrier (interior-point method) solvers for semidefinite programs byabout 7 orders of magnitude.

5.4. EDM DEFINITION 323spheres:Then the kissing numberminimize − tr W V TD∈S NN DV Nsubject to D 1j = 1, j = 2... N(770)D ij ≥ 1, 2 ≤ i < j = 3... ND ∈ EDM NK = N max − 1 (771)is found from the maximal number of spheres N that solve this semidefiniteprogram in a given affine dimension r . Matrix W can be interpreted as thedirection of search through the positive semidefinite cone S N−1+ for a rank-roptimal solution −VN TD⋆ V N ; it is constant, in this program, determined bya method disclosed in4.4.1. In two dimensions,⎡⎤4 1 2 −1 −1 11 4 −1 −1 2 1W =2 −1 4 1 1 −11⎢ −1 −1 1 4 1 2(772)⎥6⎣ −1 2 1 1 4 −1 ⎦1 1 −1 2 −1 4In three dimensions,⎡⎤9 1 −2 −1 3 −1 −1 1 2 1 −2 11 9 3 −1 −1 1 1 −2 1 2 −1 −1−2 3 9 1 2 −1 −1 2 −1 −1 1 2−1 −1 1 9 1 −1 1 −1 3 2 −1 13 −1 2 1 9 1 1 −1 −1 −1 1 −1W =−1 1 −1 −1 1 9 2 −1 2 −1 2 31−1 1 −1 1 1 2 9 3 −1 1 −2 −1121 −2 2 −1 −1 −1 3 9 2 −1 1 12 1 −1 3 −1 2 −1 2 9 −1 1 −1⎢ 1 2 −1 2 −1 −1 1 −1 −1 9 3 1⎥⎣ −2 −1 1 −1 1 2 −2 1 1 3 9 −1 ⎦1 −1 2 1 −1 3 −1 1 −1 1 −1 9(773)A four-dimensional solution has rational direction matrix as well. Here is anoptimal point list 5.11 in Matlab output format:5.11 An optimal five-dimensional point list is known: The answer was known at least 175

324 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXColumns 1 through 6X = 0 -0.1983 -0.4584 0.1657 0.9399 0.74160 0.6863 0.2936 0.6239 -0.2936 0.39270 -0.4835 0.8146 -0.6448 0.0611 -0.42240 0.5059 0.2004 -0.4093 -0.1632 0.3427Columns 7 through 12-0.4815 -0.9399 -0.7416 0.1983 0.4584 -0.28320 0.2936 -0.3927 -0.6863 -0.2936 -0.6863-0.8756 -0.0611 0.4224 0.4835 -0.8146 -0.3922-0.0372 0.1632 -0.3427 -0.5059 -0.2004 -0.5431Columns 13 through 180.2832 -0.2926 -0.6473 0.0943 0.3640 -0.36400.6863 0.9176 -0.6239 -0.2313 -0.0624 0.06240.3922 0.1698 -0.2309 -0.6533 -0.1613 0.16130.5431 -0.2088 0.3721 0.7147 -0.9152 0.9152Columns 19 through 25-0.0943 0.6473 -0.1657 0.2926 -0.5759 0.5759 0.48150.2313 0.6239 -0.6239 -0.9176 0.2313 -0.2313 00.6533 0.2309 0.6448 -0.1698 -0.2224 0.2224 0.8756-0.7147 -0.3721 0.4093 0.2088 -0.7520 0.7520 0.0372This particular optimal solution was found by solving a problem sequencein increasing number of spheres. Numerical problems begin to arise withmatrices of this cardinality due to interior-point methods of solution.years ago. I believe Gauss knew it. Moreover, Korkine & Zolotarev proved in 1882 that D 5is the densest lattice in five dimensions. So they proved that if a kissing arrangement infive dimensions can be extended to some lattice, then k(5)= 40. Of course, the conjecturein the general case also is: k(5)= 40. You would like to see coordinates? Easily.Let A= √ 2; then p(1)=(A,A,0,0,0), p(2)=(−A,A,0,0,0), p(3)=(A, −A,0,0,0),...p(40)=(0,0,0, −A, −A); i.e., we are considering points with coordinates that have twoA and three 0 with any choice of signs and any ordering of the coordinates; the samecoordinates-expression in dimensions 3 and 4.The first miracle happens in dimension 6. There are better packings than D 6(Conjecture: k(6)=72). It’s a real miracle how dense the packing is in eight dimensions(E 8 =Korkine & Zolotarev packing that was discovered in 1880s) and especially indimension 24, that is the so-called Leech lattice.Actually, people in coding theory have conjectures on the kissing numbers for dimensionsup to 32 (or even greater?). However, sometimes they found better lower bounds. I knowthat Ericson & Zinoviev a few years ago discovered (by hand, no computer) in dimensions13 and 14 better kissing arrangements than were known before. −Oleg Musin

5.4. EDM DEFINITION 325By eliminating some equality constraints for this particular problem,matrix variable dimension can be reduced. From5.8.3 we have[ ] 0−VNDV T N = 11 T T 1− [0 I ] D(774)I 2(which does not generally hold) where identity matrix I ∈ S N−1 has one lessdimension than EDM D . By defining an EDM principal submatrix[ ]ˆD = ∆ 0T[0 I ] D ∈ S N−1Ih(775)we get a convex problem equivalent to (770)minimize − tr(W ˆD)ˆD∈S N−1subject to ˆDij ≥ 1 , 1 ≤ i < j = 2... N −111 T − ˆD 1 2 ≽ 0δ( ˆD) = 0(776)Any feasible matrix 11 T − ˆD 1 2belongs to an elliptope (5.9.1.0.1). This next example shows how finding the common point of intersectionfor three circles in a plane, a nonlinear problem, has convex expression.5.4.2.2.4 Example. So & Ye trilateration in wireless sensor network.Given three known absolute point positions in R 2 (three anchors ˇx 2 , ˇx 3 , ˇx 4 )and only one unknown point (one sensor x 1 ∈ R 2 ), the sensor’s absoluteposition is determined from its noiseless measured distance-square ď i1to each of three anchors (Figure 2, Figure 84(a)). This trilaterationcan be expressed as a convex optimization problem in terms of listX = ∆ [x 1 ˇx 2 ˇx 3 ˇx 4 ]∈ R 2×4 and Gram matrix G∈ S 4 (743):minimize trGG∈S 4 , X∈R2×4 subject to tr(GΦ i1 ) = ďi1 , i = 2, 3, 4tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4X(:, 2:4) = [ ˇx 2 ˇx 3 ˇx 4 ][ ] I XX T≽ 0G(777)

326 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX(a)(c)x 3x 4 x 5x 5 x 6√d13√d14x 3ˇx 3 ˇx 4√d12ˇx 2x 1x 1x 2x 4x 1x 1Figure 84: (a) Given three distances indicated with absolute pointpositions ˇx 2 , ˇx 3 , ˇx 4 known and noncollinear, absolute position of x 1 in R 2can be precisely and uniquely determined by trilateration; solution to asystem of nonlinear equations. Dimensionless EDM graphs (b) (c) (d)represent EDMs in various states of completion. Line segments representknown absolute distances and may cross without vertex at intersection.(b) Four-point list must always be embeddable in affine subset havingdimension rankVN TDV N not exceeding 3. To determine relative position ofx 2 ,x 3 ,x 4 , triangle inequality is necessary and sufficient (5.14.1). Knowingall distance information, then (by injectivity of D (5.6)) point position x 1is uniquely determined to within an isometry in any dimension. (c) Whenfifth point is introduced, only distances to x 3 ,x 4 ,x 5 are required todetermine relative position of x 2 in R 2 . Graph represents first instanceof missing distance information; √ d 12 . (d) Three distances are absent( √ d 12 , √ d 13 , √ d 23 ) from complete set of interpoint distances, yet uniqueisometric reconstruction (5.4.2.2.5) of six points in R 2 is certain.(b)(d)

5.4. EDM DEFINITION 327whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (732)and where the constraint on distance-square ďi1 is equivalently written asa constraint on the Gram matrix via (745). There are 9 independent affineequality constraints on that Gram matrix while the sensor is constrainedonly by dimensioning to lie in R 2 . Although trG the objective ofminimization 5.12 insures a solution on the boundary of positive semidefinitecone S 4 + , we claim that the set of feasible Gram matrices forms a line(2.5.1.1) in isomorphic R 10 tangent (2.1.7.2) to the positive semidefinitecone boundary. (confer4.2.1.3)By Schur complement (A.4,2.9.1.0.1) any feasible G and X provideG ≽ X T X (778)which is a convex relaxation of the desired (nonconvex) equality constraint[ ] [ ]I X IX T =G X T [ I X ] (779)expected positive semidefinite rank-2 under noiseless conditions. But by(1318), the relaxation admits(3 ≥) rankG ≥ rankX (780)(a third dimension corresponding to an intersection of three spheres(not circles) were there noise). If rank[ ]I Xrank⋆X ⋆T G ⋆ = 2 (781)of an optimal solution equals 2, then G ⋆ = X ⋆T X ⋆ by Theorem A.4.0.0.4.As posed, this localization problem does not require affinely independent(Figure 20, three noncollinear) anchors. Assuming the anchors exhibitno rotational or reflective symmetry in their affine hull (5.5.2) andassuming the sensor x 1 lies in that affine hull, then sensor position solutionx ⋆ 1= X ⋆ (:, 1) is unique under noiseless measurement. [241] 5.12 Trace (trG = 〈I , G〉) minimization is a heuristic for rank minimization. (7.2.2.1) Itmay be interpreted as squashing G which is bounded below by X T X as in (778).

328 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXThis preceding transformation of trilateration to a semidefinite programworks all the time ((781) holds) despite relaxation (778) because the optimalsolution set is a unique point.Proof (sketch). Only the sensor location x 1 is unknown. The objectivefunction together with the equality constraints make a linear system ofequations in Gram matrix variable GtrG = ‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2tr(GΦ i1 = ďi1 , i = 2, 3, 4tr ( ) (782)Ge i e T i = ‖ˇxi ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4which is invertible:⎡svec(I) Tsvec(Φ 21 ) Tsvec(Φ 31 ) Tsvec(Φ 41 ) Tsvec(e 2 e T 2 ) Tsvec G =svec(e 3 e T 3 ) Tsvec(e 4 e T 4 ) Tsvec ( (e⎢ 2 e T 3 + e 3 e T 2 )/2 ) T⎣ svec ( (e 2 e T 4 + e 4 e T 2 )/2 ) Tsvec ( (e 3 e T 4 + e 4 e T 3 )/2 ) T⎤⎥⎦−1⎡‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2 ⎤ď 21ď 31ď 41‖ˇx 2 ‖ 2‖ˇx 3 ‖ 2‖ˇx 4 ‖ 2⎢ ˇx T 2ˇx 3⎥⎣ ˇx T 2ˇx 4⎦ˇx T 3ˇx 4(783)That line in the ambient space S 4 of G is traced by the right-hand side.One must show this line to be tangential (2.1.7.2) to S 4 + in order to proveuniqueness. Tangency is possible for affine dimension 1 or 2 while itsoccurrence depends completely on the known measurement data. But as soon as significant noise is introduced or whenever distance data isincomplete, such problems can remain convex although the set of all optimalsolutions generally becomes a convex set bigger than a single point (but stillcontaining the noiseless solution).

5.4. EDM DEFINITION 3295.4.2.2.5 Definition. Isometric reconstruction. (confer5.5.3)Isometric reconstruction from an EDM means building a list X correct towithin a rotation, reflection, and translation; in other terms, reconstructionof relative position, correct to within an isometry, or to within a rigidtransformation.△How much distance information is needed to uniquely localize a sensor(to recover actual relative position)? The narrative in Figure 84 helpsdispel any notion of distance data proliferation in low affine dimension(r

330 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXx 1x 4x 6x 3x 5x 2Figure 85: Incomplete EDM corresponding to this dimensionless EDMgraph provides unique isometric reconstruction in R 2 . (drawn freehand, nosymmetry intended)ˇx 4x 1x 2ˇx 3ˇx 5Figure 86: Two sensors • and three anchors ◦ in R 2 . (Ye) Connectingline-segments denote known absolute distances. Incomplete EDMcorresponding to this dimensionless EDM graph provides unique isometricreconstruction in R 2 .

5.4. EDM DEFINITION 331105ˇx 4ˇx 50ˇx 3−5x 2x 1−10−15−6 −4 −2 0 2 4 6 8Figure 87: Given in red are two discrete linear trajectories of sensors x 1and x 2 in R 2 localized by algorithm (785) as indicated by blue bullets • .Anchors ˇx 3 , ˇx 4 , ˇx 5 corresponding to Figure 86 are indicated by ⊗ . Whentargets and bullets • coincide under these noiseless conditions, localizationis successful. On this run, two visible localization errors are due to rank-3Gram optimal solutions. These errors can be corrected by choosing a differentnormal in objective of minimization.

332 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.2.2.6 Example. Tandem trilateration in wireless sensor network.Given three known absolute point-positions in R 2 (three anchors ˇx 3 , ˇx 4 , ˇx 5 )and two unknown sensors x 1 , x 2 ∈ R 2 , the sensors’ absolute positions aredeterminable from their noiseless distances-square (as indicated in Figure 86)assuming the anchors exhibit no rotational or reflective symmetry in theiraffine hull (5.5.2). This example differs from Example 5.4.2.2.4 in so faras trilateration of each sensor is now in terms of one unknown position, theother sensor. We express this localization as a convex optimization problem(a semidefinite program,4.1) in terms of list X ∆ = [x 1 x 2 ˇx 3 ˇx 4 ˇx 5 ]∈ R 2×5and Gram matrix G∈ S 5 (743) via relaxation (778):minimize trGG∈S 5 , X∈R2×5 subject to tr(GΦ i1 ) = ďi1 , i = 2, 4, 5tr(GΦ i2 ) = ďi2 , i = 3, 5tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 3, 4, 5tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 3 ≤ i < j = 4, 5X(:, 3:5) = [ ˇx 3 ˇx 4 ˇx 5 ][ ] I XX T≽ 0G(785)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (732)This problem realization is fragile because of the unknown distances betweensensors and anchors. Yet there is no more information we may include beyondthe 11 independent equality constraints on the Gram matrix (nonredundantconstraints not antithetical) to reduce the feasible set 5.14 . (By virtue of theirdimensioning, the sensors are already constrained to R 2 the affine hull of theanchors.)Exhibited in Figure 87 are two mistakes in solution X ⋆ (:,1:2) dueto a rank-3 optimal Gram matrix G ⋆ . The trace objective is a heuristicminimizing convex envelope of quasiconcave function 5.15 rankG. (2.9.2.6.2,7.2.2.1) A rank-2 optimal Gram matrix can be found and the errors5.14 the presumably nonempty convex set of all points G and X satisfying the constraints.5.15 Projection on that nonconvex subset of all N ×N-dimensional positive semidefinitematrices, in an affine subset, whose rank does not exceed 2 is a problem considered difficultto solve. [266,4]

5.4. EDM DEFINITION 333corrected by choosing a different normal for the linear objective function,now implicitly the identity matrix I ; id est,trG = 〈G , I 〉 ← 〈G , δ(u)〉 (786)where vector u ∈ R 5 is randomly selected. A random search for a goodnormal δ(u) in only a few iterations is quite easy and effective becausethe problem is small, an optimal solution is known a priori to exist in twodimensions, a good normal direction is not necessarily unique, and (wespeculate) because the feasible affine-subset slices the positive semidefinitecone thinly in the Euclidean sense. 5.16We explore ramifications of noise and incomplete data throughout; theirindividual effect being to expand the optimal solution set, introducing moresolutions and higher-rank solutions. Hence our focus shifts in4.4 todiscovery of a reliable means for diminishing the optimal solution set byintroduction of a rank constraint.Now we illustrate how a problem in distance geometry can be solvedwithout equality constraints representing measured distance; instead, wehave only upper and lower bounds on distances measured:5.4.2.2.7 Example. Wireless location in a cellular telephone network.Utilizing measurements of distance, time of flight, angle of arrival, or signalpower, multilateration is the process of localizing (determining absoluteposition of) a radio signal source • by inferring geometry relative to multiplefixed base stations ◦ whose locations are known.We consider localization of a cellular telephone by distance geometry,so we assume distance to any particular base station can be inferred fromreceived signal power. On a large open flat expanse of terrain, signal-powermeasurement corresponds well with inverse distance. But it is not uncommonfor measurement of signal power to suffer 20 decibels in loss caused by factorssuch as multipath interference (signal reflections), mountainous terrain,man-made structures, turning one’s head, or rolling the windows up in anautomobile. Consequently, contours of equal signal power are no longercircular; their geometry is irregular and would more aptly be approximated5.16 The log det rank-heuristic from7.2.2.4 does not work here because it chooses thewrong normal. Rank reduction (4.1.1.1) is unsuccessful here because Barvinok’s upperbound (2.9.3.0.1) on rank of G ⋆ is 4.

334 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXˇx 7x 1ˇx 2ˇx 3ˇx 4ˇx 6ˇx 5Figure 88: Regions of coverage by base stations ◦ in a cellular telephonenetwork. The term cellular arises from packing of regions best coveredby neighboring base stations. Illustrated is a pentagonal cell best coveredby base station ˇx 2 . Like a Voronoi diagram, cell geometry depends onbase-station arrangement. In some US urban environments, it is not unusualto find base stations spaced approximately 1 mile apart. There can be asmany as 20 base-station antennae capable of receiving signal from any givencell phone • ; practically, that number is closer to 6.Figure 89: Some fitted contours of equal signal power in R 2 transmittedfrom a commercial cellular telephone • over about 1 mile suburban terrainoutside San Francisco in 2005. Data courtesy Polaris Wireless. [243]

5.4. EDM DEFINITION 335by translated ellipsoids of graduated orientation and eccentricity as inFigure 89.Depicted in Figure 88 is one cell phone x 1 whose signal power isautomatically and repeatedly measured by 6 base stations ◦ nearby. 5.17Those signal power measurements are transmitted from that cell phone tobase station ˇx 2 who decides whether to transfer (hand-off or hand-over)responsibility for that call should the user roam outside its cell. 5.18Due to noise, at least one distance measurement more than the minimumnumber of measurements is required for reliable localization in practice;3 measurements are minimum in two dimensions, 4 in three. 5.19 Existenceof noise precludes measured distance from the input data. We instead assignmeasured distance to a range estimate specified by individual upper andlower bounds: d i1 is the upper bound on distance-square from the cell phoneto i th base station, while d i1 is the lower bound. These bounds become theinput data. Each measurement range is presumed different from the others.Then convex problem (777) takes the form:minimize trGG∈S 7 , X∈R2×7 subject to d i1 ≤ tr(GΦ i1 ) ≤ d i1 , i = 2... 7tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2... 7tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3... 7X(:, 2:7) = [ ˇx 2 ˇx 3 ˇx 4 ˇx 5 ˇx 6 ˇx 7 ][ ] I XX T≽ 0 (787)GwhereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (732)5.17 Cell phone signal power is typically encoded logarithmically with 1-decibel incrementand 64-decibel dynamic range.5.18 Because distance to base station is quite difficult to infer from signal powermeasurements in an urban environment, localization of a particular cell phone • bydistance geometry would be far easier were the whole cellular system instead conceived socell phone x 1 also transmits (to base station ˇx 2 ) its signal power as received by all othercell phones within range.5.19 In Example 4.4.1.1.2, we explore how this convex optimization algorithm fares in theface of measurement noise.

336 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFigure 90: A depiction of molecular conformation. [80]This semidefinite program realizes the wireless location problem illustratedin Figure 88. Location X ⋆ (:, 1) is taken as solution, although measurementnoise will often cause rankG ⋆ to exceed 2. Randomized search for a rank-2optimal solution is not so easy here as in Example 5.4.2.2.6. We introduce amethod in4.4 for enforcing the stronger rank-constraint (781). To formulatethis same problem in three dimensions, point list X is simply redimensionedin the semidefinite program.5.4.2.2.8 Example. (Biswas, Nigam, Ye) Molecular Conformation.The subatomic measurement technique called nuclear magnetic resonancespectroscopy (NMR) is employed to ascertain physical conformation ofmolecules; e.g., Figure 3, Figure 90. From this technique, distance, angle,and dihedral angle measurements can be obtained. Dihedral angles ariseconsequent to a phenomenon where atom subsets are physically constrainedto Euclidean planes.In the rigid covalent geometry approximation, the bond lengthsand angles are treated as completely fixed, so that a given spatialstructure can be described very compactly indeed by a list oftorsion angles alone... These are the dihedral angles betweenthe planes spanned by the two consecutive triples in a chain offour covalently bonded atoms. [60,1.1]

5.4. EDM DEFINITION 337Crippen & Havel recommend working exclusively with distance databecause they consider angle data to be mathematically cumbersome. Thepresent example shows instead how inclusion of dihedral angle data into aproblem statement can be made elegant and convex.As before, ascribe position information to the matrixX = [x 1 · · · x N ] ∈ R 3×N (66)and introduce a matrix ℵ holding normals η to planes respecting dihedralangles ϕ :ℵ ∆ = [η 1 · · · η M ] ∈ R 3×M (788)As in the other examples, we preferentially work with Gram matrices Gbecause of the bridge they provide between other variables; we define[ ] [ ] [ ]Gℵ Z ∆ ℵZ T =T ℵ ℵ T X ℵT [ ℵ X ]G X X T ℵ X T =∈ RX X T N+M×N+M (789)whose rank is 3 by assumption. So our problem’s variables are the twoGram matrices G X and G ℵ and matrix Z = ℵ T X of cross products. Thenmeasurements of distance-square d can be expressed as linear constraints onG X as in (787), dihedral angle ϕ measurements can be expressed as linearconstraints on G ℵ by (763), and normal-vector η conditions can be expressedby vanishing linear constraints on cross-product matrix Z : Consider threepoints x labelled 1, 2, 3 assumed to lie in the l th plane whose normal is η l .There might occur, for example, the independent constraintsη T l (x 1 − x 2 ) = 0η T l (x 2 − x 3 ) = 0which are expressible in terms of constant matrices A∈ R M×N ;〈Z , A l12 〉 = 0〈Z , A l23 〉 = 0Although normals η can be constrained exactly to unit length,(790)(791)δ(G ℵ ) = 1 (792)NMR data is noisy; so measurements are given as upper and lower bounds.Given bounds on dihedral angles respecting 0 ≤ ϕ j ≤ π and bounds ondistances d i and given constant matrices A k (791) and symmetric matricesΦ i (732) and B j per (763), then a molecular conformation problem can beexpressed:

338 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXfindG ℵ ∈S M , G X ∈S N , Z∈R M×N G Xsubject to d i ≤ tr(G X Φ i ) ≤ d i , ∀i ∈ I 1cos ϕ j ≤ tr(G ℵ B j ) ≤ cos ϕ j , ∀j ∈ I 2〈Z , A k 〉 = 0 , ∀k ∈ I 3G X 1 = 0δ(G ℵ ) = 1[ ]Gℵ ZZ T ≽ 0G X[ ]Gℵ ZrankZ T = 3G X(793)Ignoring the rank constraint would tend to force cross-product matrix Z tozero. What binds these three variables is the rank constraint; we show howto satisfy it in4.4.5.4.3 Inner-product form EDM definition[p.20] We might, for example, realize a constellation given onlyinterstellar distance (or, equivalently, distance from Earth andrelative angular measurement; the Earth as vertex to two stars).Equivalent to (730) is [290,1-7] [251,3.2]d ij = d ik + d kj − 2 √ d ik d kj cos θ ikj= [√ d ik√dkj] [ 1 −e ıθ ikj−e −ıθ ikj1] [√ ]d ik√dkj(794)called the law of cosines, where ı ∆ = √ −1 , i,k,j are positive integers, andθ ikj is the angle at vertex x k formed by vectors x i − x k and x j − x k ;cos θ ikj =1(d 2 ik + d kj − d ij )√ = (x i − x k ) T (x j − x k )dik d kj ‖x i − x k ‖ ‖x j − x k ‖(795)

5.4. EDM DEFINITION 339where ([√the numerator ]) forms an inner product of vectors. Distance-squaredikd ij √dkj is a convex quadratic function 5.20 on R 2 + whereas d ij (θ ikj )is quasiconvex (3.3) minimized over domain −π ≤θ ikj ≤π by θ ⋆ ikj =0, weget the Pythagorean theorem when θ ikj = ±π/2, and d ij (θ ikj ) is maximizedwhen θ ⋆ ikj =±π ; d ij = (√ d ik + √ ) 2,d kj θikj = ±πsod ij = d ik + d kj , θ ikj = ± π 2d ij = (√ d ik − √ d kj) 2,θikj = 0(796)| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj (797)Hence the triangle inequality, Euclidean metric property 4, holds for anyEDM D .We may construct an inner-product form of the EDM definition formatrices by evaluating (794) for k=1: By defining⎡√ √ √ ⎤d 12 d12 d 13 cos θ 213 d12 d 14 cos θ 214 · · · d12 d 1N cos θ 21N√ √ √ d12 dΘ T Θ =∆ 13 cosθ 213 d 13 d13 d 14 cos θ 314 · · · d13 d 1N cos θ 31N√ √ √ d12 d 14 cosθ 214 d13 d 14 cos θ 314 d 14... d14 d 1N cos θ 41N∈ S N−1⎢⎥⎣ ........√ √. ⎦√d12 d 1N cosθ 21N d13 d 1N cos θ 31N d14 d 1N cos θ 41N · · · d 1Nthen any EDM may be expressed[ ]D(Θ) =∆ 0δ(Θ T 1Θ)T + 1 [ 0 δ(Θ T Θ) ] [ 0 0 T T− 20 Θ T Θ=[0 δ(Θ T Θ) Tδ(Θ T Θ) δ(Θ T Θ)1 T + 1δ(Θ T Θ) T − 2Θ T Θ]](798)∈ EDM N(799)EDM N = { D(Θ) | Θ ∈ R N−1×N−1} (800)for which all Euclidean metric properties hold. Entries of Θ T Θ result fromvector inner-products as in (795); id est,[]5.20 1 −e ıθ ikj−e −ıθ ≽ 0, having eigenvalues {0,2}. Minimum is attained forikj1[ √ ] {dik√dkjµ1, µ ≥ 0, θikj = 0=0, −π ≤ θ ikj ≤ π , θ ikj ≠ 0 . (D.2.1, [46, exmp.4.5])

340 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXΘ = [x 2 − x 1 x 3 − x 1 · · · x N − x 1 ] = X √ 2V N ∈ R n×N−1 (801)Inner product Θ T Θ is obviously related to a Gram matrix (743),[ 0 0TG =0 Θ T Θ], x 1 = 0 (802)For D = D(Θ) and no condition on the list X (confer (751) (756))Θ T Θ = −V T NDV N ∈ R N−1×N−1 (803)5.4.3.1 Relative-angle formThe inner-product form EDM definition is not a unique definition ofEuclidean distance matrix; there are approximately five flavors distinguishedby their argument to operator D . Here is another one:Like D(X) (734), D(Θ) will make an EDM given any Θ∈ R n×N−1 , it isneither a convex function of Θ (5.4.3.2), and it is homogeneous in the sense(737). Scrutinizing Θ T Θ (798) we find that because of the arbitrary choicek = 1, distances therein are all with respect to point x 1 . Similarly, relativeangles in Θ T Θ are between all vector pairs having vertex x 1 . Yet pickingarbitrary θ i1j to fill Θ T Θ will not necessarily make an EDM; inner product(798) must be positive semidefinite.Θ T Θ = √ δ(d) Ω √ δ(d) =∆⎡√ ⎤⎡⎤⎡√ ⎤d12 0 1 cos θ 213 · · · cos θ 21N d12 0√ d13 ⎢⎥cosθ 213 1... cos θ √31Nd13 ⎣ ... ⎦⎢⎥⎢⎥√ ⎣ ....... . ⎦⎣... ⎦√0d1N cos θ 21N cos θ 31N · · · 1 0d1N(804)Expression D(Θ) defines an EDM for any positive semidefinite relative-anglematrixΩ = [cos θ i1j , i,j = 2...N] ∈ S N−1 (805)and any nonnegative distance vectord = [d 1j , j = 2...N] = δ(Θ T Θ) ∈ R N−1 (806)

5.4. EDM DEFINITION 341because (A.3.1.0.5)Ω ≽ 0 ⇒ Θ T Θ ≽ 0 (807)Decomposition (804) and the relative-angle matrix inequality Ω ≽ 0 lead toa different expression of an inner-product form EDM definition (799)D(Ω,d) ∆ ==[ ] 01dT + 1 [ 0 d ] √ ([ ]) [ ] √ ([ ])0 0 0 T T 0− 2 δδd 0 Ω d[ ]0 dTd d1 T + 1d T − 2 √ δ(d) Ω √ ∈ EDM Nδ(d)(808)and another expression of the EDM cone:EDM N ={D(Ω,d) | Ω ≽ 0, √ }δ(d) ≽ 0(809)In the particular circumstance x 1 = 0, we can relate interpoint anglematrix Ψ from the Gram decomposition in (743) to relative-angle matrixΩ in (804). Thus,[ ] 1 0TΨ ≡ , x0 Ω 1 = 0 (810)5.4.3.2 Inner-product form −V T N D(Θ)V N convexityOn[√page 339]we saw that each EDM entry d ij is a convex quadratic functiondikof √dkj and a quasiconvex function of θ ikj . Here the situation forinner-product form EDM operator D(Θ) (799) is identical to that in5.4.1for list-form D(X) ; −D(Θ) is not a quasiconvex function of Θ by the samereasoning, and from (803)−V T N D(Θ)V N = Θ T Θ (811)is a convex quadratic function of Θ on domain R n×N−1 achieving its minimumat Θ = 0.

342 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.3.3 Inner-product form, discussionWe deduce that knowledge of interpoint distance is equivalent to knowledgeof distance and angle from the perspective of one point, x 1 in our chosencase. The total amount of information N(N −1)/2 in Θ T Θ is unchanged 5.21with respect to EDM D .5.5 InvarianceWhen D is an EDM, there exist an infinite number of corresponding N-pointlists X (66) in Euclidean space. All those lists are related by isometrictransformation: rotation, reflection, and translation (offset or shift).5.5.1 TranslationAny translation common among all the points x l in a list will be cancelled inthe formation of each d ij . Proof follows directly from (730). Knowing thattranslation α in advance, we may remove it from the list constituting thecolumns of X by subtracting α1 T . Then it stands to reason by list-formdefinition (734) of an EDM, for any translation α∈ R nD(X − α1 T ) = D(X) (812)In words, interpoint distances are unaffected by offset; EDM D is translationinvariant. When α = x 1 in particular,[x 2 −x 1 x 3 −x 1 · · · x N −x 1 ] = X √ 2V N ∈ R n×N−1 (801)and so(D(X −x 1 1 T ) = D(X −Xe 1 1 T ) = D X[0 √ ])2V N= D(X) (813)5.21 The reason for the amount O(N 2 ) information is because of the relative measurements.The use of a fixed reference in the measurement of angles and distances would reduce therequired information but is antithetical. In the particular case n = 2, for example, orderingall points x l (in a length-N list) by increasing angle of vector x l − x 1 with respect tox 2 − x 1 , θ i1j becomes equivalent to j−1 ∑θ k,1,k+1 ≤ 2π and the amount of information isreduced to 2N −3; rather, O(N).k=i

5.5. INVARIANCE 3435.5.1.0.1 Example. Translating geometric center to origin.We might choose to shift the geometric center α c of an N-point list {x l }(arranged columnar in X) to the origin; [268] [113]α = α ∆ c = Xb ∆ c = 1 X1 ∈ P ⊆ A (814)Nwhere A represents the list’s affine hull. If we were to associate a point-massm l with each of the points x l in the list, then their center of mass(or gravity) would be ( ∑ x l m l )/ ∑ m l . The geometric center is the sameas the center of mass under the assumption of uniform mass density acrosspoints. [161] The geometric center always lies in the convex hull P of the list;id est, α c ∈ P because b T c 1=1 and b c ≽ 0 . 5.22 Subtracting the geometriccenter from every list member,X − α c 1 T = X − 1 N X11T = X(I − 1 N 11T ) = XV ∈ R n×N (815)So we have (confer (734))D(X) = D(XV ) = δ(V T X T XV )1 T +1δ(V T X T XV ) T − 2V T X T XV ∈ EDM N5.5.1.1 Gram-form invarianceFollowing from (816) and the linear Gram-form EDM operator (746):(816)D(G) = D(V GV ) = δ(V GV )1 T + 1δ(V GV ) T − 2V GV ∈ EDM N (817)The Gram-form consequently exhibits invariance to translation by a doublet(B.2) u1 T + 1u T ;D(G) = D(G − (u1 T + 1u T )) (818)because, for any u∈ R N , D(u1 T + 1u T )=0. The collection of all suchdoublets forms the nullspace (827) to the operator; the translation-invariantsubspace S N⊥c (1794) of the symmetric matrices S N . This means matrix Gcan belong to an expanse more broad than a positive semidefinite cone; id est,G∈ S N + − S N⊥c . So explains Gram matrix sufficiency in EDM definition (746).5.22 Any b from α = Xb chosen such that b T 1 = 1, more generally, makes an auxiliaryV -matrix. (B.4.5)

344 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.5.2 Rotation/ReflectionRotation of the list X ∈ R n×N about some arbitrary point α∈ R n , orreflection through some affine subset containing α , can be accomplishedvia Q(X −α1 T ) where Q is an orthogonal matrix (B.5).We rightfully expectD ( Q(X − α1 T ) ) = D(QX − β1 T ) = D(QX) = D(X) (819)Because list-form D(X) is translation invariant, we may safely ignoreoffset and consider only the impact of matrices that premultiply X .Interpoint distances are unaffected by rotation or reflection; we say,EDM D is rotation/reflection invariant. Proof follows from the fact,Q T =Q −1 ⇒ X T Q T QX =X T X . So (819) follows directly from (734).The class of premultiplying matrices for which interpoint distances areunaffected is a little more broad than orthogonal matrices. Looking at EDMdefinition (734), it appears that any matrix Q p such thatwill have the propertyX T Q T pQ p X = X T X (820)D(Q p X) = D(X) (821)An example is skinny Q p ∈ R m×n (m>n) having orthonormal columns. Wecall such a matrix orthonormal.5.5.2.1 Inner-product form invarianceLikewise, D(Θ) (799) is rotation/reflection invariant;so (820) and (821) similarly apply.5.5.3 Invariance conclusionD(Q p Θ) = D(QΘ) = D(Θ) (822)In the making of an EDM, absolute rotation, reflection, and translationinformation is lost. Given an EDM, reconstruction of point position (5.12,the list X) can be guaranteed correct only in affine dimension r and relativeposition. Given a noiseless complete EDM, this isometric reconstruction isunique in so far as every realization of a corresponding list X is congruent:

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 3455.6 Injectivity of D & unique reconstructionInjectivity implies uniqueness of isometric reconstruction; hence, we endeavorto demonstrate it.EDM operators list-form D(X) (734), Gram-form D(G) (746), andinner-product form D(Θ) (799) are many-to-one surjections (5.5) onto thesame range; the EDM cone (6): (confer (747) (829))EDM N = { D(X) : R N−1×N → S N h | X ∈ R N−1×N}= { }D(G) : S N → S N h | G ∈ S N + − S N⊥c= { (823)D(Θ) : R N−1×N−1 → S N h | Θ ∈ R N−1×N−1}where (5.5.1.1)S N⊥c = {u1 T + 1u T | u∈ R N } ⊆ S N (1794)5.6.1 Gram-form bijectivityBecause linear Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (746)has no nullspace [58,A.1] on the geometric center subspace 5.23 (E.7.2.0.2)S N c∆= {G∈ S N | G1 = 0} (1792)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1793)≡ {V N AV T N | A ∈ SN−1 }(824)then D(G) on that subspace is injective.5.23 The equivalence ≡ in (824) follows from the fact: Given B = V Y V = V N AVN T ∈ SN cwith only matrix A∈ S N−1 unknown, then V † †TNBV N = A or V † N Y V †TN = A .

346 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX∈ basis S N⊥hdim S N c = dim S N h = N(N−1)2in R N(N+1)/2dim S N⊥c= dim S N⊥h= N in R N(N+1)/2basis S N c = V {E ij }V (confer (50))S N cbasis S N⊥cS N h∈ basis S N⊥hFigure 91: Orthogonal complements in S N abstractly oriented in isometricallyisomorphic R N(N+1)/2 . Case N = 2 accurately illustrated in R 3 . Orthogonalprojection of basis for S N⊥h on S N⊥c yields another basis for S N⊥c . (Basisvectors for S N⊥c are illustrated lying in a plane orthogonal to S N c in thisdimension. Basis vectors for each ⊥ space outnumber those for its respectiveorthogonal complement; such is not the case in higher dimension.)

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 347To prove injectivity of D(G) on S N c : Any matrix Y ∈ S N can bedecomposed into orthogonal components in S N ;Y = V Y V + (Y − V Y V ) (825)where V Y V ∈ S N c and Y −V Y V ∈ S N⊥c (1794). Because of translationinvariance (5.5.1.1) and linearity, D(Y −V Y V )=0 hence N(D)⊇ S N⊥c . Itremains only to showD(V Y V ) = 0 ⇔ V Y V = 0 (826)(⇔ Y = u1 T + 1u T for some u∈ R N) . D(V Y V ) will vanish whenever2V Y V = δ(V Y V )1 T + 1δ(V Y V ) T . But this implies R(1) (B.2) were asubset of R(V Y V ) , which is contradictory. Thus we haveN(D) = {Y | D(Y )=0} = {Y | V Y V = 0} = S N⊥c (827)Since G1=0 ⇔ X1=0 (755) simply means list X is geometricallycentered at the origin, and because the Gram-form EDM operator D istranslation invariant and N(D) is the translation-invariant subspace S N⊥c ,then EDM definition D(G) (823) on 5.24 (confer6.6.1,6.7.1,A.7.4.1)S N c ∩ S N + = {V Y V ≽ 0 | Y ∈ S N } ≡ {V N AV T Nmust be surjective onto EDM N ; (confer (747))EDM N = { D(G) | G ∈ S N c ∩ S N +5.6.1.1 Gram-form operator D inversion| A∈ S N−1+ } ⊂ S N (828)Define the linear geometric centering operator V ; (confer (756))}(829)V(D) : S N → S N ∆ = −V DV 1 2(830)[61,4.3] 5.25 This orthogonal projector V has no nullspace onS N h = aff EDM N (1081)5.24 Equivalence ≡ in (828) follows from the fact: Given B = V Y V = V N AVN T ∈ SN + withonly matrix A unknown, then V † †TNBV N= A and A∈ SN−1 + must be positive semidefiniteby positive semidefiniteness of B and Corollary A.3.1.0.5.5.25 Critchley cites Torgerson (1958) [264, ch.11,2] for a history and derivation of (830).

348 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXbecause the projection of −D/2 on S N c (1792) can be 0 if and only ifD ∈ S N⊥c ; but S N⊥c ∩ S N h = 0 (Figure 91). Projector V on S N h is thereforeinjective hence invertible. Further, −V S N h V/2 is equivalent to the geometriccenter subspace S N c in the ambient space of symmetric matrices; a surjection,S N c = V(S N ) = V ( ) ( )S N h ⊕ S N⊥h = V SNh (831)because (63)V ( ) ( ) (S N h ⊇ V SN⊥h = V δ 2 (S N ) ) (832)Because D(G) on S N c is injective, and aff D ( V(EDM N ) ) = D ( V(aff EDM N ) )by property (108) of the affine hull, we find for D ∈ S N h (confer (760))id est,orD(−V DV 1 2 ) = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (833)D = D ( V(D) ) (834)−V DV = V ( D(−V DV ) ) (835)S N h = D ( V(S N h ) ) (836)−V S N h V = V ( D(−V S N h V ) ) (837)These operators V and D are mutual inverses.The Gram-form D ( )S N c (746) is equivalent to SNh ;D ( S N c) (= D V(SNh ⊕ S N⊥h ) ) = S N h + D ( V(S N⊥h ) ) = S N h (838)because S N h ⊇ D ( V(S N⊥h ) ) . In summary, for the Gram-form we have theisomorphisms [62,2] [61, p.76, p.107] [5,2.1] 5.26 [4,2] [6,18.2.1] [1,2.1]and from the bijectivity results in5.6.1,S N h = D(S N c ) (839)S N c = V(S N h ) (840)EDM N = D(S N c ∩ S N +) (841)S N c ∩ S N + = V(EDM N ) (842)5.26 In [5, p.6, line 20], delete sentence: Since G is also...not a singleton set.[5, p.10, line 11] x 3 =2 (not 1).

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 3495.6.2 Inner-product form bijectivityThe Gram-form EDM operator D(G)= δ(G)1 T + 1δ(G) T − 2G (746) is aninjective map, for example, on the domain that is the subspace of symmetricmatrices having all zeros in the first row and columnS N 1∆= {G∈ S N | Ge 1 = 0}{[ ] [ 0 0T 0 0T= Y0 I 0 I] }| Y ∈ S N(1796)because it obviously has no nullspace there. Since Ge 1 = 0 ⇔ Xe 1 = 0 (748)means the first point in the list X resides at the origin, then D(G) on S N 1 ∩ S N +must be surjective onto EDM N .Substituting Θ T Θ ← −VN TDV N (811) into inner-product form EDMdefinition D(Θ) (799), it may be further decomposed: (confer (754))[ ]0D(D) =δ ( [−VN TDV ) 1 T + 1 0 δ ( −VN TDV ) ] [ ]T 0 0TN − 2N0 −VN TDV N(843)This linear operator D is another flavor of inner-product form and an injectivemap of the EDM cone onto itself. Yet when its domain is instead the entiresymmetric hollow subspace S N h = aff EDM N , D(D) becomes an injectivemap onto that same subspace. Proof follows directly from the fact: linear Dhas no nullspace [58,A.1] on S N h = aff D(EDM N )= D(aff EDM N ) (108).5.6.2.1 Inversion of D ( −VN TDV )NInjectivity of D(D) suggests inversion of (confer (751))V N (D) : S N → S N−1 ∆ = −V T NDV N (844)a linear surjective 5.27 mapping onto S N−1 having nullspace 5.28 S N⊥c ;V N (S N h ) = S N−1 (845)5.27 Surjectivity of V N (D) is demonstrated via the Gram-form EDM operator D(G):Since S N h = D(S N c ) (838), then for any Y ∈ S N−1 , −VN T †TD(VN Y V † N /2)V N = Y .5.28 N(V N ) ⊇ S N⊥c is apparent. There exists a linear mappingT(V N (D)) ∆ = V †TN V N(D)V † N = −V DV 1 2 = V(D)

350 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXinjective on domain S N h because S N⊥c ∩ S N h = 0. Revising the argument ofthis inner-product form (843), we get another flavorD ( [−VN TDV )N =0δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TNand we obtain mutual inversion of operators V N and D , for D ∈ S N h[ 0 0T− 20 −VN TDV N(846)]orD = D ( V N (D) ) (847)−V T NDV N = V N(D(−VTN DV N ) ) (848)S N h = D ( V N (S N h ) ) (849)−V T N S N h V N = V N(D(−VTN S N h V N ) ) (850)Substituting Θ T Θ ← Φ into inner-product form EDM definition (799),any EDM may be expressed by the new flavor[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(851)Φ ≽ 0where this D is a linear surjective operator onto EDM N by definition,injective because it has no nullspace on domain S N−1+ . More broadly,aff D(S N−1+ )= D(aff S N−1+ ) (108),S N h = D(S N−1 )S N−1 = V N (S N h )(852)demonstrably isomorphisms, and by bijectivity of this inner-product form:such thatEDM N = D(S N−1+ ) (853)S N−1+ = V N (EDM N ) (854)N(T(V N )) = N(V) ⊇ N(V N ) ⊇ S N⊥c= N(V)where the equality S N⊥c = N(V) is known (E.7.2.0.2).

5.7. EMBEDDING IN AFFINE HULL 3515.7 Embedding in affine hullThe affine hull A (68) of a point list {x l } (arranged columnar in X ∈ R n×N(66)) is identical to the affine hull of that polyhedron P (76) formed from allconvex combinations of the x l ; [46,2] [232,17]A = aff X = aff P (855)Comparing hull definitions (68) and (76), it becomes obvious that the x land their convex hull P are embedded in their unique affine hull A ;A ⊇ P ⊇ {x l } (856)Recall: affine dimension r is a lower bound on embedding, equal todimension of the subspace parallel to that nonempty affine set A in whichthe points are embedded. (2.3.1) We define dimension of the convex hullP to be the same as dimension r of the affine hull A [232,2], but r is notnecessarily equal to the rank of X (875).For the particular example illustrated in Figure 79, P is the triangle plusits relative interior while its three vertices constitute the entire list X . Theaffine hull A is the unique plane that contains the triangle, so r=2 in thatexample while the rank of X is 3. Were there only two points in Figure 79,then the affine hull would instead be the unique line passing through them;r would become 1 while the rank would then be 2.5.7.1 Determining affine dimensionKnowledge of affine dimension r becomes important because we lose anyabsolute offset common to all the generating x l in R n when reconstructingconvex polyhedra given only distance information. (5.5.1) To calculate r , wefirst remove any offset that serves to increase dimensionality of the subspacerequired to contain polyhedron P ; subtracting any α ∈ A in the affine hullfrom every list member will work,translating A to the origin: 5.29X − α1 T (857)A − α = aff(X − α1 T ) = aff(X) − α (858)P − α = conv(X − α1 T ) = conv(X) − α (859)5.29 The manipulation of hull functions aff and conv follows from their definitions.

352 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBecause (855) and (856) translate,R n ⊇ A −α = aff(X − α1 T ) = aff(P − α) ⊇ P − α ⊇ {x l − α} (860)where from the previous relations it is easily shownaff(P − α) = aff(P) − α (861)Translating A neither changes its dimension or the dimension of theembedded polyhedron P ; (67)r ∆ = dim A = dim(A − α) ∆ = dim(P − α) = dim P (862)For any α ∈ R n , (858)-(862) remain true. [232, p.4, p.12] Yet when α ∈ A ,the affine set A − α becomes a unique subspace of R n in which the {x l − α}and their convex hull P − α are embedded (860), and whose dimension ismore easily calculated.5.7.1.0.1 Example. Translating first list-member to origin.Subtracting the first member α = ∆ x 1 from every list member will translatetheir affine hull A and their convex hull P and, in particular, x 1 ∈ P ⊆ A tothe origin in R n ; videlicet,X − x 1 1 T = X − Xe 1 1 T = X(I − e 1 1 T ) = X[0 √ ]2V N ∈ R n×N (863)where V N is defined in (740), and e 1 in (750). Applying (860) to (863),R n ⊇ R(XV N ) = A−x 1 = aff(X −x 1 1 T ) = aff(P −x 1 ) ⊇ P −x 1 ∋ 0(864)where XV N ∈ R n×N−1 . Hencer = dim R(XV N ) (865)Since shifting the geometric center to the origin (5.5.1.0.1) translates theaffine hull to the origin as well, then it must also be truer = dim R(XV ) (866)

5.7. EMBEDDING IN AFFINE HULL 353For any matrix whose range is R(V )= N(1 T ) we get the same result; e.g.,becauser = dim R(XV †TN ) (867)R(XV ) = {Xz | z ∈ N(1 T )} (868)and R(V ) = R(V N ) = R(V †TN) (E). These auxiliary matrices (B.4.2) aremore closely related;V = V N V † N(1466)5.7.1.1 Affine dimension r versus rankNow, suppose D is an EDM as defined byD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (734)and we premultiply by −V T N and postmultiply by V N . Then because V T N 1=0(741), it is always true that−V T NDV N = 2V T NX T XV N = 2V T N GV N ∈ S N−1 (869)where G is a Gram matrix.(confer (756))Similarly pre- and postmultiplying by V−V DV = 2V X T XV = 2V GV ∈ S N (870)always holds because V 1=0 (1456). Likewise, multiplying inner-productform EDM definition (799), it always holds:−V T NDV N = Θ T Θ ∈ S N−1 (803)For any matrix A , rankA T A = rankA = rankA T . [150,0.4] 5.30 So, by(868), affine dimensionr = rankXV = rankXV N = rankXV †TN= rank Θ= rankV DV = rankV GV = rankVN TDV N = rankVN TGV N(871)5.30 For A∈R m×n , N(A T A) = N(A). [251,3.3]

354 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBy conservation of dimension, (A.7.3.0.1)r + dim N(VNDV T N ) = N −1 (872)r + dim N(V DV ) = N (873)For D ∈ EDM N −V T NDV N ≻ 0 ⇔ r = N −1 (874)but −V DV ⊁ 0. The general fact 5.31 (confer (767))r ≤ min{n, N −1} (875)is evident from (863) but can be visualized in the example illustrated inFigure 79. There we imagine a vector from the origin to each point in thelist. Those three vectors are linearly independent in R 3 , but affine dimensionr is 2 because the three points lie in a plane. When that plane is translatedto the origin, it becomes the only subspace of dimension r=2 that cancontain the translated triangular polyhedron.5.7.2 PrécisWe collect expressions for affine dimension: for list X ∈ R n×N and Grammatrix G∈ S N +r∆= dim(P − α) = dim P = dim conv X= dim(A − α) = dim A = dim aff X= rank(X − x 1 1 T ) = rank(X − α c 1 T )= rank Θ (801)= rankXV N = rankXV = rankXV †TN= rankX , Xe 1 = 0 or X1=0= rankVN TGV N = rankV GV = rankV † N GV N= rankG , Ge 1 = 0 (751) or G1=0 (756)= rankVN TDV N = rankV DV = rankV † N DV N = rankV N (VN TDV N)VNT= rank Λ (961)([ ]) 0 1T= N −1 − dim N1 −D[ 0 1T= rank1 −D]− 2 (883)(876)⎫⎪⎬D ∈ EDM N⎪⎭5.31 rankX ≤ min{n , N}

5.7. EMBEDDING IN AFFINE HULL 3555.7.3 Eigenvalues of −V DV versus −V † N DV NSuppose for D ∈ EDM N we are given eigenvectors v i ∈ R N of −V DV andcorresponding eigenvalues λ∈ R N so that−V DV v i = λ i v i , i = 1... N (877)From these we can determine the eigenvectors and eigenvalues of −V † N DV N :Defineν i ∆ = V † N v i , λ i ≠ 0 (878)Then we have:−V DV N V † N v i = λ i v i (879)−V † N V DV N ν i = λ i V † N v i (880)−V † N DV N ν i = λ i ν i (881)the eigenvectors of −V † N DV N are given by (878) while its correspondingnonzero eigenvalues are identical to those of −V DV although −V † N DV Nis not necessarily positive semidefinite. In contrast, −VN TDV N is positivesemidefinite but its nonzero eigenvalues are generally different.5.7.3.0.1 Theorem. EDM rank versus affine dimension r .[113,3] [133,3] [112,3] For D ∈ EDM N (confer (1036))1. r = rank(D) − 1 ⇔ 1 T D † 1 ≠ 0Points constituting a list X generating the polyhedron corresponding toD lie on the relative boundary of an r-dimensional circumhyperspherehavingdiameter = √ 2 ( 1 T D † 1 ) −1/2circumcenter = XD† 11 T D † 1(882)2. r = rank(D) − 2 ⇔ 1 T D † 1 = 0There can be no circumhypersphere whose relative boundary containsa generating list for the corresponding polyhedron.3. In Cayley-Menger form [77,6.2] [60,3.3] [37,40] (5.11.2),([ ]) [ ]0 1T 0 1Tr = N −1 − dim N= rank − 2 (883)1 −D 1 −DCircumhyperspheres exist for r< rank(D)−2. [263,7]⋄

356 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFor all practical purposes, (875)max{0, rank(D) − 2} ≤ r ≤ min{n, N −1} (884)5.8 Euclidean metric versus matrix criteria5.8.1 Nonnegativity property 1When D=[d ij ] is an EDM (734), then it is apparent from (869)2VNX T T XV N = −VNDV T N ≽ 0 (885)because for any matrix A , A T A≽0 . 5.32 We claim nonnegativity of the d ijis enforced primarily by the matrix inequality (885); id est,−V T N DV N ≽ 0D ∈ S N h}⇒ d ij ≥ 0, i ≠ j (886)(The matrix inequality to enforce strict positivity differs by a stroke of thepen. (889))We now support our claim: If any matrix A∈ R m×m is positivesemidefinite, then its main diagonal δ(A)∈ R m must have all nonnegativeentries. [110,4.2] Given D ∈ S N h−VN TDV N =⎡⎤1d 12 2 (d 112+d 13 −d 23 )2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 12+d 1N −d 2N )12 (d 112+d 13 −d 23 ) d 13 2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 13+d 1N −d 3N )1⎢2 (d 11,j+1+d 1,i+1 −d j+1,i+1 )2 (d .1,j+1+d 1,i+1 −d j+1,i+1 ) d .. 11,i+1 2 (d 14+d 1N −d 4N )⎥⎣..... . .. . ⎦12 (d 112+d 1N −d 2N )2 (d 113+d 1N −d 3N )2 (d 14+d 1N −d 4N ) · · · d 1N= 1 2 (1D 1,2:N + D 2:N,1 1 T − D 2:N,2:N ) ∈ S N−1 (887)5.32 For A∈R m×n , A T A ≽ 0 ⇔ y T A T Ay = ‖Ay‖ 2 ≥ 0 for all ‖y‖ = 1. When A isfull-rank skinny-or-square, A T A ≻ 0.

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 357where row,column indices i,j ∈ {1... N −1}. [236] It follows:−V T N DV N ≽ 0D ∈ S N h}⎡⇒ δ(−VNDV T N ) = ⎢⎣⎤d 12d 13⎥.d 1N⎦ ≽ 0 (888)Multiplication of V N by any permutation matrix Ξ has null effect on its rangeand nullspace. In other words, any permutation of the rows or columns of V Nproduces a basis for N(1 T ); id est, R(Ξ r V N )= R(V N Ξ c )= R(V N )= N(1 T ).Hence, −VN TDV N ≽ 0 ⇔ −VN T ΞT rDΞ r V N ≽ 0 (⇔ −Ξ T c VN TDV N Ξ c ≽ 0).Various permutation matrices 5.33 will sift the remaining d ij similarlyto (888) thereby proving their nonnegativity. Hence −VN TDV N ≽ 0 isa sufficient test for the first property (5.2) of the Euclidean metric,nonnegativity.When affine dimension r equals 1, in particular, nonnegativity symmetryand hollowness become necessary and sufficient criteria satisfying matrixinequality (885). (6.6.0.0.1)5.8.1.1 Strict positivityShould we require the points in R n to be distinct, then entries of D off themain diagonal must be strictly positive {d ij > 0, i ≠ j} and only those entriesalong the main diagonal of D are 0. By similar argument, the strict matrixinequality is a sufficient test for strict positivity of Euclidean distance-square;−V T N DV N ≻ 0D ∈ S N h}⇒ d ij > 0, i ≠ j (889)5.33 The rule of thumb is: If Ξ r (i,1) = 1, then δ(−V T N ΞT rDΞ r V N )∈ R N−1 is somepermutation of the i th row or column of D excepting the 0 entry from the main diagonal.

358 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.8.2 Triangle inequality property 4In light of Kreyszig’s observation [167,1.1, prob.15] that properties 2through 4 of the Euclidean metric (5.2) together imply nonnegativityproperty 1,2 √ d jk = √ d jk + √ d kj ≥ √ d jj = 0 , j ≠k (890)nonnegativity criterion (886) suggests that the matrix inequality−V T N DV N ≽ 0 might somehow take on the role of triangle inequality; id est,δ(D) = 0D T = D−V T N DV N ≽ 0⎫⎬⎭ ⇒ √ d ij ≤ √ d ik + √ d kj , i≠j ≠k (891)We now show that is indeed the case: Let T be the leading principalsubmatrix in S 2 of −VN TDV N (upper left 2×2 submatrix from (887));[T =∆1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13(892)Submatrix T must be positive (semi)definite whenever −VN TDV N(A.3.1.0.4,5.8.3) Now we have,is.−V T N DV N ≽ 0 ⇒ T ≽ 0 ⇔ λ 1 ≥ λ 2 ≥ 0−V T N DV N ≻ 0 ⇒ T ≻ 0 ⇔ λ 1 > λ 2 > 0(893)where λ 1 and λ 2 are the eigenvalues of T , real due only to symmetry of T :(λ 1 = 1 d2 12 + d 13 + √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(λ 2 = 1 d2 12 + d 13 − √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(894)Nonnegativity of eigenvalue λ 1 is guaranteed by only nonnegativity of the d ijwhich in turn is guaranteed by matrix inequality (886). Inequality betweenthe eigenvalues in (893) follows from only realness of the d ij . Since λ 1always equals or exceeds λ 2 , conditions for the positive (semi)definiteness ofsubmatrix T can be completely determined by examining λ 2 the smaller ofits two eigenvalues. A triangle inequality is made apparent when we express

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 359T eigenvalue nonnegativity in terms of D matrix entries; videlicet,T ≽ 0 ⇔ detT = λ 1 λ 2 ≥ 0 , d 12 ,d 13 ≥ 0 (c)⇔λ 2 ≥ 0(b)⇔| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)(895)Triangle inequality (895a) (confer (797) (907)), in terms of three rootedentries from D , is equivalent to metric property 4√d13 ≤ √ d 12 + √ d 23√d23≤ √ d 12 + √ d 13√d12≤ √ d 13 + √ d 23(896)for the corresponding points x 1 ,x 2 ,x 3 from some length-N list. 5.345.8.2.1 CommentGiven D whose dimension N equals or exceeds 3, there are N!/(3!(N − 3)!)distinct triangle inequalities in total like (797) that must be satisfied, of whicheach d ij is involved in N−2, and each point x i is in (N −1)!/(2!(N −1 − 2)!).We have so far revealed only one of those triangle inequalities; namely, (895a)that came from T (892). Yet we claim if −VN TDV N ≽ 0 then all triangleinequalities will be satisfied simultaneously;| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj , i

360 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.8.2.1.1 Shore. The columns of Ξ r V N Ξ c hold a basis for N(1 T )when Ξ r and Ξ c are permutation matrices. In other words, any permutationof the rows or columns of V N leaves its range and nullspace unchanged;id est, R(Ξ r V N Ξ c )= R(V N )= N(1 T ) (741). Hence, two distinct matrixinequalities can be equivalent tests of the positive semidefiniteness of D onR(V N ) ; id est, −VN TDV N ≽ 0 ⇔ −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c )≽0. By properlychoosing permutation matrices, 5.35 the leading principal submatrix T Ξ ∈ S 2of −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c ) may be loaded with the entries of D needed totest any particular triangle inequality (similarly to (887)-(895)). Because allthe triangle inequalities can be individually tested using a test equivalent tothe lone matrix inequality −VN TDV N ≽0, it logically follows that the lonematrix inequality tests all those triangle inequalities simultaneously. Weconclude that −VN TDV N ≽0 is a sufficient test for the fourth property of theEuclidean metric, triangle inequality.5.8.2.2 Strict triangle inequalityWithout exception, all the inequalities in (895) and (896) can be madestrict while their corresponding implications remain true. The thenstrict inequality (895a) or (896) may be interpreted as a strict triangleinequality under which collinear arrangement of points is not allowed.[164,24/6, p.322] Hence by similar reasoning, −VN TDV N ≻ 0 is a sufficienttest of all the strict triangle inequalities; id est,δ(D) = 0D T = D−V T N DV N ≻ 0⎫⎬⎭ ⇒ √ d ij < √ d ik + √ d kj , i≠j ≠k (898)5.8.3 −V T N DV N nestingFrom (892) observe that T =−VN TDV N | N←3 . In fact, for D ∈ EDM N , theleading principal submatrices of −VN TDV N form a nested sequence (byinclusion) whose members are individually positive semidefinite [110] [150][251] and have the same form as T ; videlicet, 5.365.35 To individually test triangle inequality | √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj forparticular i,k,j, set Ξ r (i,1)= Ξ r (k,2)= Ξ r (j,3)=1 and Ξ c = I .5.36 −V DV | N←1 = 0 ∈ S 0 + (B.4.1)

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 361−V T N DV N | N←1 = [ ∅ ] (o)−V T N DV N | N←2 = [d 12 ] ∈ S + (a)−V T N DV N | N←3 =−V T N DV N | N←4 =[⎡⎢⎣1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13= T ∈ S 2 + (b)1d 12 (d 12 12+d 13 −d 23 ) (d ⎤2 12+d 14 −d 24 )1(d 12 12+d 13 −d 23 ) d 13 (d 2 13+d 14 −d 34 )1(d 12 12+d 14 −d 24 ) (d 2 13+d 14 −d 34 ) d 14⎥⎦ (c).−VN TDV N | N←i =.−VN TDV N =⎡⎣⎡⎣−VN TDV ⎤N | N←i−1 ν(i)⎦ ∈ S i−1ν(i) T + (d)d 1i−VN TDV ⎤N | N←N−1 ν(N)⎦ ∈ S N−1ν(N) T + (e)d 1N (899)where⎡ν(i) = ∆ 1 ⎢2 ⎣⎤d 12 +d 1i −d 2id 13 +d 1i −d 3i⎥. ⎦ ∈ Ri−2 , i > 2 (900)d 1,i−1 +d 1i −d i−1,iHence, the leading principal submatrices of EDM D must also be EDMs. 5.37Bordered symmetric matrices in the form (899d) are known to haveintertwined [251,6.4] (or interlaced [150,4.3] [248,IV.4.1]) eigenvalues;(confer5.11.1) that means, for the particular submatrices (899a) and (899b),5.37 In fact, each and every principal submatrix of an EDM D is another EDM. [172,4.1]

362 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXλ 2 ≤ d 12 ≤ λ 1 (901)where d 12 is the eigenvalue of submatrix (899a) and λ 1 , λ 2 are theeigenvalues of T (899b) (892). Intertwining in (901) predicts that shouldd 12 become 0, then λ 2 must go to 0 . 5.38 The eigenvalues are similarlyintertwined for submatrices (899b) and (899c);γ 3 ≤ λ 2 ≤ γ 2 ≤ λ 1 ≤ γ 1 (902)where γ 1 , γ 2 ,γ 3 are the eigenvalues of submatrix (899c). Intertwininglikewise predicts that should λ 2 become 0 (a possibility revealed in5.8.3.1),then γ 3 must go to 0. Combining results so far for N = 2, 3, 4: (901) (902)γ 3 ≤ λ 2 ≤ d 12 ≤ λ 1 ≤ γ 1 (903)The preceding logic extends by induction through the remaining membersof the sequence (899).5.8.3.1 Tightening the triangle inequalityNow we apply Schur complement fromA.4 to tighten the triangle inequalityfrom (891) in case: cardinality N = 4. We find that the gains by doing soare modest. From (899) we identify:[ A] BB T C∆= −V T NDV N | N←4 (904)A ∆ = T = −V T NDV N | N←3 (905)both positive semidefinite by assumption, where B=ν(4) (900), andC = d 14 . Using nonstrict CC † -form (1337), C ≽0 by assumption (5.8.1)and CC † =I . So by the positive semidefinite ordering of eigenvalues theorem(A.3.1.0.1),−V T NDV N | N←4 ≽ 0 ⇔ T ≽ d −114 ν(4)ν(4) T ⇒{λ1 ≥ d −114 ‖ν(4)‖ 2λ 2 ≥ 0(906)where {d −114 ‖ν(4)‖ 2 , 0} are the eigenvalues of d −114 ν(4)ν(4) T while λ 1 , λ 2 arethe eigenvalues of T .5.38 If d 12 were 0, eigenvalue λ 2 becomes 0 (894) because d 13 must then be equal to d 23 ;id est, d 12 = 0 ⇔ x 1 = x 2 . (5.4)

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 3635.8.3.1.1 Example. Small completion problem, II.Applying the inequality for λ 1 in (906) to the small completion problem onpage 308 Figure 80, the lower bound on √ d 14 (1.236 in (727)) is tightenedto 1.289 . The correct value of √ d 14 to three significant figures is 1.414 .5.8.4 Affine dimension reduction in two dimensions(confer5.14.4) The leading principal 2×2 submatrix T of −VN TDV N haslargest eigenvalue λ 1 (894) which is a convex function of D . 5.39 λ 1 can neverbe 0 unless d 12 = d 13 = d 23 = 0. Eigenvalue λ 1 can never be negative whilethe d ij are nonnegative. The remaining eigenvalue λ 2 is a concave functionof D that becomes 0 only at the upper and lower bounds of inequality (895a)and its equivalent forms: (confer (897))| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)⇔| √ d 12 − √ d 13 | ≤ √ d 23 ≤ √ d 12 + √ d 13 (b)⇔| √ d 13 − √ d 23 | ≤ √ d 12 ≤ √ d 13 + √ d 23 (c)(907)In between those bounds, λ 2 is strictly positive; otherwise, it would benegative but prevented by the condition T ≽ 0.When λ 2 becomes 0, it means triangle △ 123 has collapsed to a linesegment; a potential reduction in affine dimension r . The same logic is validfor any particular principal 2×2 submatrix of −VN TDV N , hence applicableto other triangles.5.39 The largest eigenvalue of any symmetric matrix is always a convex function of itsentries, while the ⎡ smallest ⎤ eigenvalue is always concave. [46, exmp.3.10] In our particulard 12case, say d =∆ ⎣d 13⎦∈ R 3 . Then the Hessian (1563) ∇ 2 λ 1 (d)≽0 certifies convexityd 23whereas ∇ 2 λ 2 (d)≼0 certifies concavity. Each Hessian has rank equal to 1. The respectivegradients ∇λ 1 (d) and ∇λ 2 (d) are nowhere 0.

364 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.9 Bridge: Convex polyhedra to EDMsThe criteria for the existence of an EDM include, by definition (734) (799),the properties imposed upon its entries d ij by the Euclidean metric. From5.8.1 and5.8.2, we know there is a relationship of matrix criteria to thoseproperties. Here is a snapshot of what we are sure: for i , j , k ∈{1... N}(confer5.2)√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇐−V T N DV N ≽ 0δ(D) = 0D T = D(908)all implied by D ∈ EDM N . In words, these four Euclidean metric propertiesare necessary conditions for D to be a distance matrix. At the moment,we have no converse. As of concern in5.3, we have yet to establishmetric requirements beyond the four Euclidean metric properties that wouldallow D to be certified an EDM or might facilitate polyhedron or listreconstruction from an incomplete EDM. We deal with this problem in5.14.Our present goal is to establish ab initio the necessary and sufficient matrixcriteria that will subsume all the Euclidean metric properties and any furtherrequirements 5.40 for all N >1 (5.8.3); id est,−V T N DV N ≽ 0D ∈ S N h}⇔ D ∈ EDM N (753)or for EDM definition (808),}Ω ≽ 0√δ(d) ≽ 0⇔ D = D(Ω,d) ∈ EDM N (909)5.40 In 1935, Schoenberg [236, (1)] first extolled matrix product −VN TDV N (887)(predicated on symmetry and self-distance) specifically incorporating V N , albeitalgebraically. He showed: nonnegativity −y T VN TDV N y ≥ 0, for all y ∈ R N−1 , is necessaryand sufficient for D to be an EDM. Gower [112,3] remarks how surprising it is that sucha fundamental property of Euclidean geometry was obtained so late.

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 365Figure 92: Elliptope E 3 in isometrically isomorphic R 6 (projected on R 3 )is a convex body that appears to possess some kind of symmetry in thisdimension; it resembles a malformed pillow in the shape of a bulgingtetrahedron. Elliptope relative boundary is not smooth and comprises allset members (910) having at least one 0 eigenvalue. [175,2.1] This elliptopehas an infinity of vertices, but there are only four vertices corresponding toa rank-1 matrix. Those yy T , evident in the illustration, have binary vectory ∈ R 3 with entries in {±1}.

366 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX0Figure 93: Elliptope E 2 in isometrically isomorphic R 3 is a line segmentillustrated interior to positive semidefinite cone S 2 + (Figure 33).5.9.1 Geometric arguments5.9.1.0.1 Definition. Elliptope: [175] [172,2.3] [77,31.5]a unique bounded immutable convex Euclidean body in S n ; intersection ofpositive semidefinite cone S n + with that set of n hyperplanes defined by unitymain diagonal;E n ∆ = S n + ∩ {Φ∈ S n | δ(Φ)=1} (910)a.k.a, the set of all correlation matrices of dimensiondim E n = n(n −1)/2 in R n(n+1)/2 (911)An elliptope E n is not a polyhedron, in general, but has some polyhedral facesand an infinity of vertices. 5.41 Of those, 2 n−1 vertices are extreme directionsyy T of the positive semidefinite cone where the entries of vector y ∈ R n eachbelong to {±1} while the vector exercises every combination. Each of theremaining vertices has rank belonging to the set {k>0 | k(k + 1)/2 ≤ n}.Each and every face of an elliptope is exposed.△5.41 Laurent defines vertex distinctly from the sense herein (2.6.1.0.1); she defines vertexas a point with full-dimensional (nonempty interior) normal cone (E.10.3.2.1). Herdefinition excludes point C in Figure 23, for example.

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 367The elliptope for dimension n = 2 is a line segment in isometricallyisomorphic R n(n+1)/2 (Figure 93). Obviously, cone(E n )≠ S n + . The elliptopefor dimension n = 3 is realized in Figure 92.5.9.1.0.2 Lemma. Hypersphere. [15,4] (confer bullet p.318)Matrix A = [A ij ]∈ S N belongs to the elliptope in S N iff there exist N pointsp on the boundary of a hypersphere having radius 1 in R rank A such that‖p i − p j ‖ = √ 2 √ 1 − A ij , i,j=1... N (912)There is a similar theorem for Euclidean distance matrices:We derive matrix criteria for D to be an EDM, validating (753) usingsimple geometry; distance to the polyhedron formed by the convex hull of alist of points (66) in Euclidean space R n .⋄5.9.1.0.3 EDM assertion.D is a Euclidean distance matrix if and only if D ∈ S N h and distances-squarefrom the origin{‖p(y)‖ 2 = −y T V T NDV N y | y ∈ S − β} (913)correspond to points p in some bounded convex polyhedronP − α = {p(y) | y ∈ S − β} (914)having N or fewer vertices embedded in an r-dimensional subspace A − αof R n , where α ∈ A = aff P and where the domain of linear surjection p(y)is the unit simplex S ⊂ R N−1+ shifted such that its vertex at the origin istranslated to −β in R N−1 . When β = 0, then α = x 1 .⋄In terms of V N , the unit simplex (254) in R N−1 has an equivalentrepresentation:S = {s ∈ R N−1 | √ 2V N s ≽ −e 1 } (915)where e 1 is as in (750). Incidental to the EDM assertion, shifting theunit-simplex domain in R N−1 translates the polyhedron P in R n . Indeed,

368 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXthere is a map from vertices of the unit simplex to members of the listgenerating P ;p : R N−1⎛⎧⎪⎨p⎜⎝⎪⎩−βe 1 − βe 2 − β.e N−1 − β⎫⎞⎪⎬⎟⎠⎪⎭→=⎧⎪⎨⎪⎩R nx 1 − αx 2 − αx 3 − α.x N − α⎫⎪⎬⎪⎭(916)5.9.1.0.4 Proof. EDM assertion.(⇒) We demonstrate that if D is an EDM, then each distance-square ‖p(y)‖ 2described by (913) corresponds to a point p in some embedded polyhedronP − α . Assume D is indeed an EDM; id est, D can be made from some listX of N unknown points in Euclidean space R n ; D = D(X) for X ∈ R n×Nas in (734). Since D is translation invariant (5.5.1), we may shift the affinehull A of those unknown points to the origin as in (857). Then take anypoint p in their convex hull (76);P − α = {p = (X − Xb1 T )a | a T 1 = 1, a ≽ 0} (917)where α = Xb ∈ A ⇔ b T 1 = 1. Solutions to a T 1 = 1 are: 5.42a ∈{e 1 + √ }2V N s | s ∈ R N−1(918)where e 1 is as in (750). Similarly, b = e 1 + √ 2V N β .P − α = {p = X(I − (e 1 + √ 2V N β)1 T )(e 1 + √ 2V N s) | √ 2V N s ≽ −e 1 }= {p = X √ 2V N (s − β) | √ 2V N s ≽ −e 1 }(919)that describes the domain of p(s) as the unit simplexS = {s | √ 2V N s ≽ −e 1 } ⊂ R N−1+ (915)5.42 Since R(V N )= N(1 T ) and N(1 T )⊥ R(1) , then over all s∈ R N−1 , V N s is ahyperplane through the origin orthogonal to 1. Thus the solutions {a} constitute ahyperplane orthogonal to the vector 1, and offset from the origin in R N by any particularsolution; in this case, a = e 1 .

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 369Making the substitution s − β ← yP − α = {p = X √ 2V N y | y ∈ S − β} (920)Point p belongs to a convex polyhedron P−α embedded in an r-dimensionalsubspace of R n because the convex hull of any list forms a polyhedron, andbecause the translated affine hull A − α contains the translated polyhedronP − α (860) and the origin (when α∈A), and because A has dimension r bydefinition (862). Now, any distance-square from the origin to the polyhedronP − α can be formulated{p T p = ‖p‖ 2 = 2y T V T NX T XV N y | y ∈ S − β} (921)Applying (869) to (921) we get (913).(⇐) To validate the EDM assertion in the reverse direction, we prove: Ifeach distance-square ‖p(y)‖ 2 (913) on the shifted unit-simplex S −β ⊂ R N−1corresponds to a point p(y) in some embedded polyhedron P − α , then Dis an EDM. The r-dimensional subspace A − α ⊆ R n is spanned byp(S − β) = P − α (922)because A − α = aff(P − α) ⊇ P − α (860). So, outside the domain S − βof linear surjection p(y) , the simplex complement \S − β ⊂ R N−1 mustcontain the domain of the distance-square ‖p(y)‖ 2 = p(y) T p(y) to remainingpoints in the subspace A − α ; id est, to the polyhedron’s relative exterior\P − α . For ‖p(y)‖ 2 to be nonnegative on the entire subspace A − α ,−V T N DV N must be positive semidefinite and is assumed symmetric; 5.43−V T NDV N ∆ = Θ T pΘ p (923)where 5.44 Θ p ∈ R m×N−1 for some m ≥ r . Because p(S − β) is a convexpolyhedron, it is necessarily a set of linear combinations of points from somelength-N list because every convex polyhedron having N or fewer verticescan be generated that way (2.12.2). Equivalent to (913) are{p T p | p ∈ P − α} = {p T p = y T Θ T pΘ p y | y ∈ S − β} (924)5.43 The antisymmetric part ( −V T N DV N − (−V T N DV N) T) /2 is annihilated by ‖p(y)‖ 2 . Bythe same reasoning, any positive (semi)definite matrix A is generally assumed symmetricbecause only the symmetric part (A +A T )/2 survives the test y T Ay ≥ 0. [150,7.1]5.44 A T = A ≽ 0 ⇔ A = R T R for some real matrix R . [251,6.3]

370 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBecause p ∈ P − α may be found by factoring (924), the list Θ p is foundby factoring (923). A unique EDM can be made from that list usinginner-product form definition D(Θ)| Θ=Θp (799). That EDM will be identicalto D if δ(D)=0, by injectivity of D (843).5.9.2 Necessity and sufficiencyFrom (885) we learned that matrix inequality −VN TDV N ≽ 0 is a necessarytest for D to be an EDM. In5.9.1, the connection between convex polyhedraand EDMs was pronounced by the EDM assertion; the matrix inequalitytogether with D ∈ S N h became a sufficient test when the EDM assertiondemanded that every bounded convex polyhedron have a correspondingEDM. For all N >1 (5.8.3), the matrix criteria for the existence of an EDMin (753), (909), and (729) are therefore necessary and sufficient and subsumeall the Euclidean metric properties and further requirements.5.9.3 Example revisitedNow we apply the necessary and sufficient EDM criteria (753) to an earlierproblem.5.9.3.0.1 Example. Small completion problem, III. (confer5.8.3.1.1)Continuing Example 5.3.0.0.2 pertaining to Figure 80 where N = 4,distance-square d 14 is ascertainable from the matrix inequality −VN TDV N ≽0.Because all distances in (726) are known except √ d 14 , we may simplycalculate the smallest eigenvalue of −VN TDV N over a range of d 14 as inFigure 94. We observe a unique value of d 14 satisfying (753) where theabscissa is tangent to the hypograph of the smallest eigenvalue. Sincethe smallest eigenvalue of a symmetric matrix is known to be a concavefunction (5.8.4), we calculate its second partial derivative with respect tod 14 evaluated at 2 and find −1/3. We conclude there are no other satisfyingvalues of d 14 . Further, that value of d 14 does not meet an upper or lowerbound of a triangle inequality like (897), so neither does it cause the collapseof any triangle. Because the smallest eigenvalue is 0, affine dimension r ofany point list corresponding to D cannot exceed N −2. (5.7.1.1)

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 371smallest eigenvalue1 2 3 4d 14-0.2-0.4-0.6Figure 94: Smallest eigenvalue of −VN TDV None value of d 14 : 2.makes it a PSD matrix for only1.41.2d 1/αijlog 2 (1+d 1/αij )(a)10.80.60.40.21−e −αd ij40.5 1 1.5 2d ij3(b)21d 1/αijlog 2 (1+d 1/αij )1−e −αd ij0.5 1 1.5 2αFigure 95: Some entrywise EDM compositions: (a) α = 2. Concavenondecreasing in d ij . (b) Trajectory convergence in α for d ij = 2.

372 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.10 EDM-entry compositionLaurent [172,2.3] applies results from Schoenberg (1938) [237] to showcertain nonlinear compositions of individual EDM entries yield EDMs;in particular,D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0⇔ [e −αd ij] ∈ E N ∀α > 0(925)where D = [d ij ] and E N is the elliptope (910).5.10.0.0.1 Proof. (Laurent, 2003) [237] (confer [167])Lemma 2.1. from A Tour d’Horizon ...on Completion Problems. [172]The following assertions are equivalent: for D=[d ij , i,j=1... N]∈ S N h andE N the elliptope in S N (5.9.1.0.1),(i) D ∈ EDM N(ii) e −αD = ∆ [e −αd ij] ∈ E N for all α > 0(iii) 11 T − e −αD ∆ = [1 − e −αd ij] ∈ EDM N for all α > 0⋄1) Equivalence of Lemma 2.1 (i) (ii) is stated in Schoenberg’s Theorem 1[237, p.527].2) (ii) ⇒ (iii) can be seen from the statement in the beginning of section 3,saying that a distance space embeds in L 2 iff some associated matrixis PSD. We reformulate it:Let d =(d ij ) i,j=0,1...N be a distance space on N+1 points(i.e., symmetric hollow matrix of order N+1) and let p =(p ij ) i,j=1...Nbe the symmetric matrix of order N related by:(A) 2p ij = d 0i + d 0j − d ij for i,j = 1... Nor equivalently(B) d 0i = p ii ,d ij = p ii + p jj − 2p ij for i,j = 1... NThen d embeds in L 2 iff p is a positive semidefinite matrix iff d is ofnegative type (second half page 525/top of page 526 in [237]).

5.10. EDM-ENTRY COMPOSITION 373(ii) ⇒ (iii): set p = e −αd and define d ′ from p using (B) above. Thend ′ is a distance space on N+1 points that embeds in L 2 . Thus itssubspace of N points also embeds in L 2 and is precisely 1 − e −αd .Note that (iii) ⇒ (ii) cannot be read immediately from this argumentsince (iii) involves the subdistance of d ′ on N points (and not the fulld ′ on N+1 points).3) Show (iii) ⇒ (i) by using the series expansion of the function 1 − e −αd :the constant term cancels, α factors out; there remains a summationof d plus a multiple of α . Letting α go to 0 gives the result.This is not explicitly written in Schoenberg, but he also uses suchan argument; expansion of the exponential function then α → 0 (firstproof on [237, p.526]).Schoenberg’s results [237,6, thm.5] (confer [167, p.108-109]) alsosuggest certain finite positive roots of EDM entries produce EDMs;specifically,D ∈ EDM N ⇔ [d 1/αij ] ∈ EDM N ∀α > 1 (926)The special case α = 2 is of interest because it corresponds to absolutedistance; e.g.,D ∈ EDM N ⇒ ◦√ D ∈ EDM N (927)Assuming that points constituting a corresponding list X are distinct(889), then it follows: for D ∈ S N hlimα→∞ [d1/α ij] = limα→∞[1 − e −αd ij] = −E ∆ = 11 T − I (928)Negative elementary matrix −E (B.3) is relatively interior to the EDM cone(6.6) and terminal to the respective trajectories (925) and (926) as functionsof α . Both trajectories are confined to the EDM cone; in engineering terms,the EDM cone is an invariant set [234] to either trajectory. Further, if D isnot an EDM but for some particular α p it becomes an EDM, then for allgreater values of α it remains an EDM.

374 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXThese preliminary findings lead one to speculate whether any concavenondecreasing composition of individual EDM entries d ij on R + will produceanother EDM; e.g., empirical evidence suggests that given EDM D , for eachfixed α ≥ 1 [sic] the composition [log 2 (1 + d 1/αij )] is also an EDM. Figure 95illustrates that composition’s concavity in d ij together with functions from(925) and (926).5.10.1 EDM by elliptopeFor some κ∈ R + and C ∈ S N + in the elliptope E N (5.9.1.0.1), Alfakih assertsany given EDM D is expressible [7] [77,31.5]D = κ(11 T − C) ∈ EDM N (929)This expression exhibits nonlinear combination of variables κ and C . Wetherefore propose a different expression requiring redefinition of the elliptope(910) by parametrization;E n t∆= S n + ∩ {Φ∈ S n | δ(Φ)=t1} (930)where, of course, E n = E n 1 . Then any given EDM D is expressiblewhich is linear in variables t∈ R + and E∈ E N t .D = t11 T − E ∈ EDM N (931)5.11 EDM indefinitenessBy the known result inA.7.2 regarding a 0-valued entry on the maindiagonal of a symmetric positive semidefinite matrix, there can be no positivenor negative semidefinite EDM except the 0 matrix because EDM N ⊆ S N h(733) andS N h ∩ S N + = 0 (932)the origin. So when D ∈ EDM N , there can be no factorization D =A T Anor −D =A T A . [251,6.3] Hence eigenvalues of an EDM are neither allnonnegative or all nonpositive; an EDM is indefinite and possibly invertible.

5.11. EDM INDEFINITENESS 3755.11.1 EDM eigenvalues, congruence transformationFor any symmetric −D , we can characterize its eigenvalues by congruencetransformation: [251,6.3][ ]VT−W T NDW = − D [ V N1 T1 ] = −[VTN DV N V T N D11 T DV N 1 T D1]∈ S N (933)BecauseW ∆ = [V N 1 ] ∈ R N×N (934)is full-rank, then (1342)inertia(−D) = inertia ( −W T DW ) (935)the congruence (933) has the same number of positive, zero, and negativeeigenvalues as −D . Further, if we denote by {γ i , i=1... N −1} theeigenvalues of −VN TDV N and denote eigenvalues of the congruence −W T DWby {ζ i , i=1... N} and if we arrange each respective set of eigenvalues innonincreasing order, then by theory of interlacing eigenvalues for borderedsymmetric matrices [150,4.3] [251,6.4] [248,IV.4.1]ζ N ≤ γ N−1 ≤ ζ N−1 ≤ γ N−2 ≤ · · · ≤ γ 2 ≤ ζ 2 ≤ γ 1 ≤ ζ 1 (936)When D ∈ EDM N , then γ i ≥ 0 ∀i (1279) because −V T N DV N ≽0 as weknow. That means the congruence must have N −1 nonnegative eigenvalues;ζ i ≥ 0, i=1... N −1. The remaining eigenvalue ζ N cannot be nonnegativebecause then −D would be positive semidefinite, an impossibility; so ζ N < 0.By congruence, nontrivial −D must therefore have exactly one negativeeigenvalue; 5.45 [77,2.4.5]5.45 All the entries of the corresponding eigenvector must have the same sign with respectto each other [61, p.116] because that eigenvector is the Perron vector corresponding tothe spectral radius; [150,8.2.6] the predominant characteristic of negative [sic] matrices.

376 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N ⇒⎧λ(−D) i ≥ 0, i=1... N −1⎪⎨ ( N)∑λ(−D) i = 0i=1⎪⎩D ∈ S N h ∩ R N×N+(937)where the λ(−D) i are nonincreasingly ordered eigenvalues of −D whosesum must be 0 only because trD = 0 [251,5.1]. The eigenvalue summationcondition, therefore, can be considered redundant. Even so, all theseconditions are insufficient to determine whether some given H ∈ S N h is anEDM, as shown by counter-example. 5.465.11.1.0.1 Exercise. Spectral inequality.Prove whether it holds: for D=[d ij ]∈ EDM Nλ(−D) 1 ≥ d ij ≥ λ(−D) N−1 ∀i ≠ j (938)Terminology: a spectral cone is a convex cone containing all eigenspectracorresponding to some set of matrices. Any positive semidefinite matrix, forexample, possesses a vector of nonnegative eigenvalues corresponding to aneigenspectrum contained in a spectral cone that is a nonnegative orthant.5.11.2 Spectral cones[ 0 1TDenoting the eigenvalues of Cayley-Menger matrix1 −D([ 0 1Tλ1 −D]∈ S N+1 by])∈ R N+1 (939)we have the Cayley-Menger form (5.7.3.0.1) of necessary and sufficientconditions for D ∈ EDM N from the literature: [133,3] 5.47 [53,3] [77,6.2]5.46 When N = 3, for example, the symmetric hollow matrix⎡ ⎤0 1 1H = ⎣ 1 0 5 ⎦ ∈ S N h ∩ R+N×N1 5 0is not an EDM, although λ(−H) = [5 0.3723 −5.3723] T conforms to (937).5.47 Recall: for D ∈ S N h , −V T N DV N ≽ 0 subsumes nonnegativity property 1 (5.8.1).

5.11. EDM INDEFINITENESS 377(confer (753) (729))⎧ ([ ])⎪⎨ 0 1Tλ≥ 0 ,D ∈ EDM N ⇔1 −Di⎪⎩D ∈ S N hi = 1... N⎫⎪⎬⎪⎭ ⇔ {−VTN DV N ≽ 0D ∈ S N h(940)These conditions say the Cayley-Menger form has ([ one and])only one negative0 1Teigenvalue. When D is an EDM, eigenvalues λ belong to that1 −Dparticular orthant in R N+1 having the N+1 th coordinate as sole negativecoordinate 5.48 :[ ]RN+= cone {eR 1 , e 2 , · · · e N , −e N+1 } (941)−5.11.2.1 Cayley-Menger versus SchoenbergConnection to the Schoenberg criterion (753) is made when theCayley-Menger form is further partitioned:⎡ [ ] [ ] ⎤[ ] 0 1T 0 1 1 T⎢= ⎣ 1 0 −D 1,2:N⎥⎦ (942)1 −D[1 −D 2:N,1 ] −D 2:N,2:N[ ] 0 1Matrix D ∈ S N h is an EDM if and only if the Schur complement of1 0(A.4) in this partition is positive semidefinite; [15,1] [159,3] id est,(confer (887))D ∈ EDM N⇔[ ][ ]0 1 1−D 2:N,2:N − [1 −D 2:N,1 ]T= −2VN 1 0 −D TDV N ≽ 01,2:Nand(943)D ∈ S N h5.48 Empirically, all except one entry of the corresponding eigenvector have the same signwith respect to each other.

378 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXPositive semidefiniteness of that Schur complement insures nonnegativity(D ∈ R N×N+ ,5.8.1), whereas complementary inertia (1344) insures that lonenegative eigenvalue of the Cayley-Menger form.Now we apply results from chapter 2 with regard to polyhedral cones andtheir duals.5.11.2.2 Ordered eigenspectraConditions (940) specify eigenvalue [ membership ] to the smallest pointed0 1Tpolyhedral spectral cone for1 −EDM N :K ∆ λ = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N ≥ 0 ≥ ζ N+1 , 1 T ζ = 0}[ ]RN+= K M ∩ ∩ ∂HR −([ ])0 1T= λ1 −EDM N(944)where∂H ∆ = {ζ ∈ R N+1 | 1 T ζ = 0} (945)is a hyperplane through the origin, and K M is the monotone cone(2.13.9.4.2, implying ordered eigenspectra) which has nonempty interior butis not pointed;K M = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N+1 } (378)K ∗ M = { [e 1 − e 2 e 2 −e 3 · · · e N −e N+1 ]a | a ≽ 0 } ⊂ R N+1 (379)So because of the hyperplane,indicating K λ has empty interior. Defining⎡⎡e T 1 − e T ⎤2A =∆ ⎢e T 2 − e T 3 ⎥⎣ . ⎦ ∈ RN×N+1 , B ∆ =⎢e T N − ⎣eT N+1dim aff K λ = dim ∂H = N (946)e T 1e T2 .e T N−e T N+1⎤⎥⎦ ∈ RN+1×N+1 (947)

5.11. EDM INDEFINITENESS 379we have the halfspace-description:K λ = {ζ ∈ R N+1 | Aζ ≽ 0, Bζ ≽ 0, 1 T ζ = 0} (948)From this and (386) we get a vertex-description for a pointed spectral conehaving empty interior:K λ ={V N([ ÂˆB] ) †}V N b | b ≽ 0(949)where V N ∈ R N+1×N , and where [sic]ˆB = e T N ∈ R 1×N+1 (950)andÂ =⎡⎢⎣e T 1 − e T 2e T 2 − e T 3.e T N−1 − eT N⎤⎥⎦ ∈ RN−1×N+1 (951)hold [ ] those rows of A and B corresponding to conically independent rows inAVB N .Conditions (940) can be equivalently restated in terms of a spectral conefor Euclidean distance matrices:⎧ ([ ]) [ ]⎪⎨ 0 1TRN+λ∈ KD ∈ EDM N ⇔ 1 −DM ∩ ∩ ∂HR − (952)⎪⎩D ∈ S N hVertex-description of the dual spectral cone is, (273)K ∗ λ = K∗ M + [RN+R −] ∗+ ∂H ∗ ⊆ R N+1= {[ A T B T 1 −1 ] b | b ≽ 0 } ={ }[ÂT ˆBT 1 −1]a | a ≽ 0(953)

380 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFrom (949) and (387) we get a halfspace-description:K ∗ λ = {y ∈ R N+1 | (V T N [ÂT ˆBT ]) † V T N y ≽ 0} (954)This polyhedral dual spectral cone K ∗ λ is closed, convex, has nonemptyinterior because K λ is pointed, but is not pointed because K λ has emptyinterior.5.11.2.3 Unordered eigenspectraSpectral cones are not unique; eigenspectra ordering can be rendered benignwithin a cone by presorting a vector of eigenvalues into nonincreasingorder. 5.49 Then things simplify: Conditions (940) now specify eigenvaluemembership to the spectral cone([ ]) [ ]0 1T RNλ1 −EDM N += ∩ ∂HR − (955)= {ζ ∈ R N+1 | Bζ ≽ 0, 1 T ζ = 0}where B is defined in (947), and ∂H in (945). From (386) we get avertex-description for a pointed spectral cone having empty interior:([ ])0 1T {λ1 −EDM N = V N ( ˜BV}N ) † b | b ≽ 0{[ ] } (956)I=−1 T b | b ≽ 0where V N ∈ R N+1×N and˜B ∆ =⎡⎢⎣e T 1e T2 .e T N⎤⎥⎦ ∈ RN×N+1 (957)holds only those rows of B corresponding to conically independent rowsin BV N .5.49 Eigenspectra ordering (represented by a cone having monotone description such as(944)) becomes benign in (1164), for example, where projection of a given presorted vectoron the nonnegative orthant in a subspace is equivalent to its projection on the monotonenonnegative cone in that same subspace; equivalence is a consequence of presorting.

5.11. EDM INDEFINITENESS 381For presorted eigenvalues, (940) can be equivalently restatedD ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈[ ]RN+∩ ∂HR −(958)Vertex-description of the dual spectral cone is, (273)([ ]) 0 1T ∗λ1 −EDM N =[ ]RN++ ∂H ∗ ⊆ R N+1R −= {[ B T 1 −1 ] b | b ≽ 0 } =From (387) we get a halfspace-description:{ [ }˜BT 1 −1]a | a ≽ 0(959)([ ]) 0 1T ∗λ1 −EDM N = {y ∈ R N+1 | (VN T ˜B T ) † VN T y ≽ 0}= {y ∈ R N+1 | [I −1 ]y ≽ 0}(960)This polyhedral dual spectral cone is closed, convex, has nonempty interiorbut is not pointed. (Notice that any nonincreasingly ordered eigenspectrumbelongs to this dual spectral cone.)5.11.2.4 Dual cone versus dual spectral coneAn open question regards the relationship of convex cones and their duals tothe corresponding spectral cones and their duals. A positive semidefinitecone, for example, is self-dual. Both the nonnegative orthant and themonotone nonnegative cone are spectral cones for it. When we considerthe nonnegative orthant, then that spectral cone for the self-dual positivesemidefinite cone is also self-dual.

382 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.12 List reconstructionThe traditional term multidimensional scaling 5.50 [189] [71] [266] [69][192] [61] refers to any reconstruction of a list X ∈ R n×N in Euclideanspace from interpoint distance information, possibly incomplete (6.4),ordinal (5.13.2), or specified perhaps only by bounding-constraints(5.4.2.2.7) [267]. Techniques for reconstruction are essentially methodsfor optimally embedding an unknown list of points, corresponding togiven Euclidean distance data, in an affine subset of desired or minimumdimension. The oldest known precursor is called principal componentanalysis [115] which analyzes the correlation matrix (5.9.1.0.1); [39,22]a.k.a, Karhunen−Loéve transform in the digital signal processing literature.Isometric reconstruction (5.5.3) of point list X is best performed by eigendecomposition of a Gram matrix; for then, numerical errors of factorizationare easily spotted in the eigenvalues.We now consider how rotation/reflection and translation invariance factorinto a reconstruction.5.12.1 x 1 at the origin. V NAt the stage of reconstruction, we have D ∈ EDM N and we wish to finda generating list (2.3.2) for P − α by factoring positive semidefinite−VN TDV N (923) as suggested in5.9.1.0.4. One way to factor −VN TDV Nis via diagonalization of symmetric matrices; [251,5.6] [150] (A.5.2,A.3)−VNDV T ∆ N = QΛQ T (961)QΛQ T ≽ 0 ⇔ Λ ≽ 0 (962)where Q∈ R N−1×N−1 is an orthogonal matrix containing eigenvectorswhile Λ∈ S N−1 is a diagonal matrix containing corresponding nonnegativeeigenvalues ordered by nonincreasing value. From the diagonalization,identify the list using (869);−V T NDV N = 2V T NX T XV N ∆ = Q √ ΛQ T pQ p√ΛQT(963)5.50 Scaling [264] means making a scale, i.e., a numerical representation of qualitative data.If the scale is multidimensional, it’s multidimensional scaling.−Jan de LeeuwWhen the metric is Euclidean distance, then reconstruction is termed metricmultidimensional scaling.

5.12. LIST RECONSTRUCTION 383where √ ΛQ T pQ p√Λ ∆ = Λ = √ Λ √ Λ and where Q p ∈ R n×N−1 is unknown asis its dimension n . Rotation/reflection is accounted for by Q p yet only itsfirst r columns are necessarily orthonormal. 5.51 Assuming membership tothe unit simplex y ∈ S (920), then point p = X √ 2V N y = Q p√ΛQ T y in R nbelongs to the translated polyhedronP − x 1 (964)whose generating list constitutes the columns of (863)[ √ ] [ √ ]0 X 2VN = 0 Q p ΛQT∈ R n×N= [0 x 2 −x 1 x 3 −x 1 · · · x N −x 1 ](965)The scaled auxiliary matrix V N represents that translation. A simple choicefor Q p has n set to N − 1; id est, Q p = I . Ideally, each member of thegenerating list has at most r nonzero entries; r being, affine dimensionrankV T NDV N = rankQ p√ΛQ T = rank Λ = r (966)Each member then has at least N −1 − r zeros in its higher-dimensionalcoordinates because r ≤ N −1. (875) To truncate those zeros, choose nequal to affine dimension which is the smallest n possible because XV N hasrank r ≤ n (871). 5.52 In that case, the simplest choice for Q p is [ I 0 ]having dimensions r ×N −1.We may wish to verify the list (965) found from the diagonalization of−VN TDV N . Because of rotation/reflection and translation invariance (5.5),EDM D can be uniquely made from that list by calculating: (734)5.51 Recall r signifies affine dimension. Q p is not necessarily an orthogonal matrix. Q p isconstrained such that only its first r columns are necessarily orthonormal because thereare only r nonzero eigenvalues in Λ when −VN TDV N has rank r (5.7.1.1). Remainingcolumns of Q p are arbitrary.⎡⎤⎡q T1 ⎤λ 1 0. .. 5.52 If we write Q T = ⎣. .. ⎦ as rowwise eigenvectors, Λ = ⎢ λr ⎥qN−1T ⎣ 0 ... ⎦0 0in terms of eigenvalues, and Q p = [ ]q p1 · · · q pN−1 as column vectors, then√Q p Λ Q T ∑= r √λi q pi qiT is a sum of r linearly independent rank-one matrices (B.1.1).i=1Hence the summation has rank r .

384 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXD(X) = D(X[0 √ 2V N ]) = D(Q p [0 √ ΛQ T ]) = D([0 √ ΛQ T ]) (967)This suggests a way to find EDM D given −V T N DV N ; (confer (847))[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TN[ 0 0T− 20 −VN TDV N(754)]5.12.2 0 geometric center. VAlternatively, we may perform reconstruction using instead the auxiliarymatrix V (B.4.1), corresponding to the polyhedronP − α c (968)whose geometric center has been translated to the origin. Redimensioningdiagonalization factors Q, Λ∈R N×N and unknown Q p ∈ R n×N , (870)−V DV = 2V X T XV ∆ = Q √ ΛQ T pQ p√ΛQT ∆ = QΛQ T (969)where the geometrically centered generating list constitutes (confer (965))XV = 1 √2Q p√ΛQ T ∈ R n×N= [x 1 − 1 N X1 x 2 − 1 N X1 x 3 − 1 N X1 · · · x N − 1 N X1 ] (970)where α c = 1 N X1. (5.5.1.0.1) The simplest choice for Q p is [I 0 ]∈ R r×N .Now EDM D can be uniquely made from the list found, by calculating:(734)D(X) = D(XV ) = D( 1 √2Q p√ΛQ T ) = D( √ ΛQ T ) 1 2(971)This EDM is, of course, identical to (967). Similarly to (754), from −V DVwe can find EDM D ; (confer (834))D = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (760)

5.12. LIST RECONSTRUCTION 385(a)(c)(b)(d)(f)(e)Figure 96: Map of United States of America showing some state boundariesand the Great Lakes. All plots made using 5020 connected points. Anydifference in scale in (a) through (d) is an artifact of plotting routine.(a) shows original map made from decimated (latitude, longitude) data.(b) Original map data rotated (freehand) to highlight curvature of Earth.(c) Map isometrically reconstructed from the EDM.(d) Same reconstructed map illustrating curvature.(e)(f) Two views of one isotonic reconstruction; problem (980) with no sortconstraint Πd (and no hidden line removal).

386 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.13 Reconstruction examples5.13.1 Isometric reconstruction5.13.1.0.1 Example. Map of the USA.The most fundamental application of EDMs is to reconstruct relative pointposition given only interpoint distance information. Drawing a map of theUnited States is a good illustration of isometric reconstruction from completedistance data. We obtained latitude and longitude information for the coast,border, states, and Great Lakes from the usalo atlas data file within theMatlab Mapping Toolbox; the conversion to Cartesian coordinates (x,y,z)via:φ ∆ = π/2 − latitudeθ ∆ = longitudex = sin(φ) cos(θ)y = sin(φ) sin(θ)z = cos(φ)(972)We used 64% of the available map data to calculate EDM D from N = 5020points. The original (decimated) data and its isometric reconstruction via(963) are shown in Figure 96(a)-(d). The Matlab code is inF.3.1. Theeigenvalues computed for (961) areλ(−V T NDV N ) = [199.8 152.3 2.465 0 0 0 · · · ] T (973)The 0 eigenvalues have absolute numerical error on the order of 2E-13 ;meaning, the EDM data indicates three dimensions (r = 3) are required forreconstruction to nearly machine precision.5.13.2 Isotonic reconstructionSometimes only comparative information about distance is known (Earth iscloser to the Moon than it is to the Sun). Suppose, for example, the EDMD for three points is unknown:

5.13. RECONSTRUCTION EXAMPLES 387D = [d ij ] =⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0but the comparative data is available:⎤⎦ ∈ S 3 h (723)d 13 ≥ d 23 ≥ d 12 (974)With the vectorization d = [d 12 d 13 d 23 ] T ∈ R 3 , we express the comparativedistance relationship as the nonincreasing sortingΠd =⎡⎣0 1 00 0 11 0 0⎤⎡⎦⎣⎤d 12d 13⎦ =d 23⎡⎣⎤d 13d 23⎦ ∈ K M+ (975)d 12where Π is a given permutation matrix expressing the known sorting actionon the entries of unknown EDM D , and K M+ is the monotone nonnegativecone (2.13.9.4.1)K M+ ∆ = {z | z 1 ≥ z 2 ≥ · · · ≥ z N(N−1)/2 ≥ 0} ⊆ R N(N−1)/2+ (371)where N(N −1)/2 = 3 for the present example.vectorization (975) we create the sort-index matrixgenerally defined⎡O = ⎣0 1 2 3 21 2 0 2 23 2 2 2 0⎤From the sorted⎦ ∈ S 3 h ∩ R 3×3+ (976)O ij ∆ = k 2 | d ij = ( Ξ Πd ) k , j ≠ i (977)where Ξ is a permutation matrix (1533) completely reversing order of vectorentries.Replacing EDM data with indices-square of a nonincreasing sorting likethis is, of course, a heuristic we invented and may be regarded as a nonlinearintroduction of much noise into the Euclidean distance matrix. For largedata sets, this heuristic makes an otherwise intense problem computationallytractable; we see an example in relaxed problem (981).

388 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXλ(−V T N OV N) j90080070060050040030020010001 2 3 4 5 6 7 8 9 10jFigure 97: Largest ten eigenvalues of −VN TOV N for map of USA, sorted bynonincreasing value. In the code (F.3.2), we normalize O by (N(N −1)/2) 2 .Any process of reconstruction that leaves comparative distanceinformation intact is called ordinal multidimensional scaling or isotonicreconstruction. Beyond rotation, reflection, and translation error, (5.5)list reconstruction by isotonic reconstruction is subject to error in absolutescale (dilation) and distance ratio. Yet Borg & Groenen argue: [39,2.2]reconstruction from complete comparative distance information for a largenumber of points is as highly constrained as reconstruction from an EDM;the larger the number, the better.5.13.2.1 Isotonic map of the USATo test Borg & Groenen’s conjecture, suppose we make a complete sort-indexmatrix O ∈ S N h ∩ R N×N+ for the map of the USA and then substitute O in placeof EDM D in the reconstruction process of5.12. Whereas EDM D returnedonly three significant eigenvalues (973), the sort-index matrix O is generallynot an EDM (certainly not an EDM with corresponding affine dimension 3)so returns many more. The eigenvalues, calculated with absolute numericalerror approximately 5E-7, are plotted in Figure 97:λ(−V T N OV N ) = [880.1 463.9 186.1 46.20 17.12 9.625 8.257 1.701 0.7128 0.6460 · · · ] T(978)

5.13. RECONSTRUCTION EXAMPLES 389The extra eigenvalues indicate that affine dimension corresponding to anEDM near O is likely to exceed 3. To realize the map, we must simultaneouslyreduce that dimensionality and find an EDM D closest to O in some sense(a problem explored more in7) while maintaining the known comparativedistance relationship; e.g., given permutation matrix Π expressing theknown sorting action on the entries d of unknown D ∈ S N h , (64)d ∆ = 1 √2dvec D =⎡⎢⎣⎤d 12d 13d 23d 14d 24d 34⎥⎦.d N−1,N∈ R N(N−1)/2 (979)we can make the sort-index matrix O input to the optimization problemminimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(980)Πd ∈ K M+D ∈ EDM Nthat finds the EDM D (corresponding to affine dimension not exceeding 3 inisomorphic dvec EDM N ∩ Π T K M+ ) closest to O in the sense of Schoenberg(753).Analytical solution to this problem, ignoring the sort constraintΠd ∈ K M+ , is known [266]: we get the convex optimization [sic] (7.1)minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(981)D ∈ EDM NOnly the three largest nonnegative eigenvalues in (978) need be retainedto make list (965); the rest are discarded. The reconstruction fromEDM D found in this manner is plotted in Figure 96(e)(f) from which itbecomes obvious that inclusion of the sort constraint is necessary for isotonicreconstruction.

390 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXThat sort constraint demands: any optimal solution D ⋆ must possess theknown comparative distance relationship that produces the original ordinaldistance data O (977). Ignoring the sort constraint, apparently, violates it.Yet even more remarkable is how much the map reconstructed using onlyordinal data still resembles the original map of the USA after suffering themany violations produced by solving relaxed problem (981). This suggeststhe simple reconstruction techniques of5.12 are robust to a significantamount of noise.5.13.2.2 Isotonic solution with sort constraintBecause problems involving rank are generally difficult, we will partition(980) into two problems we know how to solve and then alternate theirsolution until convergence:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM Nminimize ‖σ − Πd‖σsubject to σ ∈ K M+(a)(b)(982)where the sort-index matrix O (a given constant in (a)) becomes an implicitvectorized variable o i solving the i th instance of (982b)o i ∆ = Π T σ ⋆ = 1 √2dvecO i ∈ R N(N−1)/2 , i∈{1, 2, 3...} (983)As mentioned in discussion of relaxed problem (981), a closed-formsolution to problem (982a) exists. Only the first iteration of (982a) sees theoriginal sort-index matrix O whose entries are nonnegative whole numbers;id est, O 0 =O∈ S N h ∩ R N×N+ (977). Subsequent iterations i take the previoussolution of (982b) as inputO i = dvec −1 ( √ 2o i ) ∈ S N (984)real successors to the sort-index matrix O .New problem (982b) finds the unique minimum-distance projection ofΠd on the monotone nonnegative cone K M+ . By definingY †T ∆ = [e 1 − e 2 e 2 −e 3 e 3 −e 4 · · · e m ] ∈ R m×m (372)

5.13. RECONSTRUCTION EXAMPLES 391where m=N(N ∆ −1)/2, we may rewrite (982b) as an equivalent quadraticprogram; a convex optimization problem [46,4] in terms of thehalfspace-description of K M+ :minimize (σ − Πd) T (σ − Πd)σsubject to Y † σ ≽ 0(985)This quadratic program can be converted to a semidefinite program viaSchur-form (3.1.7.2); we get the equivalent problemminimizet∈R , σsubject tot[tI σ − Πd(σ − Πd) T 1]≽ 0(986)Y † σ ≽ 05.13.2.3 ConvergenceInE.10 we discuss convergence of alternating projection on intersectingconvex sets in a Euclidean vector space; convergence to a point in theirintersection. Here the situation is different for two reasons:Firstly, sets of positive semidefinite matrices having an upper bound onrank are generally not convex. Yet in7.1.4.0.1 we prove (982a) is equivalentto a projection of nonincreasingly ordered eigenvalues on a subset of thenonnegative orthant:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM N≡minimize ‖Υ − Λ‖ FΥ [ ]R3+ (987)subject to δ(Υ) ∈0∆where −VN TDV N =UΥU T ∈ S N−1 and −VN TOV N =QΛQ T ∈ S N−1 areordered diagonalizations (A.5). It so happens: optimal orthogonal U ⋆always equals Q given. Linear operator T(A) = U ⋆T AU ⋆ , acting on squarematrix A , is a bijective isometry because the Frobenius norm is orthogonallyinvariant (40). This isometric isomorphism T thus maps a nonconvexproblem to a convex one that preserves distance.∆

392 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXSecondly, the second half (982b) of the alternation takes place in adifferent vector space; S N h (versus S N−1 ). From5.6 we know these twovector spaces are related by an isomorphism, S N−1 =V N (S N h ) (852), but notby an isometry.We have, therefore, no guarantee from theory of alternating projectionthat the alternation (982) converges to a point, in the set of all EDMscorresponding to affine dimension not in excess of 3, belonging todvec EDM N ∩ Π T K M+ .5.13.2.4 InterludeWe have not implemented the second half (985) of alternation (982) forUSA map data because memory-demands exceed the capability of our 32-bitlaptop computer.5.13.2.4.1 Exercise. Convergence of isotonic solution by alternation.Empirically demonstrate convergence, discussed in5.13.2.3, on a smallerdata set.It would be remiss not to mention another method of solution to thisisotonic reconstruction problem: Once again we assume only comparativedistance data like (974) is available. Given known set of indices Iminimize rankV DVD(988)subject to d ij ≤ d kl ≤ d mn ∀(i,j,k,l,m,n)∈ ID ∈ EDM Nthis problem minimizes affine dimension while finding an EDM whoseentries satisfy known comparative relationships. Suitable rank heuristicsare discussed in4.4.1 and7.2.2 that will transform this to a convexoptimization problem.Using contemporary computers, even with a rank heuristic in place of theobjective function, this problem formulation is more difficult to compute thanthe relaxed counterpart problem (981). That is because there exist efficientalgorithms to compute a selected few eigenvalues and eigenvectors from avery large matrix. Regardless, it is important to recognize: the optimalsolution set for this problem (988) is practically always different from theoptimal solution set for its counterpart, problem (980).

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 3935.14 Fifth property of Euclidean metricWe continue now with the question raised in5.3 regarding the necessityfor at least one requirement more than the four properties of the Euclideanmetric (5.2) to certify realizability of a bounded convex polyhedron or toreconstruct a generating list for it from incomplete distance information.There we saw that the four Euclidean metric properties are necessary forD ∈ EDM N in the case N = 3, but become insufficient when cardinality Nexceeds 3 (regardless of affine dimension).5.14.1 RecapitulateIn the particular case N = 3, −V T N DV N ≽ 0 (893) and D ∈ S 3 h are necessaryand sufficient conditions for D to be an EDM. By (895), triangle inequality isthen the only Euclidean condition bounding the necessarily nonnegative d ij ;and those bounds are tight. That means the first four properties of theEuclidean metric are necessary and sufficient conditions for D to be an EDMin the case N = 3 ; for i,j∈{1, 2, 3}√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇔ −V T N DV N ≽ 0D ∈ S 3 h⇔ D ∈ EDM 3(989)Yet those four properties become insufficient when N > 3.5.14.2 Derivation of the fifthCorrespondence between the triangle inequality and the EDM was developedin5.8.2 where a triangle inequality (895a) was revealed within theleading principal 2 ×2 submatrix of −VN TDV N when positive semidefinite.Our choice of the leading principal submatrix was arbitrary; actually,a unique triangle inequality like (797) corresponds to any one of the(N −1)!/(2!(N −1 − 2)!) principal 2 ×2 submatrices. 5.53 Assuming D ∈ S 4 hand −VN TDV N ∈ S 3 , then by the positive (semi)definite principal submatrices5.53 There are fewer principal 2×2 submatrices in −V T N DV N than there are triangles madeby four or more points because there are N!/(3!(N − 3)!) triangles made by point triples.The triangles corresponding to those submatrices all have vertex x 1 . (confer5.8.2.1)

394 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXtheorem (A.3.1.0.4) it is sufficient to prove all d ij are nonnegative, alltriangle inequalities are satisfied, and det(−VN TDV N) is nonnegative. WhenN = 4, in other words, that nonnegative determinant becomes the fifth andlast Euclidean metric requirement for D ∈ EDM N . We now endeavor toascribe geometric meaning to it.5.14.2.1 Nonnegative determinantBy (803) when D ∈EDM 4 , −VN TDV N is equal to inner product (798),⎡√ √ ⎤d 12 d12 d 13 cosθ 213 d12 √d12 √d 14 cosθ 214Θ T Θ = ⎣ d 13 cosθ 213 d 13 d13 √d12d 14 cosθ 314⎦√(990)d 14 cosθ 214 d13 d 14 cosθ 314 d 14Because Euclidean space is an inner-product space, the more conciseinner-product form of the determinant is admitted;det(Θ T Θ) = −d 12 d 13 d 14(cos(θ213 ) 2 +cos(θ 214 ) 2 +cos(θ 314 ) 2 − 2 cos θ 213 cosθ 214 cos θ 314 − 1 )The determinant is nonnegative if and only if(991)cos θ 214 cos θ 314 − √ sin(θ 214 ) 2 sin(θ 314 ) 2 ≤ cos θ 213 ≤ cos θ 214 cos θ 314 + √ sin(θ 214 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 314 − √ sin(θ 213 ) 2 sin(θ 314 ) 2 ≤ cos θ 214 ≤ cos θ 213 cos θ 314 + √ sin(θ 213 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 214 − √ sin(θ 213 ) 2 sin(θ 214 ) 2 ≤ cos θ 314 ≤ cos θ 213 cos θ 214 + √ sin(θ 213 ) 2 sin(θ 214 ) 2which simplifies, for 0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π and all i≠j ≠l ∈{2, 3, 4}, to(992)cos(θ i1l + θ l1j ) ≤ cos θ i1j ≤ cos(θ i1l − θ l1j ) (993)Analogously to triangle inequality (907), the determinant is 0 upon equalityon either side of (993) which is tight. Inequality (993) can be equivalentlywritten linearly as a “triangle inequality”, but between relative angles[303,1.4];|θ i1l − θ l1j | ≤ θ i1j ≤ θ i1l + θ l1jθ i1l + θ l1j + θ i1j ≤ 2π(994)0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 395θ 213-θ 214-θ 314πFigure 98: The relative-angle inequality tetrahedron (995) bounding EDM 4is regular; drawn in entirety. Each angle θ (795) must belong to this solidto be realizable.Generalizing this:5.14.2.1.1 Fifth property of Euclidean metric - restatement.Relative-angle inequality. (confer5.3.1.0.1) [36] [37, p.17, p.107] [172,3.1]Augmenting the four fundamental Euclidean metric properties in R n , for alli,j,l ≠ k ∈{1... N }, i

396 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBecause point labelling is arbitrary, this fifth Euclidean metricrequirement must apply to each of the N points as though each were in turnlabelled x 1 ; hence the new index k in (995). Just as the triangle inequalityis the ultimate test for realizability of only three points, the relative-angleinequality is the ultimate test for only four. For four distinct points, thetriangle inequality remains a necessary although penultimate test; (5.4.3)Four Euclidean metric properties (5.2).Angle θ inequality (728) or (995).⇔ −V T N DV N ≽ 0D ∈ S 4 h⇔ D = D(Θ) ∈ EDM 4The relative-angle inequality, for this case, is illustrated in Figure 98.5.14.2.2 Beyond the fifth metric property(996)When cardinality N exceeds 4, the first four properties of the Euclideanmetric and the relative-angle inequality together become insufficientconditions for realizability. In other words, the four Euclidean metricproperties and relative-angle inequality remain necessary but become asufficient test only for positive semidefiniteness of all the principal 3 × 3submatrices [sic] in −VN TDV N . Relative-angle inequality can be consideredthe ultimate test only for realizability at each vertex x k of each and everypurported tetrahedron constituting a hyperdimensional body.When N = 5 in particular, relative-angle inequality becomes thepenultimate Euclidean metric requirement while nonnegativity of thenunwieldy det(Θ T Θ) corresponds (by the positive (semi)definite principalsubmatrices theorem inA.3.1.0.4) to the sixth and last Euclidean metricrequirement. Together these six tests become necessary and sufficient, andso on.Yet for all values of N , only assuming nonnegative d ij , relative-anglematrix inequality in (909) is necessary and sufficient to certify realizability;(5.4.3.1)Euclidean metric property 1 (5.2).Angle matrix inequality Ω ≽ 0 (804).⇔ −V T N DV N ≽ 0D ∈ S N h⇔ D = D(Ω,d) ∈ EDM N(997)Like matrix criteria (729), (753), and (909), the relative-angle matrixinequality and nonnegativity property subsume all the Euclidean metricproperties and further requirements.

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 3975.14.3 Path not followedAs a means to test for realizability of four or more points, anintuitively appealing way to augment the four Euclidean metric propertiesis to recognize generalizations of the triangle inequality: In thecase N = 4, the three-dimensional analogue to triangle & distance istetrahedron & facet-area, while in the case N = 5 the four-dimensionalanalogue is polychoron & facet-volume, ad infinitum. For N points, N + 1metric properties are required.5.14.3.1 N = 4Each of the four facets of a general tetrahedron is a triangle and itsrelative interior. Suppose we identify each facet of the tetrahedron by itsarea-squared: c 1 , c 2 , c 3 , c 4 . Then analogous to metric property 4, we maywrite a tight 5.54 area inequality for the facets√ci ≤ √ c j + √ c k + √ c l , i≠j ≠k≠l∈{1, 2, 3, 4} (998)which is a generalized “triangle” inequality [167,1.1] that follows from√ci = √ c j cos ϕ ij + √ c k cos ϕ ik + √ c l cos ϕ il (999)[178] [284, Law of Cosines] where ϕ ij is the dihedral angle at the commonedge between triangular facets i and j .If D is the EDM corresponding to the whole tetrahedron, thenarea-squared of the i th triangular facet has a convenient formula in termsof D i ∈ EDM N−1 the EDM corresponding to that particular facet: From theCayley-Menger determinant 5.55 for simplices, [284] [87] [112,4] [60,3.3]the i th facet area-squared for i∈{1... N} is (A.4.1)5.54 The upper bound is met when all angles in (999) are simultaneously 0; that occurs,for example, if one point is relatively interior to the convex hull of the three remaining.5.55 whose foremost characteristic is: the determinant vanishes [ if and ] only if affine0 1Tdimension does not equal penultimate cardinality; id est, det = 0 ⇔ r < N −11 −Dwhere D is any EDM (5.7.3.0.1). Otherwise, the determinant is negative.

398 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXc i ===[ ]−1 0 12 N−2 (N −2)! 2 det T1 −D i(−1) N2 N−2 (N −2)! 2 detD i(1000)(1 T D −1i 1 ) (1001)(−1) N2 N−2 (N −2)! 2 1T cof(D i ) T 1 (1002)where D i is the i th principal N −1×N −1 submatrix 5.56 of D ∈ EDM N ,and cof(D i ) is the N −1×N −1 matrix of cofactors [251,4] correspondingto D i . The number of principal 3 × 3 submatrices in D is, of course, equalto the number of triangular facets in the tetrahedron; four (N!/(3!(N −3)!))when N = 4.5.14.3.1.1 Exercise. Sufficiency conditions for an EDM of four points.Triangle inequality (property 4) and area inequality (998) are conditionsnecessary for D to be an EDM. Prove their sufficiency in conjunction withthe remaining three Euclidean metric properties.5.14.3.2 N = 5Moving to the next level, we might encounter a Euclidean body calledpolychoron, a bounded polyhedron in four dimensions. 5.57 The polychoronhas five (N!/(4!(N −4)!)) facets, each of them a general tetrahedron whosevolume-squared c i is calculated using the same formula; (1000) whereD is the EDM corresponding to the polychoron, and D i is the EDMcorresponding to the i th facet (the principal 4 × 4 submatrix of D ∈ EDM Ncorresponding to the i th tetrahedron). The analogue to triangle & distanceis now polychoron & facet-volume. We could then write another generalized“triangle” inequality like (998) but in terms of facet volume; [289,IV]√ci ≤ √ c j + √ c k + √ c l + √ c m , i≠j ≠k≠l≠m∈{1... 5} (1003)5.56 Every principal submatrix of an EDM remains an EDM. [172,4.1]5.57 The simplest polychoron is called a pentatope [284]; a regular simplex hence convex.(A pentahedron is a three-dimensional body having five vertices.)

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 399.Figure 99: Length of one-dimensional face a equals height h=a=1 of thisnonsimplicial pyramid in R 3 with square base inscribed in a circle of radius Rcentered at the origin. [284, Pyramid]5.14.3.2.1 Exercise. Sufficiency for an EDM of five points.For N = 5, triangle (distance) inequality (5.2), area inequality (998), andvolume inequality (1003) are conditions necessary for D to be an EDM.Prove their sufficiency.5.14.3.3 Volume of simplicesThere is no known formula for the volume of a bounded general convexpolyhedron expressed either by halfspace or vertex-description. [301,2.1][210, p.173] [169] [170] [123] [124] Volume is a concept germane to R 3 ;in higher dimensions it is called content. Applying the EDM assertion(5.9.1.0.3) and a result given in [46,8.3.1], a general nonempty simplex(2.12.3) in R N−1 corresponding to an EDM D ∈ S N h has content√ c = content(S)√det(−V T N DV N) (1004)where the content-squared of the unit simplex S ⊂ R N−1 is proportional toits Cayley-Menger determinant;content(S) 2 =[ ]−1 0 12 N−1 (N −1)! 2 det T1 −D([0 e 1 e 2 · · · e N−1 ])where e i ∈ R N−1 and the EDM operator used is D(X) (734).(1005)

400 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.14.3.3.1 Example. Pyramid.A formula for volume of a pyramid is known; 5.58 it is 1 the product of its3base area with its height. [164] The pyramid in Figure 99 has volume 1 . 3To find its volume using EDMs, we must first decompose the pyramid intosimplicial parts. Slicing it in half along the plane containing the line segmentscorresponding to radius R and height h we find the vertices of one simplex,⎡X = ⎣1/2 1/2 −1/2 01/2 −1/2 −1/2 00 0 0 1⎤⎦∈ R n×N (1006)where N = n + 1 for any nonempty simplex in R n .this simplex is half that of the entire pyramid; id est,evaluating (1004).√The volume ofc =1found by6With that, we conclude digression of path.5.14.4 Affine dimension reduction in three dimensions(confer5.8.4) The determinant of any M ×M matrix is equal to the productof its M eigenvalues. [251,5.1] When N = 4 and det(Θ T Θ) is 0, thatmeans one or more eigenvalues of Θ T Θ∈ R 3×3 are 0. The determinant willgo to 0 whenever equality is attained on either side of (728), (995a), or (995b),meaning that a tetrahedron has collapsed to a lower affine dimension; id est,r = rank Θ T Θ = rank Θ is reduced below N −1 exactly by the number of0 eigenvalues (5.7.1.1).In solving completion problems of any size N where one or more entriesof an EDM are unknown, therefore, dimension r of the affine hull requiredto contain the unknown points is potentially reduced by selecting distancesto attain equality in (728) or (995a) or (995b).5.58 Pyramid volume is independent of the paramount vertex position as long as its heightremains constant.

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 4015.14.4.1 Exemplum reduxWe now apply the fifth Euclidean metric property to an earlier problem:5.14.4.1.1 Example. Small completion problem, IV. (confer5.9.3.0.1)Returning again to Example 5.3.0.0.2 that pertains to Figure 80 whereN =4, distance-square d 14 is ascertainable from the fifth Euclidean metricproperty. Because all distances in (726) are known except √ d 14 , thencos θ 123 =0 and θ 324 =0 result from identity (795). Applying (728),cos(θ 123 + θ 324 ) ≤ cos θ 124 ≤ cos(θ 123 − θ 324 )0 ≤ cos θ 124 ≤ 0(1007)It follows again from (795) that d 14 can only be 2. As explained in thissubsection, affine dimension r cannot exceed N −2 because equality isattained in (1007).

402 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

Chapter 6Cone of distance matricesFor N > 3, the cone of EDMs is no longer a circular cone andthe geometry becomes complicated...−Hayden, Wells, Liu, & Tarazaga (1991) [134,3]In the subspace of symmetric matrices S N , we know the convex cone ofEuclidean distance matrices EDM N (the EDM cone) does not intersect thepositive semidefinite cone S N + (PSD cone) except at the origin, their onlyvertex; there can be no positive nor negative semidefinite EDM. (932) [172]EDM N ∩ S N + = 0 (1008)Even so, the two convex cones can be related. In6.8.1 we prove the newequalityEDM N = S N h ∩ ( )S N⊥c − S N + (1100)2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.403

404 CHAPTER 6. CONE OF DISTANCE MATRICESa resemblance to EDM definition (734) whereS N h∆= { A ∈ S N | δ(A) = 0 } (57)is the symmetric hollow subspace (2.2.3) and whereS N⊥c = {u1 T + 1u T | u∈ R N } (1794)is the orthogonal complement of the geometric center subspace (E.7.2.0.2)S N c∆= {Y ∈ S N | Y 1 = 0} (1792)6.0.1 gravityEquality (1100) is equally important as the known isomorphisms (841) (842)(853) (854) relating the EDM cone EDM N to an N(N −1)/2-dimensionalface of S N + (5.6.1.1), or to S N−1+ (5.6.2.1). 6.1 Those isomorphisms havenever led to this equality (1100) relating the whole cones EDM N and S N + .Equality (1100) is not obvious from the various EDM matrix definitionssuch as (734) or (1026) because inclusion must be proved algebraically inorder to establish equality; EDM N ⊇ S N h ∩ (S N⊥c − S N +). We will insteadprove (1100) using purely geometric methods.6.0.2 highlightIn6.8.1.7 we show: the Schoenberg criterion for discriminating Euclideandistance matricesD ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(753)is a discretized membership relation (2.13.4) between the EDM cone and itsordinary dual.6.1 Because both positive semidefinite cones are frequently in play, dimension is notated.

6.1. DEFINING EDM CONE 4056.1 Defining EDM coneWe invoke a popular matrix criterion to illustrate correspondence between theEDM and PSD cones belonging to the ambient space of symmetric matrices:{−V DV ∈ SND ∈ EDM N+⇔(758)D ∈ S N hwhere V ∈ S N is the geometric centering matrix (B.4). The set of all EDMsof dimension N ×N forms a closed convex cone EDM N because any pair ofEDMs satisfies the definition of a convex cone (145); videlicet, for each andevery ζ 1 , ζ 2 ≥ 0 (A.3.1.0.2)ζ 1 V D 1 V + ζ 2 V D 2 V ≽ 0ζ 1 D 1 + ζ 2 D 2 ∈ S N h⇐ V D 1V ≽ 0, V D 2 V ≽ 0D 1 ∈ S N h , D 2 ∈ S N h(1009)and convex cones are invariant to inverse affine transformation [232, p.22].6.1.0.0.1 Definition. Cone of Euclidean distance matrices.In the subspace of symmetric matrices, the set of all Euclidean distancematrices forms a unique immutable pointed closed convex cone called theEDM cone: for N > 0EDM N = ∆ { }D ∈ S N h | −V DV ∈ S N += ⋂ {D ∈ S N | 〈zz T , −D〉≥0, δ(D)=0 } (1010)z∈N(1 T )The EDM cone in isomorphic R N(N+1)/2 [sic] is the intersection of an infinitenumber (when N >2) of halfspaces about the origin and a finite numberof hyperplanes through the origin in vectorized variable D = [d ij ] . HenceEDM N has empty interior with respect to S N because it is confined to thesymmetric hollow subspace S N h . The EDM cone relative interior comprisesrel int EDM N = ⋂ {D ∈ S N | 〈zz T , −D〉>0, δ(D)=0 }z∈N(1 T )= { D ∈ EDM N | rank(V DV ) = N −1 } (1011)while its relative boundary comprisesrel∂EDM N = { D ∈ EDM N | 〈zz T , −D〉 = 0 for some z ∈ N(1 T ) }= { D ∈ EDM N | rank(V DV ) < N −1 } (1012)△

406 CHAPTER 6. CONE OF DISTANCE MATRICESdvec rel∂EDM 3d 0 13 0.20.2d 120.40.40.60.60.80.8111d 230.80.60.4d 23(a)0.20(d)d 12d 130.80.8√d23 0.60.6√ √ 0 0 d13 0.20.2 d120.20.20.40.4 0.40.40.60.6 0.60.60.80.8 0.80.811 11110.40.40.20.2(b)00(c)Figure 100: Relative boundary (tiled) of EDM cone EDM 3 drawn truncatedin isometrically isomorphic subspace R 3 . (a) EDM cone drawn in usualdistance-square coordinates d ij . View is from interior toward origin. Unlikepositive semidefinite cone, EDM cone is not self-dual, neither is it properin ambient symmetric subspace (dual EDM cone for this example belongsto isomorphic R 6 ). (b) Drawn in its natural coordinates √ d ij (absolutedistance), cone remains convex (confer5.10); intersection of three halfspaces(896) whose partial boundaries each contain origin. Cone geometry becomes“complicated” (nonpolyhedral) in higher dimension. [134,3] (c) Twocoordinate systems artificially superimposed. Coordinate transformationfrom d ij to √ d ij appears a topological contraction. (d) Sitting onits vertex 0, pointed EDM 3 is a circular cone having axis of revolutiondvec(−E)= dvec(11 T − I) (928) (64). Rounded vertex is plot artifact.

6.2. POLYHEDRAL BOUNDS 407This cone is more easily visualized in the isomorphic vector subspaceR N(N−1)/2 corresponding to S N h :In the case N = 1 point, the EDM cone is the origin in R 0 .In the case N = 2, the EDM cone is the nonnegative real line in R ; ahalfline in a subspace of the realization in Figure 108.The EDM cone in the case N = 3 is a circular cone in R 3 illustrated inFigure 100(a)(d); rather, the set of all matrices⎡D = ⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎦ ∈ EDM 3 (1013)makes a circular cone in this dimension. In this case, the first four Euclideanmetric properties are necessary and sufficient tests to certify realizabilityof triangles; (989). Thus triangle inequality property 4 describes threehalfspaces (896) whose intersection makes a polyhedral cone in R 3 ofrealizable √ d ij (absolute distance); an isomorphic subspace representationof the set of all EDMs D in the natural coordinates⎡ √ √ ⎤0 d12 d13◦√D =∆ √d12 √⎣ 0 d23 ⎦√d13 √(1014)d23 0illustrated in Figure 100(b).6.2 Polyhedral boundsThe convex cone of EDMs is nonpolyhedral in d ij for N > 2 ; e.g.,Figure 100(a). Still we found necessary and sufficient bounding polyhedralrelations consistent with EDM cones for cardinality N = 1, 2, 3, 4:N = 3. Transforming distance-square coordinates d ij by taking their positivesquare root provides polyhedral cone in Figure 100(b); polyhedralbecause an intersection of three halfspaces in natural coordinates√dij is provided by triangle inequalities (896). This polyhedralcone implicitly encompasses necessary and sufficient metric properties:nonnegativity, self-distance, symmetry, and triangle inequality.

408 CHAPTER 6. CONE OF DISTANCE MATRICES(b)(c)dvec rel∂EDM 3(a)∂H0Figure 101: (a) In isometrically isomorphic subspace R 3 , intersection ofEDM 3 with hyperplane ∂H representing one fixed symmetric entry d 23 =κ(both drawn truncated, rounded vertex is artifact of plot). EDMs in thisdimension corresponding to affine dimension 1 comprise relative boundary ofEDM cone (6.6). Since intersection illustrated includes a nontrivial subsetof cone’s relative boundary, then it is apparent there exist infinitely manyEDM completions corresponding to affine dimension 1. In this dimension it isimpossible to represent a unique nonzero completion corresponding to affinedimension 1, for example, using a single hyperplane because any hyperplanesupporting relative boundary at a particular point Γ contains an entire ray{ζΓ | ζ ≥0} belonging to rel∂EDM 3 by Lemma 2.8.0.0.1. (b) d 13 =κ.(c) d 12 =κ.

6.3.√EDM CONE IS NOT CONVEX 409N = 4. Relative-angle inequality (995) together with four Euclidean metricproperties are necessary and sufficient tests for realizability oftetrahedra. (996) Albeit relative angles θ ikj (795) are nonlinearfunctions of the d ij , relative-angle inequality provides a regulartetrahedron in R 3 [sic] (Figure 98) bounding angles θ ikj at vertex x kconsistently with EDM 4 . 6.2Yet were we to employ the procedure outlined in5.14.3 for makinggeneralized triangle inequalities, then we would find all the necessary andsufficient d ij -transformations for generating bounding polyhedra consistentwith EDMs of any higher dimension (N > 3).6.3 √ EDM cone is not convexFor some applications, like a molecular conformation problem (Figure 3,Figure 90) or multidimensional scaling, [68] [269] absolute distance √ d ijis the preferred variable. Taking square root of the entries in all EDMs Dof dimension N , we get another cone but not a convex cone when N > 3(Figure 100(b)): [61,4.5.2]√EDM N ∆ = { ◦√ D | D ∈ EDM N } (1015)where ◦√ D is defined as in (1014). It is a cone simply because any coneis completely constituted by rays emanating from the origin: (2.7) Anygiven ray {ζΓ∈ R N(N−1)/2 | ζ ≥0} remains a ray under entrywise square root:{ √ ζΓ∈ R N(N−1)/2 | ζ ≥0}. Because of how √ EDM N is defined, it is obviousthat (confer5.10)D ∈ EDM N ⇔ ◦√ √D ∈ EDM N (1016)Were √ EDM N convex, then given◦ √ D 1 , ◦√ D 2 ∈ √ EDM N we wouldexpect their conic combination ◦√ D 1 + ◦√ D 2 to be a member of √ EDM N .That is easily proven false by counter-example via (1016), for then( ◦√ D 1 + ◦√ D 2 ) ◦( ◦√ D 1 + ◦√ D 2 ) would need to be a member of EDM N .Notwithstanding, in7.2.1 we learn how to transform a nonconvexproximity problem in the natural coordinates √ d ij to a convex optimization.6.2 Still, property-4 triangle inequalities (896) corresponding to each principal 3 ×3submatrix of −VN TDV N demand that the corresponding √ d ij belong to a polyhedralcone like that in Figure 100(b).

410 CHAPTER 6. CONE OF DISTANCE MATRICES(a)2 nearest neighbors(b)3 nearest neighborsFigure 102: Neighborhood graph (dashed) with dimensionless EDM subgraphcompletion (solid) superimposed (but not covering dashed). Local viewof a few dense samples from relative interior of some arbitraryEuclidean manifold whose affine dimension appears two-dimensional in thisneighborhood. All line segments measure absolute distance. Dashed linesegments help visually locate nearest neighbors; suggesting, best number ofnearest neighbors can be greater than value of embedding dimension aftertopological transformation. (confer [156,2]) Solid line segments representcompletion of EDM subgraph from available distance data for an arbitrarilychosen sample and its nearest neighbors. Each distance from EDM subgraphbecomes distance-square in corresponding EDM submatrix.

6.4. A GEOMETRY OF COMPLETION 4116.4 a geometry of completionIntriguing is the question of whether the list in X ∈ R n×N (66) maybe reconstructed given an incomplete noiseless EDM, and under whatcircumstances reconstruction is unique. [1] [2] [3] [4] [6] [15] [153] [159] [171][172] [173]If one or more entries of a particular EDM are fixed, then geometricinterpretation of the feasible set of completions is the intersection of the EDMcone EDM N in isomorphic subspace R N(N−1)/2 with as many hyperplanesas there are fixed symmetric entries. (Depicted in Figure 101(a) is anintersection of the EDM cone EDM 3 with a single hyperplane representingthe set of all EDMs having one fixed symmetric entry.) Assuming anonempty intersection, then the number of completions is generally infinite,and those corresponding to particular affine dimension r

412 CHAPTER 6. CONE OF DISTANCE MATRICESthe graph. The dimensionless EDM subgraph between each sample andits nearest neighbors is completed from available data and included asinput; one such EDM subgraph completion is drawn superimposed upon theneighborhood graph in Figure 102. 6.3 The consequent dimensionless EDMgraph comprising all the subgraphs is incomplete, in general, because theneighbor number is relatively small; incomplete even though it is a supersetof the neighborhood graph. Remaining distances (those not graphed at all)are squared then made variables within the algorithm; it is this variabilitythat admits unfurling.To demonstrate, consider untying the trefoil knot drawn in Figure 103(a).A corresponding Euclidean distance matrix D = [d ij , i,j=1... N]employing only 2 nearest neighbors is banded having the incomplete form⎡D =⎢⎣0 ď 12 ď 13 ? · · · ? ď 1,N−1 ď 1Nď 12 0 ď 23 ď 24... ? ? ď 2Nď 13 ď 23 0 ď 34... ? ? ?? ď 24 ď 34 0...... ? ?................... ?? ? ?...... 0 ď N−2,N−1 ď N−2,Nď 1,N−1 ? ? ?... ď N−2,N−1 0 ď N−1,Nď 1N ď 2N ? ? ? ď N−2,N ď N−1,N 0(1017)where ďij denotes a given fixed distance-square. The unfurling algorithmcan be expressed as an optimization problem; constrained distance-squaremaximization:maximize 〈−V , D〉Dsubject to 〈D , e i e T j + e j e T i 〉 1 = 2 ďij ∀(i,j)∈ I(1018)rank(V DV ) = 2D ∈ EDM N⎤⎥⎦6.3 Local reconstruction of point position from the EDM submatrix corresponding to acomplete dimensionless EDM subgraph is unique to within an isometry (5.6,5.12).

6.4. A GEOMETRY OF COMPLETION 413(a)(b)Figure 103: (a) Trefoil knot in R 3 from Weinberger & Saul [283].(b) Topological transformation algorithm employing 4 nearest neighbors andN = 539 samples reduces affine dimension of knot to r=2. Choosing instead2 nearest neighbors would make this embedding more circular.

414 CHAPTER 6. CONE OF DISTANCE MATRICESwhere e i ∈ R N is the i th member of the standard basis, where set I indexesthe given distance-square data like that in (1017), where V ∈ R N×N is thegeometric centering matrix (B.4.1), and where〈−V , D〉 = tr(−V DV ) = 2 trG = 1 N∑d ij (759)i,jwhere G is the Gram matrix producing D assuming G1 = 0.If the (rank) constraint on affine dimension is ignored, then problem(1018) becomes convex, a corresponding solution D ⋆ can be found, anda nearest rank-2 solution is then had by ordered eigen decomposition of−V D ⋆ V followed by spectral projection (7.1.3) on[R2+0]⊂ R N .Thistwo-step process is necessarily suboptimal. Yet because the decompositionfor the trefoil knot reveals only two dominant eigenvalues, the spectralprojection is nearly benign. Such a reconstruction of point position (5.12)utilizing 4 nearest neighbors is drawn in Figure 103(b); a low-dimensionalembedding of the trefoil knot.This problem (1018) can, of course, be written equivalently in terms ofGram matrix G , facilitated by (765); videlicet, for Φ ij as in (732)maximize 〈I , G〉G∈S N csubject to 〈G , Φ ij 〉 = ďijrankG = 2G ≽ 0∀(i,j)∈ I(1019)The advantage to converting EDM to Gram is: Gram matrix G is a bridgebetween point list X and EDM D ; constraints on any or all of thesethree variables may now be introduced. (Example 5.4.2.2.4) Confinementto the geometric center subspace S N c (implicit constraint G1 = 0) keeps Gindependent of its translation-invariant subspace S N⊥c (5.5.1.1, Figure 110)so as not to become numerically unbounded.

6.4. A GEOMETRY OF COMPLETION 415Figure 104: Trefoil ribbon; courtesy, Kilian Weinberger. Same topologicaltransformation algorithm with 5 nearest neighbors and N = 1617 samples.

416 CHAPTER 6. CONE OF DISTANCE MATRICES6.5 EDM definition in 11 TAny EDM D corresponding to affine dimension r has representationD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (1020)where R(V X ∈ R N×r )⊆ N(1 T ) = 1 ⊥ ,VX T V X = δ 2 (VX T V X ) and V X is full-rank with orthogonal columns.(1021)Equation (1020) is simply the standard EDM definition (734) with acentered list X as in (816); Gram matrix X T X has been replaced withthe subcompact singular value decomposition (A.6.2) 6.4V X V T X ≡ V T X T XV ∈ S N c ∩ S N + (1022)This means: inner product VX TV X is an r×r diagonal matrix Σ of nonzerosingular values.Vector δ(V X VX T ) may me decomposed into complementary parts byprojecting it on orthogonal subspaces 1 ⊥ and R(1) : namely,P 1 ⊥(δ(VX V T X ) ) = V δ(V X V T X ) (1023)P 1(δ(VX V T X ) ) = 1 N 11T δ(V X V T X ) (1024)Of courseδ(V X V T X ) = V δ(V X V T X ) + 1 N 11T δ(V X V T X ) (1025)by (757). Substituting this into EDM definition (1020), we get theHayden, Wells, Liu, & Tarazaga EDM formula [134,2]whereD(V X , y) ∆ = y1 T + 1y T + λ N 11T − 2V X V T X ∈ EDM N (1026)λ ∆ = 2‖V X ‖ 2 F = 1 T δ(V X V T X )2 and y ∆ = δ(V X V T X ) − λ2N 1 = V δ(V XV T X )∆(1027)6.4 Subcompact SVD: V X VXT = Q √ Σ √ ΣQ T ≡ V T X T XV . So VX T is not necessarily XV(5.5.1.0.1), although affine dimension r = rank(VX T ) = rank(XV ). (866)

6.5. EDM DEFINITION IN 11 T 417and y=0 if and only if 1 is an eigenvector of EDM D . Scalar λ becomesan eigenvalue when corresponding eigenvector 1 exists. 6.5Then the particular dyad sum from (1026)y1 T + 1y T + λ N 11T ∈ S N⊥c (1028)must belong to the orthogonal complement of the geometric center subspace(p.626), whereas V X V T X ∈ SN c ∩ S N + (1022) belongs to the positive semidefinitecone in the geometric center subspace.Proof. We validate eigenvector 1 and eigenvalue λ .(⇒) Suppose 1 is an eigenvector of EDM D . Then becauseit followsV T X 1 = 0 (1029)D1 = δ(V X V T X )1T 1 + 1δ(V X V T X )T 1 = N δ(V X V T X ) + ‖V X ‖ 2 F 1⇒ δ(V X V T X ) ∝ 1 (1030)For some κ∈ R +δ(V X V T X ) T 1 = N κ = tr(V T X V X ) = ‖V X ‖ 2 F ⇒ δ(V X V T X ) = 1 N ‖V X ‖ 2 F1 (1031)so y=0.(⇐) Now suppose δ(V X VX T)= λ 1 ; id est, y=0. Then2ND = λ N 11T − 2V X V T X ∈ EDM N (1032)1 is an eigenvector with corresponding eigenvalue λ . 6.5.1 Range of EDM DFromB.1.1 pertaining to linear independence of dyad sums: If the transposehalves of all the dyads in the sum (1020) 6.6 make a linearly independent set,6.5 e.g., when X = I in EDM definition (734).6.6 Identifying columns V X∆= [v1 · · · v r ] , then V X V T X = ∑ iv i v T iis also a sum of dyads.

418 CHAPTER 6. CONE OF DISTANCE MATRICESN(1 T )δ(V X V T X )1V XFigure 105: Example of V X selection to make an EDM corresponding tocardinality N = 3 and affine dimension r = 1 ; V X is a vector in nullspaceN(1 T )⊂ R 3 . Nullspace of 1 T is hyperplane in R 3 (drawn truncated) havingnormal 1. Vector δ(V X VX T) may or may not be in plane spanned by {1, V X } ,but belongs to nonnegative orthant which is strictly supported by N(1 T ).

6.5. EDM DEFINITION IN 11 T 419then the nontranspose halves constitute a basis for the range of EDM D .Saying this mathematically: For D ∈ EDM NR(D)= R([δ(V X V T X ) 1 V X ]) ⇐ rank([δ(V X V T X ) 1 V X ])= 2 + rR(D)= R([1 V X ]) ⇐ otherwise (1033)To prove this, we need that condition under which the rank equality issatisfied: We know R(V X )⊥1, but what is the relative geometric orientationof δ(V X VX T) ? δ(V X V X T)≽0 because V X V X T ≽0, and δ(V X V X T )∝1 remainspossible (1030); this means δ(V X VX T) /∈ N(1T ) simply because it has nonegative entries. (Figure 105) If the projection of δ(V X VX T) on N(1T ) doesnot belong to R(V X ), then that is a necessary and sufficient condition forlinear independence (l.i.) of δ(V X VX T) with respect to R([1 V X ]) ; id est,V δ(V X V T X ) ≠ V X a for any a∈ R r(I − 1 N 11T )δ(V X V T X ) ≠ V X aδ(V X V T X ) − 1 N ‖V X ‖ 2 F 1 ≠ V X aδ(V X V T X ) − λ2N 1 = y ≠ V X a ⇔ {1, δ(V X V T X ), V X } is l.i.(1034)On the other hand when this condition is violated (when (1027) y=V X a pfor some particular a∈ R r ), then from (1026) we haveR ( D = y1 T + 1y T + λ N 11T − 2V X V T X)= R((VX a p + λ N 1)1T + (1a T p − 2V X )V T X= R([V X a p + λ N 1 1aT p − 2V X ])= R([1 V X ]) (1035)An example of such a violation is (1032) where, in particular, a p = 0. Then a statement parallel to (1033) is, for D ∈ EDM N (Theorem 5.7.3.0.1)rank(D) = r + 2 ⇔ y /∈ R(V X ) ( ⇔ 1 T D † 1 = 0 )rank(D) = r + 1 ⇔ y ∈ R(V X ) ( ⇔ 1 T D † 1 ≠ 0 ) (1036))

420 CHAPTER 6. CONE OF DISTANCE MATRICES10(a)-110-1V X ∈ R 3×1տ-101←−dvecD(V X ) ⊂ dvec rel ∂EDM 3(b)Figure 106: (a) Vector V X from Figure 105 spirals in N(1 T )⊂ R 3decaying toward origin. (Spiral is two-dimensional in vector space R 3 .)(b) Corresponding trajectory D(V X ) on EDM cone relative boundary createsa vortex also decaying toward origin. There are two complete orbits on EDMcone boundary about axis of revolution for every single revolution of V Xabout origin. (Vortex is three-dimensional in isometrically isomorphic R 3 .)

6.5. EDM DEFINITION IN 11 T 4216.5.2 Boundary constituents of EDM coneExpression (1020) has utility in forming the set of all EDMs correspondingto affine dimension r :{D ∈ EDM N | rank(V DV )= r }= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ N(1 T ) }whereas {D ∈ EDM N | rank(V DV )≤ r} is the closure of this same set;(1037){D ∈ EDM N | rank(V DV )≤ r } = { D ∈ EDM N | rank(V DV )= r } (1038)For example,rel ∂EDM N = { D ∈ EDM N | rank(V DV )< N −1 }= N−2⋃ {D ∈ EDM N | rank(V DV )= r } (1039)r=0None of these are necessarily convex sets, althoughEDM N = N−1 ⋃ {D ∈ EDM N | rank(V DV )= r }r=0= { D ∈ EDM N | rank(V DV )= N −1 }rel int EDM N = { D ∈ EDM N | rank(V DV )= N −1 } (1040)are pointed convex cones.When cardinality N = 3 and affine dimension r = 2, for example, therelative interior rel int EDM 3 is realized via (1037). (6.6)When N = 3 and r = 1, the relative boundary of the EDM conedvec rel∂EDM 3 is realized in isomorphic R 3 as in Figure 100(d). This figurecould be constructed via (1038) by spiraling vector V X tightly about theorigin in N(1 T ) ; as can be imagined with aid of Figure 105. Vectors closeto the origin in N(1 T ) are correspondingly close to the origin in EDM N .As vector V X orbits the origin in N(1 T ) , the corresponding EDM orbitsthe axis of revolution while remaining on the boundary of the circular conedvec rel∂EDM 3 . (Figure 106)

422 CHAPTER 6. CONE OF DISTANCE MATRICES6.5.3 Faces of EDM cone6.5.3.0.1 Exercise. Isomorphic faces.Prove that in high cardinality N , any set of EDMs made via (1037) or (1038)with particular affine dimension r is isomorphic with any set admitting thesame affine dimension but made in lower cardinality.6.5.3.1 Extreme direction of EDM coneIn particular, extreme directions (2.8.1) of EDM N correspond to affinedimension r = 1 and are simply represented: for any particular cardinalityN ≥ 2 (2.8.2) and each and every nonzero vector z in N(1 T )Γ ∆ = (z ◦ z)1 T + 1(z ◦ z) T − 2zz T ∈ EDM N= δ(zz T )1 T + 1δ(zz T ) T − 2zz T (1041)is an extreme direction corresponding to a one-dimensional face of the EDMcone EDM N that is a ray in isomorphic subspace R N(N−1)/2 .Proving this would exercise the fundamental definition (156) of extremedirection. Here is a sketch: Any EDM may be representedD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (1020)where matrix V X (1021) has orthogonal columns. For the same reason (1298)that zz T is an extreme direction of the positive semidefinite cone (2.9.2.4)for any particular nonzero vector z , there is no conic combination of distinctEDMs (each conically independent of Γ) equal to Γ .6.5.3.1.1 Example. Biorthogonal expansion of an EDM.(confer2.13.7.1.1) When matrix D belongs to the EDM cone, nonnegativecoordinates for biorthogonal expansion are the eigenvalues λ∈ R N of−V DV 1 : For any D ∈ 2 SN h it holds

6.5. EDM DEFINITION IN 11 T 423D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(833)By diagonalization −V DV 1 2∆= QΛQ T ∈ S N c (A.5.2) we may write( N) (∑N)∑ T∑D = δ λ i q i qiT 1 T + 1δ λ i q i qiT − 2 N λ i q i qiTi=1i=1i=1∑= N ( )λ i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1(1042)where q i is the i th eigenvector of −V DV 1 arranged columnar in orthogonal2matrixQ = [q 1 q 2 · · · q N ] ∈ R N×N (342)and where {δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i , i=1... N} are extremedirections of some pointed polyhedral cone K ⊂ S N h and extreme directionsof EDM N . Invertibility of (1042)−V DV 1 = −V ∑ N (λ2 i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1∑= N λ i q i qiTi=1)V12(1043)implies linear independence of those extreme directions.biorthogonal expansion is expresseddvec D = Y Y † dvec D = Y λ ( )−V DV 1 2Then the(1044)whereY = ∆ [ dvec ( ) ]δ(q i qi T )1 T + 1δ(q i qi T ) T − 2q i qiT , i = 1... N ∈ RN(N−1)/2×N(1045)When D belongs to the EDM cone in the subspace of symmetric hollowmatrices, unique coordinates Y † dvecD for this biorthogonal expansionmust be the nonnegative eigenvalues λ of −V DV 1 . This means D2simultaneously belongs to the EDM cone and to the pointed polyhedral conedvec K = cone(Y ).

424 CHAPTER 6. CONE OF DISTANCE MATRICES6.5.3.2 Smallest faceNow suppose we are given a particular EDM D(V Xp )∈ EDM N correspondingto affine dimension r and parametrized by V Xp in (1020). The EDM cone’ssmallest face that contains D(V Xp ) isF ( EDM N ∋ D(V Xp ) )= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ R(V Xp ) }(1046)which is isomorphic 6.7 with the convex cone EDM r+1 , hence of dimensiondim F ( EDM N ∋ D(V Xp ) ) = (r + 1)r/2 (1047)in isomorphic R N(N−1)/2 . Not all dimensions are represented; e.g., the EDMcone has no two-dimensional faces.When cardinality N = 4 and affine dimension r=2 so that R(V Xp ) is anytwo-dimensional subspace of three-dimensional N(1 T ) in R 4 , for example,then the corresponding face of EDM 4 is isometrically isomorphic with: (1038)EDM 3 = {D ∈ EDM 3 | rank(V DV )≤ 2} ≃ F(EDM 4 ∋ D(V Xp )) (1048)Each two-dimensional subspace of N(1 T ) corresponds to anotherthree-dimensional face.Because each and every principal submatrix of an EDM in EDM N(5.14.3) is another EDM [172,4.1], for example, then each principalsubmatrix belongs to a particular face of EDM N .6.5.3.3 Open questionThis result (1047) is analogous to that for the positive semidefinite cone,although the question remains open whether all faces of EDM N (whosedimension is less than the dimension of the cone) are exposed like they arefor the positive semidefinite cone. (2.9.2.3) [263]6.7 The fact that the smallest face is isomorphic with another (perhaps smaller) EDMcone is implicit in [134,2].

6.6. CORRESPONDENCE TO PSD CONE S N−1+ 4256.6 Correspondence to PSD cone S N−1+Hayden & Wells et alii [134,2] assert one-to-one correspondence of EDMswith positive semidefinite matrices in the symmetric subspace. Becauserank(V DV )≤N−1 (5.7.1.1), that positive semidefinite cone correspondingto the EDM cone can only be S N−1+ . [6,18.2.1] To clearly demonstrate thiscorrespondence, we invoke inner-product form EDM definition[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(851)Φ ≽ 0Then the EDM cone may be expressedEDM N = { D(Φ) | Φ ∈ S N−1+}(1049)Hayden & Wells’ assertion can therefore be equivalently stated in terms ofan inner-product form EDM operatorD(S N−1+ ) = EDM N (853)V N (EDM N ) = S N−1+ (854)identity (854) holding because R(V N )= N(1 T ) (741), linear functions D(Φ)and V N (D)= −VN TDV N (5.6.2.1) being mutually inverse.In terms of affine dimension r , Hayden & Wells claim particularcorrespondence between PSD and EDM cones:r = N −1: Symmetric hollow matrices −D positive definite on N(1 T ) correspondto points relatively interior to the EDM cone.r < N −1: Symmetric hollow matrices −D positive semidefinite on N(1 T ) , where−VN TDV N has at least one 0 eigenvalue, correspond to points on therelative boundary of the EDM cone.r = 1: Symmetric hollow nonnegative matrices rank-one on N(1 T ) correspondto extreme directions (1041) of the EDM cone; id est, for some nonzerovector u (A.3.1.0.7)rankV T N DV N =1D ∈ S N h ∩ R N×N+}⇔{D ∈ EDM N−VTD is an extreme direction ⇔ N DV N ≡ uu TD ∈ S N h(1050)

426 CHAPTER 6. CONE OF DISTANCE MATRICES6.6.0.0.1 Proof. Case r = 1 is easily proved: From the nonnegativitydevelopment in5.8.1, extreme direction (1041), and Schoenberg criterion(753), we need show only sufficiency; id est, proverankV T N DV N =1D ∈ S N h ∩ R N×N+}⇒D ∈ EDM ND is an extreme directionAny symmetric matrix D satisfying the rank condition must have the form,for z,q ∈ R N and nonzero z ∈ N(1 T ) ,because (5.6.2.1, conferE.7.2.0.2)D = ±(1q T + q1 T − 2zz T ) (1051)N(V N (D)) = {1q T + q1 T | q ∈ R N } ⊆ S N (1052)Hollowness demands q = δ(zz T ) while nonnegativity demands choice ofpositive sign in (1051). Matrix D thus takes the form of an extremedirection (1041) of the EDM cone.The foregoing proof is not extensible in rank: An EDMwith corresponding affine dimension r has the general form, for{z i ∈ N(1 T ), i=1... r} an independent set,( r)∑ T ( r)∑ ∑D = 1δ z i zi T + δ z i ziT 1 T − 2 r z i zi T ∈ EDM N (1053)i=1i=1i=1The EDM so defined relies principally on the sum ∑ z i ziT having positivesummand coefficients (⇔ −VN TDV N ≽0) 6.8 . Then it is easy to find asum incorporating negative coefficients while meeting rank, nonnegativity,and symmetric hollowness conditions but not positive semidefiniteness onsubspace R(V N ) ; e.g., from page 376,⎡−V ⎣0 1 11 0 51 5 06.8 (⇐) For a i ∈ R N−1 , let z i =V †TN a i .⎤⎦V 1 2 = z 1z T 1 − z 2 z T 2 (1054)

6.6. CORRESPONDENCE TO PSD CONE S N−1+ 4276.6.0.0.2 Example. ⎡ Extreme ⎤ rays versus rays on the boundary.0 1 4The EDM D = ⎣ 1 0 1 ⎦ is an extreme direction of EDM 3 where[ ]4 1 01u = in (1050). Because −VN 2TDV N has eigenvalues {0, 5}, the raywhose direction is D also lies on[ the relative ] boundary of EDM 3 .0 1In exception, EDM D = κ , for any particular κ > 0, is an1 0extreme direction of EDM 2 but −VN TDV N has only one eigenvalue: {κ}.Because EDM 2 is a ray whose relative boundary (2.6.1.3.1) is the origin,this conventional boundary does not include D which belongs to the relativeinterior in this dimension. (2.7.0.0.1)6.6.1 Gram-form correspondence to S N−1+With respect to D(G)=δ(G)1 T + 1δ(G) T − 2G (746) the linear Gram-formEDM operator, results in5.6.1 provide [1,2.6]EDM N = D ( V(EDM N ) ) ≡ D ( )V N S N−1+ VNT(1055)V N S N−1+ VN T ≡ V ( D ( ))V N S N−1+ VN T = V(EDM N ) = ∆ −V EDM N V 1 = 2 SN c ∩ S N +(1056)a one-to-one correspondence between EDM N and S N−1+ .6.6.2 EDM cone by elliptope(confer5.10.1) Defining the elliptope parametrized by scalar t>0then following Alfakih [7] we haveE N t = S N + ∩ {Φ∈ S N | δ(Φ)=t1} (930)EDM N = cone{11 T − E N 1 } = {t(11 T − E N 1 ) | t ≥ 0} (1057)Identification E N = E N 1 equates the standard elliptope (5.9.1.0.1, Figure 92)to our parametrized elliptope.

428 CHAPTER 6. CONE OF DISTANCE MATRICESdvec rel∂EDM 3dvec(11 T − E 3 )EDM N = cone{11 T − E N } = {t(11 T − E N ) | t ≥ 0} (1057)Figure 107: Three views of translated negated elliptope 11 T − E13(confer Figure 92) shrouded by truncated EDM cone. Fractal on EDMcone relative boundary is numerical artifact belonging to intersection withelliptope relative boundary. The fractal is trying to convey existence of aneighborhood about the origin where the translated elliptope boundary andEDM cone boundary intersect.

6.6. CORRESPONDENCE TO PSD CONE S N−1+ 4296.6.2.0.1 Expository. Define T E (11 T ) to be the tangent cone to theelliptope E at point 11 T ; id est,T E (11 T ) ∆ = {t(E − 11 T ) | t≥0} (1058)The normal cone K ⊥ E (11T ) to the elliptope at 11 T is a closed convex conedefined (E.10.3.2.1, Figure 135)K ⊥ E (11 T ) ∆ = {B | 〈B , Φ − 11 T 〉 ≤ 0, Φ∈ E } (1059)The polar cone of any set K is the closed convex cone (confer (259))K ◦ ∆ = {B | 〈B , A〉≤0, for all A∈ K} (1060)The normal cone is well known to be the polar of the tangent cone,and vice versa; [148,A.5.2.4]K ⊥ E (11 T ) = T E (11 T ) ◦ (1061)K ⊥ E (11 T ) ◦ = T E (11 T ) (1062)From Deza & Laurent [77, p.535] we have the EDM coneEDM = −T E (11 T ) (1063)The polar EDM cone is also expressible in terms of the elliptope. From(1061) we haveEDM ◦ = −K ⊥ E (11 T ) (1064)⋆In5.10.1 we proposed the expression for EDM DD = t11 T − E ∈ EDM N (931)where t∈ R + and E belongs to the parametrized elliptope E N t . We furtherpropose, for any particular t>0Proof. Pending.EDM N = cone{t11 T − E N t } (1065)Relationship of the translated negated elliptope with the EDM cone isillustrated in Figure 107. We speculateEDM N = limt→∞t11 T − E N t (1066)

430 CHAPTER 6. CONE OF DISTANCE MATRICES6.7 Vectorization & projection interpretationInE.7.2.0.2 we learn: −V DV can be interpreted as orthogonal projection[4,2] of vectorized −D ∈ S N h on the subspace of geometrically centeredsymmetric matricesS N c = {G∈ S N | G1 = 0} (1792)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1793)≡ {V N AVN T | A ∈ SN−1 }(824)because elementary auxiliary matrix V is an orthogonal projector (B.4.1).Yet there is another useful projection interpretation:Revising the fundamental matrix criterion for membership to the EDMcone (729), 6.9〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h}⇔ D ∈ EDM N (1067)this is equivalent, of course, to the Schoenberg criterion−VN TDV }N ≽ 0⇔ D ∈ EDM N (753)D ∈ S N hbecause N(11 T ) = R(V N ). When D ∈ EDM N , correspondence (1067)means −z T Dz is proportional to a nonnegative coefficient of orthogonalprojection (E.6.4.2, Figure 109) of −D in isometrically isomorphicR N(N+1)/2 on the range of each and every vectorized (2.2.2.1) symmetricdyad (B.1) in the nullspace of 11 T ; id est, on each and every member ofT ∆ = { svec(zz T ) | z ∈ N(11 T )= R(V N ) } ⊂ svec ∂ S N += { svec(V N υυ T V T N ) | υ ∈ RN−1} (1068)whose dimension isdim T = N(N − 1)/2 (1069)6.9 N(11 T )= N(1 T ) and R(zz T )= R(z)

6.7. VECTORIZATION & PROJECTION INTERPRETATION 431The set of all symmetric dyads {zz T |z∈R N } constitute the extremedirections of the positive semidefinite cone (2.8.1,2.9) S N + , hence lie onits boundary. Yet only those dyads in R(V N ) are included in the test (1067),thus only a subset T of all vectorized extreme directions of S N + is observed.In the particularly simple case D ∈ EDM 2 = {D ∈ S 2 h | d 12 ≥ 0} , forexample, only one extreme direction of the PSD cone is involved:zz T =[1 −1−1 1](1070)Any nonnegative scaling of vectorized zz T belongs to the set T illustratedin Figure 108 and Figure 109.6.7.1 Face of PSD cone S N + containing VIn any case, set T (1068) constitutes the vectorized extreme directions ofan N(N −1)/2-dimensional face of the PSD cone S N + containing auxiliarymatrix V ; a face isomorphic with S N−1+ = S rank V+ (2.9.2.3).To show this, we must first find the smallest face that contains auxiliarymatrix V and then determine its extreme directions. From (191),F ( S N + ∋V ) = {W ∈ S N + | N(W) ⊇ N(V )} = {W ∈ S N + | N(W) ⊇ 1}= {V Y V ≽ 0 | Y ∈ S N } ≡ {V N BV T N | B ∈ SN−1 + }≃ S rank V+ = −V T N EDMN V N (1071)where the equivalence ≡ is from5.6.1 while isomorphic equality ≃ withtransformed EDM cone is from (854). Projector V belongs to F ( S N + ∋V )because V N V † N V †TN V N T = V . (B.4.3) Each and every rank-one matrixbelonging to this face is therefore of the form:V N υυ T V T N | υ ∈ R N−1 (1072)Because F ( S N + ∋V ) is isomorphic with a positive semidefinite cone S N−1+ ,then T constitutes the vectorized extreme directions of F , the originconstitutes the extreme points of F , and auxiliary matrix V is some convexcombination of those extreme points and directions by the extremes theorem(2.8.1.1.1).

432 CHAPTER 6. CONE OF DISTANCE MATRICESsvec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec EDM 20−Td 11√2d12T ∆ ={[ 1svec(zz T ) | z ∈ N(11 T )= κ−1] }, κ∈ R ⊂ svec ∂ S 2 +Figure 108: Truncated boundary of positive semidefinite cone S 2 + inisometrically isomorphic R 3 (via svec (47)) is, in this dimension, constitutedsolely by its extreme directions. Truncated cone of Euclidean distancematrices EDM 2 in isometrically isomorphic subspace R . Relativeboundary of EDM cone is constituted solely by matrix 0. HalflineT = {κ 2 [ 1 − √ 2 1 ] T | κ∈ R} on PSD cone boundary depicts that loneextreme ray (1070) on which orthogonal projection of −D must be positivesemidefinite if D is to belong to EDM 2 . aff cone T = svec S 2 c . (1075) DualEDM cone is halfspace in R 3 whose bounding hyperplane has inward-normalsvec EDM 2 .

6.7. VECTORIZATION & PROJECTION INTERPRETATION 433svec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec S 2 h−D0D−Td 11√2d12Projection of vectorized −D on range of vectorized zz T :P svec zz T(svec(−D)) = 〈zzT , −D〉〈zz T , zz T 〉 zzTD ∈ EDM N⇔{〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h(1067)Figure 109: Given-matrix D is assumed to belong to symmetric hollowsubspace S 2 h ; a line in this dimension. Negating D , we find its polar alongS 2 h . Set T (1068) has only one ray member in this dimension; not orthogonalto S 2 h . Orthogonal projection of −D on T (indicated by half dot) hasnonnegative projection coefficient. Matrix D must therefore be an EDM.

434 CHAPTER 6. CONE OF DISTANCE MATRICESIn fact, the smallest face that contains auxiliary matrix V of the PSDcone S N + is the intersection with the geometric center subspace (1792) (1793);F ( S N + ∋V ) = cone { V N υυ T V T N= S N c ∩ S N +In isometrically isomorphic R N(N+1)/2| υ ∈ RN−1}(1073)svec F ( S N + ∋V ) = cone T (1074)related to S N c byaff cone T = svec S N c (1075)6.7.2 EDM criteria in 11 T(confer6.5) Laurent specifies an elliptope trajectory condition for EDM conemembership: [172,2.3]D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0 (925)From the parametrized elliptope E N tD ∈ EDM N ⇔ ∃ t∈ R +E∈ E N tin6.6.2 we propose} D = t11 T − E (1076)Chabrillac & Crouzeix [53,4] prove a different criterion they attributeto Finsler (1937) [97]. We apply it to EDMs: for D ∈ S N h (874)−V T N DV N ≻ 0 ⇔ ∃κ>0 −D + κ11 T ≻ 0⇔D ∈ EDM N with corresponding affine dimension r=N −1(1077)This Finsler criterion has geometric interpretation in terms of thevectorization & projection already discussed in connection with (1067). Withreference to Figure 108, the offset 11 T is simply a direction orthogonal toT in isomorphic R 3 . Intuitively, translation of −D in direction 11 T is likeorthogonal projection on T in so far as similar information can be obtained.

6.8. DUAL EDM CONE 435When the Finsler criterion (1077) is applied despite lower affinedimension, the constant κ can go to infinity making the test −D+κ11 T ≽ 0impractical for numerical computation. Chabrillac & Crouzeix invent acriterion for the semidefinite case, but is no more practical: for D ∈ S N hD ∈ EDM N⇔(1078)∃κ p >0 ∀κ≥κ p , −D − κ11 T [sic] has exactly one negative eigenvalue6.8 Dual EDM cone6.8.1 Ambient S NWe consider finding the ordinary dual EDM cone in ambient space S N whereEDM N is pointed, closed, convex, but has empty interior. The set of all EDMsin S N is a closed convex cone because it is the intersection of halfspaces aboutthe origin in vectorized variable D (2.4.1.1.1,2.7.2):EDM N = ⋂ {D ∈ S N | 〈e i e T i , D〉 ≥ 0, 〈e i e T i , D〉 ≤ 0, 〈zz T , −D〉 ≥ 0 }z∈ N(1 T )i=1...N(1079)By definition (259), dual cone K ∗comprises each and every vectorinward-normal to a hyperplane supporting convex cone K (2.4.2.6.1) orbounding a halfspace containing K . The dual EDM cone in the ambientspace of symmetric matrices is therefore expressible as the aggregate of everyconic combination of inward-normals from (1079):EDM N∗ = cone{e i e T i , −e j e T j | i, j =1... N } − cone{zz T | 11 T zz T =0}∑= { N ∑ζ i e i e T i − N ξ j e j e T j | ζ i ,ξ j ≥ 0} − cone{zz T | 11 T zz T =0}i=1j=1= {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1 , (‖v‖= 1) } ⊂ S N= {δ 2 (Y ) − V N ΨV T N | Y ∈ SN , Ψ∈ S N−1+ } (1080)The EDM cone is not self-dual in ambient S N because its affine hull belongsto a proper subspaceaff EDM N = S N h (1081)

436 CHAPTER 6. CONE OF DISTANCE MATRICESThe ordinary dual EDM cone cannot, therefore, be pointed. (2.13.1.1)When N = 1, the EDM cone is the point at the origin in R . Auxiliarymatrix V N is empty [ ∅ ] , and dual cone EDM ∗ is the real line.When N = 2, the EDM cone is a nonnegative real line in isometricallyisomorphic R 3 ; there S 2 h is a real line containing the EDM cone. Dual coneEDM 2∗ is the particular halfspace in R 3 whose boundary has inward-normalEDM 2 . Diagonal matrices {δ(u)} in (1080) are represented by a hyperplanethrough the origin {d | [ 0 1 0 ]d = 0} while the term cone{V N υυ T V T N }is represented by the halfline T in Figure 108 belonging to the positivesemidefinite cone boundary. The dual EDM cone is formed by translatingthe hyperplane along the negative semidefinite halfline −T ; the union ofeach and every translation. (confer2.10.2.0.1)When cardinality N exceeds 2, the dual EDM cone can no longer bepolyhedral simply because the EDM cone cannot. (2.13.1.1)6.8.1.1 EDM cone and its dual in ambient S NConsider the two convex conessoK ∆ 1 = S N hK ∆ 2 =⋂{A ∈ S N | 〈yy T , −A〉 ≥ 0 }y∈N(1 T )= { A ∈ S N | −z T V AV z ≥ 0 ∀zz T (≽ 0) } (1082)= { A ∈ S N | −V AV ≽ 0 }K 1 ∩ K 2 = EDM N (1083)The dual cone K ∗ 1 = S N⊥h ⊆ S N (63) is the subspace of diagonal matrices.From (1080) via (273),K ∗ 2 = − cone { V N υυ T V T N | υ ∈ R N−1} ⊂ S N (1084)Gaffke & Mathar [99,5.3] observe that projection on K 1 and K 2 havesimple closed forms: Projection on subspace K 1 is easily performed bysymmetrization and zeroing the main diagonal or vice versa, while projectionof H ∈ S N on K 2 isP K2 H = H − P S N+(V H V ) (1085)

6.8. DUAL EDM CONE 437Proof. First, we observe membership of H−P S N+(V H V ) to K 2 because()P S N+(V H V ) − H = P S N+(V H V ) − V H V + (V H V − H) (1086)The term P S N+(V H V ) − V H V necessarily belongs to the (dual) positivesemidefinite cone by Theorem E.9.2.0.1. V 2 = V , hence()−V H −P S N+(V H V ) V ≽ 0 (1087)by Corollary A.3.1.0.5.Next, we requireExpanding,〈P K2 H −H , P K2 H 〉 = 0 (1088)〈−P S N+(V H V ) , H −P S N+(V H V )〉 = 0 (1089)〈P S N+(V H V ) , (P S N+(V H V ) − V H V ) + (V H V − H)〉 = 0 (1090)〈P S N+(V H V ) , (V H V − H)〉 = 0 (1091)Product V H V belongs to the geometric center subspace; (E.7.2.0.2)V H V ∈ S N c = {Y ∈ S N | N(Y )⊇1} (1092)Diagonalize V H V ∆ =QΛQ T (A.5) whose nullspace is spanned bythe eigenvectors corresponding to 0 eigenvalues by Theorem A.7.3.0.1.Projection of V H V on the PSD cone (7.1) simply zeros negative eigenvaluesin diagonal matrix Λ . Thenfrom which it follows:N(P S N+(V H V )) ⊇ N(V H V ) (⊇ N(V )) (1093)P S N+(V H V ) ∈ S N c (1094)so P S N+(V H V ) ⊥ (V H V −H) because V H V −H ∈ S N⊥c .Finally, we must have P K2 H −H =−P S N+(V H V )∈ K ∗ 2 .From6.7.1we know dual cone K ∗ 2 =−F ( S N + ∋V ) is the negative of the positivesemidefinite cone’s smallest face that contains auxiliary matrix V . ThusP S N+(V H V )∈ F ( S N + ∋V ) ⇔ N(P S N+(V H V ))⊇ N(V ) (2.9.2.3) which wasalready established in (1093).

438 CHAPTER 6. CONE OF DISTANCE MATRICESEDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec ∂S 2 +svec S 2 h0Figure 110: Orthogonal complement S 2⊥c (1794) (B.2) of geometric centersubspace (a plane in isometrically isomorphic R 3 ; drawn is a tiled fragment)apparently supporting positive semidefinite cone. (Rounded vertex is artifactof plot.) Line svec S 2 c = aff cone T (1075) intersects svec ∂S 2 + , also drawn inFigure 108; it runs along PSD cone boundary. (confer Figure 91)

6.8. DUAL EDM CONE 439EDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec S 2 h0− svec ∂S 2 +Figure 111: EDM cone construction in isometrically isomorphic R 3 by addingpolar PSD cone to svec S 2⊥c . Difference svec ( S 2⊥c − S+) 2 is halfspace partiallybounded by svec S 2⊥c . EDM cone is nonnegative halfline along svec S 2 h inthis dimension.

440 CHAPTER 6. CONE OF DISTANCE MATRICESFrom the results inE.7.2.0.2, we know matrix product V H V is theorthogonal projection of H ∈ S N on the geometric center subspace S N c . Thusthe projection productP K2 H = H − P S N+P S N cH (1095)6.8.1.1.1 Lemma. Projection on PSD cone ∩ geometric center subspace.P S N+ ∩ S N c= P S N+P S N c(1096)⋄Proof. For each and every H ∈ S N , projection of P S N cH on the positivesemidefinite cone remains in the geometric center subspaceS N c = {G∈ S N | G1 = 0} (1792)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1793)(824)That is because: eigenvectors of P S N cH corresponding to 0 eigenvaluesspan its nullspace N(P S N cH). (A.7.3.0.1) To project P S N cH on the positivesemidefinite cone, its negative eigenvalues are zeroed. (7.1.2) The nullspaceis thereby expanded while eigenvectors originally spanning N(P S N cH)remain intact. Because the geometric center subspace is invariant toprojection on the PSD cone, then the rule for projection on a convex setin a subspace governs (E.9.5, projectors do not commute) and statement(1096) follows directly. From the lemma it followsThen from (1819){P S N+P S N cH | H ∈ S N } = {P S N+ ∩ S N c H | H ∈ SN } (1097)− ( S N c ∩ S N +) ∗= {H − PS N+P S N cH | H ∈ S N } (1098)From (273) we get closure of a vector sumK 2 = − ( )S N c ∩ S N ∗+ = SN⊥c − S N + (1099)therefore the new equalityEDM N = K 1 ∩ K 2 = S N h ∩ ( )S N⊥c − S N +(1100)

6.8. DUAL EDM CONE 441whose veracity is intuitively evident, in hindsight, [61, p.109] from themost fundamental EDM definition (734). Formula (1100) is not a matrixcriterion for membership to the EDM cone, and it is not an equivalencebetween EDM operators. Rather, it is a recipe for constructing the EDMcone whole from large Euclidean bodies: the positive semidefinite cone,orthogonal complement of the geometric center subspace, and symmetrichollow subspace. A realization of this construction in low dimension isillustrated in Figure 110 and Figure 111.The dual EDM cone follows directly from (1100) by standard propertiesof cones (2.13.1.1):EDM N∗ = K ∗ 1 + K ∗ 2 = S N⊥h − S N c ∩ S N + (1101)which bears strong resemblance to (1080).6.8.1.2 nonnegative orthantThat EDM N is a proper subset of the nonnegative orthant is not obviousfrom (1100). We wish to verifyEDM N = S N h ∩ ( )S N⊥c − S N + ⊂ RN×N+ (1102)While there are many ways to prove this, it is sufficient to show that allentries of the extreme directions of EDM N must be nonnegative; id est, forany particular nonzero vector z = [z i , i=1... N]∈ N(1 T ) (6.5.3.1),δ(zz T )1 T + 1δ(zz T ) T − 2zz T ≥ 0 (1103)where the inequality denotes entrywise comparison. The inequality holdsbecause the i,j th entry of an extreme direction is squared: (z i − z j ) 2 .We observe that the dyad 2zz T ∈ S N + belongs to the positive semidefinitecone, the doubletδ(zz T )1 T + 1δ(zz T ) T ∈ S N⊥c (1104)to the orthogonal complement (1794) of the geometric center subspace, whiletheir difference is a member of the symmetric hollow subspace S N h .Here is an algebraic method provided by Trosset to prove nonnegativity:Suppose we are given A∈ S N⊥c and B = [B ij ]∈ S N + and A −B ∈ S N h .Then we have, for some vector u , A = u1 T + 1u T = [A ij ] = [u i + u j ] andδ(B)= δ(A)= 2u . Positive semidefiniteness of B requires nonnegativityA −B ≥ 0 because(e i −e j ) T B(e i −e j ) = (B ii −B ij )−(B ji −B jj ) = 2(u i +u j )−2B ij ≥ 0 (1105)

442 CHAPTER 6. CONE OF DISTANCE MATRICES6.8.1.3 Dual Euclidean distance matrix criterionConditions necessary for membership of a matrix D ∗ ∈ S N to the dualEDM cone EDM N∗ may be derived from (1080): D ∗ ∈ EDM N∗ ⇒D ∗ = δ(y) − V N AVNT for some vector y and positive semidefinite matrixA∈ S N−1+ . This in turn implies δ(D ∗ 1)=δ(y) . Then, for D ∗ ∈ S Nwhere, for any symmetric matrix D ∗D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1106)δ(D ∗ 1) − D ∗ ∈ S N c (1107)To show sufficiency of the matrix criterion in (1106), recall Gram-formEDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (746)where Gram matrix G is positive semidefinite by definition, and recall theself-adjointness property of the main-diagonal linear operator δ (A.1):〈D , D ∗ 〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , D ∗〉 = 〈G , δ(D ∗ 1) − D ∗ 〉 2 (765)Assuming 〈G , δ(D ∗ 1) − D ∗ 〉≥ 0 (1311), then we have known membershiprelation (2.13.2.0.1)D ∗ ∈ EDM N∗ ⇔ 〈D , D ∗ 〉 ≥ 0 ∀D ∈ EDM N (1108)Elegance of this matrix criterion (1106) for membership to the dualEDM cone is the lack of any other assumptions except D ∗ be symmetric.(Recall: Schoenberg criterion (753) for membership to the EDM cone requiresmembership to the symmetric hollow subspace.)Linear Gram-form EDM operator (746) has adjoint, for Y ∈ S NThen we have: [61, p.111]D T (Y ) ∆ = (δ(Y 1) − Y ) 2 (1109)EDM N∗ = {Y ∈ S N | δ(Y 1) − Y ≽ 0} (1110)the dual EDM cone expressed in terms of the adjoint operator. A dual EDMcone determined this way is illustrated in Figure 113.

6.8. DUAL EDM CONE 4436.8.1.3.1 Exercise. Dual EDM spectral cone.Find a spectral cone as in5.11.2 corresponding to EDM N∗ .6.8.1.4 Nonorthogonal components of dual EDMNow we tie construct (1101) for the dual EDM cone together with the matrixcriterion (1106) for dual EDM cone membership. For any D ∗ ∈ S N it isobvious:δ(D ∗ 1) ∈ S N⊥h (1111)any diagonal matrix belongs to the subspace of diagonal matrices (58). Weknow when D ∗ ∈ EDM N∗ δ(D ∗ 1) − D ∗ ∈ S N c ∩ S N + (1112)this adjoint expression (1109) belongs to that face (1073) of the positivesemidefinite cone S N + in the geometric center subspace. Any nonzerodual EDMD ∗ = δ(D ∗ 1) − (δ(D ∗ 1) − D ∗ ) ∈ S N⊥h ⊖ S N c ∩ S N + = EDM N∗ (1113)can therefore be expressed as the difference of two linearly independentnonorthogonal components (Figure 91, Figure 112).6.8.1.5 Affine dimension complementarityFrom6.8.1.3 we have, for some A∈ S N−1+ (confer (1112))δ(D ∗ 1) − D ∗ = V N AV T N ∈ S N c ∩ S N + (1114)if and only if D ∗ belongs to the dual EDM cone. Call rank(V N AV T N ) dualaffine dimension. Empirically, we find a complementary relationship in affinedimension between the projection of some arbitrary symmetric matrix H onthe polar EDM cone, EDM N◦ = −EDM N∗ , and its projection on the EDMcone; id est, the optimal solution of 6.10minimize ‖D ◦ − H‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0(1115)6.10 This dual projection can be solved quickly (without semidefinite programming) via

444 CHAPTER 6. CONE OF DISTANCE MATRICESD ◦ = δ(D ◦ 1) + (D ◦ − δ(D ◦ 1)) ∈ S N⊥h⊕ S N c ∩ S N + = EDM N◦S N c ∩ S N +EDM N◦ D ◦ − δ(D ◦ 1)D ◦δ(D ◦ 1)0EDM N◦S N⊥hFigure 112: Hand-drawn abstraction of polar EDM cone (drawn truncated).Any member D ◦ of polar EDM cone can be decomposed into two linearlyindependent nonorthogonal components: δ(D ◦ 1) and D ◦ − δ(D ◦ 1).

6.8. DUAL EDM CONE 445has dual affine dimension complementary to affine dimension correspondingto the optimal solution ofminimize ‖D − H‖ FD∈S N hsubject to −VN TDV N ≽ 0(1116)Precisely,rank(D ◦⋆ −δ(D ◦⋆ 1)) + rank(V T ND ⋆ V N ) = N −1 (1117)and rank(D ◦⋆ −δ(D ◦⋆ 1))≤N−1 because vector 1 is always in the nullspaceof rank’s argument. This is similar to the known result for projection on theself-dual positive semidefinite cone and its polar:rankP −S N+H + rankP S N+H = N (1118)When low affine dimension is a desirable result of projection on theEDM cone, projection on the polar EDM cone should be performedinstead. Convex polar problem (1115) can be solved for D ◦⋆ bytransforming to an equivalent Schur-form semidefinite program (3.1.7.2).Interior-point methods for numerically solving semidefinite programs tendto produce high-rank solutions. (4.1.1) Then D ⋆ = H − D ◦⋆ ∈ EDM N byCorollary E.9.2.2.1, and D ⋆ will tend to have low affine dimension. Thisapproach breaks when attempting projection on a cone subset discriminatedby affine dimension or rank, because then we have no complementarityrelation like (1117) or (1118) (7.1.4.1).Lemma 6.8.1.1.1; rewriting,minimize ‖(D ◦ − δ(D ◦ 1)) − (H − δ(D ◦ 1))‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0which is the projection of affinely transformed optimal solution H − δ(D ◦⋆ 1) on S N c ∩ S N + ;D ◦⋆ − δ(D ◦⋆ 1) = P S N+P S Nc(H − δ(D ◦⋆ 1))Foreknowledge of an optimal solution D ◦⋆ as argument to projection suggests recursion.

446 CHAPTER 6. CONE OF DISTANCE MATRICES6.8.1.6 EDM cone dualityIn5.6.1.1, via Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G ∈ EDM N ⇐ G ≽ 0 (746)we established clear connection between the EDM cone and that face (1073)of positive semidefinite cone S N + in the geometric center subspace:EDM N = D(S N c ∩ S N +) (841)V(EDM N ) = S N c ∩ S N + (842)whereIn5.6.1 we establishedV(D) = −V DV 1 2(830)S N c ∩ S N + = V N S N−1+ V T N (828)Then from (1106), (1114), and (1080) we can deduceδ(EDM N∗ 1) − EDM N∗ = V N S N−1+ V T N = S N c ∩ S N + (1119)which, by (841) and (842), means the EDM cone can be related to the dualEDM cone by an equality:(EDM N = D δ(EDM N∗ 1) − EDM N∗) (1120)V(EDM N ) = δ(EDM N∗ 1) − EDM N∗ (1121)This means projection −V(EDM N ) of the EDM cone on the geometriccenter subspace S N c (E.7.2.0.2) is an affine transformation of the dual EDMcone: EDM N∗ − δ(EDM N∗ 1). Secondarily, it means the EDM cone is notself-dual.

6.8. DUAL EDM CONE 4476.8.1.7 Schoenberg criterion is discretized membership relationWe show the Schoenberg criterion−VN TDV }N ∈ S N−1+D ∈ S N h⇔ D ∈ EDM N (753)to be a discretized membership relation (2.13.4) between a closed convexcone K and its dual K ∗ like〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) ⇔ x ∈ K (318)where G(K ∗ ) is any set of generators whose conic hull constructs closedconvex dual cone K ∗ :The Schoenberg criterion is the same as}〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0⇔ D ∈ EDM N (1067)D ∈ S N hwhich, by (1068), is the same as〈zz T , −D〉 ≥ 0 ∀zz T ∈ { V N υυ T VN T | υ ∈ RN−1}D ∈ S N h}⇔ D ∈ EDM N (1122)where the zz T constitute a set of generators G for the positive semidefinitecone’s smallest face F ( S N + ∋V ) (6.7.1) that contains auxiliary matrix V .From the aggregate in (1080) we get the ordinary membership relation,assuming only D ∈ S N [148, p.58],〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ EDM N∗ ⇔ D ∈ EDM N(1123)〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM NDiscretization yields (318):〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {e i e T i , −e j e T j , −V N υυ T V T N | i, j =1... N , υ ∈ R N−1 } ⇔ D ∈ EDM N(1124)

448 CHAPTER 6. CONE OF DISTANCE MATRICESBecause 〈 {δ(u) | u∈ R N }, D 〉 ≥ 0 ⇔ D ∈ S N h , we can restrict observationto the symmetric hollow subspace without loss of generality. Then for D ∈ S N h〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ { −V N υυ T V T N | υ ∈ R N−1} ⇔ D ∈ EDM N (1125)this discretized membership relation becomes (1122); identical to theSchoenberg criterion.Hitherto a correspondence between the EDM cone and a face of a PSDcone, the Schoenberg criterion is now accurately interpreted as a discretizedmembership relation between the EDM cone and its ordinary dual.6.8.2 Ambient S N hWhen instead we consider the ambient space of symmetric hollow matrices(1081), then still we find the EDM cone is not self-dual for N >2. Thesimplest way to prove this is as follows:Given a set of generators G = {Γ} (1041) for the pointed closed convexEDM cone, the discrete membership theorem in2.13.4.2.1 asserts thatmembers of the dual EDM cone in the ambient space of symmetric hollowmatrices can be discerned via discretized membership relation:EDM N∗ ∩ S N h∆= {D ∗ ∈ S N h | 〈Γ , D ∗ 〉 ≥ 0 ∀ Γ ∈ G(EDM N )}(1126)By comparison= {D ∗ ∈ S N h | 〈δ(zz T )1 T + 1δ(zz T ) T − 2zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}= {D ∗ ∈ S N h | 〈1δ(zz T ) T − zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}EDM N = {D ∈ S N h | 〈−zz T , D〉 ≥ 0 ∀z∈ N(1 T )} (1127)the term δ(zz T ) T D ∗ 1 foils any hope of self-duality in ambient S N h .

6.9. THEOREM OF THE ALTERNATIVE 449To find the dual EDM cone in ambient S N h per2.13.9.4 we prune theaggregate in (1080) describing the ordinary dual EDM cone, removing anymember having nonzero main diagonal:EDM N∗ ∩ S N h = cone { δ 2 (V N υυ T V T N ) − V N υυT V T N | υ ∈ RN−1}= {δ 2 (V N ΨV T N ) − V N ΨV T N| Ψ∈ SN−1 + }(1128)When N = 1, the EDM cone and its dual in ambient S h each comprisethe origin in isomorphic R 0 ; thus, self-dual in this dimension. (confer (85))When N = 2, the EDM cone is the nonnegative real line in isomorphic R .(Figure 108) EDM 2∗ in S 2 h is identical, thus self-dual in this dimension.This [ result]is in agreement [ with ](1126), verified directly: for all κ∈ R ,11z = κ and δ(zz−1T ) = κ 2 ⇒ d ∗ 12 ≥ 0.1The first case adverse to self-duality N = 3 may be deduced fromFigure 100; the EDM cone is a circular cone in isomorphic R 3 correspondingto no rotation of the Lorentz cone (148) (the self-dual circular cone).Figure 113 illustrates the EDM cone and its dual in ambient S 3 h ; no longerself-dual.6.9 Theorem of the alternativeIn2.13.2.1.1 we showed how alternative systems of generalized inequalitycan be derived from closed convex cones and their duals. This section is,therefore, a fitting postscript to the discussion of the dual EDM cone.6.9.0.0.1 Theorem. EDM alternative. [113,1]Given D ∈ S N hD ∈ EDM Nor in the alternative{1 T z = 1∃ z such thatDz = 0(1129)In words, either N(D) intersects hyperplane {z | 1 T z =1} or D is an EDM;the alternatives are incompatible.⋄

450 CHAPTER 6. CONE OF DISTANCE MATRICES0dvec rel∂EDM 3dvec rel∂(EDM 3∗ ∩ S 3 h)D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1106)Figure 113: Ordinary dual EDM cone projected on S 3 h shrouds EDM 3 ;drawn tiled in isometrically isomorphic R 3 . (It so happens: intersectionEDM N∗ ∩ S N h (2.13.9.3) is identical to projection of dual EDM cone on S N h .)

6.10. POSTSCRIPT 451When D is an EDM [191,2]N(D) ⊂ N(1 T ) = {z | 1 T z = 0} (1130)Because [113,2] (E.0.1)thenDD † 1 = 11 T D † D = 1 T (1131)R(1) ⊂ R(D) (1132)6.10 postscriptWe provided an equality (1100) relating the convex cone of Euclidean distancematrices to the convex cone of positive semidefinite matrices. Projection ona positive semidefinite cone, constrained by an upper bound on rank, is easyand well known; [85] simply, a matter of truncating a list of eigenvalues.Projection on a positive semidefinite cone with such a rank constraint is, infact, a convex optimization problem. (7.1.4)In the past, it was difficult to project on the EDM cone under aconstraint on rank or affine dimension. A surrogate method was to invokethe Schoenberg criterion (753) and then project on a positive semidefinitecone under a rank constraint bounding affine dimension from above. But asolution acquired that way is necessarily suboptimal.In7.3.3 we present a method for projecting directly on the EDM coneunder a constraint on rank or affine dimension.

452 CHAPTER 6. CONE OF DISTANCE MATRICES

Chapter 7Proximity problemsIn summary, we find that the solution to problem [(1142.3) p.459]is difficult and depends on the dimension of the space as thegeometry of the cone of EDMs becomes more complex.−Hayden, Wells, Liu, & Tarazaga (1991) [134,3]A problem common to various sciences is to find the Euclidean distancematrix (EDM) D ∈ EDM N closest in some sense to a given complete matrixof measurements H under a constraint on affine dimension 0 ≤ r ≤ N −1(2.3.1,5.7.1.1); rather, r is bounded above by desired affine dimension ρ .2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.453

454 CHAPTER 7. PROXIMITY PROBLEMS7.0.1 Measurement matrix HIdeally, we want a given matrix of measurements H ∈ R N×N to conformwith the first three Euclidean metric properties (5.2); to belong to theintersection of the orthant of nonnegative matrices R N×N+ with the symmetrichollow subspace S N h (2.2.3.0.1). Geometrically, we want H to belong to thepolyhedral cone (2.12.1.0.1)K ∆ = S N h ∩ R N×N+ (1133)Yet in practice, H can possess significant measurement uncertainty (noise).Sometimes realization of an optimization problem demands that itsinput, the given matrix H , possess some particular characteristics; perhapssymmetry and hollowness or nonnegativity. When that H given does notpossess the desired properties, then we must impose them upon H prior tooptimization:When measurement matrix H is not symmetric or hollow, taking itssymmetric hollow part is equivalent to orthogonal projection on thesymmetric hollow subspace S N h .When measurements of distance in H are negative, zeroing negativeentries effects unique minimum-distance projection on the orthant ofnonnegative matrices R N×N+ in isomorphic R N2 (E.9.2.2.3).7.0.1.1 Order of impositionSince convex cone K (1133) is the intersection of an orthant with a subspace,we want to project on that subset of the orthant belonging to the subspace;on the nonnegative orthant in the symmetric hollow subspace that is, in fact,the intersection. For that reason alone, unique minimum-distance projectionof H on K (that member of K closest to H in isomorphic R N2 in the Euclideansense) can be attained by first taking its symmetric hollow part, and onlythen clipping negative entries of the result to 0 ; id est, there is only onecorrect order of projection, in general, on an orthant intersecting a subspace:project on the subspace, then project the result on the orthant in thatsubspace. (conferE.9.5)

In contrast, order of projection on an intersection of subspaces is arbitrary.455That order-of-projection rule applies more generally, of course, tothe intersection of any convex set C with any subspace. Consider theproximity problem 7.1 over convex feasible set S N h ∩ C given nonsymmetricnonhollow H ∈ R N×N :minimizeB∈S N hsubject to‖B − H‖ 2 FB ∈ C(1134)a convex optimization problem. Because the symmetric hollow subspaceis orthogonal to the antisymmetric antihollow subspace (2.2.3), then forB ∈ S N h( ( ))1tr B T 2 (H −HT ) + δ 2 (H) = 0 (1135)so the objective function is equivalent to( )∥ ‖B − H‖ 2 F ≡1 ∥∥∥2∥ B − 2 (H +HT ) − δ 2 (H) +F21∥2 (H −HT ) + δ 2 (H)∥F(1136)This means the antisymmetric antihollow part of given matrix H wouldbe ignored by minimization with respect to symmetric hollow variable Bunder the Frobenius norm; id est, minimization proceeds as though giventhe symmetric hollow part of H .This action of the Frobenius norm (1136) is effectively a Euclideanprojection (minimum-distance projection) of H on the symmetric hollowsubspace S N h prior to minimization. Thus minimization proceeds inherentlyfollowing the correct order for projection on S N h ∩ C . Therefore wemay either assume H ∈ S N h , or take its symmetric hollow part prior tooptimization.7.1 There are two equivalent interpretations of projection (E.9): one finds a set normal,the other, minimum distance between a point and a set. Here we realize the latter view.

456 CHAPTER 7. PROXIMITY PROBLEMS......❜..❜..❜..❜..❜..❜..❜.❜❜❜❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧ 0 ✟✟✟✟✟✟✟✟✟✟✟EDM N❜❜ S N ❝ ❝❝❝❝❝❝❝❝❝❝❝h❜❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❜❜❜❜❜❜ K = S N .h ∩ R N×N+.❜... S N ❜.. ❜..❜.. ❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧....... . . . ..R N×N........................Figure 114: Pseudo-Venn diagram: The EDM cone belongs to theintersection of the symmetric hollow subspace with the nonnegative orthant;EDM N ⊆ K (733). EDM N cannot exist outside S N h , but R N×N+ does.......7.0.1.2 Egregious input error under nonnegativity demandMore pertinent to the optimization problems presented herein whereC ∆ = EDM N ⊆ K = S N h ∩ R N×N+ (1137)then should some particular realization of a proximity problem demandinput H be nonnegative, and were we only to zero negative entries of anonsymmetric nonhollow input H prior to optimization, then the ensuingprojection on EDM N would be guaranteed incorrect (out of order).Now comes a surprising fact: Even were we to correctly follow theorder-of-projection rule and provide H ∈ K prior to optimization, then theensuing projection on EDM N will be incorrect whenever input H has negativeentries and some proximity problem demands nonnegative input H .

457HS N h0EDM NK = S N h ∩ R N×N+Figure 115: Pseudo-Venn diagram from Figure 114 showing elbow placedin path of projection of H on EDM N ⊂ S N h by an optimization problemdemanding nonnegative input matrix H . The first two line segmentsleading away from H result from correct order-of-projection required toprovide nonnegative H prior to optimization. Were H nonnegative, then itsprojection on S N h would instead belong to K ; making the elbow disappear.(confer Figure 126)

458 CHAPTER 7. PROXIMITY PROBLEMSThis is best understood referring to Figure 114: Suppose nonnegativeinput H is demanded, and then the problem realization correctly projectsits input first on S N h and then directly on C = EDM N . That demandfor nonnegativity effectively requires imposition of K on input H priorto optimization so as to obtain correct order of projection (on S N h first).Yet such an imposition prior to projection on EDM N generally introducesan elbow into the path of projection (illustrated in Figure 115) caused bythe technique itself; that being, a particular proximity problem realizationrequiring nonnegative input.Any procedure for imposition of nonnegativity on input H can only beincorrect in this circumstance. There is no resolution unless input H isguaranteed nonnegative with no tinkering. Otherwise, we have no choice butto employ a different problem realization; one not demanding nonnegativeinput.7.0.2 Lower boundMost of the problems we encounter in this chapter have the general form:minimize ‖B − A‖ FBsubject to B ∈ C(1138)where A ∈ R m×n is given data. This particular objective denotes Euclideanprojection (E) of vectorized matrix A on the set C which may or may not beconvex. When C is convex, then the projection is unique minimum-distancebecause the Frobenius norm when squared is a strictly convex function ofvariable B and because the optimal solution is the same regardless of thesquare (1492). When C is a subspace, then the direction of projection isorthogonal to C .Denoting by A=U ∆ A Σ A QA T and B =U ∆ B Σ B QB T their full singular valuedecompositions (whose singular values are always nonincreasingly ordered(A.6)), there exists a tight lower bound on the objective over the manifoldof orthogonal matrices;‖Σ B − Σ A ‖ F ≤inf ‖B − A‖ F (1139)U A ,U B ,Q A ,Q BThis least lower bound holds more generally for any orthogonally invariantnorm on R m×n (2.2.1) including the Frobenius and spectral norm [248,II.3].[150,7.4.51]

4597.0.3 Problem approachProblems traditionally posed in terms of point position {x i , i=1... N } ,such as∑minimize (‖x i − x j ‖ − h ij ) 2 (1140){x i }orminimize{x i }i , j ∈ I∑(‖x i − x j ‖ 2 − h 2 ij) 2 (1141)i , j ∈ I(where I is an abstract set of indices and h ij is given data) are everywhereconverted (in this book) to the distance-square variable D or to Grammatrix G ; the Gram matrix acting as bridge between position and distance.That conversion is performed regardless of whether known data is complete.Then the techniques of chapter 5 or chapter 6 are applied to find relative orabsolute position. This approach is taken because we prefer introduction ofrank constraints into convex problems rather than searching an infinitude oflocal minima in (1140) or (1141) [70].7.0.4 Three prevalent proximity problemsThere are three statements of the closest-EDM problem prevalent in theliterature, the multiplicity due primarily to choice of projection on theEDM versus positive semidefinite (PSD) cone and vacillation between thedistance-square variable d ij versus absolute distance √ d ij . In their mostfundamental form, the three prevalent proximity problems are (1142.1),(1142.2), and (1142.3): [260](1)(3)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ (2)◦√ √D ∈ EDMN(1142)minimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN(4)

460 CHAPTER 7. PROXIMITY PROBLEMSwhere we have made explicit an imposed upper bound ρ on affine dimensionr = rankV T NDV N = rankV DV (876)that is benign when ρ =N−1, and where D ∆ = [d ij ] and ◦√ D ∆ = [ √ d ij ] .Problems (1142.2) and (1142.3) are Euclidean projections of a vectorizedmatrix H on an EDM cone (6.3), whereas problems (1142.1) and (1142.4) areEuclidean projections of a vectorized matrix −VHV on a PSD cone. Problem(1142.4) is not posed in the literature because it has limited theoreticalfoundation. 7.2Analytical solution to (1142.1) is known in closed form for any bound ρalthough, as the problem is stated, it is a convex optimization only in the caseρ =N−1. We show in7.1.4 how (1142.1) becomes a convex optimizationproblem for any ρ when transformed to the spectral domain. When expressedas a function of point list in a matrix X as in (1140), problem (1142.2)is a variant of what is known in statistics literature as the stress problem.[39, p.34] [68] [269] Problems (1142.2) and (1142.3) are convex optimizationproblems in D for the case ρ =N −1. Even with the rank constraint removedfrom (1142.2), we will see the convex problem remaining inherently minimizesaffine dimension.Generally speaking, each problem in (1142) produces a different resultbecause there is no isometry relating them. Of the various auxiliaryV -matrices (B.4), the geometric centering matrix V (757) appears in theliterature most often although V N (740) is the auxiliary matrix naturallyconsequent to Schoenberg’s seminal exposition [236]. Substitution of anyauxiliary matrix or its pseudoinverse into these problems produces anothervalid problem.Substitution of VNT for left-hand V in (1142.1), in particular, produces adifferent result becauseminimize ‖−V (D − H)V ‖ 2 FD(1143)subject to D ∈ EDM N7.2 D ∈ EDM N ⇒ ◦√ D ∈ EDM N , −V ◦√ DV ∈ S N + (5.10)

7.1. FIRST PREVALENT PROBLEM: 461finds D to attain Euclidean distance of vectorized −VHV to the positivesemidefinite cone in ambient isometrically isomorphic R N(N+1)/2 , whereasminimize ‖−VN T(D − H)V N ‖ 2 FD(1144)subject to D ∈ EDM Nattains Euclidean distance of vectorized −VN THV N to the positivesemidefinite cone in isometrically isomorphic subspace R N(N−1)/2 ; quitedifferent projections 7.3 regardless of whether affine dimension is constrained.But substitution of auxiliary matrix VW T (B.4.3) or V † Nyields the sameresult as (1142.1) because V = V W VW T = V N V † N; id est,‖−V (D − H)V ‖ 2 F = ‖−V W V T W (D − H)V W V T W ‖2 F = ‖−V T W (D − H)V W‖ 2 F= ‖−V N V † N (D − H)V N V † N ‖2 F = ‖−V † N (D − H)V N ‖ 2 F(1145)We see no compelling reason to prefer one particular auxiliary V -matrixover another. Each has its own coherent interpretations; e.g.,5.4.2,6.7.Neither can we say any particular problem formulation produces generallybetter results than another.7.1 First prevalent problem:Projection on PSD coneThis first problemminimize ‖−VN T(D − H)V ⎫N ‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 1 (1146)⎪D ∈ EDM N ⎭7.3 The isomorphism T(Y )=V †TN Y V † N onto SN c = {V X V | X ∈ S N } relates the map in(1144) to that in (1143), but is not an isometry.

462 CHAPTER 7. PROXIMITY PROBLEMSposes a Euclidean projection of −VN THV N in subspace S N−1 on a generallynonconvex subset (when ρ

7.1. FIRST PREVALENT PROBLEM: 4637.1.1 Closest-EDM Problem 1, convex case7.1.1.0.1 Proof. Solution (1148), convex case.When desired affine dimension is unconstrained, ρ =N −1, the rank functiondisappears from (1146) leaving a convex optimization problem; a simpleunique minimum-distance projection on the positive semidefinite cone S N−1videlicetby (753). Becauseminimize ‖−VD∈ S N N T(D − H)V N ‖ 2 Fhsubject to −VN TDV N ≽ 0S N−1 = −V T N S N h V N (845)+ :(1151)then the necessary and sufficient conditions for projection in isometricallyisomorphic R N(N−1)/2 on the self-dual (322) positive semidefinite cone S N−1+are: 7.6 (E.9.2.0.1) (1401) (confer (1826))−V T N D⋆ V N ≽ 0−V T N D⋆ V N(−VTN D ⋆ V N + V T N HV N)= 0−V T N D⋆ V N + V T N HV N ≽ 0(1152)Symmetric −VN THV N is diagonalizable hence decomposable in terms of itseigenvectors v and eigenvalues λ as in (1149). Therefore (confer (1148))−V T ND ⋆ V N =N−1∑i=1max{0, λ i }v i v T i (1153)satisfies (1152), optimally solving (1151). To see that, recall: theseeigenvectors constitute an orthogonal set and−V T ND ⋆ V N + V T NHV NN−1∑= − min{0, λ i }v i vi T (1154)i=17.6 These conditions for projection on a convex cone are identical to theKarush-Kuhn-Tucker (KKT) optimality conditions for problem (1151).

464 CHAPTER 7. PROXIMITY PROBLEMS7.1.2 Generic problemPrior to determination of D ⋆ , analytical solution (1148) to Problem 1 isequivalent to solution of a generic rank-constrained projection problem; aEuclidean projection on a rank ρ subset of a PSD cone (on a generallynonconvex subset of the PSD cone boundary ∂S N−1+ when ρ

7.1. FIRST PREVALENT PROBLEM: 465dimension ρ ; 7.7 called rank ρ subset: (186) (220)S N−1+ \ S N−1+ (ρ + 1) = {X ∈ S N−1+ | rankX ≤ ρ} (225)7.1.3 Choice of spectral coneSpectral projection substitutes projection on a polyhedral cone, containinga complete set of eigenspectra, in place of projection on a convex set ofdiagonalizable matrices; e.g., (1168). In this section we develop a method ofspectral projection for constraining rank of positive semidefinite matrices ina proximity problem like (1155). We will see why an orthant turns out to bethe best choice of spectral cone, and why presorting is critical.Define a nonlinear permutation operator π(x) : R n → R n that sorts itsvector argument x into nonincreasing order.7.1.3.0.1 Definition. Spectral projection.Let R be an orthogonal matrix and Λ a nonincreasingly ordered diagonalmatrix of eigenvalues. Spectral projection means unique minimum-distanceprojection of a rotated (R ,B.5.4) nonincreasingly ordered (π) vector (δ)of eigenvaluesπ ( δ(R T ΛR) ) (1157)on a polyhedral cone containing all eigenspectra corresponding to a rank ρsubset of a positive semidefinite cone (2.9.2.1) or the EDM cone (inCayley-Menger form,5.11.2.3).△In the simplest and most common case, projection on a positivesemidefinite cone, orthogonal matrix R equals I (7.1.4.0.1) and diagonalmatrix Λ is ordered during diagonalization (A.5.2). Then spectralprojection simply means projection of δ(Λ) on a subset of the nonnegativeorthant, as we shall now ascertain:It is curious how nonconvex Problem 1 has such a simple analyticalsolution (1148). Although solution to generic problem (1155) is known since1936 [85], Trosset [266,2] first observed its equivalence in 1997 to projection7.7 Recall: affine dimension is a lower bound on embedding (2.3.1), equal to dimensionof the smallest affine set in which points from a list X corresponding to an EDM D canbe embedded.

466 CHAPTER 7. PROXIMITY PROBLEMSof an ordered vector of eigenvalues (in diagonal matrix Λ) on a subset of themonotone nonnegative cone (2.13.9.4.1)K M+ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ N−1 ≥ 0} ⊆ R N−1+ (371)Of interest, momentarily, is only the smallest convex subset of themonotone nonnegative cone K M+ containing every nonincreasingly orderedeigenspectrum corresponding to a rank ρ subset of the positive semidefinitecone S N−1+ ; id est,K ρ M+∆= {υ ∈ R ρ | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ 0} ⊆ R ρ + (1158)a pointed polyhedral cone, a ρ-dimensional convex subset of themonotone nonnegative cone K M+ ⊆ R N−1+ having property, for λ denotingeigenspectra,[ρ] KM+= π(λ(rank ρ subset)) ⊆ K N−1 ∆0M+= K M+ (1159)For each and every elemental eigenspectrumγ ∈ λ(rank ρ subset)⊆ R N−1+ (1160)of the rank ρ subset (ordered or unordered in λ), there is a nonlinearsurjection π(γ) onto K ρ M+ .7.1.3.0.2 Exercise. Smallest spectral cone.Prove that there is no convex subset of K M+ smaller than K ρ M+ containingevery ordered eigenspectrum corresponding to the rank ρ subset of a positivesemidefinite cone (2.9.2.1).7.1.3.0.3 Proposition. (Hardy-Littlewood-Pólya) [129,X][41,1.2] Any vectors σ and γ in R N−1 satisfy a tight inequalityπ(σ) T π(γ) ≥ σ T γ ≥ π(σ) T Ξπ(γ) (1161)where Ξ is the order-reversing permutation matrix defined in (1533),and permutator π(γ) is a nonlinear function that sorts vector γ intononincreasing order thereby providing the greatest upper bound and leastlower bound with respect to every possible sorting.⋄

7.1. FIRST PREVALENT PROBLEM: 4677.1.3.0.4 Corollary. Monotone nonnegative sort.Any given vectors σ,γ∈R N−1 satisfy a tight Euclidean distance inequality‖π(σ) − π(γ)‖ ≤ ‖σ − γ‖ (1162)where nonlinear function π(γ) sorts vector γ into nonincreasing orderthereby providing the least lower bound with respect to every possiblesorting.⋄Given γ ∈ R N−1infσ∈R N−1+‖σ−γ‖ = infσ∈R N−1+‖π(σ)−π(γ)‖ = infσ∈R N−1+‖σ−π(γ)‖ = inf ‖σ−π(γ)‖σ∈K M+Yet for γ representing an arbitrary vector of eigenvalues, because(1163)inf − γ‖0‖σ 2 ≥ inf − π(γ)‖σ∈R ρ +0‖σ 2 =σ∈R ρ +infσ∈K ρ M+0‖σ − π(γ)‖ 2 (1164)then projection of γ on the eigenspectra of a rank ρ subset can be tightenedsimply by presorting γ into nonincreasing order.Proof. Simply because π(γ) 1:ρ ≽ π(γ 1:ρ )inf − γ‖0‖σ 2 = γρ+1:N−1 T γ ρ+1:N−1 + inf ‖σ 1:ρ − γ 1:ρ ‖ 2σ∈R N−1+= γ T γ + inf σ1:ρσ T 1:ρ − 2σ1:ργ T 1:ρσ∈R ρ +inf − γ‖0‖σ 2 ≥ infσ∈R ρ +σ∈R N−1+σ1:ρσ T 1:ρ − 2σ T (1165)1:ρπ(γ) 1:ρσ∈R N−1+− π(γ)‖σ∈R ρ +0‖σ 2≥ γ T γ + inf

468 CHAPTER 7. PROXIMITY PROBLEMS7.1.3.1 Orthant is best spectral cone for Problem 1This means unique minimum-distance projection of γ on the nearestspectral member of the rank ρ subset is tantamount to presorting γ intononincreasing order. Only then does unique spectral projection on a subsetK ρ M+of the monotone nonnegative cone become equivalent to unique spectralprojection on a subset R ρ + of the nonnegative orthant (which is simpler);in other words, unique minimum-distance projection of sorted γ on thenonnegative orthant in a ρ-dimensional subspace of R N is indistinguishablefrom its projection on the subset K ρ M+of the monotone nonnegative cone inthat same subspace.7.1.4 Closest-EDM Problem 1, “nonconvex” caseTrosset’s proof of solution (1148), for projection on a rank ρ subset of thepositive semidefinite cone S N−1+ , was algebraic in nature. [266,2] Here wederive that known result but instead using a more geometric argument viaspectral projection on a polyhedral cone (subsuming the proof in7.1.1).In so doing, we demonstrate how nonconvex Problem 1 is transformed to aconvex optimization:7.1.4.0.1 Proof. Solution (1148), nonconvex case.As explained in7.1.2, we may instead work with the more facile genericproblem (1155). With diagonalization of unknownB ∆ = UΥU T ∈ S N−1 (1166)given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalizableA ∆ = QΛQ T ∈ S N−1 (1167)having eigenvalues in Λ arranged in nonincreasing order, by (40) the genericproblem is equivalent tominimize ‖B − A‖ 2B∈S N−1Fsubject to rankB ≤ ρB ≽ 0≡minimize ‖Υ − R T ΛR‖ 2 FR , Υsubject to rank Υ ≤ ρ (1168)Υ ≽ 0R −1 = R T

7.1. FIRST PREVALENT PROBLEM: 469whereR ∆ = Q T U ∈ R N−1×N−1 (1169)in U on the set of orthogonal matrices is a linear bijection. We proposesolving (1168) by instead solving the problem sequence:minimize ‖Υ − R T ΛR‖ 2 FΥsubject to rank Υ ≤ ρΥ ≽ 0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(1170)Problem (1170a) is equivalent to: (1) orthogonal projection of R T ΛRon an N − 1-dimensional subspace of isometrically isomorphic R N(N−1)/2containing δ(Υ)∈ R N−1+ , (2) nonincreasingly ordering the[result, (3) uniqueRρ]+minimum-distance projection of the ordered result on . (E.9.5)0Projection on that N−1-dimensional subspace amounts to zeroing R T ΛR atall entries off the main diagonal; thus, the equivalent sequence leading witha spectral projection:minimize ‖δ(Υ) − π ( δ(R T ΛR) ) ‖ 2Υ [ Rρ]+subject to δ(Υ) ∈0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(1171)Because any permutation matrix is an orthogonal matrix, it is alwaysfeasible that δ(R T ΛR)∈ R N−1 be arranged in nonincreasing order; hence, thepermutation operator π . Unique minimum-distance projection of vectorπ ( δ(R T ΛR) ) [ Rρ]+on the ρ-dimensional subset of nonnegative orthant0

470 CHAPTER 7. PROXIMITY PROBLEMSR N−1+ requires: (E.9.2.0.1)δ(Υ ⋆ ) ρ+1:N−1 = 0δ(Υ ⋆ ) ≽ 0δ(Υ ⋆ ) T( δ(Υ ⋆ ) − π(δ(R T ΛR)) ) = 0δ(Υ ⋆ ) − π(δ(R T ΛR)) ≽ 0(1172)which are necessary and sufficient conditions. Any value Υ ⋆ satisfyingconditions (1172) is optimal for (1171a). So{ {δ(Υ ⋆ max 0, π ( δ(R) i =T ΛR) ) }, i=1... ρi(1173)0, i=ρ+1... N −1specifies an optimal solution. The lower bound on the objective with respectto R in (1171b) is tight; by (1139)‖ |Υ ⋆ | − |Λ| ‖ F ≤ ‖Υ ⋆ − R T ΛR‖ F (1174)where | | denotes absolute entry-value. For selection of Υ ⋆ as in (1173), thislower bound is attained when (conferC.4.2.2)R ⋆ = I (1175)which is the known solution.7.1.4.1 SignificanceImportance of this well-known [85] optimal solution (1148) for projection ona rank ρ subset of a positive semidefinite cone should not be dismissed:Problem 1, as stated, is generally nonconvex. This analytical solutionat once encompasses projection on a rank ρ subset (225) of the positivesemidefinite cone (generally, a nonconvex subset of its boundary)from either the exterior or interior of that cone. 7.8 By problemtransformation to the spectral domain, projection on a rank ρ subsetbecomes a convex optimization problem.7.8 Projection on the boundary from the interior of a convex Euclidean body is generallya nonconvex problem.

7.1. FIRST PREVALENT PROBLEM: 471This solution is closed form.This solution is equivalent to projection on a polyhedral cone inthe spectral domain (spectral projection, projection on a spectralcone7.1.3.0.1); a necessary and sufficient condition (A.3.1) formembership of a symmetric matrix to a rank ρ subset of a positivesemidefinite cone (2.9.2.1).Because U ⋆ = Q , a minimum-distance projection on a rank ρ subsetof the positive semidefinite cone is a positive semidefinite matrixorthogonal (in the Euclidean sense) to direction of projection. 7.9For the convex case problem, this solution is always unique. Otherwise,distinct eigenvalues (multiplicity 1) in Λ guarantees uniqueness of thissolution by the reasoning inA.5.0.1 . 7.107.1.5 Problem 1 in spectral norm, convex caseWhen instead we pose the matrix 2-norm (spectral norm) in Problem 1 (1146)for the convex case ρ = N −1, then the new problemminimize ‖−VN T(D − H)V N ‖ 2D(1176)subject to D ∈ EDM Nis convex although its solution is not necessarily unique; 7.11 giving rise tononorthogonal projection (E.1) on the positive semidefinite cone S N−1+ .Indeed, its solution set includes the Frobenius solution (1148) for the convexcase whenever −VN THV N is a normal matrix. [133,1] [127] [46,8.1.1]Singular value problem (1176) is equivalent tominimize µµ , Dsubject to −µI ≼ −VN T(D − H)V N ≼ µI (1177)D ∈ EDM N7.9 But Theorem E.9.2.0.1 for unique projection on a closed convex cone does not applyhere because the direction of projection is not necessarily a member of the dual PSD cone.This occurs, for example, whenever positive eigenvalues are truncated.7.10 Uncertainty of uniqueness prevents the erroneous conclusion that a rank ρ subset(186) were a convex body by the Bunt-Motzkin[theorem](E.9.0.0.1).2 07.11 For each and every |t|≤2, for example, has the same spectral-norm value.0 t

472 CHAPTER 7. PROXIMITY PROBLEMSwhere{ ∣µ ⋆ = maxλ ( −VN T (D ⋆ − H)V N∣ , i = 1... N −1i)i} ∈ R + (1178)the minimized largest absolute eigenvalue (due to matrix symmetry).For lack of a unique solution here, we prefer the Frobenius rather thanspectral norm.7.2 Second prevalent problem:Projection on EDM cone in √ d ijLet◦√D ∆ = [ √ d ij ] ∈ K = S N h ∩ R N×N+ (1179)be an unknown matrix of absolute distance; id est,D = [d ij ] ∆ = ◦√ D ◦ ◦√ D ∈ EDM N (1180)where ◦ denotes Hadamard product. The second prevalent proximityproblem is a Euclidean projection (in the natural coordinates √ d ij ) of matrixH on a nonconvex subset of the boundary of the nonconvex cone of Euclideanabsolute-distance matrices rel∂ √ EDM N : (6.3, confer Figure 100(b))minimize ‖ ◦√ ⎫D − H‖◦√ 2 FD⎪⎬subject to rankVN TDV N ≤ ρ Problem 2 (1181)◦√ √D ∈ EDMN⎪⎭where√EDM N = { ◦√ D | D ∈ EDM N } (1015)This statement of the second proximity problem is considered difficult tosolve because of the constraint on desired affine dimension ρ (5.7.2) andbecause the objective function‖ ◦√ D − H‖ 2 F = ∑ i,j( √ d ij − h ij ) 2 (1182)is expressed in the natural coordinates; projection on a doubly nonconvexset.

7.2. SECOND PREVALENT PROBLEM: 473Our solution to this second problem prevalent in the literature requiresmeasurement matrix H to be nonnegative;H = [h ij ] ∈ R N×N+ (1183)If the H matrix given has negative entries, then the technique of solutionpresented here becomes invalid. As explained in7.0.1, projection of Hon K = S N h ∩ R N×N+ (1133) prior to application of this proposed solution isincorrect.7.2.1 Convex caseWhen ρ = N − 1, the rank constraint vanishes and a convex problememerges: 7.12minimize◦√Dsubject to‖ ◦√ D − H‖ 2 F◦√D ∈√EDMN⇔minimizeD∑ √d ij − 2h ij dij + h 2 iji,jsubject to D ∈ EDM N (1184)For any fixed i and j , the argument of summation is a convex functionof d ij because (for nonnegative constant h ij ) the negative square root isconvex in nonnegative d ij and because d ij + h 2 ij is affine (convex). Becausethe sum of any number of convex functions in D remains convex [46,3.2.1]and because the feasible set is convex in D , we have a convex optimizationproblem:minimize 1 T (D − 2H ◦ ◦√ D )1 + ‖H‖ 2 FD(1185)subject to D ∈ EDM NThe objective function being a sum of strictly convex functions is,moreover, strictly convex in D on the nonnegative orthant. Existenceof a unique solution D ⋆ for this second prevalent problem depends uponnonnegativity of H and a convex feasible set (3.1.2). 7.137.12 still thought to be a nonconvex problem as late as 1997 [269] even though discoveredconvex by de Leeuw in 1993. [68] [39,13.6] Yet using methods from3, it can be easilyascertained: ‖ ◦√ D − H‖ F is not convex in D .7.13 The transformed problem in variable D no longer describes √ Euclidean projection onan EDM cone. Otherwise we might erroneously conclude EDM N were a convex bodyby the Bunt-Motzkin theorem (E.9.0.0.1).

474 CHAPTER 7. PROXIMITY PROBLEMS7.2.1.1 Equivalent semidefinite program, Problem 2, convex caseConvex problem (1184) is numerically solvable for its global minimum usingan interior-point method [46,11] [301] [216] [206] [292] [9] [98]. We translate(1184) to an equivalent semidefinite program (SDP) for a pedagogical reasonmade clear in7.2.2.2 and because there exist readily available computerprograms for numerical solution [255] [117] [271] [27] [293] [294] [295].Substituting a new matrix variable Y ∆ = [y ij ]∈ R N×N+h ij√dij ← y ij (1186)Boyd proposes: problem (1184) is equivalent to the semidefinite program∑minimize d ij − 2y ij + h 2 ijD , Yi,j[ ]dij y ij(1187)subject to≽ 0 , i,j =1... Ny ij h 2 ijD ∈ EDM NTo see that, recall d ij ≥ 0 is implicit to D ∈ EDM N (5.8.1, (753)). Sowhen H ∈ R N×N+ is nonnegative as assumed,[ ]dij y √ij≽ 0 ⇔ h ij√d ij ≥ yij 2 (1188)y ijh 2 ijMinimization of the objective function implies maximization of y ij that isbounded above. √Hence nonnegativity of y ij is implicit to (1187) and, asdesired, y ij →h ij dij as optimization proceeds.If the given matrix H is now assumed symmetric and nonnegative,H = [h ij ] ∈ S N ∩ R N×N+ (1189)then Y = H ◦ ◦√ D must belong to K= S N h ∩ R N×N+ (1133). Because Y ∈ S N h(B.4.2 no.20), then‖ ◦√ D − H‖ 2 F = ∑ i,jd ij − 2y ij + h 2 ij = −N tr(V (D − 2Y )V ) + ‖H‖ 2 F (1190)

7.2. SECOND PREVALENT PROBLEM: 475So convex problem (1187) is equivalent to the semidefinite programminimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,j > i = 1... N −1y ij h 2 ij(1191)Y ∈ S N hD ∈ EDM Nwhere the constants h 2 ij and N have been dropped arbitrarily from theobjective.7.2.1.2 Gram-form semidefinite program, Problem 2, convex caseThere is great advantage to expressing problem statement (1191) inGram-form because Gram matrix G is a bidirectional bridge between pointlist X and distance matrix D ; e.g., Example 5.4.2.2.4, Example 6.4.0.0.1.This way, problem convexity can be maintained while simultaneouslyconstraining point list X , Gram matrix G , and distance matrix D at ourdiscretion.Convex problem (1191) may be equivalently written via linear bijective(5.6.1) EDM operator D(G) (746);minimizeG∈S N c , Y ∈ SN hsubject to− tr(V (D(G) − 2Y )V )[ ]〈Φij , G〉 y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(1192)G ≽ 0where distance-square D = [d ij ] ∈ S N h (730) is related to Gram matrix entriesG = [g ij ] ∈ S N c ∩ S N + byd ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(745)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (732)

476 CHAPTER 7. PROXIMITY PROBLEMSConfinement of G to the geometric center subspace provides numericalstability and no loss of generality (confer (1019)); implicit constraint G1 = 0is otherwise unnecessary.To include constraints on the list X ∈ R n×N , we would first rewrite (1192)minimize − tr(V (D(G) − 2Y )V )G∈S N c , Y ∈ S N h , X∈ Rn×N [ ]〈Φij , G〉 y ijsubject to≽ 0 ,y ijh 2 ij[ ] I XX T ≽ 0GX ∈ Cj > i = 1... N −1(1193)and then add the constraints, realized here in abstract membership to someconvex set C . This problem realization includes a convex relaxation of thenonconvex constraint G = X T X and, if desired, more constraints on G couldbe added. This technique is discussed in5.4.2.2.4.7.2.2 Minimization of affine dimension in Problem 2When desired affine dimension ρ is diminished, the rank function becomesreinserted into problem (1187) that is then rendered difficult to solve becausethe feasible set {D , Y } loses convexity in S N h × R N×N . Indeed, the rankfunction is quasiconcave (3.3) on the positive semidefinite cone; (2.9.2.6.2)id est, its sublevel sets are not convex.7.2.2.1 Rank minimization heuristicA remedy developed in [91] [193] [92] [90] introduces convex envelope (cenv)of the quasiconcave rank function: (Figure 116)7.2.2.1.1 Definition. Convex envelope. [147]The convex envelope of a function f : C →R is defined as the largest convexfunction g such that g ≤ f on convex domain C ⊆ R n . 7.14 △7.14 Provided f ≢+∞ and there exists an affine function h ≤f on R n , then the convexenvelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convexconjugate of f ; id est, the conjugate-conjugate function f ∗∗ . [148,E.1]

7.2. SECOND PREVALENT PROBLEM: 477rankXgcenv rankXFigure 116: Abstraction of convex envelope of rank function. Rank isa quasiconcave function on the positive semidefinite cone, but its convexenvelope is the smallest convex function enveloping it. Vertical bar labelled gillustrates a trace/rank gap; id est, rank found exceeds estimate (by 2).[91] [90] Convex envelope of rank function: for σ a singular value,cenv(rankA) on {A∈ R m×n | ‖A‖ 2 ≤κ} = 1 ∑σ(A) i (1194)κcenv(rankA) on {A∈ S n + | ‖A‖ 2 ≤κ} = 1 tr(A) (1195)κA properly scaled trace thus represents the best convex lower bound on rankfor positive semidefinite matrices. The idea, then, is to substitute the convexenvelope for rank of some variable A∈ S M + (A.6.5)rankA ← cenv(rankA) ∝trA = ∑ iσ(A) i = ∑ iiλ(A) i = ‖λ(A)‖ 1which is equivalent to the sum of all eigenvalues or singular values.(1196)[90] Convex envelope of the cardinality function is proportional to the1-norm:cenv(cardx) on {x∈ R n | ‖x‖ ∞ ≤κ} = 1 κ ‖x‖ 1 (1197)

478 CHAPTER 7. PROXIMITY PROBLEMS7.2.2.2 Applying trace rank-heuristic to Problem 2Substituting rank envelope for rank function in Problem 2, for D ∈ EDM N(confer (876))cenv rank(−V T NDV N ) = cenv rank(−V DV ) ∝ − tr(V DV ) (1198)and for desired affine dimension ρ ≤ N −1 and nonnegative H [sic] we geta convex optimization problemminimize ‖ ◦√ D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(1199)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. The equivalentsemidefinite program makes κ variable: for nonnegative and symmetric HminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]dij y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(1200)− tr(V DV ) ≤ κ ρY ∈ S N hD ∈ EDM Nwhich is the same as (1191), the problem with no explicit constraint on affinedimension. As the present problem is stated, the desired affine dimension ρyields to the variable scale factor κ ; ρ is effectively ignored.Yet this result is an illuminant for problem (1191) and it equivalents(all the way back to (1184)): When the given measurement matrix His nonnegative and symmetric, finding the closest EDM D as in problem(1184), (1187), or (1191) implicitly entails minimization of affine dimension(confer5.8.4,5.14.4). Those non−rank-constrained problems are eachinherently equivalent to cenv(rank)-minimization problem (1200), in otherwords, and their optimal solutions are unique because of the strictly convexobjective function in (1184).7.2.2.3 Rank-heuristic insightMinimization of affine dimension by use of this trace rank-heuristic (1198)tends to find the list configuration of least energy; rather, it tends to optimize

7.2. SECOND PREVALENT PROBLEM: 479compaction of the reconstruction by minimizing total distance. (759) It is bestused where some physical equilibrium implies such an energy minimization;e.g., [268,5].For this Problem 2, the trace rank-heuristic arose naturally in theobjective in terms of V . We observe: V (in contrast to VN T ) spreads energyover all available distances (B.4.2 no.20, confer no.22) although the rankfunction itself is insensitive to choice of auxiliary matrix.7.2.2.4 Rank minimization heuristic beyond convex envelopeFazel, Hindi, and Boyd [92] [297] [93] propose a rank heuristic more potentthan trace (1196) for problems of rank minimization;rankY ← log det(Y +εI) (1201)the concave surrogate function log det in place of quasiconcave rankY(2.9.2.6.2) when Y ∈ S n + is variable and where ε is a small positive constant.They propose minimization of the surrogate by substituting a sequencecomprising infima of a linearized surrogate about the current estimate Y i ;id est, from the first-order Taylor series expansion about Y i on some openinterval of ‖Y ‖ (D.1.7)log det(Y + εI) ≈ log det(Y i + εI) + tr ( (Y i + εI) −1 (Y − Y i ) ) (1202)we make the surrogate sequence of infima over bounded convex feasible set Cwhere, for i = 0...arg inf rankY ← lim Y i+1 (1203)Y ∈ C i→∞Y i+1 = arg infY ∈ C tr( (Y i + εI) −1 Y ) (1204)Choosing Y 0 =I , the first step becomes equivalent to finding the infimum oftrY ; the trace rank-heuristic (1196). The intuition underlying (1204) is thenew term in the argument of trace; specifically, (Y i + εI) −1 weights Y so thatrelatively small eigenvalues of Y found by the infimum are made even smaller.

480 CHAPTER 7. PROXIMITY PROBLEMSTo see that, substitute the nonincreasingly ordered diagonalizationsY i + εI ∆ = Q(Λ + εI)Q TY ∆ = UΥU T(a)(b)(1205)into (1204). Then from (1507) we have,inf δ((Λ + εI) −1 ) T δ(Υ) =Υ∈U ⋆T CU ⋆≤infΥ∈U T CUinf tr ( (Λ + εI) −1 R T ΥR )R T =R −1inf tr((Y i + εI) −1 Y )Y ∈ C(1206)where R = ∆ Q T U in U on the set of orthogonal matrices is a bijection. Therole of ε is, therefore, to limit the maximum weight; the smallest entry onthe main diagonal of Υ gets the largest weight.7.2.2.5 Applying log det rank-heuristic to Problem 2When the log det rank-heuristic is inserted into Problem 2, problem (1200)becomes the problem sequence in iminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]djl y jl≽ 0 ,y jlh 2 jll > j = 1... N −1− tr((−V D i V + εI) −1 V DV ) ≤ κ ρ(1207)Y ∈ S N hD ∈ EDM Nwhere D i+1 ∆ =D ⋆ ∈ EDM N , and D 0 ∆ =11 T − I .7.2.2.6 Tightening this log det rank-heuristicLike the trace method, this log det technique for constraining rank offersno provision for meeting a predetermined upper bound ρ . Yet since theeigenvalues of the sum are simply determined, λ(Y i + εI) = δ(Λ + εI) , wemay certainly force selected weights to ε −1 by manipulating diagonalization(1205a). Empirically we find this sometimes leads to better results, althoughaffine dimension of a solution cannot be guaranteed.

7.2. SECOND PREVALENT PROBLEM: 4817.2.2.7 Cumulative summary of rank heuristicsWe have studied the perturbation method of rank reduction in4.3, aswell as the trace heuristic (convex envelope method7.2.2.1.1) and log detheuristic in7.2.2.4. There is another good contemporary method calledLMIRank [212] based on alternating projection (E.10) that does not solvethe ball packing problem presented in5.4.2.2.3, so it is not evaluated furtherherein. None of these exceed performance of the convex iteration method forconstraining rank developed in4.4:7.2.2.7.1 Example. Rank regularization enforcing affine dimension.We apply the convex iteration method from4.4.1 to numerically solve aninstance of Problem 2; a method empirically superior to the foregoing convexenvelope and log det heuristics.Unidimensional scaling, [70] a historically practical application ofmultidimensional scaling (5.12), entails solution of an optimization problemhaving local minima whose multiplicity varies as the factorial of point-listcardinality. Geometrically, it means finding a list constrained to lie in oneaffine dimension. In terms of point list, the nonconvex problem is: givennonnegative symmetric matrix H = [h ij ] ∈ S N ∩ R N×N+ (1189) whose entriesh ij are all known, (1140)minimize{x i ∈R}N∑(‖x i − x j ‖ − h ij ) 2 (1208)i , j=1called a raw stress problem [39, p.34] which has an implicit constraint ondimensional embedding of points {x i ∈ R , i = 1... N}. This problem hasproven NP-hard; e.g., [52].As always, we first transform variables to distance-square D ∈ S N h ; so webegin with convex problem (1191) on page 475minimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,y ijh 2 ijY ∈ S N hD ∈ EDM NrankV T N DV N = 1j > i = 1... N −1(1209)

482 CHAPTER 7. PROXIMITY PROBLEMSthat becomes equivalent to (1208) by making explicit the constraint on affinedimension via rank. The iteration is formed by moving the dimensionalconstraint to the objective:minimize −〈V (D − 2Y )V , I〉 − w〈VN TDV N , W 〉D , Y[ ]dij y ijsubject to≽ 0 , j > i = 1... N −1y ij h 2 ij(1210)Y ∈ S N hD ∈ EDM Nwhere w (≈ 10) is a positive scalar just large enough to make 〈V T N DV N , W 〉vanish to within some numerical precision, and where direction matrix W isan optimal solution to semidefinite program (1506a)minimize −〈VN T WD⋆ V N , W 〉subject to 0 ≼ W ≼ ItrW = N − 1(1211)one of which is known in closed form. Semidefinite programs (1210) and(1211) are iterated until convergence in the sense defined on page 257. Thisiteration is not a projection method. Convex problem (1210) is neither arelaxation of unidimensional scaling problem (1209); instead, problem (1210)is a convex equivalent to (1209) at convergence of the iteration.Jan de Leeuw provided us with some test data⎡H =⎢⎣0.000000 5.235301 5.499274 6.404294 6.486829 6.2632655.235301 0.000000 3.208028 5.840931 3.559010 5.3534895.499274 3.208028 0.000000 5.679550 4.020339 5.2398426.404294 5.840931 5.679550 0.000000 4.862884 4.5431206.486829 3.559010 4.020339 4.862884 0.000000 4.6187186.263265 5.353489 5.239842 4.543120 4.618718 0.000000and a globally optimal solution⎤⎥⎦(1212)X ⋆ = [ −4.981494 −2.121026 −1.038738 4.555130 0.764096 2.822032 ]= [ x ⋆ 1 x ⋆ 2 x ⋆ 3 x ⋆ 4 x ⋆ 5 x ⋆ 6 ](1213)

7.3. THIRD PREVALENT PROBLEM: 483found by searching 6! local minima of (1208) [70]. By iterating convexproblems (1210) and (1211) about twenty times (initial W = 0) we find theglobal infimum 98.12812 of stress problem (1208), and by (965) we find acorresponding one-dimensional point list that is a rigid transformation in Rof X ⋆ .Here we found the infimum to accuracy of the given data, but that ceasesto hold as problem size increases. Because of machine numerical precisionand an interior-point method of solution, we speculate, accuracy degradesquickly as problem size increases beyond this.7.3 Third prevalent problem:Projection on EDM cone in d ijReformulating Problem 2 (p.472) in terms of EDM D changes the problemconsiderably:⎫minimize ‖D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 3 (1214)⎪D ∈ EDM N ⎭This third prevalent proximity problem is a Euclidean projection of givenmatrix H on a generally nonconvex subset (ρ < N −1) of ∂EDM N theboundary of the convex cone of Euclidean distance matrices relative tosubspace S N h (Figure 100(d)). Because coordinates of projection aredistance-square and H presumably now holds distance-square measurements,numerical solution to Problem 3 is generally different than that of Problem 2.For the moment, we need make no assumptions regarding measurementmatrix H .7.3.1 Convex caseminimize ‖D − H‖ 2 FD(1215)subject to D ∈ EDM NWhen the rank constraint disappears (for ρ = N −1), this third problembecomes obviously convex because the feasible set is then the entire EDM

484 CHAPTER 7. PROXIMITY PROBLEMScone and because the objective function‖D − H‖ 2 F = ∑ i,j(d ij − h ij ) 2 (1216)is a strictly convex quadratic in D ; 7.15∑minimize d 2 ij − 2h ij d ij + h 2 ijDi,j(1217)subject to D ∈ EDM NOptimal solution D ⋆ is therefore unique, as expected, for this simpleprojection on the EDM cone.7.3.1.1 Equivalent semidefinite program, Problem 3, convex caseIn the past, this convex problem was solved numerically by means ofalternating projection. (Example 7.3.1.1.1) [106] [99] [134,1] We translate(1217) to an equivalent semidefinite program because we have a good solver:Assume the given measurement matrix H to be nonnegative andsymmetric; 7.16H = [h ij ] ∈ S N ∩ R N×N+ (1189)We then propose: Problem (1217) is equivalent to the semidefinite program,for∂ = ∆ [d 2 ij ] = D ◦D (1218)7.15 For nonzero Y ∈ S N h and some open interval of t∈R (3.2.3.0.2,D.2.3)d 2dt 2 ‖(D + tY ) − H‖2 F = 2 trY T Y > 07.16 If that H given has negative entries, then the technique of solution presented herebecomes invalid. Projection of H on K (1133) prior to application of this proposedtechnique, as explained in7.0.1, is incorrect.

7.3. THIRD PREVALENT PROBLEM: 485a matrix of distance-square squared,minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1(1219)D ∈ EDM N∂ ∈ S N hwhere [∂ij d ijd ij 1]≽ 0 ⇔ ∂ ij ≥ d 2 ij (1220)Symmetry of input H facilitates trace in the objective (B.4.2 no.20), whileits nonnegativity causes ∂ ij →d 2 ij as optimization proceeds.7.3.1.1.1 Example. Alternating projection on nearest EDM.By solving (1219) we confirm the result from an example given by Glunt,Hayden, et alii [106,6] who found an analytical solution to convexoptimization problem (1215) for particular cardinality N = 3 by using thealternating projection method of von Neumann (E.10):⎡H = ⎣0 1 11 0 91 9 0⎤⎦ , D ⋆ =⎡⎢⎣19 1909 919 7609 919 7609 9⎤⎥⎦ (1221)The original problem (1215) of projecting H on the EDM cone is transformedto an equivalent iterative sequence of projections on the two convex cones(1082) from6.8.1.1. Using ordinary alternating projection, input H goes toD ⋆ with an accuracy of four decimal places in about 17 iterations. Affinedimension corresponding to this optimal solution is r = 1.Obviation of semidefinite programming’s computational expense is theprincipal advantage of this alternating projection technique. 7.3.1.2 Schur-form semidefinite program, Problem 3 convex caseSemidefinite program (1219) can be reformulated by moving the objectivefunction inminimize ‖D − H‖ 2 FD(1215)subject to D ∈ EDM N

486 CHAPTER 7. PROXIMITY PROBLEMSto the constraints. This makes an equivalent second-order cone program: forany measurement matrix Hminimize tt∈R , Dsubject to ‖D − H‖ 2 F ≤ t(1222)D ∈ EDM NWe can transform this problem to an equivalent Schur-form semidefiniteprogram; (3.1.7.2)minimizet∈R , Dsubject tot[]tI vec(D − H)vec(D − H) T ≽ 0 (1223)1D ∈ EDM Ncharacterized by great sparsity and structure. The advantage of this SDPis lack of conditions on input H ; e.g., negative entries would invalidate anysolution provided by (1219). (7.0.1.2)7.3.1.3 Gram-form semidefinite program, Problem 3 convex caseFurther, this problem statement may be equivalently written in terms of aGram matrix via linear bijective (5.6.1) EDM operator D(G) (746);minimizeG∈S N c , t∈Rsubject tot[tI vec(D(G) − H)vec(D(G) − H) T 1G ≽ 0]≽ 0To include constraints on the list X ∈ R n×N , we would rewrite this:G∈S N cminimize, t∈R , X∈ Rn×Ntsubject to[]tI vec(D(G) − H)vec(D(G) − H) T ≽ 01[ ] I XX T ≽ 0GX ∈ C(1224)(1225)where C is some abstract convex set. This technique is discussed in5.4.2.2.4.

7.3. THIRD PREVALENT PROBLEM: 4877.3.1.4 Dual interpretation, projection on EDM coneFromE.9.1.1 we learn that projection on a convex set has a dual form. Inthe circumstance K is a convex cone and point x exists exterior to the coneor on its boundary, distance to the nearest point Px in K is found as theoptimal value of the objective‖x − Px‖ = maximize a T xasubject to ‖a‖ ≤ 1 (1810)a ∈ K ◦where K ◦ is the polar cone.Applying this result to (1215), we get a convex optimization for any givensymmetric matrix H exterior to or on the EDM cone boundary:maximize 〈A ◦ , H〉minimize ‖D − H‖ 2 A ◦FDsubject to D ∈ EDM ≡ subject to ‖A ◦ ‖ N F ≤ 1 (1226)A ◦ ∈ EDM N◦Then from (1812) projection of H on cone EDM N isD ⋆ = H − A ◦⋆ 〈A ◦⋆ , H〉 (1227)Critchley proposed, instead, projection on the polar EDM cone in his 1980thesis [61, p.113]: In that circumstance, by projection on the algebraiccomplement (E.9.2.2.1),which is equal to (1227) when A ⋆ solvesD ⋆ = A ⋆ 〈A ⋆ , H〉 (1228)maximize 〈A , H〉Asubject to ‖A‖ F = 1(1229)A ∈ EDM NThis projection of symmetric H on polar cone EDM N◦ can be made a convexproblem, of course, by relaxing the equality constraint (‖A‖ F ≤ 1).

488 CHAPTER 7. PROXIMITY PROBLEMS7.3.2 Minimization of affine dimension in Problem 3When the desired affine dimension ρ is diminished, Problem 3 (1214) isdifficult to solve [134,3] because the feasible set in R N(N−1)/2 loses convexity.By substituting rank envelope (1198) into Problem 3, then for any given Hwe get a convex problemminimize ‖D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(1230)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. Given κ , problem(1230) is a convex optimization having unique solution in any desiredaffine dimension ρ ; an approximation to Euclidean projection on thatnonconvex subset of the EDM cone containing EDMs with correspondingaffine dimension no greater than ρ .The SDP equivalent to (1230) does not move κ into the variables as onpage 478: for nonnegative symmetric input H and distance-square squaredvariable ∂ as in (1218),minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1− tr(V DV ) ≤ κ ρD ∈ EDM N(1231)∂ ∈ S N hThat means we will not see equivalence of this cenv(rank)-minimizationproblem to the non−rank-constrained problems (1217) and (1219) like wesaw for its counterpart (1200) in Problem 2.Another approach to affine dimension minimization is to project insteadon the polar EDM cone; discussed in6.8.1.5.

7.3. THIRD PREVALENT PROBLEM: 4897.3.3 Constrained affine dimension, Problem 3When one desires affine dimension diminished further below what can beachieved via cenv(rank)-minimization as in (1231), spectral projection can beconsidered a natural means in light of its successful application to projectionon a rank ρ subset of the positive semidefinite cone in7.1.4.Yet it is wrong here to zero eigenvalues of −V DV or −V GV or avariant to reduce affine dimension, because that particular method comesfrom projection on a positive semidefinite cone (1168); zeroing thoseeigenvalues here in Problem 3 would place an elbow in the projectionpath (confer Figure 115) thereby producing a result that is necessarilysuboptimal. Problem 3 is instead a projection on the EDM cone whoseassociated spectral cone is considerably different. (5.11.2.3) Proper choiceof spectral cone is demanded by diagonalization of that variable argument tothe objective:7.3.3.1 Cayley-Menger formWe use Cayley-Menger composition of the Euclidean distance matrix to solvea problem that is the same as Problem 3 (1214): (5.7.3.0.1)[ ] [ ]∥ minimize0 1T 0 1T ∥∥∥2D∥ −1 −D 1 −H[ ]F0 1T(1232)subject to rank≤ ρ + 21 −DD ∈ EDM Na projection of H on a generally nonconvex subset (when ρ < N −1) of theEuclidean distance matrix cone boundary rel∂EDM N ; id est, projectionfrom the EDM cone interior or exterior on a subset of its relative boundary(6.6, (1012)).Rank of an optimal solution is intrinsically bounded above and below;[ ] 0 1T2 ≤ rank1 −D ⋆ ≤ ρ + 2 ≤ N + 1 (1233)Our proposed strategy ([ for low-rank ]) solution is projection on that subset0 1Tof a spectral cone λ1 −EDM N (5.11.2.3) corresponding to affine

490 CHAPTER 7. PROXIMITY PROBLEMSdimension not in excess of that ρ desired; id est, spectral projection on⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂H ⊂ R N+1 (1234)R −where∂H = {λ ∈ R N+1 | 1 T λ = 0} (945)is a hyperplane through the origin. This pointed polyhedral cone (1234), towhich membership subsumes the rank constraint, has empty interior.Given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalization (A.5)of unknown EDM D[0 1T1 −D]∆= UΥU T ∈ S N+1h(1235)and given symmetric H in diagonalization[ ] 0 1T∆= QΛQ T ∈ S N+1 (1236)1 −Hhaving eigenvalues arranged in nonincreasing order, then by (958) problem(1232) is equivalent to∥minimize ∥δ(Υ) − π ( δ(R T ΛR) )∥ ∥ 2Υ , R⎡ ⎤subject to δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H(1237)R −δ(QRΥR T Q T ) = 0R −1 = R Twhere π is the permutation operator from7.1.3 arranging its vectorargument in nonincreasing order, 7.17 whereR ∆ = Q T U ∈ R N+1×N+1 (1238)in U on the set of orthogonal matrices is a bijection, and where ∂H insuresone negative eigenvalue. Hollowness constraint δ(QRΥR T Q T ) = 0 makesproblem (1237) difficult by making the two variables dependent.7.17 Recall, any permutation matrix is an orthogonal matrix.

7.3. THIRD PREVALENT PROBLEM: 491Our plan is to instead divide problem (1237) into two and then iteratetheir solution:∥minimize ∥δ(Υ) − π ( δ(R T ΛR) )∥ ∥ 2Υ ⎡ ⎤subject to δ(Υ) ∈ ⎣ Rρ+1 +(a)0 ⎦ ∩ ∂HR −(1239)minimize ‖R Υ ⋆ R T − Λ‖ 2 FRsubject to δ(QR Υ ⋆ R T Q T ) = 0R −1 = R T(b)We justify disappearance of the hollowness constraint in convexoptimization problem (1239a): From the arguments in7.1.3 withregard ⎡to π ⎤the permutation operator, cone membership constraintδ(Υ) ∈⎣ Rρ+1 +0 ⎦∩ ∂H from (1239a) is equivalent toR −⎡ ⎤δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H ∩ K M (1240)R −where K M is the monotone cone (2.13.9.4.2). Membership of δ(Υ) to thepolyhedral cone of majorization (Theorem A.1.2.0.1)K ∗ λδ = ∂H ∩ K ∗ M+ (1255)where K ∗ M+ is the dual monotone nonnegative cone (2.13.9.4.1), is acondition (in absence of a hollowness [ constraint) ] that would insure existence0 1Tof a symmetric hollow matrix . Curiously, intersection of1 −D⎡ ⎤this feasible superset ⎣ Rρ+1 +0 ⎦∩ ∂H ∩ K M from (1240) with the cone ofmajorization K ∗ λδR −is a benign operation; id est,∂H ∩ K ∗ M+ ∩ K M = ∂H ∩ K M (1241)

492 CHAPTER 7. PROXIMITY PROBLEMSverifiable by observing conic dependencies (2.10.3) among the aggregate ofhalfspace-description normals. The cone membership constraint in (1239a)therefore inherently insures existence of a symmetric hollow matrix. Optimization (1239b) would be a Procrustes problem (C.4) were itnot for the hollowness constraint; it is, instead, a minimization over theintersection of the nonconvex manifold of orthogonal matrices with anothernonconvex set in variable R specified by the hollowness constraint.We solve problem (1239b) by a method introduced in4.6.0.0.2: DefineR = [r 1 · · · r N+1 ]∈ R N+1×N+1 and make the assignment⎡G = ⎢⎣r 1.r N+11⎤[ r1 T · · · rN+1 T 1] ⎥∈ S (N+1)2 +1⎦⎡⎤ ⎡R 11 · · · R 1,N+1 r 1.....∆= ⎢⎣ R1,N+1 T ⎥=⎢R N+1,N+1 r N+1 ⎦ ⎣r1 T · · · rN+1 T 1(1242)⎤r 1 r1 T · · · r 1 rN+1 T r 1.....r N+1 r1 T r N+1 rN+1 T ⎥r N+1 ⎦r1 T · · · rN+1 T 1where R ∆ ij = r i rj T ∈ R N+1×N+1 and Υ ⋆ ii ∈ R . Since R Υ ⋆ R T = N+1 ∑ΥiiR ⋆ ii thenproblem (1239b) is equivalently expressed:∥ ∥∥∥ N+12∑minimize Υ ⋆R ij , r iiiR ii − Λ∥i=1Fsubject to trR ii = 1, i=1... N+1trR ij ⎡= 0,⎤i

7.4. CONCLUSION 493The rank constraint is regularized by method of convex iteration developedin4.4. Problem (1243) is partitioned into two convex problems:∥ ∥∥∥ N+12∑minimize Υ ⋆R ij , r iiiR ii − Λ∥ + 〈G, W 〉i=1Fsubject to trR ii = 1, i=1... N+1trR ij ⎡= 0,⎤i

494 CHAPTER 7. PROXIMITY PROBLEMSprojection on the EDM cone, any solution acquired that way is necessarilysuboptimal.A second recourse is problem redesign: A presupposition to all proximityproblems in this chapter is that matrix H is given. We considered H havingvarious properties such as nonnegativity, symmetry, hollowness, or lackthereof. It was assumed that if H did not already belong to the EDM cone,then we wanted an EDM closest to H in some sense; id est, input-matrix Hwas assumed corrupted somehow. For practical problems, it withstandsreason that such a proximity problem could instead be reformulated so thatsome or all entries of H were unknown but bounded above and below byknown limits; the norm objective is thereby eliminated as in the developmentbeginning on page 266. That redesign (the art, p.8), in terms of theGram-matrix bridge between point-list X and EDM D , at once encompassesproximity and completion problems.A third recourse is to apply the method of convex iteration just like wedid in7.2.2.7.1. This technique is applicable to any semidefinite problemrequiring a rank constraint; it places a regularization term in the objectivethat enforces the rank constraint.The importance and application of solving rank-constrained problemsare enormous, a conclusion generally accepted gratis by the mathematicsand engineering communities. For example, one might be interested in theminimal order dynamic output feedback which stabilizes a given linear timeinvariant (LTI) plant (this problem is considered to be among the mostimportant “open” problems in control). [194] Rank-constrained semidefiniteprograms arise in many vital feedback and control problems [122], optics[54], and communications [211] [187]. Rank-constrained problems also appearnaturally in combinatorial optimization. (4.6.0.0.6)

Appendix ALinear algebraA.1 Main-diagonal δ operator, λ , trace, vecWe introduce notation δ denoting the main-diagonal linear self-adjointoperator. When linear function δ operates on a square matrix A∈ R N×N ,δ(A) returns a vector composed of all the entries from the main diagonal inthe natural order;δ(A) ∈ R N (1246)Operating on a vector y ∈ R N , δ naturally returns a diagonal matrix;δ(y) ∈ S N (1247)Operating recursively on a vector Λ∈ R N or diagonal matrix Λ∈ S N ,δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1248)Defined in this manner, main-diagonal linear operator δ is self-adjoint[167,3.10,9.5-1]; A.1 videlicet, (2.2)δ(A) T y = 〈δ(A), y〉 = 〈A , δ(y)〉 = tr ( A T δ(y) ) (1249)A.1 Linear operator T : R m×n → R M×N is self-adjoint when, for each and everyX 1 , X 2 ∈ R m×n 〈T(X 1 ), X 2 〉 = 〈X 1 , T(X 2 )〉2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.495

496 APPENDIX A. LINEAR ALGEBRAA.1.1IdentitiesThis δ notation is efficient and unambiguous as illustrated in the followingexamples where A ◦ B denotes Hadamard product [150] [110,1.1.4] ofmatrices of like size, ⊗ the Kronecker product (D.1.2.1), y a vector, X amatrix, e i the i th member of the standard basis for R n , S N h the symmetrichollow subspace, σ a vector of (nonincreasingly) ordered singular values, andλ(A) denotes a vector of nonincreasingly ordered eigenvalues of matrix A :1. δ(A) = δ(A T )2. tr(A) = tr(A T ) = δ(A) T 13. 〈I, A〉 = trA4. δ(cA) = cδ(A), c∈R5. tr(c √ A T A) = c tr √ A T A = c1 T σ(A), c∈R6. tr(cA) = c tr(A) = c1 T λ(A), c∈R7. δ(A + B) = δ(A) + δ(B)8. tr(A + B) = tr(A) + tr(B)9. δ(AB) = (A ◦ B T )1 = (B T ◦ A)110. δ(AB) T = 1 T (A T ◦ B) = 1 T (B ◦ A T )⎡ ⎤u 1 v 111. δ(uv T ⎢ ⎥) = ⎣ . ⎦ = u ◦ v , u,v ∈ R Nu N v N12. tr(A T B) = tr(AB T ) = tr(BA T ) = tr(B T A)= 1 T (A ◦ B)1 = 1 T δ(AB T ) = δ(A T B) T 1 = δ(BA T ) T 1 = δ(B T A) T 113. D = [d ij ] ∈ S N h , H = [h ij ] ∈ S N h , V = I − 1 N 11T ∈ S N (conferB.4.2 no.20)N tr(−V (D ◦ H)V ) = tr(D T H) = 1 T (D ◦ H)1 = tr(11 T (D ◦ H)) = ∑ d ij h iji,j14. tr(ΛA) = δ(Λ) T δ(A), δ 2 (Λ) ∆ = Λ ∈ S N

A.1. MAIN-DIAGONAL δ OPERATOR, λ , TRACE, VEC 49715. y T Bδ(A) = tr ( Bδ(A)y T) = tr ( δ(B T y)A ) = tr ( Aδ(B T y) )= δ(A) T B T y = tr ( yδ(A) T B T) = tr ( A T δ(B T y) ) = tr ( δ(B T y)A T)16. δ 2 (A T A) = ∑ ie i e T iA T Ae i e T i17. δ ( δ(A)1 T) = δ ( 1δ(A) T) = δ(A)18. δ(A1)1 = δ(A11 T ) = A1 , δ(y)1 = δ(y1 T ) = y19. δ(I1) = δ(1) = I20. δ(e i e T j 1) = δ(e i ) = e i e T i21. vec(AXB) = (B T ⊗ A) vec X22. vec(BXA) = (A T ⊗ B) vec X23. tr(AXBX T ) = vec(X) T vec(AXB) = vec(X) T (B T ⊗A) vec X [116]24.tr(AX T BX) = vec(X) T vec(BXA) = vec(X) T (A T ⊗ B) vec X= δ ( vec(X) vec(X) T (A T ⊗ B) ) T125. For ζ =[ζ i ]∈ R k and x=[x i ]∈ R k ,∑ζ i /x i = ζ T δ(x) −1 126. For any permutation matrix Ξ and dimensionally compatible vector yor matrix Aiδ(Ξy) = Ξδ(y) Ξ T (1250)δ(ΞAΞ T ) = Ξδ(A) (1251)So given any permutation matrix Ξ and any dimensionally compatiblematrix B , for example,δ 2 (B) = Ξδ 2 (Ξ T B Ξ)Ξ T (1252)27. π(δ(A)) = λ(I ◦A) where π is the presorting function

498 APPENDIX A. LINEAR ALGEBRAA.1.2MajorizationA.1.2.0.1 Theorem. (Schur) Majorization. [303,7.4] [150,4.3][151,5.5] Let λ∈ R N denote a given vector of eigenvalues and letδ ∈ R N denote a given vector of main diagonal entries, both arranged innonincreasing order. Thenand conversely∃A∈ S N λ(A)=λ and δ(A)= δ ⇐ λ − δ ∈ K ∗ λδ (1253)A∈ S N ⇒ λ(A) − δ(A) ∈ K ∗ λδ (1254)The difference belongs to the pointed polyhedral cone of majorization (emptyinterior, confer (272))K ∗ λδ∆= K ∗ M+ ∩ {ζ1 | ζ ∈ R} ∗ (1255)where K ∗ M+ is the dual monotone nonnegative cone (377), and where thedual of the line is a hyperplane; ∂H = {ζ1 | ζ ∈ R} ∗ = 1 ⊥ .⋄Majorization cone K ∗ λδ is naturally consequent to the definition ofmajorization; id est, vector y ∈ R N majorizes vector x if and only ifk∑x i ≤i=1k∑y i ∀ 1 ≤ k ≤ N (1256)i=1and1 T x = 1 T y (1257)Under these circumstances, rather, vector x is majorized by vector y .In the particular circumstance δ(A)=0, we get:A.1.2.0.2 Corollary. Symmetric hollow majorization.Let λ∈ R N denote a given vector of eigenvalues arranged in nonincreasingorder. Then∃A∈ S N h λ(A)=λ ⇐ λ ∈ K ∗ λδ (1258)and converselywhere K ∗ λδis defined in (1255).⋄A∈ S N h ⇒ λ(A) ∈ K ∗ λδ (1259)

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 499A.2 Semidefiniteness: domain of testThe most fundamental necessary, sufficient, and definitive test for positivesemidefiniteness of matrix A ∈ R n×n is: [151,1]x T Ax ≥ 0 for each and every x ∈ R n such that ‖x‖ = 1 (1260)Traditionally, authors demand evaluation over broader domain; namely,over all x ∈ R n which is sufficient but unnecessary. Indeed, that standardtextbook requirement is far over-reaching because if x T Ax is nonnegative forparticular x = x p , then it is nonnegative for any αx p where α∈ R . Thus,only normalized x in R n need be evaluated.Many authors add the further requirement that the domain be complex;the broadest domain. By so doing, only Hermitian matrices (A H = A wheresuperscript H denotes conjugate transpose) A.2 are admitted to the set ofpositive semidefinite matrices (1263); an unnecessary prohibitive condition.A.2.1Symmetry versus semidefinitenessWe call (1260) the most fundamental test of positive semidefiniteness. Yetsome authors instead say, for real A and complex domain (x∈ C n ), thecomplex test x H Ax≥0 is most fundamental. That complex broadening of thedomain of test causes nonsymmetric real matrices to be excluded from the setof positive semidefinite matrices. Yet admitting nonsymmetric real matricesor not is a matter of preference A.3 unless that complex test is adopted, as weshall now explain.Any real square matrix A has a representation in terms of its symmetricand antisymmetric parts; id est,A = (A +AT )2+ (A −AT )2(44)Because, for all real A , the antisymmetric part vanishes under real test,x T (A −AT )2x = 0 (1261)A.2 Hermitian symmetry is the complex analogue; the real part of a Hermitian matrixis symmetric while its imaginary part is antisymmetric. A Hermitian matrix has realeigenvalues and real main diagonal.A.3 Golub & Van Loan [110,4.2.2], for example, admit nonsymmetric real matrices.

500 APPENDIX A. LINEAR ALGEBRAonly the symmetric part of A , (A +A T )/2, has a role determining positivesemidefiniteness. Hence the oft-made presumption that only symmetricmatrices may be positive semidefinite is, of course, erroneous under (1260).Because eigenvalue-signs of a symmetric matrix translate unequivocally toits semidefiniteness, the eigenvalues that determine semidefiniteness arealways those of the symmetrized matrix. (A.3) For that reason, andbecause symmetric (or Hermitian) matrices must have real eigenvalues,the convention adopted in the literature is that semidefinite matrices aresynonymous with symmetric semidefinite matrices. Certainly misleadingunder (1260), that presumption is typically bolstered with compellingexamples from the physical sciences where symmetric matrices occur withinthe mathematical exposition of natural phenomena. A.4 [96,52]Perhaps a better explanation of this pervasive presumption of symmetrycomes from Horn & Johnson [150,7.1] whose perspective A.5 is the complexmatrix, thus necessitating the complex domain of test throughout theirtreatise. They explain, if A∈C n×n...and if x H Ax is real for all x ∈ C n , then A is Hermitian.Thus, the assumption that A is Hermitian is not necessary in thedefinition of positive definiteness. It is customary, however.Their comment is best explained by noting, the real part of x H Ax comesfrom the Hermitian part (A +A H )/2 of A ;rather,Re(x H Ax) = x H A +AH2x (1262)x H Ax ∈ R ⇔ A H = A (1263)because the imaginary part of x H Ax comes from the anti-Hermitian part(A −A H )/2 ;Im(x H Ax) = x H A −AH x2(1264)that vanishes for nonzero x if and only if A = A H . So the Hermitiansymmetry assumption is unnecessary, according to Horn & Johnson, notA.4 Symmetric matrices are certainly pervasive in the our chosen subject as well.A.5 A totally complex perspective is not necessarily more advantageous. The positivesemidefinite cone, for example, is not self-dual (2.13.5) in the ambient space of Hermitianmatrices. [145,II]

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 501because nonHermitian matrices could be regarded positive semidefinite,rather because nonHermitian (includes nonsymmetric real) matrices are notcomparable on the real line under x H Ax . Yet that complex edifice isdismantled in the test of real matrices (1260) because the domain of testis no longer necessarily complex; meaning, x T Ax will certainly always bereal, regardless of symmetry, and so real A will always be comparable.In summary, if we limit the domain of test to x in R n as in (1260),then nonsymmetric real matrices are admitted to the realm of semidefinitematrices because they become comparable on the real line. One importantexception occurs for rank-one matrices Ψ=uv T where u and v are realvectors: Ψ is positive semidefinite if and only if Ψ=uu T . (A.3.1.0.7)We might choose to expand the domain of test to x in C n so that onlysymmetric matrices would be comparable. The alternative to expandingdomain of test is to assume all matrices of interest to be symmetric; thatis commonly done, hence the synonymous relationship with semidefinitematrices.A.2.1.0.1 Example. Nonsymmetric positive definite product.Horn & Johnson assert and Zhang agrees:If A,B ∈ C n×n are positive definite, then we know that theproduct AB is positive definite if and only if AB is Hermitian.[150,7.6, prob.10] [303,6.2,3.2]Implicitly in their statement, A and B are assumed individually Hermitianand the domain of test is assumed complex.We prove that assertion to be false for real matrices under (1260) thatadopts a real domain of test.A T = A =⎡⎢⎣3 0 −1 00 5 1 0−1 1 4 10 0 1 4⎤⎥⎦ ,⎡λ(A) = ⎢⎣5.94.53.42.0⎤⎥⎦ (1265)B T = B =⎡⎢⎣4 4 −1 −14 5 0 0−1 0 5 1−1 0 1 4⎤⎥⎦ ,⎡λ(B) = ⎢⎣8.85.53.30.24⎤⎥⎦ (1266)

502 APPENDIX A. LINEAR ALGEBRA(AB) T ≠ AB =⎡⎢⎣13 12 −8 −419 25 5 1−5 1 22 9−5 0 9 17⎤⎥⎦ ,⎡λ(AB) = ⎢⎣36.29.10.0.72⎤⎥⎦ (1267)1(AB + 2 (AB)T ) =⎡⎢⎣13 15.5 −6.5 −4.515.5 25 3 0.5−6.5 3 22 9−4.5 0.5 9 17⎤⎥⎦ , λ( 12 (AB + (AB)T ) ) =⎡⎢⎣36.30.10.0.014(1268)Whenever A∈ S n + and B ∈ S n + , then λ(AB)=λ( √ AB √ A) will alwaysbe a nonnegative vector by (1294) and Corollary A.3.1.0.5. Yet positivedefiniteness of the product AB is certified instead by the nonnegativeeigenvalues λ ( 12 (AB + (AB)T ) ) in (1268) (A.3.1.0.1) despite the fact ABis not symmetric. A.6 Horn & Johnson and Zhang resolve the anomaly bychoosing to exclude nonsymmetric matrices and products; they do so byexpanding the domain of test to C n .⎤⎥⎦A.3 Proper statementsof positive semidefinitenessUnlike Horn & Johnson and Zhang, we never adopt the complex domainof test in regard to real matrices. So motivated is our consideration ofproper statements of positive semidefiniteness under real domain of test.This restriction, ironically, complicates the facts when compared to thecorresponding statements for the complex case (found elsewhere, [150] [303]).We state several fundamental facts regarding positive semidefiniteness ofreal matrix A and the product AB and sum A +B of real matrices underfundamental real test (1260); a few require proof as they depart from thestandard texts, while those remaining are well established or obvious.A.6 It is a little more difficult to find a counter-example in R 2×2 or R 3×3 ; which mayhave served to advance any confusion.

A.3. PROPER STATEMENTS 503A.3.0.0.1 Theorem. Positive (semi)definite matrix.A∈ S M is positive semidefinite if and only if for each and every real vectorx of unit norm, ‖x‖ = 1 , A.7 we have x T Ax ≥ 0 (1260);A ≽ 0 ⇔ tr(xx T A) = x T Ax ≥ 0 (1269)Matrix A ∈ S M is positive definite if and only if for each and every ‖x‖ = 1we have x T Ax > 0 ;A ≻ 0 ⇔ tr(xx T A) = x T Ax > 0 (1270)Proof. Statements (1269) and (1270) are each a particular instanceof dual generalized inequalities (2.13.2) with respect to the positivesemidefinite cone; videlicet, [272]⋄A ≽ 0 ⇔ 〈xx T , A〉 ≥ 0 ∀xx T (≽ 0)A ≻ 0 ⇔ 〈xx T , A〉 > 0 ∀xx T (≽ 0), xx T ≠ 0(1271)This says: positive semidefinite matrix A must belong to the normal sideof every hyperplane whose normal is an extreme direction of the positivesemidefinite cone. Relations (1269) and (1270) remain true when xx T isreplaced with “for each and every” X ∈ S M + [46,2.6.1] (2.13.5) of unit norm‖X‖= 1 as inA ≽ 0 ⇔ tr(XA) ≥ 0 ∀X ∈ S M +A ≻ 0 ⇔ tr(XA) > 0 ∀X ∈ S M + , X ≠ 0(1272)but this condition is far more than what is necessary. By the discretemembership theorem in2.13.4.2.1, the extreme directions xx T of the positivesemidefinite cone constitute a minimal set of generators necessary andsufficient for discretization of dual generalized inequalities (1272) certifyingmembership to that cone.A.7 The traditional condition requiring all x∈ R M for defining positive (semi)definitenessis actually far more than what is necessary. The set of norm-1 vectors is necessary andsufficient to establish positive semidefiniteness; actually, any particular norm and anynonzero norm-constant will work.

504 APPENDIX A. LINEAR ALGEBRAA.3.1Semidefiniteness, eigenvalues, nonsymmetricWhen A∈ R n×n , let λ ( 12 (A +AT ) ) ∈ R n denote eigenvalues of thesymmetrized matrix A.8 arranged in nonincreasing order.By positive semidefiniteness of A∈ R n×n we mean, A.9 [202,1.3.1](conferA.3.1.0.1)x T Ax ≥ 0 ∀x∈ R n ⇔ A +A T ≽ 0 ⇔ λ(A +A T ) ≽ 0 (1273)(2.9.0.1)A ≽ 0 ⇒ A T = A (1274)A ≽ B ⇔ A −B ≽ 0 A ≽ 0 or B ≽ 0 (1275)x T Ax≥0 ∀x A T = A (1276)Matrix symmetry is not intrinsic to positive semidefiniteness;A T = A, λ(A) ≽ 0 ⇒ x T Ax ≥ 0 ∀x (1277)If A T = A thenλ(A) ≽ 0 ⇐ A T = A, x T Ax ≥ 0 ∀x (1278)λ(A) ≽ 0 ⇔ A ≽ 0 (1279)meaning, matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices if and only if its eigenvalues belong tothe nonnegative orthant.〈A , A〉 = 〈λ(A), λ(A)〉 (1280)For µ∈ R , A∈ R n×n , and vector λ(A)∈ C n holding the orderedeigenvalues of Aλ(µI + A) = µ1 + λ(A) (1281)Proof: A=MJM −1 and µI + A = M(µI + J)M −1 where J isthe Jordan form for A ; [251,5.6, App.B] id est, δ(J) = λ(A) , soλ(µI + A) = δ(µI + J) because µI + J is also a Jordan form. A.8 The symmetrization of A is (A +A T )/2. λ ( 12 (A +AT ) ) = λ(A +A T )/2.A.9 Strang agrees [251, p.334] it is not λ(A) that requires observation. Yet he is mistakenby proposing the Hermitian part alone x H (A+A H )x be tested, because the anti-Hermitianpart does not vanish under complex test unless A is Hermitian. (1264)

A.3. PROPER STATEMENTS 505By similar reasoning,λ(I + µA) = 1 + λ(µA) (1282)For vector σ(A) holding the singular values of any matrix Aσ(I + µA T A) = π(|1 + µσ(A T A)|) (1283)σ(µI + A T A) = π(|µ1 + σ(A T A)|) (1284)where π is the nonlinear permutation operator sorting its vectorargument into nonincreasing order.For A∈ S M and each and every ‖w‖= 1 [150,7.7, prob.9]w T Aw ≤ µ ⇔ A ≼ µI ⇔ λ(A) ≼ µ1 (1285)[150,2.5.4] (confer (36))A is normal ⇔ ‖A‖ 2 F = λ(A) T λ(A) (1286)For A∈ R m×n A T A ≽ 0, AA T ≽ 0 (1287)because, for dimensionally compatible vector x , x T A T Ax = ‖Ax‖ 2 2 ,x T AA T x = ‖A T x‖ 2 2 .For A∈ R n×n and c∈Rtr(cA) = c tr(A) = c1 T λ(A)(A.1.1 no.6)For m a nonnegative integer, (1663)det(A m ) =tr(A m ) =n∏λ(A) m i (1288)i=1n∑λ(A) m i (1289)i=1

506 APPENDIX A. LINEAR ALGEBRAFor A diagonalizable (A.5), A = SΛS −1 , (confer [251, p.255])rankA = rankδ(λ(A)) = rank Λ (1290)meaning, rank is equal to the number of nonzero eigenvalues in vectorby the 0 eigenvalues theorem (A.7.3.0.1).(Fan) For A,B∈ S n [41,1.2] (confer (1546))λ(A) ∆ = δ(Λ) (1291)tr(AB) ≤ λ(A) T λ(B) (1292)with equality (Theobald) when A and B are simultaneouslydiagonalizable [150] with the same ordering of eigenvalues.For A∈ R m×n and B ∈ R n×mtr(AB) = tr(BA) (1293)and η eigenvalues of the product and commuted product are identical,including their multiplicity; [150,1.3.20] id est,λ(AB) 1:η = λ(BA) 1:η , η ∆ =min{m , n} (1294)Any eigenvalues remaining are zero. By the 0 eigenvalues theorem(A.7.3.0.1),rank(AB) = rank(BA), AB and BA diagonalizable (1295)For any compatible matrices A,B [150,0.4]For A,B ∈ S n + (219)min{rankA, rankB} ≥ rank(AB) (1296)rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} ≥ rank(AB)(1297)For A,B ∈ S n + linearly independent (B.1.1),rankA + rankB = rank(A + B) > min{rankA, rankB} ≥ rank(AB)(1298)

A.3. PROPER STATEMENTS 507Because R(A T A)= R(A T ) and R(AA T )= R(A) , for any A∈ R m×nrank(AA T ) = rank(A T A) = rankA (1299)For A∈ R m×n having no nullspace, and for any B ∈ R n×krank(AB) = rank(B) (1300)Proof. For any compatible matrix C , N(CAB)⊇ N(AB)⊇ N(B)is obvious. By assumption ∃A † A † A = I . Let C = A † , thenN(AB)= N(B) and the stated result follows by conservation ofdimension (1395).For A∈ S n and any nonsingular matrix Yinertia(A) = inertia(YAY T ) (1301)a.k.a, Sylvester’s law of inertia. (1342) [77,2.4.3]For A,B∈R n×n square, [150,0.3.5]Yet for A∈ R m×n and B ∈ R n×m [55, p.72]det(AB) = det(BA) (1302)det(AB) = detA detB (1303)det(I + AB) = det(I + BA) (1304)For A,B ∈ S n , product AB is symmetric if and only if AB iscommutative;(AB) T = AB ⇔ AB = BA (1305)Proof. (⇒) Suppose AB=(AB) T . (AB) T =B T A T =BA .AB=(AB) T ⇒ AB=BA .(⇐) Suppose AB=BA . BA=B T A T =(AB) T . AB=BA ⇒AB=(AB) T .Commutativity alone is insufficient for symmetry of the product.[251, p.26]

508 APPENDIX A. LINEAR ALGEBRADiagonalizable matrices A,B∈R n×n commute if and only if they aresimultaneously diagonalizable. [150,1.3.12] A product of diagonalmatrices is always commutative.For A,B ∈ R n×n and AB = BAx T Ax ≥ 0, x T Bx ≥ 0 ∀x ⇒ λ(A+A T ) i λ(B+B T ) i ≥ 0 ∀i x T ABx ≥ 0 ∀x(1306)the negative result arising because of the schism between the productof eigenvalues λ(A + A T ) i λ(B + B T ) i and the eigenvalues of thesymmetrized matrix product λ(AB + (AB) T ) i. For example, X 2 isgenerally not positive semidefinite unless matrix X is symmetric; then(1287) applies. Simply substituting symmetric matrices changes theoutcome:For A,B ∈ S n and AB = BAA ≽ 0, B ≽ 0 ⇒ λ(AB) i =λ(A) i λ(B) i ≥0 ∀i ⇔ AB ≽ 0 (1307)Positive semidefiniteness of A and B is sufficient but not a necessarycondition for positive semidefiniteness of the product AB .Proof. Because all symmetric matrices are diagonalizable, (A.5.2)[251,5.6] we have A=SΛS T and B=T∆T T , where Λ and ∆ arereal diagonal matrices while S and T are orthogonal matrices. Because(AB) T =AB , then T must equal S , [150,1.3] and the eigenvaluesof A are ordered in the same way as those of B ; id est, λ(A) i =δ(Λ) iand λ(B) i =δ(∆) i correspond to the same eigenvector.(⇒) Assume λ(A) i λ(B) i ≥0 for i=1... n . AB=SΛ∆S T issymmetric and has nonnegative eigenvalues contained in diagonalmatrix Λ∆ by assumption; hence positive semidefinite by (1273). Nowassume A,B ≽0. That, of course, implies λ(A) i λ(B) i ≥0 for all ibecause all the individual eigenvalues are nonnegative.(⇐) Suppose AB=SΛ∆S T ≽ 0. Then Λ∆≽0 by (1273),and so all products λ(A) i λ(B) i must be nonnegative; meaning,sgn(λ(A))= sgn(λ(B)). We may, therefore, conclude nothing aboutthe semidefiniteness of A and B .

For A,B ∈ R n×n x T Ax ≥ x T Bx ∀x ⇒ trA ≥ trB (1315)A.3. PROPER STATEMENTS 509For A,B ∈ S n and A ≽ 0, B ≽ 0 (Example A.2.1.0.1)AB = BA ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇒ AB ≽ 0 (1308)AB = BA ⇒ λ(AB) i ≥ 0, λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (1309)For A,B ∈ S n [303,6.2]A ≽ 0 ⇒ trA ≥ 0 (1310)A ≽ 0, B ≽ 0 ⇒ trA trB ≥ tr(AB) ≥ 0 (1311)Because A ≽ 0, B ≽ 0 ⇒ λ(AB) = λ( √ AB √ A) ≽ 0 by (1294) andCorollary A.3.1.0.5, then we have tr(AB) ≥ 0.For A,B,C ∈ S n (Löwner)A ≽ 0 ⇔ tr(AB)≥ 0 ∀B ≽ 0 (323)A ≼ B , B ≼ C ⇒ A ≼ C (1312)A ≼ B ⇔ A + C ≼ B + C (1313)A ≼ B , A ≽ B ⇒ A = B (1314)Proof. x T Ax≥x T Bx ∀x ⇔ λ((A −B) + (A −B) T )/2 ≽ 0 ⇒tr(A+A T −(B+B T ))/2 = tr(A −B)≥0. There is no converse. For A,B ∈ S n [303,6.2, prob.1] (Theorem A.3.1.0.4)A ≽ B ⇒ trA ≥ trB (1316)A ≽ B ⇒ δ(A) ≽ δ(B) (1317)There is no converse, and restriction to the positive semidefinite conedoes not improve the situation. The all-strict versions hold. From[303,6.2]A ≽ B ≽ 0 ⇒ rankA ≥ rankB (1318)A ≽ B ≽ 0 ⇒ detA ≥ detB ≥ 0 (1319)A ≻ B ≽ 0 ⇒ detA > detB ≥ 0 (1320)For A,B ∈ int S n + [27,4.2] [150,7.7.4]A ≽ B ⇔ A −1 ≼ B −1 (1321)

510 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ S n [303,6.2]A ≽ B ≽ 0 ⇒ √ A ≽ √ B (1322)For A,B ∈ S n and AB = BA [303,6.2, prob.3]A ≽ B ≽ 0 ⇒ A k ≽ B k , k=1, 2,... (1323)A.3.1.0.1 Theorem. Positive semidefinite ordering of eigenvalues.For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M . Then, [251,6]x T Ax ≥ 0 ∀x ⇔ λ ( A +A T) ≽ 0 (1324)x T Ax > 0 ∀x ≠ 0 ⇔ λ ( A +A T) ≻ 0 (1325)because x T (A −A T )x=0. (1261) Now arrange the entries of λ ( 1(A 2 +AT ) )and λ ( 1(B 2 +BT ) ) in nonincreasing order so λ ( 1(A 2 +AT ) ) holds the1largest eigenvalue of symmetrized A while λ ( 1(B 2 +BT ) ) holds the largest1eigenvalue of symmetrized B , and so on. Then [150,7.7, prob.1, prob.9]for κ ∈ Rx T Ax ≥ x T Bx ∀x ⇒ λ ( A +A T) ≽ λ ( B +B T)x T Ax ≥ x T Ixκ ∀x ⇔ λ ( 12 (A +AT ) ) ≽ κ1(1326)Now let A,B ∈ S M have diagonalizations A=QΛQ T and B =UΥU T withλ(A)=δ(Λ) and λ(B)=δ(Υ) arranged in nonincreasing order. ThenA ≽ B ⇔ λ(A−B) ≽ 0 (1327)A ≽ B ⇒ λ(A) ≽ λ(B) (1328)A ≽ B λ(A) ≽ λ(B) (1329)S T AS ≽ B ⇐ λ(A) ≽ λ(B) (1330)where S = QU T . [303,7.5]⋄

A.3. PROPER STATEMENTS 511A.3.1.0.2 Theorem. (Weyl) Eigenvalues of sum. [150,4.3]For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M in nonincreasingorder so λ ( 1(A 2 +AT ) ) holds the largest eigenvalue of symmetrized A while1λ ( 1(B 2 +BT ) ) holds the largest eigenvalue of symmetrized B , and so on.1Then, for any k ∈{1... M }λ ( A +A T) k + λ( B +B T) M ≤ λ( (A +A T ) + (B +B T ) ) k ≤ λ( A +A T) k + λ( B +B T) 1(1331)Weyl’s theorem establishes positive semidefiniteness of a sum of positivesemidefinite matrices. Because S M + is a convex cone (2.9.0.0.1), thenby (145)A,B ≽ 0 ⇒ ζA + ξB ≽ 0 for all ζ,ξ ≥ 0 (1332)⋄A.3.1.0.3 Corollary. Eigenvalues of sum and difference. [150,4.3]For A∈ S M and B ∈ S M + , place the eigenvalues of each matrix into therespective vectors λ(A), λ(B)∈ R M in nonincreasing order so λ(A) 1 holdsthe largest eigenvalue of A while λ(B) 1 holds the largest eigenvalue of B ,and so on. Then, for any k ∈{1... M}λ(A − B) k ≤ λ(A) k ≤ λ(A +B) k (1333)⋄When B is rank-one positive semidefinite, the eigenvalues interlace; id est,for B = qq Tλ(A) k−1 ≤ λ(A − qq T ) k ≤ λ(A) k ≤ λ(A + qq T ) k ≤ λ(A) k+1 (1334)

512 APPENDIX A. LINEAR ALGEBRAA.3.1.0.4 Theorem. Positive (semi)definite principal submatrices. A.10A∈ S M is positive semidefinite if and only if all M principal submatricesof dimension M−1 are positive semidefinite and detA is nonnegative.A ∈ S M is positive definite if and only if any one principal submatrixof dimension M −1 is positive definite and detA is positive. ⋄If any one principal submatrix of dimension M−1 is not positive definite,conversely, then A can neither be. Regardless of symmetry, if A ∈ R M×M ispositive (semi)definite, then the determinant of each and every principalsubmatrix is (nonnegative) positive. [202,1.3.1]A.3.1.0.5[150, p.399]Corollary. Positive (semi)definite symmetric products.If A∈ S M is positive definite and any particular dimensionallycompatible matrix Z has no nullspace, then Z T AZ is positive definite.If matrix A∈ S M is positive (semi)definite then, for any matrix Z ofcompatible dimension, Z T AZ is positive semidefinite.A∈ S M is positive (semi)definite if and only if there exists a nonsingularZ such that Z T AZ is positive (semi)definite.If A ∈ S M is positive semidefinite and singular it remains possible,for some skinny Z ∈ R M×N with N

A.3. PROPER STATEMENTS 513We can deduce from these, given nonsingular matrix Z and any particulardimensionally[ ]compatible Y : matrix A∈ S M is positive semidefinite if andZTonly ifY T A[Z Y ] is positive semidefinite. In other words, from theCorollary it follows: for dimensionally compatible ZA ≽ 0 ⇔ Z T AZ ≽ 0 and Z Thas a left inverseProducts such as Z † Z and ZZ † are symmetric and positive semidefinitealthough, given A ≽ 0, Z † AZ and ZAZ † are neither necessarily symmetricor positive semidefinite.A.3.1.0.6 Theorem. Symmetric projector semidefinite. [17,III][18,6] [163, p.55] For symmetric idempotent matrices P and RP,R ≽ 0P ≽ R ⇔ R(P ) ⊇ R(R) ⇔ N(P ) ⊆ N(R)(1335)Projector P is never positive definite [253,6.5, prob.20] unless it is theidentity matrix.⋄A.3.1.0.7 Theorem. Symmetric positive semidefinite.Given real matrix Ψ with rank Ψ = 1Ψ ≽ 0 ⇔ Ψ = uu T (1336)where u is some real vector; id est, symmetry is necessary and sufficient forpositive semidefiniteness of a rank-1 matrix.⋄Proof. Any rank-one matrix must have the form Ψ = uv T . (B.1)Suppose Ψ is symmetric; id est, v = u . For all y ∈ R M , y T uu T y ≥ 0.Conversely, suppose uv T is positive semidefinite. We know that can hold ifand only if uv T + vu T ≽ 0 ⇔ for all normalized y ∈ R M , 2y T uv T y ≥ 0 ;but that is possible only if v = u .The same does not hold true for matrices of higher rank, as Example A.2.1.0.1shows.

514 APPENDIX A. LINEAR ALGEBRAA.4 Schur complementConsider the Schur-form partitioned matrix G :C T = C , then [44][ ] A BG =B T ≽ 0C⇔ A ≽ 0, B T (I −AA † ) = 0, C −B T A † B ≽ 0⇔ C ≽ 0, B(I −CC † ) = 0, A−BC † B T ≽ 0Given A T = A and(1337)where A † denotes the Moore-Penrose (pseudo)inverse (E). In the firstinstance, I − AA † is a symmetric projection matrix orthogonally projectingon N(A T ). (1704) It is apparently requiredR(B) ⊥ N(A T ) (1338)which precludes A = 0 when B is any nonzero matrix. Note that A ≻ 0 ⇒A † =A −1 ; thereby, the projection matrix vanishes. Likewise, in the secondinstance, I − CC † projects orthogonally on N(C T ). It is requiredR(B T ) ⊥ N(C T ) (1339)which precludes C =0 for B nonzero. Again, C ≻ 0 ⇒ C † = C −1 . So weget, for A or C nonsingular,[ ] A BG =B T ≽ 0C⇔A ≻ 0, C −B T A −1 B ≽ 0orC ≻ 0, A−BC −1 B T ≽ 0(1340)When A is full-rank then, for all B of compatible dimension, R(B) is inR(A). Likewise, when C is full-rank, R(B T ) is in R(C). Thus the flavor,for A and C nonsingular,[ ] A BG =B T ≻ 0C⇔ A ≻ 0, C −B T A −1 B ≻ 0⇔ C ≻ 0, A−BC −1 B T ≻ 0(1341)where C − B T A −1 B is called the Schur complement of A in G , while theSchur complement of C in G is A − BC −1 B T . [103,4.8]

A.4. SCHUR COMPLEMENT 515Origin of the term Schur complement is from complementary inertia:[77,2.4.4] Defineinertia ( G∈ S M) ∆ = {p,z,n} (1342)where p,z,n respectively represent number of positive, zero, and negativeeigenvalues of G ; id est,M = p + z + n (1343)Then, when A is invertible,and when C is invertible,inertia(G) = inertia(A) + inertia(C − B T A −1 B) (1344)inertia(G) = inertia(C) + inertia(A − BC −1 B T ) (1345)When A=C =0, denoting by σ(B)∈ R m + the nonincreasingly orderedsingular values of matrix B ∈ R m×m , then we have the eigenvalues[41,1.2, prob.17]([ ]) [ ]0 B σ(B)λ(G) = λB T =(1346)0 −Ξσ(B)andinertia(G) = inertia(B T B) + inertia(−B T B) (1347)where Ξ is the order-reversing permutation matrix defined in (1533).A.4.0.0.1 Example. Nonnegative polynomial. [27, p.163]Schur-form positive semidefiniteness is necessary and sufficient for quadraticpolynomial nonnegativity; videlicet, for all compatible x[x T 1] [ A bb T c] [ x1]≥ 0 ⇔ x T Ax + 2b T x + c ≥ 0 (1348)A.4.0.0.2 Example. Sparse Schur conditions.Setting matrix A to the identity simplifies the Schur conditions. Oneconsequence relates the definiteness of three quantities:[ ][ ]I BI 0B T ≽ 0 ⇔ C − B T B ≽ 0 ⇔C0 T C −B T ≽ 0 (1349)B

516 APPENDIX A. LINEAR ALGEBRAA.4.0.0.3 Exercise. Eigenvalues λ of sparse Schur-form.Prove: given C −B T B = 0, for B ∈ R m×n and C ∈ S n⎧([ ]) ⎪⎨ 1 + λ(C) i , 1 ≤ i ≤ nI BλB T = 1, n < i ≤ mCi⎪⎩0, otherwise(1350)A.4.0.0.4 Theorem. Rank of partitioned matrices.When symmetric matrix A is invertible and C is symmetric,[ ] [ ]A B A 0rankB T = rankC 0 T C −B T A −1 B= rankA + rank(C −B T A −1 B)(1351)equals rank of a block on the main diagonal plus rank of its Schur complement[303,2.2, prob.7]. Similarly, when symmetric matrix C is invertible and Ais symmetric,[ ] [ ]A B A − BCrankB T = rank−1 B T 0C0 T C(1352)= rank(A − BC −1 B T ) + rankCProof. The first assertion (1351) holds if and only if [150,0.4.6(c)][ ] [ ]A B A 0∃ nonsingular X,Y XB T Y =C 0 T C −B T A −1 (1353)BLet [150,7.7.6]Y = X T =[ I −A −1 B0 T I]⋄(1354)A.4.0.0.5 Lemma. Rank of Schur-form block. [92] [90]Matrix B ∈ R m×n has rankB≤ ρ if and only if there exist matrices A∈ S mand C ∈ S n such that[ ][ ]A 0A Brank0 T ≤ 2ρ and G =CB T ≽ 0 (1355)C⋄

A.4. SCHUR COMPLEMENT 517Schur-form positive semidefiniteness alone implies rankA ≥ rankB andrankC ≥ rankB . But, even in absence of semidefiniteness, we must alwayshave rankG ≥ rankA, rankB, rankC by fundamental linear algebra.A.4.1Determinant[ ] A BG =B T C(1356)We consider again a matrix G partitioned like (1337), but not necessarilypositive (semi)definite, where A and C are symmetric.When A is invertible,When C is invertible,detG = detA det(C − B T A −1 B) (1357)detG = detC det(A − BC −1 B T ) (1358)When B is full-rank and skinny, C = 0, and A ≽ 0, then [46,10.1.1]detG ≠ 0 ⇔ A + BB T ≻ 0 (1359)When B is a (column) vector, then for all C ∈ R and all A of dimensioncompatible with Gwhile for C ≠ 0detG = det(A)C − B T A T cofB (1360)detG = C det(A − 1 C BBT ) (1361)where A cof is the matrix of cofactors [251,4] corresponding to A .When B is full-rank and fat, A = 0, and C ≽ 0, thendetG ≠ 0 ⇔ C + B T B ≻ 0 (1362)When B is a row vector, then for A ≠ 0 and all C of dimensioncompatible with Gwhile for all A∈ RdetG = A det(C − 1 A BT B) (1363)detG = det(C)A − BC T cofB T (1364)where C cof is the matrix of cofactors corresponding to C .

518 APPENDIX A. LINEAR ALGEBRAA.5 eigen decompositionWhen a square matrix X ∈ R m×m is diagonalizable, [251,5.6] then⎡ ⎤w T1 m∑X = SΛS −1 = [ s 1 · · · s m ] Λ⎣. ⎦ = λ i s i wi T (1365)where s i ∈ C m are linearly independent (right-)eigenvectors A.12 constitutingthe columns of S ∈ C m×m defined byw T mi=1XS = SΛ (1366)w T i ∈ C m are linearly independent left-eigenvectors of X constituting the rowsof S −1 defined by [150]S −1 X = ΛS −1 (1367)and where {λ i ∈ C} are eigenvalues (populating diagonal matrix Λ∈ C m×m )corresponding to both left and right eigenvectors; id est, λ(X) = λ(X T ).There is no connection between diagonalizability and invertibility of X .[251,5.2] Diagonalizability is guaranteed by a full set of linearly independenteigenvectors, whereas invertibility is guaranteed by all nonzero eigenvalues.distinct eigenvalues ⇒ l.i. eigenvectors ⇔ diagonalizablenot diagonalizable ⇒ repeated eigenvalue(1368)A.5.0.0.1 Theorem. Real eigenvector. Eigenvectors of a real matrixcorresponding to real eigenvalues must be real.⋄Proof. Ax = λx . Given λ=λ ∗ , x H Ax = λx H x = λ‖x‖ 2 = x T Ax ∗x = x ∗ , where x H =x ∗T . The converse is equally simple. ⇒A.5.0.1UniquenessFrom the fundamental theorem of algebra it follows: eigenvalues, includingtheir multiplicity, for a given square matrix are unique; meaning, there is noother set of eigenvalues for that matrix. (Conversely, many different matricesmay share the same unique set of eigenvalues.)Uniqueness of eigenvectors, in contrast, disallows multiplicity of the samedirection.A.12 Eigenvectors must, of course, be nonzero. The prefix eigen is from the German; inthis context meaning, something akin to “characteristic”. [248, p.14]

A.5. EIGEN DECOMPOSITION 519A.5.0.1.1 Definition. Unique eigenvectors.When eigenvectors are unique, we mean: unique to within a real nonzeroscaling, and their directions are distinct.△If S is a matrix of eigenvectors of X as in (1365), for example, then −Sis certainly another matrix of eigenvectors decomposing X with the sameeigenvalues.For any square matrix, the eigenvector corresponding to a distincteigenvalue is unique; [248, p.220]distinct eigenvalues ⇒ eigenvectors unique (1369)Eigenvectors corresponding to a repeated eigenvalue are not unique for adiagonalizable matrix;repeated eigenvalue ⇒ eigenvectors not unique (1370)Proof follows from the observation: any linear combination of distincteigenvectors of diagonalizable X , corresponding to a particular eigenvalue,produces another eigenvector. For eigenvalue λ whose multiplicity A.13dim N(X −λI) exceeds 1, in other words, any choice of independentvectors from N(X −λI) (of the same multiplicity) constitutes eigenvectorscorresponding to λ .Caveat diagonalizability insures linear independence which impliesexistence of distinct eigenvectors. We may conclude, for diagonalizablematrices,distinct eigenvalues ⇔ eigenvectors unique (1371)A.5.1eigenmatrixThe (right-)eigenvectors {s i } are naturally orthogonal to the left-eigenvectors{w i } except, for i = 1... m , w T i s i = 1 ; called a biorthogonality condition[277,2.2.4] [150] because neither set of left or right eigenvectors is necessarilyan orthogonal set. Consequently, each dyad from a diagonalization is anindependent (B.1.1) nonorthogonal projector becauseA.13 For a diagonalizable matrix, algebraic multiplicity is the same as geometric multiplicity.[248, p.15]

520 APPENDIX A. LINEAR ALGEBRAs i w T i s i w T i = s i w T i (1372)(whereas the dyads of singular value decomposition are not inherentlyprojectors (confer (1376))).The dyads of eigen decomposition can be termed eigenmatrices becauseA.5.2X s i w T i = λ i s i w T i (1373)Symmetric matrix diagonalizationThe set of normal matrices is, precisely, that set of all real matrices havinga complete orthonormal set of eigenvectors; [303,8.1] [253, prob.10.2.31]id est, any matrix X for which XX T = X T X ; [110,7.1.3] [248, p.3]e.g., orthogonal and circulant matrices [118]. All normal matrices arediagonalizable. A symmetric matrix is a special normal matrix whoseeigenvalues must be real and whose eigenvectors can be chosen to make areal orthonormal set; [253,6.4] [251, p.315] id est, for X ∈ S ms T1X = SΛS T = [ s 1 · · · s m ] Λ⎣.⎡s T m⎤⎦ =m∑λ i s i s T i (1374)where δ 2 (Λ) = Λ∈ S m (A.1) and S −1 = S T ∈ R m×m (orthogonal matrix,B.5) because of symmetry: SΛS −1 = S −T ΛS T .Because the arrangement of eigenvectors and their correspondingeigenvalues is arbitrary, we almost always arrange eigenvalues innonincreasing order as is the convention for singular value decomposition.Then to diagonalize a symmetric matrix that is already a diagonal matrix,orthogonal matrix S becomes a permutation matrix.i=1A.5.2.1Positive semidefinite matrix square rootWhen X ∈ S m + , its unique positive semidefinite matrix square root is defined√X ∆ = S √ ΛS T ∈ S m + (1375)where the square root of nonnegative diagonal matrix √ Λ is taken entrywiseand positive. Then X = √ X √ X .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 521A.6 Singular value decomposition, SVDA.6.1Compact SVD[110,2.5.4] For any A∈ R m×nq T1A = UΣQ T = [ u 1 · · · u η ] Σ⎣.⎡⎤∑⎦ = η σ i u i qiTqηT i=1(1376)U ∈ R m×η , Σ ∈ R η×η , Q ∈ R n×ηwhere U and Q are always skinny-or-square each having orthonormalcolumns, and whereη ∆ = min{m , n} (1377)Square matrix Σ is diagonal (A.1.1)δ 2 (Σ) = Σ ∈ R η×η (1378)holding the singular values {σ i ∈ R} of A which are always arranged innonincreasing order by convention and are related to eigenvalues by A.14⎧√ ⎨ λ(AT A)σ(A) i = σ(A T i = √ (√ )λ(AA T ) i = λ AT A)= λ(√AAT> 0, 1 ≤ i ≤ ρ) i =ii⎩0, ρ < i ≤ η(1379)of which the last η −ρ are 0 , A.15 whereρ ∆ = rankA = rank Σ (1380)A point sometimes lost: Any real matrix may be decomposed in terms ofits real singular values σ(A) ∈ R η and real matrices U and Q as in (1376),where [110,2.5.3]R{u i |σ i ≠0} = R(A)R{u i |σ i =0} ⊆ N(A T )R{q i |σ i ≠0} = R(A T (1381))R{q i |σ i =0} ⊆ N(A)A.14 When A is normal,√σ(A) = |λ(A)|. [303,8.1])A.15 For η = n , σ(A) = λ(A T A) = λ(√AT A where λ denotes eigenvalues.√)For η = m , σ(A) = λ(AA T ) = λ(√AAT.

522 APPENDIX A. LINEAR ALGEBRAA.6.2Subcompact SVDSome authors allow only nonzero singular values. In that case the compactdecomposition can be made smaller; it can be redimensioned in terms of rankρ because, for any A∈ R m×nρ = rankA = rank Σ = max {i∈{1... η} | σ i ≠ 0} ≤ η (1382)There are η singular values. For any flavor SVD, rank is equivalent tothe number of nonzero singular values on the main diagonal of Σ.Now⎡q T1A = UΣQ T = [ u 1 · · · u ρ ] Σ⎣.⎤∑⎦ = ρ σ i u i qiTqρT i=1(1383)U ∈ R m×ρ , Σ ∈ R ρ×ρ , Q ∈ R n×ρwhere the main diagonal of diagonal matrix Σ has no 0 entries, andR{u i } = R(A)R{q i } = R(A T )(1384)A.6.3Full SVDAnother common and useful expression of the SVD makes U and Qsquare; making the decomposition larger than compact SVD. Completingthe nullspace bases in U and Q from (1381) provides what is called thefull singular value decomposition of A ∈ R m×n [251, App.A]. Orthonormalmatrices U and Q become orthogonal matrices (B.5):R{u i |σ i ≠0} = R(A)R{u i |σ i =0} = N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} = N(A)(1385)

A.6. SINGULAR VALUE DECOMPOSITION, SVD 523For any matrix A having rank ρ (= rank Σ)⎡q T1A = UΣQ T = [ u 1 · · · u m ] Σ⎣.q T n⎤∑⎦ = η σ i u i qiTi=1⎡= [ m×ρ basis R(A) m×m−ρ basis N(A T ) ] ⎢⎣σ 1σ 2...⎤⎡ ( n×ρ basis R(A T ) ) ⎤T⎥⎣⎦⎦(n×n−ρ basis N(A)) TU ∈ R m×m , Σ ∈ R m×n , Q ∈ R n×n (1386)where upper limit of summation η is defined in (1377). Matrix Σ is nolonger necessarily square, now padded with respect to (1378) by m−ηzero rows or n−η zero columns; the nonincreasingly ordered (possibly 0)singular values appear along its main diagonal as for compact SVD (1379).An important geometrical interpretation of SVD is given in Figure 117for m = n = 2 : The image of the unit sphere under any m × n matrixmultiplication is an ellipse. Considering the three factors of the SVDseparately, note that Q T is a pure rotation of the circle. Figure 117 showshow the axes q 1 and q 2 are first rotated by Q T to coincide with the coordinateaxes. Second, the circle is stretched by Σ in the directions of the coordinateaxes to form an ellipse. The third step rotates the ellipse by U into itsfinal position. Note how q 1 and q 2 are rotated to end up as u 1 and u 2 , theprincipal axes of the final ellipse. A direct calculation shows that Aq j = σ j u j .Thus q j is first rotated to coincide with the j th coordinate axis, stretched bya factor σ j , and then rotated to point in the direction of u j . All of thisis beautifully illustrated for 2 ×2 matrices by the Matlab code eigshow.m(see [250]).A direct consequence of the geometric interpretation is that the largestsingular value σ 1 measures the “magnitude” of A (its 2-norm):‖A‖ 2 = sup ‖Ax‖ 2 = σ 1 (1387)‖x‖ 2 =1This means that ‖A‖ 2 is the length of the longest principal semiaxis of theellipse.

524 APPENDIX A. LINEAR ALGEBRAΣq 2qq 21Q Tq 1q 2 u 2Uu 1q 1Figure 117: Geometrical interpretation of full SVD [201]: Image of circle{x∈ R 2 | ‖x‖ 2 =1} under matrix multiplication Ax is, in general, an ellipse.For the example illustrated, U ∆ =[u 1 u 2 ]∈ R 2×2 , Q ∆ =[q 1 q 2 ]∈ R 2×2 .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 525Expressions for U , Q , and Σ follow readily from (1386),AA T U = UΣΣ T and A T AQ = QΣ T Σ (1388)demonstrating that the columns of U are the eigenvectors of AA T and thecolumns of Q are the eigenvectors of A T A. −Neil Muller et alii [201]A.6.4Pseudoinverse by SVDMatrix pseudoinverse (E) is nearly synonymous with singular valuedecomposition because of the elegant expression, given A = UΣQ TA † = QΣ †T U T ∈ R n×m (1389)that applies to all three flavors of SVD, where Σ † simply inverts nonzeroentries of matrix Σ .Given symmetric matrix A∈ S n and its diagonalization A = SΛS T(A.5.2), its pseudoinverse simply inverts all nonzero eigenvalues;A.6.5SVD of symmetric matricesA † = SΛ † S T (1390)A.6.5.0.1 Definition. Step function. (confer4.3.2.0.1)Define the signum-like quasilinear function ψ : R n → R n that takes value 1corresponding to a 0-valued entry in its argument:[ψ(a) =∆limx i →a ix i|x i | = { 1, ai ≥ 0−1, a i < 0 , i=1... n ]∈ R n (1391)Eigenvalue signs of a symmetric matrix having diagonalizationA = SΛS T (1374) can be absorbed either into real U or real Q from the fullSVD; [265, p.34] (conferC.4.2.1)orA = SΛS T = Sδ(ψ(δ(Λ))) |Λ|S T ∆ = U ΣQ T ∈ S n (1392)A = SΛS T = S|Λ|δ(ψ(δ(Λ)))S T ∆ = UΣ Q T ∈ S n (1393)where Σ = |Λ| , denoting entrywise absolute value of diagonal matrix Λ .△

526 APPENDIX A. LINEAR ALGEBRAA.7 ZerosA.7.1norm zeroFor any given norm, by definition,A.7.20 entry‖x‖ l= 0 ⇔ x = 0 (1394)If a positive semidefinite matrix A = [A ij ] ∈ R n×n has a 0 entry A ii on itsmain diagonal, then A ij + A ji = 0 ∀j . [202,1.3.1]Any symmetric positive semidefinite matrix having a 0 entry on its maindiagonal must be 0 along the entire row and column to which that 0 entrybelongs. [110,4.2.8] [150,7.1, prob.2]A.7.3 0 eigenvalues theoremThis theorem is simple, powerful, and widely applicable:A.7.3.0.1 Theorem. Number of 0 eigenvalues.For any matrix A∈ R m×nrank(A) + dim N(A) = n (1395)by conservation of dimension. [150,0.4.4]For any square matrix A∈ R m×m , the number of 0 eigenvalues is at leastequal to dim N(A)dim N(A) ≤ number of 0 eigenvalues ≤ m (1396)while the eigenvectors corresponding to those 0 eigenvalues belong to N(A).[251,5.1] A.16A.16 We take as given the well-known fact that the number of 0 eigenvalues cannot be lessthan the dimension of the nullspace. We offer an example of the converse:⎡ ⎤1 0 1 0A = ⎢ 0 0 1 0⎥⎣ 0 0 0 0 ⎦1 0 0 0

A.7. ZEROS 527For diagonalizable matrix A (A.5), the number of 0 eigenvalues isprecisely dim N(A) while the corresponding eigenvectors span N(A). Thereal and imaginary parts of the eigenvectors remaining span R(A).(TRANSPOSE.)Likewise, for any matrix A∈ R m×nrank(A T ) + dim N(A T ) = m (1397)For any square A∈ R m×m , the number of 0 eigenvalues is at least equalto dim N(A T ) = dim N(A) while the left-eigenvectors (eigenvectors of A T )corresponding to those 0 eigenvalues belong to N(A T ).For diagonalizable A , the number of 0 eigenvalues is preciselydim N(A T ) while the corresponding left-eigenvectors span N(A T ). The realand imaginary parts of the left-eigenvectors remaining span R(A T ). ⋄Proof. First we show, for a diagonalizable matrix, the number of 0eigenvalues is precisely the dimension of its nullspace while the eigenvectorscorresponding to those 0 eigenvalues span the nullspace:Any diagonalizable matrix A∈ R m×m must possess a complete set oflinearly independent eigenvectors. If A is full-rank (invertible), then allm=rank(A) eigenvalues are nonzero. [251,5.1]Suppose rank(A)< m . Then dim N(A) = m−rank(A). Thus there isa set of m−rank(A) linearly independent vectors spanning N(A). Eachof those can be an eigenvector associated with a 0 eigenvalue becauseA is diagonalizable ⇔ ∃ m linearly independent eigenvectors. [251,5.2]Eigenvectors of a real matrix corresponding to 0 eigenvalues must be real. A.17Thus A has at least m−rank(A) eigenvalues equal to 0.Now suppose A has more than m−rank(A) eigenvalues equal to 0.Then there are more than m−rank(A) linearly independent eigenvectorsassociated with 0 eigenvalues, and each of those eigenvectors must be inN(A). Thus there are more than m−rank(A) linearly independent vectorsin N(A) ; a contradiction.dim N(A) = 2, λ(A) = [0 0 0 1] T ; three eigenvectors in the nullspace but only two areindependent. The right-hand side of (1396) is tight for nonzero matrices; e.g., (B.1) dyaduv T ∈ R m×m has m 0-eigenvalues when u ∈ v ⊥ .A.17 Let ∗ denote complex conjugation. Suppose A=A ∗ and As i = 0. Then s i = s ∗ i ⇒As i =As ∗ i ⇒ As ∗ i =0. Conversely, As∗ i =0 ⇒ As i=As ∗ i ⇒ s i= s ∗ i .

528 APPENDIX A. LINEAR ALGEBRATherefore diagonalizable A has rank(A) nonzero eigenvalues and exactlym−rank(A) eigenvalues equal to 0 whose corresponding eigenvectors spanN(A).By similar argument, the left-eigenvectors corresponding to 0 eigenvaluesspan N(A T ).Next we show when A is diagonalizable, the real and imaginary parts ofits eigenvectors (corresponding to nonzero eigenvalues) span R(A) :The right-eigenvectors of a diagonalizable matrix A∈ R m×m are linearlyindependent if and only if the left-eigenvectors are. So, matrix A hasa representation in terms of its right- and left-eigenvectors; from thediagonalization (1365), assuming 0 eigenvalues are ordered last,A =m∑λ i s i wi T =i=1k∑≤ mi=1λ i ≠0λ i s i w T i (1398)From the linearly independent dyads theorem (B.1.1.0.2), the dyads {s i w T i }must be independent because each set of eigenvectors are; hence rankA=k ,the number of nonzero eigenvalues. Complex eigenvectors and eigenvaluesare common for real matrices, and must come in complex conjugate pairs forthe summation to remain real. Assume that conjugate pairs of eigenvaluesappear in sequence. Given any particular conjugate pair from (1398), we getthe partial summationλ i s i w T i + λ ∗ i s ∗ iw ∗Ti = 2Re(λ i s i w T i )= 2 ( Res i Re(λ i w T i ) − Im s i Im(λ i w T i ) ) (1399)where A.18 λ ∗ i = λ i+1 , s ∗ iequivalently writtenA = 2 ∑ iλ ∈ Cλ i ≠0∆∆= s i+1 , and w ∗ iRes 2i Re(λ 2i w T 2i) − Im s 2i Im(λ 2i w T 2i) + ∑ jλ ∈ Rλ j ≠0∆= w i+1 . Then (1398) isλ j s j w T j (1400)The summation (1400) shows: A is a linear combination of real and imaginaryparts of its right-eigenvectors corresponding to nonzero eigenvalues. The kvectors {Re s i ∈ R m , Im s i ∈ R m | λ i ≠0, i∈{1... m}} must therefore spanthe range of diagonalizable matrix A .The argument is similar regarding the span of the left-eigenvectors. A.18 The complex conjugate of w is denoted w ∗ , while its conjugate transpose is denotedby w H = w ∗T .

A.7. ZEROS 529A.7.4For X,A∈ S M +0 trace and matrix product[27,2.6.1, exer.2.8] [271,3.1]tr(XA) = 0 ⇔ XA = AX = 0 (1401)Proof. (⇐) Suppose XA = AX = 0. Then tr(XA)=0 is obvious.(⇒) Suppose tr(XA)=0. tr(XA)= tr( √ AX √ A) whose argument ispositive semidefinite by Corollary A.3.1.0.5. Trace of any square matrix isequivalent to the sum of its eigenvalues. Eigenvalues of a positive semidefinitematrix can total 0 if and only if each and every nonnegative eigenvalueis 0. The only feasible positive semidefinite matrix, having all 0 eigenvalues,resides at the origin; (confer (1425)) id est,√AX√A =(√X√A) T √X√A = 0 (1402)implying √ X √ A = 0 which in turn implies √ X( √ X √ A) √ A = XA = 0.Arguing similarly yields AX = 0.Diagonalizable matrices A and X are simultaneously diagonalizable ifand only if they are commutative under multiplication; [150,1.3.12] id est,iff they share a complete set of eigenvectors.A.7.4.1an equivalence in nonisomorphic spacesIdentity (1401) leads to an unusual equivalence relating convex geometry totraditional linear algebra: The convex sets, given A ≽ 0{X | 〈X , A〉 = 0} ∩ {X ≽ 0} ≡ {X | N(X) ⊇ R(A)} ∩ {X ≽ 0} (1403)(one expressed in terms of a hyperplane, the other in terms of nullspace andrange) are equivalent only when symmetric matrix A is positive semidefinite.We might apply this equivalence to the geometric center subspace, forexample,S M c = {Y ∈ S M | Y 1 = 0}= {Y ∈ S M | N(Y ) ⊇ 1} = {Y ∈ S M | R(Y ) ⊆ N(1 T )}(1792)from which we derive (confer (828))S M c ∩ S M + ≡ {X ≽ 0 | 〈X , 11 T 〉 = 0} (1404)

530 APPENDIX A. LINEAR ALGEBRAA.7.5Zero definiteThe domain over which an arbitrary real matrix A is zero definite can exceedits left and right nullspaces. For any positive semidefinite matrix A∈ R M×M(for A +A T ≽ 0){x | x T Ax = 0} = N(A +A T ) (1405)because ∃R A+A T =R T R , ‖Rx‖=0 ⇔ Rx=0, and N(A+A T )= N(R).Then given any particular vector x p , x T pAx p = 0 ⇔ x p ∈ N(A +A T ). Forany positive definite matrix A (for A +A T ≻ 0)Further, [303,3.2, prob.5]while{x | x T Ax = 0} = 0 (1406){x | x T Ax = 0} = R M ⇔ A T = −A (1407){x | x H Ax = 0} = C M ⇔ A = 0 (1408)The positive semidefinite matrix[ ] 1 2A =0 1for example, has no nullspace. Yet(1409){x | x T Ax = 0} = {x | 1 T x = 0} ⊂ R 2 (1410)which is the nullspace of the symmetrized matrix. Symmetric matrices arenot spared from the excess; videlicet,[ ] 1 2B =(1411)2 1has eigenvalues {−1, 3}, no nullspace, but is zero definite on A.19X ∆ = {x∈ R 2 | x 2 = (−2 ± √ 3)x 1 } (1412)A.19 These two lines represent the limit in the union of two generally distinct hyperbolae;id est, for matrix B and set X as definedlimε→0 +{x∈ R2 | x T Bx = ε} = X

A.7. ZEROS 531A.7.5.0.1 Proposition. (Sturm) Dyad-decompositions. [256,5.2]Let positive semidefinite matrix X ∈ S M + have rank ρ . Then given symmetricmatrix A∈ S M , 〈A, X〉 = 0 if and only if there exists a dyad-decompositionsatisfyingX =ρ∑x j xj T (1413)j=1〈A , x j x T j 〉 = 0 for each and every j ∈ {1... ρ} (1414)⋄The dyad-decomposition of X proposed is generally not that obtainedfrom a standard diagonalization by eigen decomposition, unless ρ =1 orthe given matrix A is diagonalizable simultaneously (A.7.4) with X .That means, elemental dyads in decomposition (1413) constitute a generallynonorthogonal set. Sturm & Zhang give a simple procedure for constructingthe dyad-decomposition; (F.5) matrix A may be regarded as a parameter.A.7.5.0.2 Example. Dyad.The dyad uv T ∈ R M×M (B.1) is zero definite on all x for which eitherx T u=0 or x T v=0;{x | x T uv T x = 0} = {x | x T u=0} ∪ {x | v T x=0} (1415)id est, on u ⊥ ∪ v ⊥ . Symmetrizing the dyad does not change the outcome:{x | x T (uv T + vu T )x/2 = 0} = {x | x T u=0} ∪ {x | v T x=0} (1416)

532 APPENDIX A. LINEAR ALGEBRA

Appendix BSimple matricesMathematicians also attempted to develop algebra of vectors butthere was no natural definition of the product of two vectorsthat held in arbitrary dimensions. The first vector algebra thatinvolved a noncommutative vector product (that is, v×w need notequal w×v) was proposed by Hermann Grassmann in his bookAusdehnungslehre (1844). Grassmann’s text also introduced theproduct of a column matrix and a row matrix, which resulted inwhat is now called a simple or a rank-one matrix. In the late19th century the American mathematical physicist Willard Gibbspublished his famous treatise on vector analysis. In that treatiseGibbs represented general matrices, which he called dyadics, assums of simple matrices, which Gibbs called dyads. Later thephysicist P. A. M. Dirac introduced the term “bra-ket” for whatwe now call the scalar product of a “bra” (row) vector times a“ket” (column) vector and the term “ket-bra” for the product of aket times a bra, resulting in what we now call a simple matrix, asabove. Our convention of identifying column matrices and vectorswas introduced by physicists in the 20th century.−Marie A. Vitulli [279]2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.533

534 APPENDIX B. SIMPLE MATRICESB.1 Rank-one matrix (dyad)Any matrix formed from the unsigned outer product of two vectors,Ψ = uv T ∈ R M×N (1417)where u ∈ R M and v ∈ R N , is rank-one and called a dyad. Conversely, anyrank-one matrix must have the form Ψ . [150, prob.1.4.1] Product −uv T isa negative dyad. For matrix products AB T , in general, we haveR(AB T ) ⊆ R(A) , N(AB T ) ⊇ N(B T ) (1418)with equality when B =A [251,3.3,3.6] B.1 or respectively when B isinvertible and N(A)=0. Yet for all nonzero dyads we haveR(uv T ) = R(u) , N(uv T ) = N(v T ) ≡ v ⊥ (1419)where dim v ⊥ =N −1.It is obvious a dyad can be 0 only when u or v is 0;Ψ = uv T = 0 ⇔ u = 0 or v = 0 (1420)The matrix 2-norm for Ψ is equivalent to the Frobenius norm;‖Ψ‖ 2 = ‖uv T ‖ F = ‖uv T ‖ 2 = ‖u‖ ‖v‖ (1421)When u and v are normalized, the pseudoinverse is the transposed dyad.Otherwise,Ψ † = (uv T ) † vu T=(1422)‖u‖ 2 ‖v‖ 2B.1 Proof. R(AA T ) ⊆ R(A) is obvious.R(AA T ) = {AA T y | y ∈ R m }⊇ {AA T y | A T y ∈ R(A T )} = R(A) by (121)

B.1. RANK-ONE MATRIX (DYAD) 535R(v)0 0 R(Ψ) = R(u)N(Ψ)= N(v T )N(u T )R N = R(v) ⊕ N(uv T )N(u T ) ⊕ R(uv T ) = R MFigure 118:The four fundamental subspaces [253,3.6] of any dyadΨ = uv T ∈R M×N . Ψ(x) ∆ = uv T x is a linear mapping from R N to R M . Themap from R(v) to R(u) is bijective. [251,3.1]When dyad uv T ∈R N×N is square, uv T has at least N −1 0-eigenvaluesand corresponding eigenvectors spanning v ⊥ . The remaining eigenvector uspans the range of uv T with corresponding eigenvalueλ = v T u = tr(uv T ) ∈ R (1423)Determinant is a product of the eigenvalues; so, it is always true thatdet Ψ = det(uv T ) = 0 (1424)When λ = 1, the square dyad is a nonorthogonal projector projecting onits range (Ψ 2 =Ψ ,E.6.2.1); a projector dyad. It is quite possible thatu∈v ⊥ making the remaining eigenvalue instead 0 ; B.2 λ = 0 together withthe first N −1 0-eigenvalues; id est, it is possible uv T were nonzero whileall its eigenvalues are 0. The matrix[ ] 1 [ 1 1]=−1[ 1 1−1 −1](1425)for example, has two 0-eigenvalues. In other words, eigenvector u maysimultaneously be a member of the nullspace and range of the dyad.The explanation is, simply, because u and v share the same dimension,dimu = M = dimv = N :B.2 A dyad is not always diagonalizable (A.5) because its eigenvectors are not necessarilyindependent.

536 APPENDIX B. SIMPLE MATRICESProof. Figure 118 shows the four fundamental subspaces for the dyad.Linear operator Ψ : R N →R M provides a map between vector spaces thatremain distinct when M =N ;B.1.0.1rank-one modificationu ∈ R(uv T )u ∈ N(uv T ) ⇔ v T u = 0R(uv T ) ∩ N(uv T ) = ∅(1426)If A∈ R N×N is any nonsingular matrix and 1+v T A −1 u≠0, then [162, App.6][303,2.3, prob.16] [103,4.11.2] (Sherman-Morrison)B.1.0.2dyad symmetry(A + uv T ) −1 = A −1 − A−1 uv T A −11 + v T A −1 u(1427)In the specific circumstance that v = u , then uu T ∈ R N×N is symmetric,rank-one, and positive semidefinite having exactly N −1 0-eigenvalues. Infact, (Theorem A.3.1.0.7)uv T ≽ 0 ⇔ v = u (1428)and the remaining eigenvalue is almost always positive;λ = u T u = tr(uu T ) > 0 unless u=0 (1429)The matrix [ Ψ uu T 1for example, is rank-1 positive semidefinite if and only if Ψ = uu T .](1430)B.1.1 Dyad independenceNow we consider a sum of dyads like (1417) as encountered in diagonalizationand singular value decomposition:( k∑)k∑R s i wiT = R ( )k∑s i wiT = R(s i ) ⇐ w i ∀i are l.i. (1431)i=1i=1i=1

B.1. RANK-ONE MATRIX (DYAD) 537range of summation is the vector sum of ranges. B.3 (Theorem B.1.1.1.1)Under the assumption the dyads are linearly independent (l.i.), then thevector sums are unique (p.692): for {w i } l.i. and {s i } l.i.( k∑)R s i wiT = R ( ) ( )s 1 w1 T ⊕ ... ⊕ R sk wk T = R(s1 ) ⊕ ... ⊕ R(s k ) (1432)i=1B.1.1.0.1 Definition. Linearly independent dyads. [155, p.29, thm.11][258, p.2] The set of k dyads{siwi T | i=1... k } (1433)where s i ∈C M and w i ∈C N , is said to be linearly independent iff()k∑rank SW T =∆ s i wiT = k (1434)i=1where S ∆ = [s 1 · · · s k ] ∈ C M×k and W ∆ = [w 1 · · · w k ] ∈ C N×k .△As defined, dyad independence does not preclude existence of a nullspaceN(SW T ) , nor does it imply SW T is full-rank. In absence of an assumptionof independence, generally, rankSW T ≤ k . Conversely, any rank-k matrixcan be written in the form SW T by singular value decomposition. (A.6)B.1.1.0.2 Theorem. Linearly independent (l.i.) dyads.Vectors {s i ∈ C M , i = 1... k} are l.i. and vectors {w i ∈ C N , i = 1... k} arel.i. if and only if dyads {s i wi T ∈ C M×N , i=1... k} are l.i.⋄Proof. Linear independence of k dyads is identical to definition (1434).(⇒) Suppose {s i } and {w i } are each linearly independent sets. InvokingSylvester’s rank inequality, [150,0.4] [303,2.4]rankS+rankW − k ≤ rank(SW T ) ≤ min{rankS , rankW } (≤ k) (1435)Then k ≤rank(SW T )≤k that implies the dyads are independent.(⇐) Conversely, suppose rank(SW T )=k . Thenk ≤ min{rankS , rankW } ≤ k (1436)implying the vector sets are each independent.B.3 Move of range R to inside the summation depends on linear independence of {w i }.

538 APPENDIX B. SIMPLE MATRICESB.1.1.1 Biorthogonality condition, Range and Nullspace of SumDyads characterized by a biorthogonality condition W T S =I areindependent; id est, for S ∈ C M×k and W ∈ C N×k , if W T S =I thenrank(SW T )=k by the linearly independent dyads theorem because(conferE.1.1)W T S =I ⇔ rankS=rankW =k≤M =N (1437)To see that, we need only show: N(S)=0 ⇔ ∃ B BS =I . B.4(⇐) Assume BS =I . Then N(BS)=0={x | BSx = 0} ⊇ N(S). (1418)(⇒) If N(S)=0[ then]S must be full-rank skinny-or-square.B∴ ∃ A,B,C [ S A] = I (id est, [S A] is invertible) ⇒ BS =I .CLeft inverse B is given as W T here. Because of reciprocity with S , itimmediately follows: N(W)=0 ⇔ ∃ S S T W =I . Dyads produced by diagonalization, for example, are independent becauseof their inherent biorthogonality. (A.5.1) The converse is generally false;id est, linearly independent dyads are not necessarily biorthogonal.B.1.1.1.1 Theorem. Nullspace and range of dyad sum.Given a sum of dyads represented by SW T where S ∈C M×k and W ∈ C N×kN(SW T ) = N(W T ) ⇐ ∃ B BS = IR(SW T ) = R(S) ⇐ ∃ Z W T Z = I(1438)⋄Proof. (⇒) N(SW T )⊇ N(W T ) and R(SW T )⊆ R(S) are obvious.(⇐) Assume the existence of a left inverse B ∈ R k×N and a right inverseZ ∈ R N×k . B.5N(SW T ) = {x | SW T x = 0} ⊆ {x | BSW T x = 0} = N(W T ) (1439)R(SW T ) = {SW T x | x∈ R N } ⊇ {SW T Zy | Zy ∈ R N } = R(S) (1440)B.4 Left inverse is not unique, in general.B.5 By counter-example, the theorem’s converse cannot be true; e.g., S = W = [1 0].

B.2. DOUBLET 539R([u v ])R(Π)= R([u v])0 0 N(Π)=v ⊥ ∩ u ⊥([ ]) vR N T= R([u v ]) ⊕ Nu Tv ⊥ ∩ u ⊥([ ]) vTNu T ⊕ R([u v ]) = R NFigure 119: Four fundamental subspaces [253,3.6] of a doubletΠ = uv T + vu T ∈ S N . Π(x) = (uv T + vu T )x is a linear bijective mappingfrom R([u v ]) to R([u v ]).B.2 DoubletConsider a sum of two linearly independent square dyads, one a transpositionof the other:Π = uv T + vu T = [ u v ] [ ]v Tu T = SW T ∈ S N (1441)where u,v ∈ R N . Like the dyad, a doublet can be 0 only when u or v is 0;Π = uv T + vu T = 0 ⇔ u = 0 or v = 0 (1442)By assumption of independence, a nonzero doublet has two nonzeroeigenvaluesλ 1 ∆ = u T v + ‖uv T ‖ , λ 2 ∆ = u T v − ‖uv T ‖ (1443)where λ 1 > 0 >λ 2 , with corresponding eigenvectorsx 1 ∆ = u‖u‖ + v‖v‖ , x 2∆= u‖u‖ −v‖v‖(1444)spanning the doublet range. Eigenvalue λ 1 cannot be 0 unless u and v haveopposing directions, but that is antithetical since then the dyads would nolonger be independent. Eigenvalue λ 2 is 0 if and only if u and v share thesame direction, again antithetical. Generally we have λ 1 > 0 and λ 2 < 0, soΠ is indefinite.

540 APPENDIX B. SIMPLE MATRICESN(u T )R(E) = N(v T )0 0 N(E)=R(u)R(v)R N = N(u T ) ⊕ N(E)R(v) ⊕ R(E) = R NFigure 120: v T u = 1/ζ . The four fundamental subspaces)[253,3.6] ofelementary matrix E as a linear mapping E(x)=(I − uvT x .v T uBy the nullspace and range of dyad sum theorem, doublet Π hasN −2 ([ zero-eigenvalues ]) remaining and corresponding eigenvectors spanningvTNu T . We therefore haveR(Π) = R([u v ]) , N(Π) = v ⊥ ∩ u ⊥ (1445)of respective dimension 2 and N −2.B.3 Elementary matrixA matrix of the formE = I − ζuv T ∈ R N×N (1446)where ζ ∈ R is finite and u,v ∈ R N , is called an elementary matrix or arank-one modification of the identity. [152] Any elementary matrix in R N×Nhas N −1 eigenvalues equal to 1 corresponding to real eigenvectors thatspan v ⊥ . The remaining eigenvalueλ = 1 − ζv T u (1447)corresponds to eigenvector u . B.6 From [162, App.7.A.26] the determinant:detE = 1 − tr ( ζuv T) = λ (1448)B.6 Elementary matrix E is not always diagonalizable because eigenvector u need not beindependent of the others; id est, u∈v ⊥ is possible.

B.3. ELEMENTARY MATRIX 541If λ ≠ 0 then E is invertible; [103]E −1 = I + ζ λ uvT (1449)Eigenvectors corresponding to 0 eigenvalues belong to N(E) , andthe number of 0 eigenvalues must be at least dim N(E) which, here,can be at most one. (A.7.3.0.1) The nullspace exists, therefore, whenλ=0; id est, when v T u=1/ζ , rather, whenever u belongs to thehyperplane {z ∈R N | v T z=1/ζ}. Then (when λ=0) elementary matrixE is a nonorthogonal projector projecting on its range (E 2 =E ,E.1)and N(E)= R(u) ; eigenvector u spans the nullspace when it exists. Byconservation of dimension, dim R(E)=N −dim N(E). It is apparent from(1446) that v ⊥ ⊆ R(E) , but dimv ⊥ =N −1. Hence R(E)≡v ⊥ when thenullspace exists, and the remaining eigenvectors span it.In summary, when a nontrivial nullspace of E exists,R(E) = N(v T ), N(E) = R(u), v T u = 1/ζ (1450)illustrated in Figure 120, which is opposite to the assignment of subspacesfor a dyad (Figure 118). Otherwise, R(E)= R N .When E =E T , the spectral norm isB.3.1Householder matrix‖E‖ 2 = max{1, |λ|} (1451)An elementary matrix is called a Householder matrix when it has the definingform, for nonzero vector u [110,5.1.2] [103,4.10.1] [251,7.3] [150,2.2]H = I − 2 uuTu T u ∈ SN (1452)which is a symmetric orthogonal (reflection) matrix (H −1 =H T =H(B.5.2)). Vector u is normal to an N−1-dimensional subspace u ⊥ throughwhich this particular H effects pointwise reflection; e.g., Hu ⊥ = u ⊥ whileHu = −u .Matrix H has N −1 orthonormal eigenvectors spanning that reflectingsubspace u ⊥ with corresponding eigenvalues equal to 1. The remainingeigenvector u has corresponding eigenvalue −1 ; sodetH = −1 (1453)

542 APPENDIX B. SIMPLE MATRICESDue to symmetry of H , the matrix 2-norm (the spectral norm) is equal to thelargest eigenvalue-magnitude. A Householder matrix is thus characterized,H T = H , H −1 = H T , ‖H‖ 2 = 1, H 0 (1454)For example, the permutation matrix⎡ ⎤1 0 0Ξ = ⎣ 0 0 1 ⎦ (1455)0 1 0is a Householder matrix having u=[ 0 1 −1 ] T / √ 2 . Not all permutationmatrices are Householder matrices, although all permutation matrices areorthogonal matrices [251,3.4] because they are made by permuting rowsand columns of the identity matrix. Neither are all symmetric permutationmatrices Householder matrices; e.g., Ξ = ⎢⎣Householder matrix.B.4 Auxiliary V -matricesB.4.1Auxiliary projector matrix V⎡0 0 0 10 0 1 00 1 0 01 0 0 0⎤⎥⎦(1533) is not aIt is convenient to define a matrix V that arises naturally as a consequence oftranslating the geometric center α c (5.5.1.0.1) of some list X to the origin.In place of X − α c 1 T we may write XV as in (815) whereV ∆ = I − 1 N 11T ∈ S N (757)is an elementary matrix called the geometric centering matrix.Any elementary matrix in R N×N has N −1 eigenvalues equal to 1. For theparticular elementary matrix V , the N th eigenvalue equals 0. The numberof 0 eigenvalues must equal dim N(V ) = 1, by the 0 eigenvalues theorem(A.7.3.0.1), because V =V T is diagonalizable. BecauseV 1 = 0 (1456)

B.4. AUXILIARY V -MATRICES 543the nullspace N(V )= R(1) is spanned by the eigenvector 1. The remainingeigenvectors span R(V ) ≡ 1 ⊥ = N(1 T ) that has dimension N −1.BecauseV 2 = V (1457)and V T = V , elementary matrix V is also a projection matrix (E.3)projecting orthogonally on its range N(1 T ) which is a hyperplane containingthe origin in R N V = I − 1(1 T 1) −1 1 T (1458)The {0, 1} eigenvalues also indicate diagonalizable V is a projectionmatrix. [303,4.1, thm.4.1] Symmetry of V denotes orthogonal projection;from (1709),V T = V , V † = V , ‖V ‖ 2 = 1, V ≽ 0 (1459)Matrix V is also circulant [118].B.4.1.0.1 Example. Relationship of auxiliary to Householder matrix.Let H ∈ S N be a Householder matrix (1452) defined by⎡ ⎤u = ⎢1.⎥⎣ 11 + √ ⎦ ∈ RN (1460)NThen we have [106,2]Let D ∈ S N h and define[ I 0V = H0 T 0[−HDH = ∆ A b−b T c]H (1461)](1462)where b is a vector. Then because H is nonsingular (A.3.1.0.5) [133,3][ ] A 0−V DV = −H0 T H ≽ 0 ⇔ −A ≽ 0 (1463)0and affine dimension is r = rankA when D is a Euclidean distance matrix.

544 APPENDIX B. SIMPLE MATRICESB.4.2Schoenberg auxiliary matrix V N1. V N = 1 √2[ −1TI2. V T N 1 = 03. I − e 1 1 T = [ 0 √ 2V N]4. [ 0 √ 2V N]VN = V N5. [ 0 √ 2V N]V = V]∈ R N×N−16. V [ 0 √ 2V N]=[0√2VN]7. [ 0 √ 2V N] [0√2VN]=[0√2VN]8. [ 0 √ 2V N] †=[ 0 0T0 I]V9. [ 0 √ 2V N] †V =[0√2VN] †10. [ 0 √ ] [ √ ] †2V N 0 2VN = V11. [ 0 √ [ ]] † [ √ ] 0 0T2V N 0 2VN =0 I12. [ 0 √ ] [ ]0 02V TN = [ 0 √ ]2V0 IN[ ] [ ]0 0T [0 √ ] 0 0T13.2VN =0 I0 I14. [V N1 √21 ] −1 =[ ]V†N√2N 1T15. V † N = √ 2 [ − 1 N 1 I − 1 N 11T] ∈ R N−1×N ,(I −1N 11T ∈ S N−1)16. V † N 1 = 017. V † N V N = I

B.4. AUXILIARY V -MATRICES 54518. V T = V = V N V † N = I − 1 N 11T ∈ S N(19. −V † N (11T − I)V N = I , 11 T − I ∈ EDM N)20. D = [d ij ] ∈ S N h (759)tr(−V DV ) = tr(−V D) = tr(−V † N DV N) = 1 N 1T D 1 = 1 N tr(11T D) = 1 NAny elementary matrix E ∈ S N of the particular form∑d iji,jE = k 1 I − k 2 11 T (1464)where k 1 , k 2 ∈ R , B.7 will make tr(−ED) proportional to ∑ d ij .21. D = [d ij ] ∈ S Ntr(−V DV ) = 1 N22. D = [d ij ] ∈ S N h∑i,ji≠jd ij − N−1N∑d ii = 1 T D1 1 − trD Nitr(−V T N DV N) = ∑ jd 1j23. For Y ∈ S NB.4.3V (Y − δ(Y 1))V = Y − δ(Y 1)The skinny matrix⎡V ∆ W =⎢⎣Orthonormal auxiliary matrix V W−1 √N1 + −1N+ √ N−1N+ √ N.−1N+ √ N√−1−1N· · · √N−1N+ √ N......−1N+ √ N· · ·−1N+ √ N...−1N+ √ N...· · · 1 + −1N+ √ N.⎤∈ R N×N−1 (1465)⎥⎦B.7 If k 1 is 1−ρ while k 2 equals −ρ∈R , then all eigenvalues of E for −1/(N −1) < ρ < 1are guaranteed positive and therefore E is guaranteed positive definite. [227]

546 APPENDIX B. SIMPLE MATRICEShas R(V W )= N(1 T ) and orthonormal columns. [4] We defined threeauxiliary V -matrices: V , V N (740), and V W sharing some attributes listedin Table B.4.4. For example, V can be expressedV = V W V T W = V N V † N(1466)but V T W V W= I means V is an orthogonal projector (1706) andV † W = V T W , ‖V W ‖ 2 = 1 , V T W1 = 0 (1467)B.4.4Auxiliary V -matrix TabledimV rankV R(V ) N(V T ) V T V V V T V V †V N ×N N −1 N(1 T ) R(1) V V V[ ]V N N ×(N −1) N −1 N(1 T 1) R(1) (I + 2 11T 1 N −1 −1T) V2 −1 IV W N ×(N −1) N −1 N(1 T ) R(1) I V VB.4.5More auxiliary matricesMathar shows [191,2] that any elementary matrix (B.3) of the formV M = I − b1 T ∈ R N×N (1468)such that b T 1 = 1 (confer [112,2]), is an auxiliary V -matrix havingR(V T M ) = N(bT ), R(V M ) = N(1 T )N(V M ) = R(b), N(V T M ) = R(1) (1469)Given X ∈ R n×N , the choice b= 1 N 1 (V M=V ) minimizes ‖X(I − b1 T )‖ F .[114,3.2.1]

B.5. ORTHOGONAL MATRIX 547B.5 Orthogonal matrixB.5.1Vector rotationThe property Q −1 = Q T completely defines an orthogonal matrix Q ∈ R n×nemployed to effect vector rotation; [251,2.6,3.4] [253,6.5] [150,2.1]for x ∈ R n ‖Qx‖ = ‖x‖ (1470)The orthogonal matrix Q is a normal matrix further characterized:Q −1 = Q T , ‖Q‖ 2 = 1 (1471)Applying characterization (1471) to Q T we see it too is an orthogonal matrix.Hence the rows and columns of Q respectively form an orthonormal set.All permutation matrices Ξ , for example, are orthogonal matrices. Thelargest magnitude entry of any orthogonal matrix is 1; for each and everyj ∈1... n‖Q(j,:)‖ ∞ ≤ 1(1472)‖Q(:, j)‖ ∞ ≤ 1Each and every eigenvalue of a (real) orthogonal matrix has magnitude 1λ(Q) ∈ C n , |λ(Q)| = 1 (1473)while only the identity matrix can be simultaneously positive definite andorthogonal.A unitary matrix is a complex generalization of the orthogonal matrix.The conjugate transpose defines it: U −1 = U H . An orthogonal matrix issimply a real unitary matrix.B.5.2ReflectionA matrix for pointwise reflection is defined by imposing symmetry uponthe orthogonal matrix; id est, a reflection matrix is completely definedby Q −1 = Q T = Q . The reflection matrix is an orthogonal matrix,characterized:Q T = Q , Q −1 = Q T , ‖Q‖ 2 = 1 (1474)The Householder matrix (B.3.1) is an example of a symmetric orthogonal(reflection) matrix.

548 APPENDIX B. SIMPLE MATRICESFigure 121: Gimbal: a mechanism imparting three degrees of dimensionalfreedom to a Euclidean body suspended at the device’s center. Each ring isfree to rotate about one axis. (Courtesy of The MathWorks Inc.)Reflection matrices have eigenvalues equal to ±1 and so detQ=±1. Itis natural to expect a relationship between reflection and projection matricesbecause all projection matrices have eigenvalues belonging to {0, 1}. Infact, any reflection matrix Q is related to some orthogonal projector P by[152,1, prob.44]Q = I − 2P (1475)Yet P is, generally, neither orthogonal or invertible. (E.3.2)λ(Q) ∈ R n , |λ(Q)| = 1 (1476)Reflection is with respect to R(P ) ⊥ . Matrix 2P −I represents antireflection.Every orthogonal matrix can be expressed as the product of a rotation anda reflection. The collection of all orthogonal matrices of particular dimensiondoes not form a convex set.B.5.3Rotation of range and rowspaceGiven orthogonal matrix Q , column vectors of a matrix X are simultaneouslyrotated by the product QX . In three dimensions (X ∈ R 3×N ), the precisemeaning of rotation is best illustrated in Figure 121 where the gimbal aidsvisualization of rotation achievable about the origin.

B.5. ORTHOGONAL MATRIX 549B.5.3.0.1 Example. One axis of revolution. [Partition an n+1-dimensional Euclidean space R n+1 =∆ RnRn-dimensional subspace]and define anR ∆ = {λ∈ R n+1 | 1 T λ = 0} (1477)(a hyperplane through the origin). We want an orthogonal matrix thatrotates a list in the columns of matrix X ∈ R n+1×N through the dihedralangle between R n and R (2.4.3)( ) ( )(R n 〈en+1 , 1〉1, R) = arccos = arccos √ radians (1478)‖e n+1 ‖ ‖1‖ n+1The vertex-description of the nonnegative orthant in R n+1 is{[e 1 e 2 · · · e n+1 ]a | a ≽ 0} = {a ≽ 0} = R n+1+ ⊂ R n+1 (1479)Consider rotation of these vertices via orthogonal matrixQ ∆ = [1 1 √ n+1ΞV W ]Ξ ∈ R n+1×n+1 (1480)where permutation matrix Ξ∈ S n+1 is defined in (1533), and V W ∈ R n+1×nis the orthonormal auxiliary matrix defined inB.4.3. This particularorthogonal matrix is selected because it rotates any point in subspace R nabout one axis of revolution onto R ; e.g., rotation Qe n+1 aligns the laststandard basis vector with subspace normal R ⊥ =1. The rotated standardbasis vectors remaining are orthonormal spanning R .Another interpretation of product QX is rotation/reflection of R(X).Rotation of X as in QXQ T is the simultaneous rotation/reflection of rangeand rowspace. B.8Proof. Any matrix can be expressed as a singular value decompositionX = UΣW T (1376) where δ 2 (Σ) = Σ , R(U) ⊇ R(X) , and R(W) ⊇ R(X T ).B.8 The product Q T AQ can be regarded as a coordinate transformation; e.g., givenlinear map y =Ax : R n → R n and orthogonal Q, the transformation Qy =AQx is arotation/reflection of the range and rowspace (120) of matrix A where Qy ∈ R(A) andQx∈ R(A T ) (121).

550 APPENDIX B. SIMPLE MATRICESB.5.4Matrix rotationOrthogonal matrices are also employed to rotate/reflect like vectors othermatrices: [sic] [110,12.4.1] Given orthogonal matrix Q , the product Q T Awill rotate A∈ R n×n in the Euclidean sense in R n2 because the Frobeniusnorm is orthogonally invariant (2.2.1);‖Q T A‖ F = √ tr(A T QQ T A) = ‖A‖ F (1481)(likewise for AQ). Were A symmetric, such a rotation would depart fromS n . One remedy is to instead form the product Q T AQ because‖Q T AQ‖ F =√tr(Q T A T QQ T AQ) = ‖A‖ F (1482)Matrix A is orthogonally equivalent to B if B=S T AS for someorthogonal matrix S . Every square matrix, for example, is orthogonallyequivalent to a matrix having equal entries along the main diagonal.[150,2.2, prob.3]B.5.4.1bijectionAny product of orthogonal matrices AQ remains orthogonal. Givenany other dimensionally compatible orthogonal matrix U , the mappingg(A)= U T AQ is a linear bijection on the domain of orthogonal matrices.[176,2.1]

Appendix CSome analytical optimal resultsinfC.1 properties of infima∅ = ∆ ∞ (1483)sup ∅ = ∆ −∞ (1484)Given f(x) : X →R defined on arbitrary set X [148,0.1.2]inf f(x) = − sup −f(x)x∈X x∈Xsupx∈Xf(x) = −infx∈X −f(x) (1485)arg inf f(x) = arg sup −f(x)x∈X x∈Xarg supx∈Xf(x) = arg infx∈X −f(x) (1486)Given f(x) : X →R and g(x) : X →R defined on arbitrary set X[148,0.1.2]inf (f(x) + g(x)) ≥ inf f(x) + inf g(x) (1487)x∈X x∈X x∈X2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.551

552 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven f(x) : X ∪ Y →R and arbitrary sets X and Y [148,0.1.2]X ⊂ Y ⇒ inf f(x) ≥ inf f(x) (1488)x∈X x∈Yinf f(x) = min{inf f(x), inf f(x)} (1489)x∈X ∪Y x∈X x∈Yinf f(x) ≥ max{inf f(x), inf f(x)} (1490)x∈X ∩Y x∈X x∈YOver some convex set C given vector constant y or matrix constant Yarg infx∈C ‖x − y‖ 2 = arg infx∈C ‖x − y‖2 2 (1491)arg infX∈ C ‖X − Y ‖ F = arg infX∈ C ‖X − Y ‖2 F (1492)C.2 diagonal, trace, singular and eigen valuesFor A∈ R m×n and σ(A) denoting its singular values, [46,A.1.6][91,1] (confer (36))∑iσ(A) i = tr √ A T A = sup tr(X T A) = maximize tr(X T A)‖X‖ 2 ≤1X∈R m×n [ I Xsubject toX T I]≽ 0= 1 2 minimizeX∈S m , Y ∈S nsubject totrX + trY[ ] X AA T ≽ 0Y(1493)

C.2. DIAGONAL, TRACE, SINGULAR AND EIGEN VALUES 553For X ∈ S m , Y ∈ S n , A∈ C ⊆ R m×n for set C convex, and σ(A)denoting the singular values of A [91,3]minimizeA∑σ(A) isubject to A ∈ Ci≡1minimize 2A , X , Ysubject totrX + trY[ ] X AA T ≽ 0YA ∈ C(1494)For A∈ S N + and β ∈ Rβ trA = maximize tr(XA)X∈ S Nsubject to X ≼ βI(1495)But the following statement is numerically stable, preventing anunbounded solution in direction of a 0 eigenvalue:maximize sgn(β) tr(XA)X∈ S Nsubject to X ≼ |β|IX ≽ −|β|I(1496)where β trA = tr(X ⋆ A). If β ≥ 0 , then X ≽−|β|I ← X ≽ 0.For A∈ S N having eigenvalues λ(A)∈ R N , its smallest and largesteigenvalue is respectively [9,4.1] [31,I.6.15] [150,4.2] [176,2.1]min{λ(A) i } = inf x T Ax = minimizei‖x‖=1X∈ S N +subject to trX = 1max{λ(A) i } = sup x T Ax = maximizei‖x‖=1X∈ S N +subject to trX = 1tr(XA) = maximize tt∈Rsubject to A ≽ tI(1497)tr(XA) = minimize tt∈Rsubject to A ≼ tI(1498)The smallest eigenvalue of any symmetric matrix is always a concavefunction of its entries, while the largest eigenvalue is always convex.[46, exmp.3.10] For v 1 a normalized eigenvector of A corresponding to

554 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSthe largest eigenvalue, and v N a normalized eigenvector correspondingto the smallest eigenvalue,v N = arg inf x T Ax (1499)‖x‖=1v 1 = arg sup x T Ax (1500)‖x‖=1For A∈ S N having eigenvalues λ(A)∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset(2.9.2.1) of the boundary of positive semidefinite cone S N + : Definingλ 1 ∆ = maxi{λ(A) i } and corresponding eigenvector v 1minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=(1501)‖λ(A)‖ 2 − λ 2 1 , λ 1 > 0arg minimizex‖xx T − A‖ 2 F ={0 , λ1 ≤ 0v 1√λ1 , λ 1 > 0(1502)Proof. This is simply the Eckart & Young solution from7.1.2:x ⋆ x ⋆T ={0 , λ1 ≤ 0λ 1 v 1 v T 1 , λ 1 > 0(1503)minimizexGiven nonincreasingly ordered diagonalization A = QΛQ TΛ = δ(λ(A)) (A.5), then (1501) has minimum valuewhere⎧‖QΛQ T ‖ 2 F = ‖δ(Λ)‖2 , λ 1 ≤ 0⎪⎨⎛⎡‖xx T −A‖ 2 F =λ 1 Q ⎜⎢0⎝⎣. ..⎪⎩ ∥0⎤ ⎞ ∥ ∥∥∥∥∥∥2⎡⎥ ⎟⎦− Λ⎠Q T ⎢=⎣∥Fλ 10.0⎤⎥⎦− δ(Λ)∥2(1504), λ 1 > 0

C.2. DIAGONAL, TRACE, SINGULAR AND EIGEN VALUES 555C.2.0.0.1 Exercise. Rank-1 approximation.Given symmetric matrix A∈ S N , prove:v 1 = arg minimize ‖xx T − A‖ 2 Fxsubject to ‖x‖ = 1(1505)where v 1 is a normalized eigenvector of A corresponding to its largesteigenvalue.(Ky Fan) For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged innonincreasing order, and for 1≤k ≤N [9,4.1] [158] [150,4.3.18][271,2] [176,2.1]N∑i=N−k+1λ(B) i = infU∈ R N×kU T U=Ik∑λ(B) i = supi=1U∈ R N×kU T U=Itr(UU T B) = minimizeX∈ S N +tr(XB)subject to X ≼ ItrX = k= maximize (k − N)µ + tr(B − Z)µ∈R , Z∈S N +subject to µI + Z ≽ Btr(UU T B) = maximizeX∈ S N +tr(XB)subject to X ≼ ItrX = k(a)(b)(c)= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z ≽ B(d)(1506)Given ordered diagonalization B = QΛQ T , (A.5.2) then an optimalU for the infimum is U ⋆ = Q(:, N − k+1:N)∈ R N×k whereasU ⋆ = Q(:, 1:k)∈ R N×k for the supremum. In both cases, X ⋆ = U ⋆ U ⋆T .Optimization (a) searches the convex hull of the outer product UU Tof all N ×k orthonormal matrices. (2.3.2.0.1)

556 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, and for diagonal matrix Υ∈ S k whose diagonal entries arearranged in nonincreasing order where 1≤k ≤N , we utilize themain-diagonal δ operator’s self-adjointness property (1249): [10,4.2]k∑i=1k∑i=1Υ ii λ(B) N−i+1 = inf tr(ΥU T BU) = infU∈ R N×kU T U=I= minimizeV i ∈S Ni=1U∈ R N×kU T U=Iδ(Υ) T δ(U T BU)()∑tr B k (Υ ii −Υ i+1,i+1 )V isubject to trV i = i ,I ≽ V i ≽ 0,where Υ k+1,k+1 ∆ = 0. We speculate,k∑Υ ii λ(B) i = supi=1U∈ R N×kU T U=IAlizadeh shows: [9,4.2]tr(ΥU T BU) = supU∈ R N×kU T U=I(1507)i=1... ki=1... kδ(Υ) T δ(U T BU) (1508)Υ ii λ(B) i = minimizek∑iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ∈S Nwhere Υ k+1,k+1 ∆ = 0.Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )V ii=1i=1... ksubject to trV i = i , i=1... kI ≽ V i ≽ 0, i=1... k (1509)The largest eigenvalue magnitude µ of A∈ S Nmax { |λ(A) i | } = minimize µi µ∈Rsubject to −µI ≼ A ≼ µI(1510)

C.2. DIAGONAL, TRACE, SINGULAR AND EIGEN VALUES 557is minimized over convex set C by semidefinite program: (confer7.1.5)id est,minimize ‖A‖ 2Asubject to A ∈ C≡minimize µµ , Asubject to −µI ≼ A ≼ µIA ∈ C(1511)µ ⋆ ∆ = maxi{ |λ(A ⋆ ) i | , i = 1... N } ∈ R + (1512)For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, let Πλ(B) be a permutation of eigenvalues λ(B) suchthat their absolute value becomes arranged in nonincreasingorder: |Πλ(B)| 1 ≥ |Πλ(B)| 2 ≥ · · · ≥ |Πλ(B)| N . Then, for 1≤k ≤N[9,4.3] C.1k∑|Πλ(B)| ii=1= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z + B ≽ 0µI + Z − B ≽ 0= maximize 〈B , V − W 〉V ,W ∈ S N +subject to I ≽ V , Wtr(V + W)=k(1513)For diagonal matrix Υ∈ S k whose diagonal entries are arranged innonincreasing order where 1≤k ≤Nk∑Υ ii |Πλ(B)| i = minimizek∑iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i + (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... ki=1= maximizeV i ,W i ∈S Nwhere Υ k+1,k+1 ∆ = 0.µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... kZ i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )(V i − W i )i=1i=1... ksubject to tr(V i + W i ) = i , i=1... kI ≽ V i ≽ 0, i=1... kI ≽ W i ≽ 0, i=1... k (1514)C.1 We eliminate a redundant positive semidefinite variable from Alizadeh’s minimization.There exist typographical errors in [218, (6.49) (6.55)] for this minimization.

558 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor A,B∈ S N whose eigenvalues λ(A), λ(B)∈ R N are respectivelyarranged in nonincreasing order, and for nonincreasingly ordereddiagonalizations A = W A ΥWA T and B = W B ΛW B T [149] [176,2.1]λ(A) T λ(B) = supU∈ R N×NU T U=I(confer (1537)) where optimal U istr(A T U T BU) ≥ tr(A T B) (1532)U ⋆ = W B W TA ∈ R N×N (1529)We can push that upper bound higher using a result inC.4.2.1:|λ(A)| T |λ(B)| = supU∈ C N×NU H U=IRe tr(A T U H BU) (1515)For step function ψ as defined in (1391), optimal U becomesU ⋆ = W B√δ(ψ(δ(Λ)))H√δ(ψ(δ(Υ))) WTA ∈ C N×N (1516)C.3 Orthogonal Procrustes problemGiven matrices A,B∈ R n×N , their product having full singular valuedecomposition (A.6.3)AB T ∆ = UΣQ T ∈ R n×n (1517)then an optimal solution R ⋆ to the orthogonal Procrustes problemminimize ‖A − R T B‖ FR(1518)subject to R T = R −1maximizes tr(A T R T B) over the nonconvex manifold of orthogonal matrices:[150,7.4.8]R ⋆ = QU T ∈ R n×n (1519)A necessary and sufficient condition for optimalityholds whenever R ⋆ is an orthogonal matrix. [114,4]AB T R ⋆ ≽ 0 (1520)

C.3. ORTHOGONAL PROCRUSTES PROBLEM 559Solution to problem (1518) can reveal rotation/reflection (5.5.2,B.5)of one list in the columns of A with respect to another list B . Solutionis unique if rankBV N = n . [77,2.4.1] The optimal value for objective ofminimization istr ( A T A + B T B − 2AB T R ⋆) = tr(A T A) + tr(B T B) − 2 tr(UΣU T )= ‖A‖ 2 F + ‖B‖2 F − 2δ(Σ)T 1while the optimal value for corresponding trace maximization is(1521)sup tr(A T R T B) = tr(A T R ⋆T B) = δ(Σ) T 1 ≥ tr(A T B) (1522)R T =R −1The same optimal solution R ⋆ solvesC.3.1Effect of translationmaximize ‖A + R T B‖ FR(1523)subject to R T = R −1Consider the impact of dc offset in known lists A,B∈ R n×N on problem(1518). Rotation of B there is with respect to the origin, so better resultsmay be obtained if offset is first accounted. Because the geometric centersof the lists AV and BV are the origin, instead we solveminimize ‖AV − R T BV ‖ FR(1524)subject to R T = R −1where V ∈ S N is the geometric centering matrix (B.4.1). Now we define thefull singular value decompositionand an optimal rotation matrixAV B T ∆ = UΣQ T ∈ R n×n (1525)R ⋆ = QU T ∈ R n×n (1519)The desired result is an optimally rotated offset listR ⋆T BV + A(I − V ) ≈ A (1526)which most closely matches the list in A . Equality is attained when the listsare precisely related by a rotation/reflection and an offset. When R ⋆T B=Aor B1=A1=0, this result (1526) reduces to R ⋆T B ≈ A .

560 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSC.3.1.1Translation of extended listSuppose an optimal rotation matrix R ⋆ ∈ R n×n were derived as beforefrom matrix B ∈ R n×N , but B is part of a larger list in the columns of[C B ]∈ R n×M+N where C ∈ R n×M . In that event, we wish to applythe rotation/reflection and translation to the larger list. The expressionsupplanting the approximation in (1526) makes 1 T of compatible dimension;R ⋆T [C −B11 T 1 NBV ] + A11 T 1 N(1527)id est, C −B11 T 1 N ∈ Rn×M and A11 T 1 N ∈ Rn×M+N .C.4 Two-sided orthogonal ProcrustesC.4.0.1MinimizationGiven symmetric A,B∈ S N , each having diagonalization (A.5.2)A ∆ = Q A Λ A Q T A , B ∆ = Q B Λ B Q T B (1528)where eigenvalues are arranged in their respective diagonal matrix Λ innonincreasing order, then an optimal solution [86]to the two-sided orthogonal Procrustes problemR ⋆ = Q B Q T A ∈ R N×N (1529)minimize ‖A − R T BR‖ FR= minimize tr ( A T A − 2A T R T BR + B T B )subject to R T = R −1 Rsubject to R T = R −1 (1530)maximizes tr(A T R T BR) over the nonconvex manifold of orthogonal matrices.Optimal product R ⋆T BR ⋆ has the eigenvectors of A but the eigenvalues of B .[114,7.5.1] The optimal value for the objective of minimization is, by (40)‖Q A Λ A Q T A −R ⋆T Q B Λ B Q T BR ⋆ ‖ F = ‖Q A (Λ A −Λ B )Q T A ‖ F = ‖Λ A −Λ B ‖ F (1531)while the corresponding trace maximization has optimal valuesup tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A Λ B ) ≥ tr(A T B) (1532)R T =R −1

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES 561C.4.0.2MaximizationAny permutation matrix is an orthogonal matrix. Defining a row- andcolumn-swapping permutation matrix (a reflection matrix, B.5.2)⎡ ⎤0 1·Ξ = Ξ T =⎢ ·⎥(1533)⎣ 1 ⎦1 0then an optimal solution R ⋆ to the maximization problem [sic]minimizes tr(A T R T BR) : [149] [176,2.1]maximize ‖A − R T BR‖ FR(1534)subject to R T = R −1R ⋆ = Q B ΞQ T A ∈ R N×N (1535)The optimal value for the objective of maximization is‖Q A Λ A Q T A − R⋆T Q B Λ B Q T B R⋆ ‖ F = ‖Q A Λ A Q T A − Q A ΞT Λ B ΞQ T A ‖ F= ‖Λ A − ΞΛ B Ξ‖ F(1536)while the corresponding trace minimization has optimal valueC.4.1inf tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A ΞΛ B Ξ) (1537)R T =R −1Procrustes’ relation to linear programmingAlthough these two-sided Procrustes problems are nonconvex, a connectionwith linear programming [64] was discovered by Anstreicher & Wolkowicz[10,3] [176,2.1]: Given A,B∈ S N , this semidefinite program in S and Tminimize tr(A T R T BR) = maximize tr(S + T ) (1538)RS , T ∈S Nsubject to R T = R −1 subject to A T ⊗ B − I ⊗ S − T ⊗ I ≽ 0

562 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS(where ⊗ signifies Kronecker product (D.1.2.1)) has optimal objectivevalue (1537). These two problems are strong duals (2.13.1.0.3). Givenordered diagonalizations (1528), make the observation:infR tr(AT R T BR) = inf tr(Λ ˆRT A Λ B ˆR) (1539)ˆRbecause ˆR =Q ∆ B TRQ A on the set of orthogonal matrices (which includes thepermutation matrices) is a bijection. This means, basically, diagonal matricesof eigenvalues Λ A and Λ B may be substituted for A and B , so only the maindiagonals of S and T come into play;maximizeS,T ∈S N 1 T δ(S + T )subject to δ(Λ A ⊗ (ΞΛ B Ξ) − I ⊗ S − T ⊗ I) ≽ 0(1540)a linear program in δ(S) and δ(T) having the same optimal objective valueas the semidefinite program (1538).We relate their results to Procrustes problem (1530) by manipulatingsigns (1485) and permuting eigenvalues:maximize tr(A T R T BR) = minimize 1 T δ(S + T )RS , T ∈S Nsubject to R T = R −1 subject to δ(I ⊗ S + T ⊗ I − Λ A ⊗ Λ B ) ≽ 0= minimize tr(S + T )(1541)S , T ∈S Nsubject to I ⊗ S + T ⊗ I − A T ⊗ B ≽ 0This formulation has optimal objective value identical to that in (1532).C.4.2Two-sided orthogonal Procrustes via SVDBy making left- and right-side orthogonal matrices independent, we can pushthe upper bound on trace (1532) a little further: Given real matrices A,Beach having full singular value decomposition (A.6.3)A ∆ = U A Σ A Q T A ∈ R m×n , B ∆ = U B Σ B Q T B ∈ R m×n (1542)then a well-known optimal solution R ⋆ , S ⋆ to the problemminimize ‖A − SBR‖ FR , Ssubject to R H = R −1(1543)S H = S −1

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES 563maximizes Re tr(A T SBR) : [238] [215] [38] [144] optimal orthogonal matricesS ⋆ = U A U H B ∈ R m×m , R ⋆ = Q B Q H A ∈ R n×n (1544)[sic] are not necessarily unique [150,7.4.13] because the feasible set is notconvex. The optimal value for the objective of minimization is, by (40)‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖U A (Σ A − Σ B )Q H A ‖ F = ‖Σ A − Σ B ‖ F (1545)while the corresponding trace maximization has optimal value [31,III.6.12]sup | tr(A T SBR)| = sup Re tr(A T SBR) = Re tr(A T S ⋆ BR ⋆ ) = tr(ΣA TΣ B ) ≥ tr(AT B)R H =R −1R H =R −1S H =S −1 S H =S −1 (1546)for which it is necessaryA T S ⋆ BR ⋆ ≽ 0 , BR ⋆ A T S ⋆ ≽ 0 (1547)The lower bound on inner product of singular values in (1546) is due tovon Neumann. Equality is attained if U HA U B =I and QH B Q A =I .C.4.2.1Symmetric matricesNow optimizing over the complex manifold of unitary matrices (B.5.1),the upper bound on trace (1532) is thereby raised: Suppose we are givendiagonalizations for (real) symmetric A,B (A.5)A = W A ΥW TA ∈ S n , δ(Υ) ∈ K M (1548)B = W B ΛW T B ∈ S n , δ(Λ) ∈ K M (1549)having their respective eigenvalues in diagonal matrices Υ, Λ ∈ S n arrangedin nonincreasing order (membership to the monotone cone K M (378)). Thenby splitting eigenvalue signs, we invent a symmetric SVD-like decompositionA ∆ = U A Σ A Q H A ∈ S n , B ∆ = U B Σ B Q H B ∈ S n (1550)where U A , U B , Q A , Q B ∈ C n×n are unitary matrices defined by (conferA.6.5)U A ∆ = W A√δ(ψ(δ(Υ))) , QA ∆ = W A√δ(ψ(δ(Υ)))H, ΣA = |Υ| (1551)U B ∆ = W B√δ(ψ(δ(Λ))) , QB ∆ = W B√δ(ψ(δ(Λ)))H, ΣB = |Λ| (1552)

564 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSwhere step function ψ is defined in (1391). In this circumstance,S ⋆ = U A U H B = R ⋆T ∈ C n×n (1553)optimal matrices (1544) now unitary are related by transposition.optimal value of objective (1545) isThe‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖ |Υ| − |Λ| ‖ F (1554)while the corresponding optimal value of trace maximization (1546) isC.4.2.2supR H =R −1S H =S −1 Re tr(A T SBR) = tr(|Υ| |Λ|) (1555)Diagonal matricesNow suppose A and B are diagonal matricesA = Υ = δ 2 (Υ) ∈ S n , δ(Υ) ∈ K M (1556)B = Λ = δ 2 (Λ) ∈ S n , δ(Λ) ∈ K M (1557)both having their respective main diagonal entries arranged in nonincreasingorder:minimize ‖Υ − SΛR‖ FR , Ssubject to R H = R −1(1558)S H = S −1Then we have a symmetric decomposition from unitary matrices as in (1550)whereU A ∆ = √ δ(ψ(δ(Υ))) , Q A ∆ = √ δ(ψ(δ(Υ))) H , Σ A = |Υ| (1559)U B ∆ = √ δ(ψ(δ(Λ))) , Q B ∆ = √ δ(ψ(δ(Λ))) H , Σ B = |Λ| (1560)Procrustes solution (1544) again sees the transposition relationshipS ⋆ = U A U H B = R ⋆T ∈ C n×n (1553)but both optimal unitary matrices are now themselves diagonal. So,S ⋆ ΛR ⋆ = δ(ψ(δ(Υ)))Λδ(ψ(δ(Λ))) = δ(ψ(δ(Υ)))|Λ| (1561)

Appendix DMatrix calculusFrom too much study, and from extreme passion, cometh madnesse.−Isaac Newton [105,5]D.1 Directional derivative, Taylor seriesD.1.1GradientsGradient of a differentiable real function f(x) : R K →R with respect to itsvector domain is defined⎡ ⎤∇f(x) =⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎥⎦ ∈ RK (1562)while the second-order gradient of the twice differentiable real function withrespect to its vector domain is traditionally called the Hessian ;⎡∇ 2 f(x) =⎢⎣∂ 2 f(x)∂ 2 x 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 1 ∂x 2· · ·∂ 2 f(x)∂ 2 x 2· · ·.. . .∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂ 2 x K⎤∈ S K (1563)⎥⎦2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.565

566 APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R→R N on real domain isa row-vector[∇v(x) =∆∂v 1 (x)∂xwhile the second-order gradient is∂v 2 (x)∂x· · ·∂v N (x)∂x]∈ R N (1564)[∇ 2 v(x) =∆∂ 2 v 1 (x)∂x 2∂ 2 v 2 (x)∂x 2 · · ·]∂ 2 v N (x)∈ R N (1565)∂x 2Gradient of vector-valued function h(x) : R K →R N on vector domain is⎡∇h(x) =∆ ⎢⎣∂h 1 (x)∂x 1∂h 1 (x)∂x 2.∂h 2 (x)∂x 1· · ·∂h 2 (x)∂x 2· · ·.∂h N (x)∂x 1∂h N (x)∂x 2.⎦(1566)∂h 1 (x) ∂h 2 (x) ∂h∂x K ∂x K· · · N (x)∂x K= [ ∇h 1 (x) ∇h 2 (x) · · · ∇h N (x) ] ∈ R K×Nwhile the second-order gradient has a three-dimensional representationdubbed cubix ; D.1⎡∇ 2 h(x) =∆ ⎢⎣∇ ∂h 1(x)∂x 1∇ ∂h 1(x)∂x 2.⎤⎥∇ ∂h 2(x)∂x 1· · · ∇ ∂h N(x)∂x 1∇ ∂h 2(x)∂x 2· · · ∇ ∂h N(x)∂x 2. .∇ ∂h 2(x)∂x K· · · ∇ ∂h N(x)⎦(1567)∇ ∂h 1(x)∂x K∂x K= [ ∇ 2 h 1 (x) ∇ 2 h 2 (x) · · · ∇ 2 h N (x) ] ∈ R K×N×Kwhere the gradient of each real entry is with respect to vector x as in (1562).⎤⎥D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derivedfrom mater meaning mother.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 567The gradient of real function g(X) : R K×L →R on matrix domain is⎡∇g(X) =∆ ⎢⎣=∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)[∇X(:,1) g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤∈ R K×L⎥⎦∇ X(:,2) g(X)...∇ X(:,L) g(X) ] ∈ R K×1×L (1568)where the gradient ∇ X(:,i) is with respect to the i th column of X . Thestrange appearance of (1568) in R K×1×L is meant to suggest a third dimensionperpendicular to the page (not a diagonal matrix). The second-order gradienthas representation⎡∇ 2 g(X) =∆ ⎢⎣=∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1[∇∇X(:,1) g(X)∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L. .∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤∈ R K×L×K×L⎥⎦∇∇ X(:,2) g(X)...∇∇ X(:,L) g(X) ] ∈ R K×1×L×K×L (1569)where the gradient ∇ is with respect to matrix X .

568 APPENDIX D. MATRIX CALCULUSGradient of vector-valued function g(X) : R K×L →R N on matrix domainis a cubix[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∆ =∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X).........∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] ∈ R K×N×L (1570)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∆ =∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X).........∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] ∈ R K×N×L×K×L (1571)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix⎡∇g(X) =∆ ⎢⎣∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X). .∇g M1 (X) ∇g M2 (X) · · ·.∇g MN (X)⎤⎥⎦ ∈ RM×N×K×L (1572)while the second-order gradient has six-dimensional representation⎡∇ 2 g(X) =∆ ⎢⎣and so on.∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤⎥⎦ ∈ RM×N×K×L×K×L(1573)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 569D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [39,8.3] [239]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1574)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1575)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1576)using the product rule. Formula (1574) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1577)Consider the first of the two terms:∇ X (f)g = ∇ X (X T a)Xb= [ ∇(X T a) 1 ∇(X T a) 2]Xb(1578)The gradient of X T a forms a cubix in R 2×2×2 .⎡∇ X (X T a)Xb =⎢⎣∂(X T a) 1∂X 11 ∂(X T a) 1∂X 21 ∂(X T a) 2∂X 11 ∂(X T a) 1∂(X T a) 2∂X 12 ∂X 12∂(X T a) 2∂X 21 ∂(X T a) 1∂(X T a) 2∂X 22 ∂X 22⎤⎡⎢⎣⎥⎦(1579)⎤(Xb) 1⎥⎦ ∈ R 2×1×2(Xb) 2

570 APPENDIX D. MATRIX CALCULUSBecause gradient of the product (1576) requires total change with respectto change in each entry of matrix X , the Xb vector must make an innerproduct with each vector in the second dimension of the cubix (indicated bydotted line segments);⎡∇ X (X T a)Xb = ⎢⎣⎤a 1 00 a 1⎥a 2 0 ⎦0 a 2[ ]b1 X 11 + b 2 X 12∈ R 2×1×2b 1 X 21 + b 2 X 22(1580)[ ]a1 (b= 1 X 11 + b 2 X 12 ) a 1 (b 1 X 21 + b 2 X 22 )∈ R 2×2a 2 (b 1 X 11 + b 2 X 12 ) a 2 (b 1 X 21 + b 2 X 22 )= ab T X Twhere the cubix appears as a complete 2 ×2×2 matrix. In like manner forthe second term ∇ X (g)f⎡ ⎤b 1 0∇ X (Xb)X T b 2 0[ ]X11 aa =1 + X 21 a 2⎢ ⎥∈ R 2×1×2⎣ 0 b 1⎦ X 12 a 1 + X 22 a 2 (1581)0 b 2= X T ab T ∈ R 2×2The solution∇ X a T X 2 b = ab T X T + X T ab T (1582)can be found from Table D.2.1 or verified using (1575).D.1.2.1Kronecker productA partial remedy for venturing into hyperdimensional representations, suchas the cubix or quartix, is to first vectorize matrices as in (30). This devicegives rise to the Kronecker product of matrices ⊗ ; a.k.a, direct productor tensor product. Although it sees reversal in the literature, [247,2.1] weadopt the definition: for A∈ R m×n and B ∈ R p×q⎡⎤B 11 A B 12 A · · · B 1q AB ⊗ A =∆ B 21 A B 22 A · · · B 2q A⎢⎥⎣ . . . ⎦ ∈ Rpm×qn (1583)B p1 A B p2 A · · · B pq A

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 571One advantage to vectorization is existence of a traditionaltwo-dimensional matrix representation for the second-order gradient ofa real function with respect to a vectorized matrix. For example, fromA.1.1 no.23 (D.2.1) for square A,B∈R n×n [116,5.2] [10,3]∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗A) vec X = B⊗A T +B T ⊗A ∈ R n2 ×n 2(1584)To disadvantage is a large new but known set of algebraic rules and thefact that its mere use does not generally guarantee two-dimensional matrixrepresentation of gradients.Another application of the Kronecker product is to reverse order ofappearance in a matrix product: Suppose we wish to weight the columnsof a matrix S ∈ R M×N , for example, by respective entries w i from themain-diagonal inW ∆ =⎡⎣ w ⎤1 0... ⎦∈ S N (1585)0 T w NThe conventional way of accomplishing that is to multiply S by diagonalmatrix W on the right-hand side: D.2⎡ ⎤w 1 0SW = S⎣... ⎦= [ ]S(:, 1)w 1 · · · S(:, N)w N ∈ R M×N (1586)0 T w NTo reverse product order such that diagonal matrix W instead appears tothe left of S : (Sze Wan)⎡⎤S(:, 1) 0 0SW = (δ(W) T ⊗ I) ⎢ 0 S(:, 2)...⎥⎣ ...... 0 ⎦ ∈ RM×N (1587)0 0 S(:, N)where I ∈ S M . For any matrices of like size, S,Y ∈ R M×N⎡⎤S(:, 1) 0 0S ◦Y = [ δ(Y (:, 1)) · · · δ(Y (:, N)) ] ⎢ 0 S(:, 2)...⎥⎣ ...... 0 ⎦ ∈ RM×N0 0 S(:, N)(1588)D.2 Multiplying on the left by W ∈ S M would instead weight the rows of S .

572 APPENDIX D. MATRIX CALCULUSwhich converts a Hadamard product into a standard matrix product.Hadamard product can be extracted from within the Kronecker product.[150, p.475]D.1.3Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X) [161,15.7]∇ X g ( f(X) T) = ∇ X f T ∇ f g (1589)∇ 2 X g( f(X) T) = ∇ X(∇X f T ∇ f g ) = ∇ 2 X f ∇ f g + ∇ X f T ∇ 2f g ∇ Xf (1590)D.1.3.1Two arguments∇ X g ( f(X) T , h(X) T) = ∇ X f T ∇ f g + ∇ X h T ∇ h g (1591)D.1.3.1.1 Example. Chain rule for two arguments. [30,1.1]∇ x g ( f(x) T , h(x) T) =g ( f(x) T , h(x) T) = (f(x) + h(x)) T A (f(x) + h(x)) (1592)[ ] [ ]x1εx1f(x) = , h(x) =(1593)εx 2 x 2∇ x g ( f(x) T , h(x) T) =[ 1 00 ε][ ε 0(A +A T )(f + h) +0 1[ 1 + ε 00 1 + ε](A +A T )(f + h)] ([ ] [ ])(A +A T x1 εx1) +εx 2 x 2(1594)(1595)from Table D.2.1.lim ∇ xg ( f(x) T , h(x) T) = (A +A T )x (1596)ε→0These formulae remain correct when gradient produces hyperdimensionalrepresentation:

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 573D.1.4First directional derivativeAssume that a differentiable function g(X) : R K×L →R M×N has continuousfirst- and second-order gradients ∇g and ∇ 2 g over domg which is an openset. We seek simple expressions for the first and second directional derivativesin direction Y ∈R K×L →Y, dg ∈ R M×N and dg →Y2 ∈ R M×N respectively.Assuming that the limit exists, we may state the partial derivative of themn th entry of g with respect to the kl th entry of X ;∂g mn (X)∂X klg mn (X + ∆t e= limk e T l ) − g mn(X)∆t→0 ∆t∈ R (1597)where e k is the k th standard basis vector in R K while e l is the l th standardbasis vector in R L . The total number of partial derivatives equals KLMNwhile the gradient is defined in their terms; the mn th entry of the gradient is⎡∇g mn (X) =⎢⎣∂g mn(X)∂X 11∂g mn(X)∂X 21.∂g mn(X)∂X K1∂g mn(X)∂X 12· · ·∂g mn(X)∂X 22· · ·.∂g mn(X)∂X K2· · ·∂g mn(X)∂X 1L∂g mn(X)∂X 2L.∂g mn(X)∂X KL⎤∈ R K×L (1598)⎥⎦while the gradient is a quartix⎡∇g(X) = ⎢⎣∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X). .∇g M1 (X) ∇g M2 (X) · · ·.∇g MN (X)⎤⎥⎦ ∈ RM×N×K×L (1599)By simply rotating our perspective of the four-dimensional representation ofgradient matrix, we find one of three useful transpositions of this quartix(connoted T 1 ):

574 APPENDIX D. MATRIX CALCULUS⎡∇g(X) T 1=⎢⎣∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N (1600)When the limit for ∆t ∈ R exists, it is easy to show by substitution ofvariables in (1597)∂g mn (X) g mn (X + ∆t Y kl eY kl = limk e T l ) − g mn(X)∂X kl ∆t→0 ∆t∈ R (1601)which may be interpreted as the change in g mn at X when the change in X klis equal to Y kl , the kl th entry of any Y ∈ R K×L . Because the total changein g mn (X) due to Y is the sum of change with respect to each and everyX kl , the mn th entry of the directional derivative is the corresponding totaldifferential [161,15.8]dg mn (X)| dX→Y= ∑ k,l∂g mn (X)∂X klY kl = tr ( ∇g mn (X) T Y ) (1602)= ∑ g mn (X + ∆t Y kl elimk e T l ) − g mn(X)(1603)∆t→0 ∆tk,lg mn (X + ∆t Y ) − g mn (X)= lim(1604)∆t→0 ∆t= d dt∣ g mn (X+ t Y ) (1605)t=0where t∈ R . Assuming finite Y , equation (1604) is called the Gâteauxdifferential [29, App.A.5] [148,D.2.1] [270,5.28] whose existence is impliedby existence of the Fréchet differential (the sum in (1602)). [183,7.2] Eachmay be understood as the change in g mn at X when the change in X is equal

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 575in magnitude and direction to Y . D.3 Hence the directional derivative,⎡⎤dg 11 (X) dg 12 (X) · · · dg 1N (X)→Ydg(X) =∆ dg 21 (X) dg 22 (X) · · · dg 2N (X)⎢⎥∈ R M×N⎣ . . . ⎦∣dg M1 (X) dg M2 (X) · · · dg MN (X)⎡= ⎢⎣⎡∣dX→Ytr ( ∇g 11 (X) T Y ) tr ( ∇g 12 (X) T Y ) · · · tr ( ∇g 1N (X) T Y ) ⎤tr ( ∇g 21 (X) T Y ) tr ( ∇g 22 (X) T Y ) · · · tr ( ∇g 2N (X) T Y )⎥...tr ( ∇g M1 (X) T Y ) tr ( ∇g M2 (X) T Y ) · · · tr ( ∇g MN (X) T Y ) ⎦∑k,l∑=k,l⎢⎣ ∑k,l∂g 11 (X)∂X klY kl∑k,l∂g 21 (X)∂X klY kl∑k,l.∂g M1 (X)∑∂X klY klk,l∂g 12 (X)∂X klY kl · · ·∂g 22 (X)∂X klY kl · · ·.∂g M2 (X)∂X klY kl · · ·∑k,l∑k,l∑k,l∂g 1N (X)∂X klY kl∂g 2N (X)∂X klY kl.∂g MN (X)∂X kl⎤⎥⎦Y kl(1606)from which it follows→Ydg(X) = ∑ k,l∂g(X)∂X klY kl (1607)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈ Rg(X+ t Y ) = g(X) + t →Ydg(X) + o(t 2 ) (1608)which is the first-order Taylor series expansion about X . [161,18.4][104,2.3.4] Differentiation with respect to t and subsequent t-zeroingisolates the second term of expansion. Thus differentiating and zeroingg(X+ t Y ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X+ t Y ) as in (1605). So the directional derivativeof g(X) : R K×L →R M×N in any direction Y ∈ R K×L evaluated at X ∈ domgbecomes→Ydg(X) = d dt∣ g(X+ t Y ) ∈ R M×N (1609)t=0D.3 Although Y is a matrix, we may regard it as a vector in R KL .

576 APPENDIX D. MATRIX CALCULUSυ ∆ =⎡⎢⎣∇ x f(α)→∇ xf(α)1df(α)2⎤⎥⎦f(α + t y)υ T(f(α),α)✡ ✡✡✡✡✡✡✡✡✡f(x)∂HFigure 122: Convex quadratic bowl in R 2 ×R ; f(x)= x T x : R 2 → Rversus x on some open disc in R 2 . Plane slice ∂H is perpendicular tofunction domain. Slice intersection with domain connotes bidirectionalvector y . Slope of tangent line T at point (α , f(α)) is value of directionalderivative ∇ x f(α) T y (1634) at α in slice direction y . Negative gradient−∇ x f(x)∈ R 2 is direction of steepest descent. [285] [161,15.6] [104] Whenvector υ ∈[ R 3 entry ] υ 3 is half directional derivative in gradient direction at αυ1and when = ∇υ x f(α) , then −υ points directly toward bowl bottom.2[206,2.1,5.4.5] [27,6.3.1] which is simplest. In case of a real functiong(X) : R K×L →RIn case g(X) : R K →R→Ydg(X) = tr ( ∇g(X) T Y ) (1631)→Ydg(X) = ∇g(X) T Y (1634)Unlike gradient, directional derivative does not expand dimension;directional derivative (1609) retains the dimensions of g . The derivativewith respect to t makes the directional derivative resemble ordinary calculus→Y(D.2); e.g., when g(X) is linear, dg(X) = g(Y ). [183,7.2]

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 577D.1.4.1Interpretation of directional derivativeIn the case of any differentiable real function g(X) : R K×L →R , thedirectional derivative of g(X) at X in any direction Y yields the slope ofg along the line {X+ t Y | t ∈ R} through its domain evaluated at t = 0.For higher-dimensional functions, by (1606), this slope interpretation can beapplied to each entry of the directional derivative.Figure 122, for example, shows a plane slice of a real convex bowl-shapedfunction f(x) along a line {α + t y | t ∈ R} through its domain. The slicereveals a one-dimensional real function of t ; f(α + t y). The directionalderivative at x = α in direction y is the slope of f(α + t y) with respectto t at t = 0. In the case of a real function having vector argumenth(X) : R K →R , its directional derivative in the normalized direction of itsgradient is the gradient magnitude. (1634) For a real function of real variable,the directional derivative evaluated at any point in the function domain is justthe slope of that function there scaled by the real direction. (confer3.1.8)Directional derivative generalizes our one-dimensional notion of derivativeto a multidimensional domain. When direction Y coincides with a memberof the standard unit Cartesian basis e k e T l (51), then a single partial derivative∂g(X)/∂X kl is obtained from directional derivative (1607); such is each entryof gradient ∇g(X) in equalities (1631) and (1634), for example.D.1.4.1.1 Theorem. Directional derivative optimality condition.[183,7.4] Suppose f(X) : R K×L →R is minimized on convex set C ⊆ R p×kby X ⋆ , and the directional derivative of f exists there. Then for all X ∈ CD.1.4.1.2 Example. Simple bowl.Bowl function (Figure 122)→X−X ⋆df(X) ≥ 0 (1610)f(x) : R K → R ∆ = (x − a) T (x − a) − b (1611)has function offset −b ∈ R , axis of revolution at x = a , and positive definiteHessian (1563) everywhere in its domain (an open hyperdisc in R K ); id est,strictly convex quadratic f(x) has unique global minimum equal to −b at⋄

∇ ∂g mn(X)∂X kl578 APPENDIX D. MATRIX CALCULUSx = a . A vector −υ based anywhere in domf × R pointing toward theunique bowl-bottom is specified:[ ] x − aυ ∝∈ R K × R (1612)f(x) + bSuch a vector issince the gradient isυ =⎡⎢⎣∇ x f(x)→∇ xf(x)1df(x)2⎤⎥⎦ (1613)∇ x f(x) = 2(x − a) (1614)and the directional derivative in the direction of the gradient is (1634)D.1.5→∇ xf(x)df(x) = ∇ x f(x) T ∇ x f(x) = 4(x − a) T (x − a) = 4(f(x) + b) (1615)Second directional derivativeBy similar argument, it so happens: the second directional derivative isequally simple. Given g(X) : R K×L →R M×N on open domain,⎡∂ 2 g mn(X)⎤∂X kl ∂X 12· · ·= ∂∇g mn(X)∂X kl=⎡∇ 2 g mn (X) =⎢⎣⎡=⎢⎣⎢⎣∇ ∂gmn(X)∂X 11∇ ∂gmn(X)∂X 21.∇ ∂gmn(X)∂X K1∂∇g mn(X)∂X 11∂∇g mn(X)∂X 21.∂∇g mn(X)∂X K1∂ 2 g mn(X)∂X kl ∂X 11∂ 2 g mn(X)∂X kl ∂X 21∂ 2 g mn(X)∂X kl ∂X K1∂ 2 g mn(X)∂X kl ∂X 1L∂ 2 g mn(X) ∂∂X kl ∂X 22· · ·2 g mn(X)∂X kl ∂X 2L∈ R K×L (1616)⎥. . . ⎦∂ 2 g mn(X) ∂∂X kl ∂X K2· · ·2 g mn(X)∂X kl ∂X KL⎤∇ ∂gmn(X)∂X 12· · · ∇ ∂gmn(X)∂X 1L∇ ∂gmn(X)∂X 22· · · ∇ ∂gmn(X)∂X 2L∈ R K×L×K×L⎥. . ⎦∇ ∂gmn(X)∂X K2· · · ∇ ∂gmn(X)∂X KL(1617)∂∇g mn(X)⎤∂X 12· · ·∂∇g mn(X)∂X 22.· · ·∂∇g mn(X)∂X K2· · ·∂∇g mn(X)∂X 1L∂∇g mn(X)∂X 2L.∂∇g mn(X)∂X KL⎥⎦

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 579Rotating our perspective, we get several views of the second-order gradient:⎡∇ 2 g(X) = ⎢⎣∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤⎥⎦ ∈ RM×N×K×L×K×L (1618)⎡∇ 2 g(X) T 1=⎢⎣∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22.· · · ∇ ∂g(X)∂X 2L.∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N×K×L (1619)⎡∇ 2 g(X) T 2=⎢⎣∂∇g(X)∂X 11∂∇g(X)∂X 21.∂∇g(X)∂X K1∂∇g(X)∂X 12· · ·∂∇g(X)∂X 22.· · ·∂∇g(X)∂X K2· · ·∂∇g(X)∂X 1L∂∇g(X)∂X 2L.∂∇g(X)∂X KL⎤⎥⎦ ∈ RK×L×K×L×M×N (1620)Assuming the limits exist, we may state the partial derivative of the mn thentry of g with respect to the kl th and ij th entries of X ;∂ 2 g mn(X)∂X kl ∂X ij=g mn(X+∆t elimk e T l +∆τ e i eT j )−gmn(X+∆t e k eT l )−(gmn(X+∆τ e i eT )−gmn(X))j∆τ,∆t→0∆τ ∆t(1621)Differentiating (1601) and then scaling by Y ij∂ 2 g mn(X)∂X kl ∂X ijY kl Y ij = lim∆t→0∂g mn(X+∆t Y kl e k e T l )−∂gmn(X)∂X ijY∆tij(1622)g mn(X+∆t Y kl e= limk e T l +∆τ Y ij e i e T j )−gmn(X+∆t Y kl e k e T l )−(gmn(X+∆τ Y ij e i e T )−gmn(X))j∆τ,∆t→0∆τ ∆t

580 APPENDIX D. MATRIX CALCULUSwhich can be proved by substitution of variables in (1621).second-order total differential due to any Y ∈R K×L isThe mn thd 2 g mn (X)| dX→Y= ∑ i,j∑k,l∂ 2 g mn (X)Y kl Y ij = tr(∇ X tr ( ∇g mn (X) T Y ) )TY∂X kl ∂X ij(1623)= ∑ ∂g mn (X + ∆t Y ) − ∂g mn (X)limY ij∆t→0 ∂Xi,jij ∆t(1624)g mn (X + 2∆t Y ) − 2g mn (X + ∆t Y ) + g mn (X)= lim∆t→0∆t 2 (1625)= d2dt 2 ∣∣∣∣t=0g mn (X+ t Y ) (1626)Hence the second directional derivative,→Ydg 2 (X) ∆ =⎡⎢⎣⎡tr(∇tr ( ∇g 11 (X) T Y ) )TY =tr(∇tr ( ∇g 21 (X) T Y ) )TY ⎢⎣ .tr(∇tr ( ∇g M1 (X) T Y ) )TYd 2 g 11 (X) d 2 g 12 (X) · · · d 2 g 1N (X)d 2 g 21 (X) d 2 g 22 (X) · · · d 2 g 2N (X). .d 2 g M1 (X) d 2 g M2 (X) · · ·.d 2 g MN (X)⎤⎥∈ R M×N⎦∣∣dX→Ytr(∇tr ( ∇g 12 (X) T Y ) )TY · · · tr(∇tr ( ∇g 1N (X) T Y ) ) ⎤TYtr(∇tr ( ∇g 22 (X) T Y ) )TY · · · tr(∇tr ( ∇g 2N (X) T Y ) )T Y ..tr(∇tr ( ∇g M2 (X) T Y ) )TY · · · tr(∇tr ( ∇g MN (X) T Y ) ⎥) ⎦TY⎡=⎢⎣∑ ∑i,jk,l∑ ∑i,jk,l∂ 2 g 11 (X)∂X kl ∂X ijY kl Y ij∂ 2 g 21 (X)∂X kl ∂X ijY kl Y ij.∑ ∑∂ 2 g M1 (X)∂X kl ∂X ijY kl Y iji,jk,l∑∑∂ 2 g 12 (X)∂X kl ∂X ijY kl Y ij · · ·i,ji,jk,l∑∑∂ 2 g 22 (X)∂X kl ∂X ijY kl Y ij · · ·k,l.∑∑∂ 2 g M2 (X)∂X kl ∂X ijY kl Y ij · · ·i,jk,l∑∑⎤∂ 2 g 1N (X)∂X kl ∂X ijY kl Y iji,j k,l∑∑∂ 2 g 2N (X)∂X kl ∂X ijY kl Y iji,j k,l. ⎥∑ ∑ ∂ 2 g MN (X) ⎦∂X kl ∂X ijY kl Y iji,jk,l(1627)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 581from which it follows→Ydg 2 (X) = ∑ i,j∑k,l∂ 2 g(X)∂X kl ∂X ijY kl Y ij = ∑ i,j∂∂X ij→Ydg(X)Y ij (1628)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t dg(X) →Y+ 1 →Yt2dg 2 (X) + o(t 3 ) (1629)2!which is the second-order Taylor series expansion about X . [161,18.4][104,2.3.4] Differentiating twice with respect to t and subsequent t-zeroingisolates the third term of the expansion. Thus differentiating and zeroingg(X+ t Y ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X+ t Y ) as in (1626). So the second directionalderivative of g(X) : R K×L →R M×N becomes [206,2.1,5.4.5] [27,6.3.1]→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ∈ R M×N (1630)which is again simplest. (confer (1609)) Directional derivative retains thedimensions of g .D.1.6directional derivative expressionsIn the case of a real function g(X) : R K×L →R , all its directional derivativesare in R :→Ydg(X) = tr ( ∇g(X) T Y ) (1631)→Ydg 2 (X) = tr(∇ X tr ( ∇g(X) T Y ) ) ( )T→YY = tr ∇ X dg(X) T Y(→Ydg 3 (X) = tr ∇ X tr(∇ X tr ( ∇g(X) T Y ) ) ) ( )T TY →YY = tr ∇ X dg 2 (X) T Y(1632)(1633)

582 APPENDIX D. MATRIX CALCULUSIn the case g(X) : R K →R has vector argument, they further simplify:→Ydg(X) = ∇g(X) T Y (1634)→Ydg 2 (X) = Y T ∇ 2 g(X)Y (1635)and so on.→Ydg 3 (X) = ∇ X(Y T ∇ 2 g(X)Y ) TY (1636)D.1.7Taylor seriesSeries expansions of the differentiable matrix-valued function g(X) , ofmatrix argument, were given earlier in (1608) and (1629). Assuming g(X)has continuous first-, second-, and third-order gradients over the open setdomg , then for X ∈ dom g and any Y ∈ R K×L the complete Taylor serieson some open interval of µ∈R is expressedg(X+µY ) = g(X) + µ dg(X) →Y+ 1 →Yµ2dg 2 (X) + 1 →Yµ3dg 3 (X) + o(µ 4 ) (1637)2! 3!or on some open interval of ‖Y ‖g(Y ) = g(X) +→Y −X→Y −X→Y −Xdg(X) + 1 dg 2 (X) + 1 dg 3 (X) + o(‖Y ‖ 4 ) (1638)2! 3!which are third-order expansions about X . The mean value theorem fromcalculus is what insures finite order of the series. [161] [30,1.1] [29, App.A.5][148,0.4]D.1.7.0.1 Exercise. log det. (confer [46, p.644])Find the first two terms of the Taylor series expansion (1638) for log detX .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 583D.1.8Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive arelationship between the gradient with respect to matrix X and the derivativewith respect to real variable t :D.1.8.1first-orderRemoving from (1609) the evaluation at t = 0 , D.4 we find an expression forthe directional derivative of g(X) in direction Y evaluated anywhere alonga line {X+ t Y | t ∈ R} intersecting domg→Ydg(X+ t Y ) = d g(X+ t Y ) (1639)dtIn the general case g(X) : R K×L →R M×N , from (1602) and (1605) we findtr ( ∇ X g mn (X+ t Y ) T Y ) = d dt g mn(X+ t Y ) (1640)which is valid at t = 0, of course, when X ∈ domg . In the important caseof a real function g(X) : R K×L →R , from (1631) we have simplytr ( ∇ X g(X+ t Y ) T Y ) = d g(X+ t Y ) (1641)dtWhen, additionally, g(X) : R K →R has vector argument,∇ X g(X+ t Y ) T Y = d g(X+ t Y ) (1642)dtD.4 Justified by replacing X with X+ tY in (1602)-(1604); beginning,dg mn (X+ tY )| dX→Y= ∑ k, l∂g mn (X+ tY )Y kl∂X kl

584 APPENDIX D. MATRIX CALCULUSD.1.8.1.1 Example. Gradient.g(X) = w T X T Xw , X ∈ R K×L , w ∈R L . Using the tables inD.2,tr ( ∇ X g(X+ t Y ) T Y ) = tr ( 2ww T (X T + t Y T )Y ) (1643)Applying the equivalence (1641),= 2w T (X T Y + t Y T Y )w (1644)ddt g(X+ t Y ) = d dt wT (X+ t Y ) T (X+ t Y )w (1645)= w T( X T Y + Y T X + 2t Y T Y ) w (1646)= 2w T (X T Y + t Y T Y )w (1647)which is the same as (1644); hence, equivalence is demonstrated.It is easy to extract ∇g(X) from (1647) knowing only (1641):D.1.8.2tr ( ∇ X g(X+ t Y ) T Y ) = 2w T (X T Y + t Y T Y )w= 2 tr ( ww T (X T + t Y T )Y )tr ( ∇ X g(X) T Y ) = 2 tr ( ww T X T Y )(1648)⇔∇ X g(X) = 2Xww Tsecond-orderLikewise removing the evaluation at t = 0 from (1630),→Ydg 2 (X+ t Y ) = d2dt2g(X+ t Y ) (1649)we can find a similar relationship between the second-order gradient and thesecond derivative: In the general case g(X) : R K×L →R M×N from (1623) and(1626),tr(∇ X tr ( ∇ X g mn (X+ t Y ) T Y ) )TY = d2dt 2g mn(X+ t Y ) (1650)In the case of a real function g(X) : R K×L →R we have, of course,tr(∇ X tr ( ∇ X g(X+ t Y ) T Y ) )TY = d2dt2g(X+ t Y ) (1651)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 585From (1635), the simpler case, where the real function g(X) : R K →R hasvector argument,Y T ∇ 2 Xd2g(X+ t Y )Y =dt2g(X+ t Y ) (1652)D.1.8.2.1 Example. Second-order gradient.Given real function g(X) = log detX having domain int S K + , we want tofind ∇ 2 g(X)∈ R K×K×K×K . From the tables inD.2,h(X) = ∆ ∇g(X) = X −1 ∈ int S K + (1653)so ∇ 2 g(X)=∇h(X). By (1640) and (1608), for Y ∈ S Ktr ( ∇h mn (X) T Y ) =d dt∣ h mn (X+ t Y ) (1654)( t=0)d =dt∣∣ h(X+ t Y ) (1655)t=0 mn( )d =dt∣∣ (X+ t Y ) −1 (1656)t=0mn= − ( X −1 Y X −1) mn(1657)Setting Y to a member of {e k e T l ∈ R K×K | k,l=1... K} , and employing aproperty (32) of the trace function we find∇ 2 g(X) mnkl = tr ( ∇h mn (X) T e k e T l)= ∇hmn (X) kl = − ( X −1 e k e T l X−1) mn(1658)∇ 2 g(X) kl = ∇h(X) kl = − ( X −1 e k e T l X −1) ∈ R K×K (1659)From all these first- and second-order expressions, we may generate newones by evaluating both sides at arbitrary t (in some open interval) but onlyafter the differentiation.

586 APPENDIX D. MATRIX CALCULUSD.2 Tables of gradients and derivatives[116] [50]When proving results for symmetric matrices algebraically, it is criticalto take gradients ignoring symmetry and to then substitute symmetricentries afterward.a,b∈R n , x,y ∈ R k , A,B∈ R m×n , X,Y ∈ R K×L , t,µ∈R ,i,j,k,l,K,L,m,n,M,N are integers, unless otherwise noted.x µ means δ(δ(x) µ ) for µ∈R ; id est, entrywise vector exponentiation.δ is the main-diagonal linear operator (1246). x 0 ∆ = 1, X 0 ∆ = I if square.ddx⎡∆ ⎢= ⎣ddx 1.ddx k⎤⎥⎦,→ydg(x) ,→ydg 2 (x) (directional derivativesD.1), log x ,√sgnx, x/y (Hadamard quotient), x (entrywise square root),etcetera, are maps f : R k → R k that maintain dimension; e.g., (A.1.1)ddx x−1 ∆ = ∇ x 1 T δ(x) −1 1 (1660)For A a scalar or matrix, we have the Taylor series [55,3.6]e A ∆ =∞∑k=01k! Ak (1661)Further, [251,5.4]e A ≻ 0 ∀A ∈ S m (1662)For all square A and integer kdet k A = detA k (1663)

D.2. TABLES OF GRADIENTS AND DERIVATIVES 587D.2.1algebraic∇ x x = ∇ x x T = I ∈ R k×k ∇ X X = ∇ X X T = ∆ I ∈ R K×L×K×L (identity)∇ x (Ax − b) = A T(∇ x x T A − b T) = A∇ x (Ax − b) T (Ax − b) = 2A T (Ax − b)∇x 2 (Ax − b) T (Ax − b) = 2A T A(∇ x x T Ax + 2x T By + y T Cy ) = (A +A T )x + 2By(∇x2 x T Ax + 2x T By + y T Cy ) = A +A T ∇ X a T Xb = ∇ X b T X T a = ab T∇ X a T X 2 b = X T ab T + ab T X T∇ X a T X −1 b = −X −T ab T X −T∇ x a T x T xb = 2xa T b ∇ X a T X T Xb = X(ab T + ba T )∇ x a T xx T b = (ab T + ba T )x ∇ X a T XX T b = (ab T + ba T )X∇ X (X −1 ) kl = ∂X−1∂X kl= −X −1 e ke T l X−1 , confer (1600)(1659)∇ x a T x T xa = 2xa T a∇ x a T xx T a = 2aa T x∇ x a T yx T b = ba T y∇ x a T y T xb = yb T a∇ x a T xy T b = ab T y∇ x a T x T yb = ya T b∇ X a T X T Xa = 2Xaa T∇ X a T XX T a = 2aa T X∇ X a T YX T b = ba T Y∇ X a T Y T Xb = Y ab T∇ X a T XY T b = ab T Y∇ X a T X T Y b = Y ba T

588 APPENDIX D. MATRIX CALCULUSalgebraic continuedd(X+ t Y ) = Ydtddt BT (X+ t Y ) −1 A = −B T (X+ t Y ) −1 Y (X+ t Y ) −1 Addt BT (X+ t Y ) −T A = −B T (X+ t Y ) −T Y T (X+ t Y ) −T Addt BT (X+ t Y ) µ A = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M +d 2dt 2 B T (X+ t Y ) −1 A = 2B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Addt((X+ t Y ) T A(X+ t Y ) ) = Y T AX + X T AY + 2tY T AYd 2dt 2 ((X+ t Y ) T A(X+ t Y ) ) = 2Y T AYd((X+ t Y )A(X+ t Y )) = YAX + XAY + 2tYAYdtd 2dt 2 ((X+ t Y )A(X+ t Y )) = 2YAYD.2.2trace Kronecker∇ vec X tr(AXBX T ) = ∇ vec X vec(X) T (B T ⊗ A) vec X = (B ⊗A T + B T ⊗A) vec X∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗ A) vec X = B ⊗A T + B T ⊗A

D.2. TABLES OF GRADIENTS AND DERIVATIVES 589D.2.3trace∇ x µ x = µI ∇ X trµX = ∇ X µ trX = µI∇ x 1 T δ(x) −1 1 = ddx x−1 = −x −2∇ x 1 T δ(x) −1 y = −δ(x) −2 y∇ X trX −1 = −X −2T∇ X tr(X −1 Y ) = ∇ X tr(Y X −1 ) = −X −T Y T X −Tddx xµ = µx µ−1 ∇ X trX µ = µX µ−1 , X ∈ S M∇ X trX j = jX (j−1)T∇ x (b − a T x) −1 = (b − a T x) −2 a∇ X tr ( (B − AX) −1) = ( (B − AX) −2 A ) T∇ x (b − a T x) µ = −µ(b − a T x) µ−1 a∇ x x T y = ∇ x y T x = y∇ X tr(X T Y ) = ∇ X tr(Y X T ) = ∇ X tr(Y T X) = ∇ X tr(XY T ) = Y∇ X tr(AXBX T ) = ∇ X tr(XBX T A) = A T XB T + AXB∇ X tr(AXBX) = ∇ X tr(XBXA) = A T X T B T + B T X T A T∇ X tr(AXAXAX) = ∇ X tr(XAXAXA) = 3(AXAXA) T∇ X tr(Y X k ) = ∇ X tr(X k Y ) = k−1 ∑ (X i Y X k−1−i) Ti=0∇ X tr(Y T XX T Y ) = ∇ X tr(X T Y Y T X) = 2 Y Y T X∇ X tr(Y T X T XY ) = ∇ X tr(XY Y T X T ) = 2XY Y T∇ X tr ( (X + Y ) T (X + Y ) ) = 2(X + Y )∇ X tr((X + Y )(X + Y )) = 2(X + Y ) T∇ X tr(A T XB) = ∇ X tr(X T AB T ) = AB T∇ X tr(A T X −1 B) = ∇ X tr(X −T AB T ) = −X −T AB T X −T∇ X a T Xb = ∇ X tr(ba T X) = ∇ X tr(Xba T ) = ab T∇ X b T X T a = ∇ X tr(X T ab T ) = ∇ X tr(ab T X T ) = ab T∇ X a T X −1 b = ∇ X tr(X −T ab T ) = −X −T ab T X −T∇ X a T X µ b = ...

590 APPENDIX D. MATRIX CALCULUStrace continueddtrg(X+ t Y ) = tr d g(X+ t Y )dt dt[151, p.491]dtr(X+ t Y ) = trYdtddt trj (X+ t Y ) = j tr j−1 (X+ t Y ) tr Ydtr(X+ t Y dt )j = j tr((X+ t Y ) j−1 Y )(∀j)d2tr((X+ t Y )Y ) = trYdtdtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = k tr ( (X+ t Y ) k−1 Y 2) , k ∈{0, 1, 2}dtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = tr k−1 ∑(X+ t Y ) i Y (X+ t Y ) k−1−i Ydtr((X+ t Y dt )−1 Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dtr( B T (X+ t Y ) −1 A ) = − tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 A )dtdtr( B T (X+ t Y ) −T A ) = − tr ( B T (X+ t Y ) −T Y T (X+ t Y ) −T A )dtdtr( B T (X+ t Y ) −k A ) = ..., k>0dtdtr( B T (X+ t Y ) µ A ) = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M dt +d 2tr ( B T (X+ t Y ) −1 A ) = 2 tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 A )dt 2dtr( (X+ t Y ) T A(X+ t Y ) ) = tr ( Y T AX + X T AY + 2tY T AY )dtd 2tr ( (X+ t Y ) T A(X+ t Y ) ) = 2 tr ( Y T AY )dt 2dtr((X+ t Y )A(X+ t Y )) = tr(YAX + XAY + 2tYAY )dtd 2dt 2 tr((X+ t Y )A(X+ t Y )) = 2 tr(YAY )i=0

D.2. TABLES OF GRADIENTS AND DERIVATIVES 591D.2.4log determinantx≻0, detX >0 on some neighborhood of X , and det(X+ t Y )>0 onsome open interval of t ; otherwise, log( ) would be discontinuous.dlog x = x−1∇dxddx log x−1 = −x −1ddx log xµ = µx −1X log detX = X −T∇ 2 X log det(X) kl = ∂X−T∂X kl= − ( X −1 e k e T l X−1) T, confer (1617)(1659)∇ X log detX −1 = −X −T∇ X log det µ X = µX −T∇ X log detX µ = µX −T∇ X log detX k = ∇ X log det k X = kX −T∇ X log det µ (X+ t Y ) = µ(X+ t Y ) −T∇ x log(a T x + b) = a 1a T x+b∇ X log det(AX+ B) = A T (AX+ B) −T∇ X log det(I ± A T XA) = ±A(I ± A T XA) −T A T∇ X log det(X+ t Y ) k = ∇ X log det k (X+ t Y ) = k(X+ t Y ) −Tdlog det(X+ t Y ) = tr((X+ t Y dt )−1 Y )d 2dt 2 log det(X+ t Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dlog det(X+ t Y dt )−1 = − tr((X+ t Y ) −1 Y )d 2dt 2 log det(X+ t Y ) −1 = tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )ddt log det(δ(A(x + t y) + a)2 + µI)= tr ( (δ(A(x + t y) + a) 2 + µI) −1 2δ(A(x + t y) + a)δ(Ay) )

592 APPENDIX D. MATRIX CALCULUSD.2.5determinant∇ X detX = ∇ X detX T = det(X)X −T∇ X detX −1 = − det(X −1 )X −T = − det(X) −1 X −T∇ X det µ X = µ det µ (X)X −T∇ X detX µ = µ det(X µ )X −T∇ X detX k = k det k−1 (X) ( tr(X)I − X T) , X ∈ R 2×2∇ X detX k = ∇ X det k X = k det(X k )X −T = k det k (X)X −T∇ X det µ (X+ t Y ) = µ det µ (X+ t Y )(X+ t Y ) −T∇ X det(X+ t Y ) k = ∇ X det k (X+ t Y ) = k det k (X+ t Y )(X+ t Y ) −Tddet(X+ t Y ) = det(X+ t Y ) tr((X+ t Y dt )−1 Y )d 2dt 2 det(X+ t Y ) = det(X+ t Y )(tr 2 ((X+ t Y ) −1 Y ) − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddet(X+ t Y dt )−1 = − det(X+ t Y ) −1 tr((X+ t Y ) −1 Y )d 2dt 2 det(X+ t Y ) −1 = det(X+ t Y ) −1 (tr 2 ((X+ t Y ) −1 Y ) + tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddt detµ (X+ t Y ) = µ det µ (X+ t Y ) tr((X+ t Y ) −1 Y )

D.2. TABLES OF GRADIENTS AND DERIVATIVES 593D.2.6 logarithmicMatrix logarithm.dlog(X+ t Y dt )µ = µY (X+ t Y ) −1 = µ(X+ t Y ) −1 Y ,dlog(I − t Y dt )µ = −µY (I − t Y ) −1 = −µ(I − t Y ) −1 YXY = Y X[151, p.493]D.2.7exponentialMatrix exponential. [55,3.6,4.5] [251,5.4]∇ X e tr(Y T X) = ∇ X dete Y TX = e tr(Y T X) Y (∀X,Y )∇ X tre Y X = e Y T X T Y T = Y T e XT Y Tlog-sum-exp & geometric mean [46, p.74]...d jdt j e tr(X+ t Y ) = e tr(X+ t Y ) tr j (Y )ddt etY = e tY Y = Y e tYddt eX+ t Y = e X+ t Y Y = Y e X+ t Y ,d 2dt 2 e X+ t Y = e X+ t Y Y 2 = Y e X+ t Y Y = Y 2 e X+ t Y ,XY = Y XXY = Y X

594 APPENDIX D. MATRIX CALCULUS

Appendix EProjectionFor any A∈ R m×n , the pseudoinverse [150,7.3, prob.9] [183,6.12, prob.19][110,5.5.4] [251, App.A]A †∆ = limt→0 +(AT A + t I) −1 A T = limt→0 +AT (AA T + t I) −1 ∈ R n×m (1664)is a unique matrix solving minimize ‖AX − I‖ 2 F . For any t > 0XI − A(A T A + t I) −1 A T = t(AA T + t I) −1 (1665)Equivalently, pseudoinverse A † is that unique matrix satisfying theMoore-Penrose conditions: [152,1.3] [284]1. AA † A = A 3. (AA † ) T = AA †2. A † AA † = A † 4. (A † A) T = A † Awhich are necessary and sufficient to establish the pseudoinverse whoseprincipal action is to injectively map R(A) onto R(A T ). Taken rowwise,these conditions are respectively necessary and sufficient for AA † to be theorthogonal projector on R(A) , and for A † A to be the orthogonal projectoron R(A T ).Range and nullspace of the pseudoinverse [197] [248,III.1, exer.1]R(A † ) = R(A T ), R(A †T ) = R(A) (1666)N(A † ) = N(A T ), N(A †T ) = N(A) (1667)can be derived by singular value decomposition (A.6).2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.595

596 APPENDIX E. PROJECTIONThe following relations reliably hold without qualification:a. A T † = A †Tb. A †† = Ac. (AA T ) † = A †T A †d. (A T A) † = A † A †Te. (AA † ) † = AA †f. (A † A) † = A † AYet for arbitrary A,B it is generally true that (AB) † ≠ B † A † :E.0.0.0.1 Theorem. Pseudoinverse of product. [119] [42] [177, exer.7.23]For A∈ R m×n and B ∈ R n×k (AB) † = B † A † (1668)if and only ifR(A T AB) ⊆ R(B) and R(BB T A T ) ⊆ R(A T ) (1669)⋄U T = U † for orthonormal (including the orthogonal) matrices U . So, fororthonormal matrices U,Q and arbitrary A(UAQ T ) † = QA † U T (1670)E.0.0.0.2Prove:Exercise. Kronecker inverse.(A ⊗ B) † = A † ⊗ B † (1671)E.0.1Logical deductionsWhen A is invertible, A † = A −1 ; so A † A = AA † = I . Otherwise, forA∈ R m×n [103,5.3.3.1] [177,7] [221]g. A † A = I , A † = (A T A) −1 A T , rankA = nh. AA † = I , A † = A T (AA T ) −1 , rankA = mi. A † Aω = ω , ω ∈ R(A T )j. AA † υ = υ , υ ∈ R(A)k. A † A = AA † , A normall. A k† = A †k , A normal, k an integer

597Equivalent to the corresponding Moore-Penrose condition:1. A T = A T AA † or A T = A † AA T2. A †T = A †T A † A or A †T = AA † A †TWhen A is symmetric, A † is symmetric and (A.6)A ≽ 0 ⇔ A † ≽ 0 (1672)E.0.1.0.1 Example. Solution to classical linear equation Ax = b .In2.5.1.1, the solution set to matrix equation Ax = b was representedas an intersection of hyperplanes. Regardless of rank of A or its shape(fat or skinny), interpretation as a hyperplane intersection describing apossibly empty affine set generally holds. If matrix A is rank deficient orfat, there is an infinity of solutions x when b∈R(A). A unique solutionoccurs when the hyperplanes intersect at a single point.For any shape of matrix A of any rank, and given any vector b that mayor may not be in R(A) , we wish to find a best Euclidean solution x ⋆ toAx = b (1673)(more generally, Ax ≈ b given arbitrary matrices) by solvingminimizex‖Ax − b‖ 2 (1674)Necessary and sufficient condition for optimal solution to this unconstrainedoptimization is the so-called normal equation that results from zeroing theconvex objective’s gradient: (D.2.1)A T Ax = A T b (1675)normal because error vector b −Ax is perpendicular to R(A) ; id est,A T (b −Ax)=0. Given any matrix A and any vector b , the normal equationis solvable exactly; always so, because R(A T A)= R(A T ) and A T b∈ R(A T ).When A is skinny-or-square full-rank, normal equation (1675) can besolved exactly by inversion:x ⋆ = (A T A) −1 A T b ≡ A † b (1676)

598 APPENDIX E. PROJECTIONFor matrix A of arbitrary rank and shape, on the other hand, A T A mightnot be invertible. Yet the normal equation can always be solved exactly by:(1664)x ⋆ = lim A + t I) −1 A T b = A † b (1677)t→0 +(ATinvertible for any positive value of t by (1281). The exact inversion (1676)and this pseudoinverse solution (1677) each solvelim minimizet→0 + x‖Ax − b‖ 2 + t ‖x‖ 2 (1678)simultaneously providing least squares solution to (1674) and the classicalleast norm solution E.1 [251, App.A.4] (conferE.5.0.0.5)arg minimize ‖x‖ 2xsubject to Ax = AA † bwhere AA † b is the orthogonal projection of vector b on R(A).(1679)E.1 Idempotent matricesProjection matrices are square and defined by idempotence, P 2 =P ;[251,2.6] [152,1.3] equivalent to the condition, P be diagonalizable[150,3.3, prob.3] with eigenvalues φ i ∈ {0, 1}. [303,4.1, thm.4.1]Idempotent matrices are not necessarily symmetric. The transpose of anidempotent matrix remains idempotent; P T P T = P T . Solely exceptingP = I , all projection matrices are neither orthogonal (B.5) or invertible.[251,3.4] The collection of all projection matrices of particular dimensiondoes not form a convex set.Suppose we wish to project nonorthogonally (obliquely) on the rangeof any particular matrix A∈ R m×n . All idempotent matrices projectingnonorthogonally on R(A) may be expressed:P = A(A † + BZ T ) ∈ R m×m (1680)E.1 This means: optimal solutions of lesser norm than the so-called least norm solution(1679) can be obtained (at expense of approximation Ax ≈ b hence, of perpendicularity)by ignoring the limiting operation and introducing finite positive values of t into (1678).

E.1. IDEMPOTENT MATRICES 599where R(P )= R(A) , E.2 B ∈ R n×k for k ∈{1... m} is otherwise arbitrary,and Z ∈ R m×k is any matrix whose range is in N(A T ) ; id est,A T Z = A † Z = 0 (1681)Evidently, the collection of nonorthogonal projectors projecting on R(A) isan affine subsetP k = { A(A † + BZ T ) | B ∈ R n×k} (1682)When matrix A in (1680) is skinny full-rank (A † A = I) or has orthonormalcolumns (A T A = I), either property leads to a biorthogonal characterizationof nonorthogonal projection:E.1.1Biorthogonal characterizationAny nonorthogonal projector P 2 =P ∈ R m×m projecting on nontrivial R(U)can be defined by a biorthogonality condition Q T U =I ; the biorthogonaldecomposition of P being (confer (1680))where E.3 (B.1.1.1)and where generally (confer (1709)) E.4P = UQ T , Q T U = I (1683)R(P )= R(U) , N(P )= N(Q T ) (1684)P T ≠ P , P † ≠ P , ‖P ‖ 2 ≠ 1, P 0 (1685)and P is not nonexpansive (1710).E.2 Proof. R(P )⊆ R(A) is obvious [251,3.6]. By (120) and (121),R(A † + BZ T ) = {(A † + BZ T )y | y ∈ R m }⊇ {(A † + BZ T )y | y ∈ R(A)} = R(A T )R(P ) = {A(A † + BZ T )y | y ∈ R m }⊇ {A(A † + BZ T )y | (A † + BZ T )y ∈ R(A T )} = R(A)E.3 Proof. Obviously, R(P ) ⊆ R(U). Because Q T U = IR(P ) = {UQ T x | x ∈ R m }⊇ {UQ T Uy | y ∈ R k } = R(U)E.4 Orthonormal decomposition (1706) (conferE.3.4) is a special case of biorthogonaldecomposition (1683) characterized by (1709). So, these characteristics (1685) are notnecessary conditions for biorthogonality.

600 APPENDIX E. PROJECTION(⇐) To verify assertion (1683) we observe: because idempotent matricesare diagonalizable (A.5), [150,3.3, prob.3] they must have the form (1365)P = SΦS −1 =m∑φ i s i wi T =i=1k∑≤ mi=1s i w T i (1686)that is a sum of k = rankP independent projector dyads (idempotentdyads,B.1.1,E.6.2.1) where φ i ∈ {0, 1} are the eigenvalues of P[303,4.1, thm.4.1] in diagonal matrix Φ∈ R m×m arranged in nonincreasingorder, and where s i ,w i ∈ R m are the right- and left-eigenvectors of P ,respectively, which are independent and real. E.5 ThereforeU ∆ = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1687)is the full-rank matrix S ∈ R m×m having m − k columns truncated(corresponding to 0 eigenvalues), while⎡ ⎤Q T = ∆ S −1 (1:k, :) = ⎣w T1.w T k⎦ ∈ R k×m (1688)is matrix S −1 having the corresponding m − k rows truncated. By the0 eigenvalues theorem (A.7.3.0.1), R(U)= R(P ) , R(Q)= R(P T ) , andR(P ) = span {s i | φ i = 1 ∀i}N(P ) = span {s i | φ i = 0 ∀i}R(P T ) = span {w i | φ i = 1 ∀i}N(P T ) = span {w i | φ i = 0 ∀i}(1689)Thus biorthogonality Q T U =I is a necessary condition for idempotence, andso the collection of nonorthogonal projectors projecting on R(U) is the affinesubset P k =UQ T k where Q k = {Q | Q T U = I , Q∈ R m×k }.(⇒) Biorthogonality is a sufficient condition for idempotence;P 2 =k∑s i wiTid est, if the cross-products are annihilated, then P 2 =P .i=1k∑s j wj T = P (1690)j=1E.5 Eigenvectors of a real matrix corresponding to real eigenvalues must be real.(A.5.0.0.1)

E.1. IDEMPOTENT MATRICES 601TxT ⊥ R(Q)cone(Q)cone(U)PxFigure 123: Nonorthogonal projection of x∈ R 3 on R(U)= R 2 underbiorthogonality condition; id est, Px=UQ T x such that Q T U =I . Anypoint along imaginary line T connecting x to Px will be projectednonorthogonally on Px with respect to horizontal plane constituting R 2 inthis example. Extreme directions of cone(U) correspond to two columnsof U ; likewise for cone(Q). For purpose of illustration, we truncate eachconic hull by truncating coefficients of conic combination at unity. Conic hullcone(Q) is headed upward at an angle, out of plane of page. Nonorthogonalprojection would fail were N(Q T ) in R(U) (were T parallel to a linein R(U)).

602 APPENDIX E. PROJECTIONNonorthogonal projection ofbiorthogonal expansion,Px = UQ T x =x on R(P ) has expression like ak∑wi T xs i (1691)When the domain is restricted to the range of P , say x=Uξ forξ ∈ R k , then x = Px = UQ T Uξ = Uξ and expansion is unique because theeigenvectors are linearly independent. Otherwise, any component of xin N(P)= N(Q T ) will be annihilated. The direction of nonorthogonalprojection is orthogonal to R(Q) ⇔ Q T U =I ; id est, for Px∈ R(U)i=1Px − x ⊥ R(Q) in R m (1692)E.1.1.0.1 Example. Illustration of nonorthogonal projector.Figure 123 shows cone(U) , the conic hull of the columns of⎡ ⎤1 1U = ⎣−0.5 0.3 ⎦ (1693)0 0from nonorthogonal projector P = UQ T . Matrix U has a limitless numberof left inverses because N(U T ) is nontrivial. Similarly depicted is left inverseQ T from (1680)⎡Q = U †T + ZB T = ⎣⎡= ⎣0.3750 0.6250−1.2500 1.25000 00.3750 0.6250−1.2500 1.25000.5000 0.5000⎤⎡⎦ + ⎣⎤⎦001⎤⎦[0.5 0.5](1694)where Z ∈ N(U T ) and matrix B is selected arbitrarily; id est, Q T U = Ibecause U is full-rank.Direction of projection on R(U) is orthogonal to R(Q). Any point alongline T in the figure, for example, will have the same projection. Were matrixZ instead equal to 0, then cone(Q) would become the relative dual tocone(U) (sharing the same affine hull;2.13.8, confer Figure 45(a)) In thatcase, projection Px = UU † x of x on R(U) becomes orthogonal projection(and unique minimum-distance).

E.2. I − P , PROJECTION ON ALGEBRAIC COMPLEMENT 603E.1.2Idempotence summaryNonorthogonal subspace-projector P is a linear operator defined byidempotence or biorthogonal decomposition (1683), but characterized notby symmetry nor positive semidefiniteness nor nonexpansivity (1710).E.2 I−P , Projection on algebraic complementIt follows from the diagonalizability of idempotent matrices that I − P mustalso be a projection matrix because it too is idempotent, and because it maybe expressedm∑I − P = S(I − Φ)S −1 = (1 − φ i )s i wi T (1695)where (1 − φ i ) ∈ {1, 0} are the eigenvalues of I − P (1282) whoseeigenvectors s i ,w i are identical to those of P in (1686). A consequence ofthat complementary relationship of eigenvalues is the fact, [261,2] [257,2]for subspace projector P = P 2 ∈ R m×mR(P ) = span {s i | φ i = 1 ∀i} = span {s i | (1 − φ i ) = 0 ∀i} = N(I − P )N(P ) = span {s i | φ i = 0 ∀i} = span {s i | (1 − φ i ) = 1 ∀i} = R(I − P )R(P T ) = span {w i | φ i = 1 ∀i} = span {w i | (1 − φ i ) = 0 ∀i} = N(I − P T )N(P T ) = span {w i | φ i = 0 ∀i} = span {w i | (1 − φ i ) = 1 ∀i} = R(I − P T )(1696)that is easy to see from (1686) and (1695). Idempotent I −P thereforeprojects vectors on its range, N(P ). Because all eigenvectors of a realidempotent matrix are real and independent, the algebraic complement ofR(P ) [167,3.3] is equivalent to N(P ) ; E.6 id est,R(P )⊕N(P ) = R(P T )⊕N(P T ) = R(P T )⊕N(P ) = R(P )⊕N(P T ) = R mi=1(1697)because R(P ) ⊕ R(I −P )= R m . For idempotent P ∈ R m×m , consequently,rankP + rank(I − P ) = m (1698)E.6 The same phenomenon occurs with symmetric (nonidempotent) matrices, for example.When the summands in A ⊕ B = R m are orthogonal vector spaces, the algebraiccomplement is the orthogonal complement.

604 APPENDIX E. PROJECTIONE.2.0.0.1 Theorem. Rank/Trace. [303,4.1, prob.9] (confer (1714))P 2 = P⇔rankP = trP and rank(I − P ) = tr(I − P )(1699)⋄E.2.1Universal projector characteristicAlthough projection is not necessarily orthogonal and R(P )̸⊥ R(I − P ) ingeneral, still for any projector P and any x∈ R mPx + (I − P )x = x (1700)must hold where R(I − P ) = N(P ) is the algebraic complement of R(P).The algebraic complement of closed convex cone K , for example, is thenegative dual cone −K ∗ . (1818)E.3 Symmetric idempotent matricesWhen idempotent matrix P is symmetric, P is an orthogonal projector. Inother words, the direction of projection of point x∈ R m on subspace R(P )is orthogonal to R(P ) ; id est, for P 2 =P ∈ S m and projection Px∈ R(P )Px − x ⊥ R(P ) in R m (1701)Perpendicularity is a necessary and sufficient condition for orthogonalprojection on a subspace. [73,4.9]A condition equivalent to (1701) is: Norm of direction x −Px is theinfimum over all nonorthogonal projections of x on R(P ) ; [183,3.3] forP 2 =P ∈ S m , R(P )= R(A) , matrices A,B, Z and positive integer k asdefined for (1680), and given x∈ R m‖x − Px‖ 2 = ‖x − AA † x‖ 2 = infB∈R n×k ‖x − A(A † + BZ T )x‖ 2 (1702)The infimum is attained for R(B)⊆ N(A) over any affine subset ofnonorthogonal projectors (1682) indexed by k .

E.3. SYMMETRIC IDEMPOTENT MATRICES 605Proof is straightforward: The vector 2-norm is a convex function. Settinggradient of the norm-square to 0, applyingD.2,(A T ABZ T − A T (I − AA † ) ) xx T A = 0⇔A T ABZ T xx T A = 0(1703)because A T = A T AA † . Projector P =AA † is therefore unique; theminimum-distance projector is the orthogonal projector, and vice versa.We get P =AA † so this projection matrix must be symmetric. Then forany matrix A∈ R m×n , symmetric idempotent P projects a given vector xin R m orthogonally on R(A). Under either condition (1701) or (1702), theprojection Px is unique minimum-distance; for subspaces, perpendicularityand minimum-distance conditions are equivalent.E.3.1Four subspacesWe summarize the orthogonal projectors projecting on the four fundamentalsubspaces: for A∈ R m×nA † A : R n on R(A † A) = R(A T )AA † : R m on R(AA † ) = R(A)I −A † A : R n on R(I −A † A) = N(A)I −AA † : R m on R(I −AA † ) = N(A T )(1704)For completeness: E.7 (1696)N(A † A) = N(A)N(AA † ) = N(A T )N(I −A † A) = R(A T )N(I −AA † ) = R(A)(1705)E.7 Proof is by singular value decomposition (A.6.2): N(A † A)⊆ N(A) is obvious.Conversely, suppose A † Ax=0. Then x T A † Ax=x T QQ T x=‖Q T x‖ 2 =0 where A=UΣQ Tis the subcompact singular value decomposition. Because R(Q)= R(A T ), then x∈N(A)that implies N(A † A)⊇ N(A).

606 APPENDIX E. PROJECTIONE.3.2Orthogonal characterizationAny symmetric projector P 2 =P ∈ S m projecting on nontrivial R(Q) canbe defined by the orthonormality condition Q T Q = I . When skinny matrixQ∈ R m×k has orthonormal columns, then Q † = Q T by the Moore-Penroseconditions. Hence, any P having an orthonormal decomposition (E.3.4)where [251,3.3] (1418)P = QQ T , Q T Q = I (1706)R(P ) = R(Q) , N(P ) = N(Q T ) (1707)is an orthogonal projector projecting on R(Q) having, for Px∈ R(Q)(confer (1692))Px − x ⊥ R(Q) in R m (1708)From (1706), orthogonal projector P is obviously positive semidefinite(A.3.1.0.6); necessarily,P T = P , P † = P , ‖P ‖ 2 = 1, P ≽ 0 (1709)and ‖Px‖ = ‖QQ T x‖ = ‖Q T x‖ because ‖Qy‖ = ‖y‖ ∀y ∈ R k . All orthogonalprojectors are therefore nonexpansive because√〈Px, x〉 = ‖Px‖ = ‖Q T x‖ ≤ ‖x‖ ∀x∈ R m (1710)the Bessel inequality, [73] [167] with equality when x∈ R(Q).From the diagonalization of idempotent matrices (1686) on page 600P = SΦS T =m∑φ i s i s T i =i=1k∑≤ mi=1s i s T i (1711)orthogonal projection of point x on R(P ) has expression like an orthogonalexpansion [73,4.10]k∑Px = QQ T x = s T i xs i (1712)wherei=1Q = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1713)and where the s i [sic] are orthonormal eigenvectors of symmetricidempotent P . When the domain is restricted to the range of P , sayx=Qξ for ξ ∈ R k , then x = Px = QQ T Qξ = Qξ and expansion is unique.Otherwise, any component of x in N(Q T ) will be annihilated.

E.3. SYMMETRIC IDEMPOTENT MATRICES 607E.3.2.0.1 Theorem. Symmetric rank/trace. (confer (1699) (1286))P T = P , P 2 = P⇔rankP = trP = ‖P ‖ 2 F and rank(I − P ) = tr(I − P ) = ‖I − P ‖ 2 F(1714)⋄Proof. We take as given Theorem E.2.0.0.1 establishing idempotence.We have left only to show trP =‖P ‖ 2 F ⇒ P T = P , established in [303,7.1].E.3.3Summary, symmetric idempotentIn summary, orthogonal projector P is a linear operator defined[148,A.3.1] by idempotence and symmetry, and characterized bypositive semidefiniteness and nonexpansivity. The algebraic complement(E.2) to R(P ) becomes the orthogonal complement R(I − P ) ; id est,R(P ) ⊥ R(I − P ).E.3.4Orthonormal decompositionWhen Z = 0 in the general nonorthogonal projector A(A † + BZ T ) (1680),an orthogonal projector results (for any matrix A) characterized principallyby idempotence and symmetry. Any real orthogonal projector may, infact, be represented by an orthonormal decomposition such as (1706).[152,1, prob.42]To verify that assertion for the four fundamental subspaces (1704),we need only to express A by subcompact singular value decomposition(A.6.2): From pseudoinverse (1389) of A = UΣQ T ∈ R m×nAA † = UΣΣ † U T = UU T ,I − AA † = I − UU T = U ⊥ U ⊥T ,A † A = QΣ † ΣQ T = QQ TI − A † A = I − QQ T = Q ⊥ Q ⊥T(1715)where U ⊥ ∈ R m×m−rank A holds columnar an orthonormal basis for theorthogonal complement of R(U) , and likewise for Q ⊥ ∈ R n×n−rank A .Existence of an orthonormal decomposition is sufficient to establishidempotence and symmetry of an orthogonal projector (1706).

608 APPENDIX E. PROJECTIONE.3.5Unifying trait of all projectors: directionRelation (1715) shows: orthogonal projectors simultaneously possessa biorthogonal decomposition (conferE.1.1) (for example, AA † forskinny-or-square A full-rank) and an orthonormal decomposition (UU Twhence Px = UU T x).E.3.5.1orthogonal projector, orthonormal decompositionConsider orthogonal expansion of x∈ R(U) :x = UU T x =n∑u i u T i x (1716)a sum of one-dimensional orthogonal projections (E.6.3), wherei=1U ∆ = [u 1 · · · u n ] and U T U = I (1717)and where the subspace projector has two expressions, (1715)AA † ∆ = UU T (1718)where A ∈ R m×n has rank n . The direction of projection of x on u j forsome j ∈{1... n} , for example, is orthogonal to u j but parallel to a vectorin the span of all the remaining vectors constituting the columns of U ;u T j(u j u T j x − x) = 0u j u T j x − x = u j u T j x − UU T x ∈ R({u i |i=1... n, i≠j})(1719)E.3.5.2orthogonal projector, biorthogonal decompositionWe get a similar result for the biorthogonal expansion of x∈ R(A). Defineand the rows of the pseudoinverse⎡A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1720)A † =∆ ⎢⎣a ∗T1a ∗T2 .a ∗Tn⎤⎥⎦ ∈ Rn×m (1721)

E.3. SYMMETRIC IDEMPOTENT MATRICES 609under the biorthogonality condition A † A=I . In the biorthogonal expansion(2.13.8)n∑x = AA † x = a i a ∗Ti x (1722)the direction of projection of x on a j for some particular j ∈ {1... n} , forexample, is orthogonal to a ∗ j and parallel to a vector in the span of all theremaining vectors constituting the columns of A ;i=1a ∗Tj (a j a ∗Tj x − x) = 0a j a ∗Tj x − x = a j a ∗Tj x − AA † x ∈ R({a i |i=1... n, i≠j})(1723)E.3.5.3nonorthogonal projector, biorthogonal decompositionBecause the result inE.3.5.2 is independent of matrix symmetryAA † =(AA † ) T , we must get the same result for any nonorthogonal projectorcharacterized by a biorthogonality condition; namely, for nonorthogonalprojector P = UQ T (1683) under biorthogonality condition Q T U = I , inthe biorthogonal expansion of x∈ R(U)x = UQ T x =k∑u i qi T x (1724)i=1whereU = ∆ [ ]u 1 · · · u k ∈ Rm×k⎡ ⎤q T1 (1725)Q T =∆ ⎣ . ⎦ ∈ R k×mq T kthe direction of projection of x on u j is orthogonal to q j and parallel to avector in the span of the remaining u i :q T j (u j q T j x − x) = 0u j q T j x − x = u j q T j x − UQ T x ∈ R({u i |i=1... k , i≠j})(1726)

610 APPENDIX E. PROJECTIONE.4 Algebra of projection on affine subsetsLet P A x denote projection of x on affine subset A ∆ = R + α where R is asubspace and α ∈ A . Then, because R is parallel to A , it holds:P A x = P R+α x = (I − P R )(α) + P R x= P R (x − α) + α(1727)Subspace projector P R is a linear operator (P A is not), and P R (x + y)=P R xwhenever y ⊥R and P R is an orthogonal projector.E.4.0.0.1 Theorem. Orthogonal projection on affine subset. [73,9.26]Let A = R + α be an affine subset where α ∈ A , and let R ⊥ be theorthogonal complement of subspace R . Then P A x is the orthogonalprojection of x∈ R n on A if and only ifP A x ∈ A , 〈P A x − x, a − α〉 = 0 ∀a ∈ A (1728)or if and only ifP A x ∈ A , P A x − x ∈ R ⊥ (1729)⋄E.5 Projection examplesE.5.0.0.1 Example. Orthogonal projection on orthogonal basis.Orthogonal projection on a subspace can instead be accomplished byorthogonally projecting on the individual members of an orthogonal basis forthat subspace. Suppose, for example, matrix A∈ R m×n holds an orthonormalbasis for R(A) in its columns; A = ∆ [a 1 a 2 · · · a n ] . Then orthogonalprojection of vector x∈ R n on R(A) is a sum of one-dimensional orthogonalprojectionsn∑Px = AA † x = A(A T A) −1 A T x = AA T x = a i a T i x (1730)where each symmetric dyad a i a T i is an orthogonal projector projecting onR(a i ). (E.6.3) Because ‖x − Px‖ is minimized by orthogonal projection,Px is considered to be the best approximation (in the Euclidean sense) tox from the set R(A) . [73,4.9]i=1

E.5. PROJECTION EXAMPLES 611E.5.0.0.2 Example. Orthogonal projection on span of nonorthogonal basis.Orthogonal projection on a subspace can also be accomplished by projectingnonorthogonally on the individual members of any nonorthogonal basis forthat subspace. This interpretation is in fact the principal application of thepseudoinverse we discussed. Now suppose matrix A holds a nonorthogonalbasis for R(A) in its columns,A = [a 1 a 2 · · · a n ] ∈ R m×n (1720)and define the rows a ∗Ti of its pseudoinverse A † as in (1721). Thenorthogonal projection of vector x∈ R n on R(A) is a sum of one-dimensionalnonorthogonal projectionsPx = AA † x =n∑i=1a i a ∗Ti x (1731)where each nonsymmetric dyad a i a ∗Ti is a nonorthogonal projector projectingon R(a i ) , (E.6.1) idempotent because of biorthogonality condition A † A = I .The projection Px is regarded as the best approximation to x from theset R(A) , as it was in Example E.5.0.0.1.E.5.0.0.3 Example. Biorthogonal expansion as nonorthogonal projection.Biorthogonal expansion can be viewed as a sum of components, each anonorthogonal projection on the range of an extreme direction of a pointedpolyhedral cone K ; e.g., Figure 124.Suppose matrix A∈ R m×n holds a nonorthogonal basis for R(A) inits columns as in (1720), and the rows of pseudoinverse A † are definedas in (1721). Assuming the most general biorthogonality condition(A † + BZ T )A = I with BZ T defined as for (1680), then biorthogonalexpansion of vector x is a sum of one-dimensional nonorthogonal projections;for x∈ R(A)x = A(A † + BZ T )x = AA † x =n∑i=1a i a ∗Ti x (1732)where each dyad a i a ∗Ti is a nonorthogonal projector projecting on R(a i ).(E.6.1) The extreme directions of K=cone(A) are {a 1 ,..., a n } the linearlyindependent columns of A while the extreme directions {a ∗ 1 ,..., a ∗ n}

612 APPENDIX E. PROJECTIONa ∗ 2K ∗a 2a ∗ 1xzKya 10a 1 ⊥ a ∗ 2a 2 ⊥ a ∗ 1x = y + z = P a1 x + P a2 xK ∗Figure 124: (confer Figure 51) Biorthogonal expansion of point x∈aff Kis found by projecting x nonorthogonally on range of extreme directions ofpolyhedral cone K ⊂ R 2 . Direction of projection on extreme direction a 1 isorthogonal to extreme direction a ∗ 1 of dual cone K ∗ and parallel to a 2 (E.3.5);similarly, direction of projection on a 2 is orthogonal to a ∗ 2 and parallel toa 1 . Point x is sum of nonorthogonal projections: x on R(a 1 ) and x onR(a 2 ). Expansion is unique because extreme directions of K are linearlyindependent. Were a 1 orthogonal to a 2 , then K would be identical to K ∗and nonorthogonal projections would become orthogonal.

E.5. PROJECTION EXAMPLES 613of relative dual cone K ∗ ∩aff K=cone(A †T ) (2.13.9.4) correspond to thelinearly independent (B.1.1.1) rows of A † . Directions of nonorthogonalprojection are determined by the pseudoinverse; id est, direction ofprojection a i a ∗Ti x−x on R(a i ) is orthogonal to a ∗ i . E.8Because the extreme directions of this cone K are linearly independent,the component projections are unique in the sense:there is only one linear combination of extreme directions of K thatyields a particular point x∈ R(A) wheneverR(A) = aff K = R(a 1 ) ⊕ R(a 2 ) ⊕ ... ⊕ R(a n ) (1733)E.5.0.0.4 Example. Nonorthogonal projection on elementary matrix.Suppose P Y is a linear nonorthogonal projector projecting on subspace Y ,and suppose the range of a vector u is linearly independent of Y ; id est,for some other subspace M containing Y supposeM = R(u) ⊕ Y (1734)Assuming P M x = P u x + P Y x holds, then it follows for vector x∈MP u x = x − P Y x , P Y x = x − P u x (1735)nonorthogonal projection of x on R(u) can be determined fromnonorthogonal projection of x on Y , and vice versa.Such a scenario is realizable were there some arbitrary basis for Ypopulating a full-rank skinny-or-square matrix AA ∆ = [ basis Y u ] ∈ R n+1 (1736)Then P M =AA † fulfills the requirements, with P u =A(:,n + 1)A † (n + 1,:)and P Y =A(:,1 : n)A † (1 : n,:). Observe, P M is an orthogonal projectorwhereas P Y and P u are nonorthogonal projectors.Now suppose, for example, P Y is an elementary matrix (B.3); inparticular,P Y = I − e 1 1 T =[0 √ ]2V N ∈ R N×N (1737)E.8 This remains true in high dimension although only a little more difficult to visualizein R 3 ; confer , Figure 52.

614 APPENDIX E. PROJECTIONwhere Y = N(1 T ) . We have M = R N , A = [ √ 2V N e 1 ] , and u = e 1 .Thus P u = e 1 1 T is a nonorthogonal projector projecting on R(u) in adirection parallel to a vector in Y (E.3.5), and P Y x = x − e 1 1 T x is anonorthogonal projection of x on Y in a direction parallel to u . E.5.0.0.5 Example. Projecting the origin on a hyperplane.(confer2.4.2.0.2) Given the hyperplane representation having b∈R andnonzero normal a∈ R m ∂H = {y | a T y = b} ⊂ R m (95)orthogonal projection of the origin P0 on that hyperplane is the uniqueoptimal solution to a minimization problem: (1702)‖P0 − 0‖ 2 = infy∈∂H ‖y − 0‖ 2= infξ∈R m−1 ‖Zξ + x‖ 2(1738)where x is any solution to a T y=b , and where the columns of Z ∈ R m×m−1constitute a basis for N(a T ) so that y = Zξ + x ∈ ∂H for all ξ ∈ R m−1 .The infimum can be found by setting the gradient (with respect to ξ) ofthe strictly convex norm-square to 0. We find the minimizing argumentξ ⋆ = −(Z T Z) −1 Z T x (1739)soy ⋆ = ( I − Z(Z T Z) −1 Z T) x (1740)and from (1704)P0 = y ⋆ = a(a T a) −1 a T x =a a T‖a‖ ‖a‖ x = ∆ AA † x = a b (1741)‖a‖ 2In words, any point x in the hyperplane ∂H projected on its normal a(confer (1766)) yields that point y ⋆ in the hyperplane closest to the origin.

E.5. PROJECTION EXAMPLES 615E.5.0.0.6 Example. Projection on affine subset.The technique of Example E.5.0.0.5 is extensible. Given an intersection ofhyperplanesA = {y | Ay = b} ⊂ R m (1742)where each row of A ∈ R m×n is nonzero and b ∈ R(A) , then the orthogonalprojection Px of any point x∈ R n on A is the solution to a minimizationproblem:‖Px − x‖ 2 = infy∈A‖y − x‖ 2= infξ∈R n−rank A ‖Zξ + y p − x‖ 2(1743)where y p is any solution to Ay = b , and where the columns ofZ ∈ R n×n−rank A constitute a basis for N(A) so that y = Zξ + y p ∈ A forall ξ ∈ R n−rank A .The infimum is found by setting the gradient of the strictly convexnorm-square to 0. The minimizing argument isξ ⋆ = −(Z T Z) −1 Z T (y p − x) (1744)soand from (1704),y ⋆ = ( I − Z(Z T Z) −1 Z T) (y p − x) + x (1745)Px = y ⋆ = x − A † (Ax − b)= (I − A † A)x + A † Ay p(1746)which is a projection of x on N(A) then translated perpendicularly withrespect to the nullspace until it meets the affine subset A . E.5.0.0.7 Example. Projection on affine subset, vertex-description.Suppose now we instead describe the affine subset A in terms of some givenminimal set of generators arranged columnar in X ∈ R n×N (66); id est,A ∆ = aff X = {Xa | a T 1=1} ⊆ R n (1747)Here minimal set means XV N = [x 2 −x 1 x 3 −x 1 · · · x N −x 1 ]/ √ 2 (801) isfull-rank (2.4.2.2) where V N ∈ R N×N−1 is the Schoenberg auxiliary matrix

616 APPENDIX E. PROJECTION(B.4.2). Then the orthogonal projection Px of any point x∈ R n on A isthe solution to a minimization problem:‖Px − x‖ 2= inf ‖Xa − x‖ 2a T 1=1(1748)= inf ‖X(V N ξ + a p ) − x‖ 2ξ∈R N−1where a p is any solution to a T 1=1. We find the minimizing argumentξ ⋆ = −(V T NX T XV N ) −1 V T NX T (Xa p − x) (1749)and so the orthogonal projection is [153,3]Px = Xa ⋆ = (I − XV N (XV N ) † )Xa p + XV N (XV N ) † x (1750)a projection of point x on R(XV N ) then translated perpendicularly withrespect to that range until it meets the affine subset A . E.5.0.0.8 Example. Projecting on hyperplane, halfspace, slab.Given the hyperplane representation having b ∈ R and nonzero normala∈ R m ∂H = {y | a T y = b} ⊂ R m (95)the orthogonal projection of any point x∈ R m on that hyperplane isPx = x − a(a T a) −1 (a T x − b) (1751)Orthogonal projection of x on the halfspace parametrized by b ∈ R andnonzero normal a∈ R m H − = {y | a T y ≤ b} ⊂ R m (87)is the pointPx = x − a(a T a) −1 max{0, a T x − b} (1752)Orthogonal projection of x on the convex slab (Figure 11), for c < bB = ∆ {y | c ≤ a T y ≤ b} ⊂ R m (1753)is the point [99,5.1]Px = x − a(a T a) ( −1 max{0, a T x − b} − max{0, c − a T x} ) (1754)

E.6. VECTORIZATION INTERPRETATION, 617E.6 Vectorization interpretation,projection on a matrixE.6.1Nonorthogonal projection on a vectorNonorthogonal projection of vector x on the range of vector y isaccomplished using a normalized dyad P 0 (B.1); videlicet,〈z,x〉〈z,y〉 y = zT xz T y y = yzTz T y x ∆ = P 0 x (1755)where 〈z,x〉/〈z,y〉 is the coefficient of projection on y . Because P0 2 =P 0and R(P 0 )= R(y) , rank-one matrix P 0 is a nonorthogonal projectorprojecting on R(y) . The direction of nonorthogonal projection is orthogonalto z ; id est,P 0 x − x ⊥ R(P0 T ) (1756)E.6.2Nonorthogonal projection on vectorized matrixFormula (1755) is extensible. Given X,Y,Z ∈ R m×n , we have theone-dimensional nonorthogonal projection of X in isomorphic R mn on therange of vectorized Y : (2.2)〈Z , X〉Y , 〈Z , Y 〉 ≠ 0 (1757)〈Z , Y 〉where 〈Z , X〉/〈Z , Y 〉 is the coefficient of projection. The inequalityaccounts for the fact: projection on R(vec Y ) is in a direction orthogonal tovec Z .E.6.2.1Nonorthogonal projection on dyadNow suppose we have nonorthogonal projector dyadAnalogous to (1755), for X ∈ R m×mP 0 = yzTz T y ∈ Rm×m (1758)P 0 XP 0 =yzTz T y X yzTz T y = zT Xy(z T y) 2 yzT = 〈zyT , X〉〈zy T , yz T 〉 yzT (1759)

618 APPENDIX E. PROJECTIONis a nonorthogonal projection of matrix X on the range of vectorized dyadP 0 ; from which it follows:P 0 XP 0 = zT Xyz T y〈〉yz T zyT yzTz T y = z T y , X z T y = 〈P 0 T , X〉 P 0 = 〈P 0 T , X〉〈P0 T , P 0 〉 P 0(1760)Yet this relationship between matrix product and vector inner-product onlyholds for a dyad projector. When nonsymmetric projector P 0 is rank-one asin (1758), therefore,R(vecP 0 XP 0 ) = R(vec P 0 ) in R m2 (1761)andP 0 XP 0 − X ⊥ P T 0 in R m2 (1762)E.6.2.1.1 Example. λ as coefficients of nonorthogonal projection.Any diagonalization (A.5)X = SΛS −1 =m∑λ i s i wi T ∈ R m×m (1365)i=1may be expressed as a sum of one-dimensional nonorthogonal projectionsof X , each on the range of a vectorized eigenmatrix P j ∆ = s j w T j ;∑X = m 〈(Se i e T jS −1 ) T , X〉 Se i e T jS −1i,j=1∑= m ∑〈(s j wj T ) T , X〉 s j wj T + m 〈(Se i e T jS −1 ) T , SΛS −1 〉 Se i e T jS −1j=1∑= m 〈(s j wj T ) T , X〉 s j wjTj=1i,j=1j ≠ i∆ ∑= m ∑〈Pj T , X〉 P j = m s j wj T Xs j wjTj=1∑= m λ j s j wjTj=1j=1∑= m P j XP jj=1(1763)This biorthogonal expansion of matrix X is a sum of nonorthogonalprojections because the term outside the projection coefficient 〈〉 is not

E.6. VECTORIZATION INTERPRETATION, 619identical to the inside-term. (E.6.4) The eigenvalues λ j are coefficients ofnonorthogonal projection of X , while the remaining M(M −1)/2 coefficients(for i≠j) are zeroed by projection. When P j is rank-one as in (1763),andR(vecP j XP j ) = R(vec s j w T j ) = R(vec P j ) in R m2 (1764)P j XP j − X ⊥ P T j in R m2 (1765)Were matrix X symmetric, then its eigenmatrices would also be. So theone-dimensional projections would become orthogonal. (E.6.4.1.1) E.6.3Orthogonal projection on a vectorThe formula for orthogonal projection of vector x on the range of vector y(one-dimensional projection) is basic analytic geometry; [11,3.3] [251,3.2][277,2.2] [290,1-8]〈y,x〉〈y,y〉 y = yT xy T y y = yyTy T y x = ∆ P 1 x (1766)where 〈y,x〉/〈y,y〉 is the coefficient of projection on R(y) . An equivalentdescription is: Vector P 1 x is the orthogonal projection of vector x onR(P 1 )= R(y). Rank-one matrix P 1 is a projection matrix because P 2 1 =P 1 .The direction of projection is orthogonalbecause P T 1 = P 1 .E.6.4P 1 x − x ⊥ R(P 1 ) (1767)Orthogonal projection on a vectorized matrixFrom (1766), given instead X, Y ∈ R m×n , we have the one-dimensionalorthogonal projection of matrix X in isomorphic R mn on the range ofvectorized Y : (2.2)〈Y , X〉〈Y , Y 〉 Y (1768)where 〈Y , X〉/〈Y , Y 〉 is the coefficient of projection.For orthogonal projection, the term outside the vector inner-products 〈〉must be identical to the terms inside in three places.

620 APPENDIX E. PROJECTIONE.6.4.1Orthogonal projection on dyadThere is opportunity for insight when Y is a dyad yz T (B.1): Instead givenX ∈ R m×n , y ∈ R m , and z ∈ R n〈yz T , X〉〈yz T , yz T 〉 yzT = yT Xzy T y z T z yzT (1769)is the one-dimensional orthogonal projection of X in isomorphic R mn onthe range of vectorized yz T . To reveal the obscured symmetric projectionmatrices P 1 and P 2 we rewrite (1769):y T Xzy T y z T z yzT =yyTy T y X zzTz T z∆= P 1 XP 2 (1770)So for projector dyads, projection (1770) is the orthogonal projection in R mnif and only if projectors P 1 and P 2 are symmetric; E.9 in other words,andfor orthogonal projection on the range of a vectorized dyad yz T , theterm outside the vector inner-products 〈〉 in (1769) must be identicalto the terms inside in three places.When P 1 and P 2 are rank-one symmetric projectors as in (1770), (30)When y=z then P 1 =P 2 =P T 2 andR(vecP 1 XP 2 ) = R(vec yz T ) in R mn (1771)P 1 XP 2 − X ⊥ yz T in R mn (1772)P 1 XP 1 = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1 (1773)E.9 For diagonalizable X ∈ R m×m (A.5), its orthogonal projection in isomorphic R m2 onthe range of vectorized yz T ∈ R m×m becomes:P 1 XP 2 =m∑λ i P 1 s i wi T P 2i=1When R(P 1 ) = R(w j ) and R(P 2 ) = R(s j ), the j th dyad term from the diagonalizationis isolated but only, in general, to within a scale factor because neither set of left orright eigenvectors is necessarily orthonormal unless X is normal [303,3.2]. Yet whenR(P 2 )= R(s k ) , k≠j ∈{1... m}, then P 1 XP 2 =0.

E.6. VECTORIZATION INTERPRETATION, 621meaning, P 1 XP 1 is equivalent to orthogonal projection of matrix X onthe range of vectorized projector dyad P 1 . Yet this relationship betweenmatrix product and vector inner-product does not hold for general symmetricprojector matrices.E.6.4.1.1 Example. Eigenvalues λ as coefficients of orthogonal projection.Let C represent any convex subset of subspace S M , and let C 1 be any elementof C . Then C 1 can be expressed as the orthogonal expansionC 1 =M∑i=1M∑〈E ij , C 1 〉 E ij ∈ C ⊂ S M (1774)j=1j ≥ iwhere E ij ∈ S M is a member of the standard orthonormal basis for S M(50). This expansion is a sum of one-dimensional orthogonal projectionsof C 1 ; each projection on the range of a vectorized standard basis matrix.Vector inner-product 〈E ij , C 1 〉 is the coefficient of projection of svec C 1 onR(svec E ij ).When C 1 is any member of a convex set C whose dimension is L ,Carathéodory’s theorem [77] [232] [148] [29] [30] guarantees that no morethan L +1 affinely independent members from C are required to faithfullyrepresent C 1 by their linear combination. E.10Dimension of S M is L=M(M+1)/2 in isometrically isomorphicR M(M+1)/2 . Yet because any symmetric matrix can be diagonalized, (A.5.2)C 1 ∈ S M is a linear combination of its M eigenmatrices q i q T i (A.5.1) weightedby its eigenvalues λ i ;C 1 = QΛQ T =M∑λ i q i qi T (1775)i=1where Λ ∈ S M is a diagonal matrix having δ(Λ) i =λ i , and Q=[q 1 · · · q M ]is an orthogonal matrix in R M×M containing corresponding eigenvectors.To derive eigen decomposition (1775) from expansion (1774), M standardbasis matrices E ij are rotated (B.5) into alignment with the M eigenmatricesE.10 Carathéodory’s theorem guarantees existence of a biorthogonal expansion for anyelement in aff C when C is any pointed closed convex cone.

622 APPENDIX E. PROJECTIONq i q T iof C 1 by applying a similarity transformation; [251,5.6]{ }qi q{QE ij Q T i T , i = j = 1... M} = ( )√1qi 2qj T + q j qiT , 1 ≤ i < j ≤ M(1776)which remains an orthonormal basis for S M . Then remarkably∑C 1 = M 〈QE ij Q T , C 1 〉 QE ij Q Ti,j=1j ≥ i∑= M ∑〈q i qi T , C 1 〉 q i qi T + M 〈QE ij Q T , QΛQ T 〉 QE ij Q Ti=1∑= M 〈q i qi T , C 1 〉 q i qiTi=1i,j=1j > i∆ ∑= M ∑〈P i , C 1 〉 P i = M q i qi T C 1 q i qiTi=1∑= M λ i q i qiTi=1i=1∑= M P i C 1 P ii=1(1777)this orthogonal expansion becomes the diagonalization; still a sum ofone-dimensional orthogonal projections. The eigenvaluesλ i = 〈q i q T i , C 1 〉 (1778)are clearly coefficients of projection of C 1 on the range of each vectorizedeigenmatrix. (conferE.6.2.1.1) The remaining M(M −1)/2 coefficients(i≠j) are zeroed by projection. When P i is rank-one symmetric as in (1777),R(svecP i C 1 P i ) = R(svecq i q T i ) = R(svecP i ) in R M(M+1)/2 (1779)andP i C 1 P i − C 1 ⊥ P i in R M(M+1)/2 (1780)E.6.4.2Positive semidefiniteness test as orthogonal projectionFor any given X ∈ R m×m the familiar quadratic construct y T Xy ≥ 0,over broad domain, is a fundamental test for positive semidefiniteness.

E.6. VECTORIZATION INTERPRETATION, 623(A.2) It is a fact that y T Xy is always proportional to a coefficient oforthogonal projection; letting z in formula (1769) become y ∈ R m , thenP 2 =P 1 =yy T /y T y=yy T /‖yy T ‖ 2 (confer (1421)) and formula (1770) becomes〈yy T , X〉〈yy T , yy T 〉 yyT = yT Xyy T yyy Ty T y = yyTy T y X yyTy T y∆= P 1 XP 1 (1781)By (1768), product P 1 XP 1 is the one-dimensional orthogonal projection ofX in isomorphic R m2 on the range of vectorized P 1 because, for rankP 1 =1and P 21 =P 1 ∈ S m (confer (1760))P 1 XP 1 = yT Xyy T y〈〉yy T yyT yyTy T y = y T y , X y T y = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1(1782)The coefficient of orthogonal projection 〈P 1 , X〉= y T Xy/(y T y) is also knownas Rayleigh’s quotient. E.11 When P 1 is rank-one symmetric as in (1781),R(vecP 1 XP 1 ) = R(vec P 1 ) in R m2 (1783)andP 1 XP 1 − X ⊥ P 1 in R m2 (1784)E.11 When y becomes the j th eigenvector s j of diagonalizable X , for example, 〈P 1 , X 〉becomes the j th eigenvalue: [145,III]〈P 1 , X 〉| y=sj=s T j( m∑λ i s i wiTi=1s T j s j)s j= λ jSimilarly for y = w j , the j th left-eigenvector,〈P 1 , X 〉| y=wj=w T j( m∑λ i s i wiTi=1w T j w j)w j= λ jA quandary may arise regarding the potential annihilation of the antisymmetric part ofX when s T j Xs j is formed. Were annihilation to occur, it would imply the eigenvalue thusfound came instead from the symmetric part of X . The quandary is resolved recognizingthat diagonalization of real X admits complex eigenvectors; hence, annihilation could onlycome about by forming Re(s H j Xs j) = s H j (X +XT )s j /2 [150,7.1] where (X +X T )/2 isthe symmetric part of X, and s H j denotes the conjugate transpose.

624 APPENDIX E. PROJECTIONThe test for positive semidefiniteness, then, is a test for nonnegativity ofthe coefficient of orthogonal projection of X on the range of each and everyvectorized extreme direction yy T (2.8.1) from the positive semidefinite conein the ambient space of symmetric matrices.E.6.4.3 PXP ≽ 0In some circumstances, it may be desirable to limit the domain of testy T Xy ≥ 0 for positive semidefiniteness; e.g., ‖y‖= 1. Another exampleof limiting domain-of-test is central to Euclidean distance geometry: ForR(V )= N(1 T ) , the test −V DV ≽ 0 determines whether D ∈ S N h is aEuclidean distance matrix. The same test may be stated: For D ∈ S N h (andoptionally ‖y‖=1)D ∈ EDM N ⇔ −y T Dy = 〈yy T , −D〉 ≥ 0 ∀y ∈ R(V ) (1785)The test −V DV ≽ 0 is therefore equivalent to a test for nonnegativity of thecoefficient of orthogonal projection of −D on the range of each and everyvectorized extreme direction yy T from the positive semidefinite cone S N + suchthat R(yy T ) = R(y) ⊆ R(V ). (The validity of this result is independent ofwhether V is itself a projection matrix.)E.7 on vectorized matrices of higher rankE.7.1 PXP misinterpretation for higher-rank PFor a projection matrix P of rank greater than 1, PXP is generally notcommensurate with 〈P,X 〉 P as is the case for projector dyads (1782). Yet〈P,P 〉for a symmetric idempotent matrix P of any rank we are tempted to say“ PXP is the orthogonal projection of X ∈ S m on R(vec P) ”. The fallacyis: vec PXP does not necessarily belong to the range of vectorized P ; themost basic requirement for projection on R(vec P) .E.7.2Orthogonal projection on matrix subspacesWith A 1 ∈ R m×n , B 1 ∈ R n×k , Z 1 ∈ R m×k , A 2 ∈ R p×n , B 2 ∈ R n×k , Z 2 ∈ R p×k asdefined for nonorthogonal projector (1680), and definingP 1 ∆ = A 1 A † 1 ∈ S m , P 2 ∆ = A 2 A † 2 ∈ S p (1786)

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 625then, given compatible X‖X −P 1 XP 2 ‖ F = inf ‖X −A 1 (A † 1+B 1 Z TB 1 , B 2 ∈R n×k 1 )X(A †T2 +Z 2 B2 T )A T 2 ‖ F (1787)As for all subspace projectors, range of the projector is the subspace on whichprojection is made: {P 1 Y P 2 | Y ∈ R m×p }. Altogether, for projectors P 1 andP 2 of any rank, this means projection P 1 XP 2 is unique minimum-distance,orthogonalP 1 XP 2 − X ⊥ {P 1 Y P 2 | Y ∈ R m×p } in R mp (1788)and P 1 and P 2 must each be symmetric (confer (1770)) to attain the infimum.E.7.2.0.1 Proof. Minimum Frobenius norm (1787).Defining P ∆ = A 1 (A † 1 + B 1 Z T 1 ) ,inf ‖X − A 1 (A † 1 + B 1 Z1 T )X(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2= inf ‖X − PX(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2()= inf tr (X T − A 2 (A † 2 + B 2 Z2 T )X T P T )(X − PX(A †T2 + Z 2 B2 T )A T 2 )B 1 , B 2(= inf tr X T X −X T PX(A †T2 +Z 2 B2 T )A T 2 −A 2 (A †B 1 , B 22+B 2 Z2 T )X T P T X)+A 2 (A † 2+B 2 Z2 T )X T P T PX(A †T2 +Z 2 B2 T )A T 2(1789)Necessary conditions for a global minimum are ∇ B1 =0 and ∇ B2 =0. Termsnot containing B 2 in (1789) will vanish from gradient ∇ B2 ; (D.2.3)(∇ B2 tr −X T PXZ 2 B2A T T 2 −A 2 B 2 Z2X T T P T X+A 2 A † 2X T P T PXZ 2 B2A T T 2)+A 2 B 2 Z2X T T P T PXA †T2 A T 2+A 2 B 2 Z2X T T P T PXZ 2 B2A T T 2= −2A T 2X T PXZ 2 + 2A T 2A 2 A † 2X T P T PXZ 2 +)2A T 2A 2 B 2 Z2X T T P T PXZ 2= A T 2(−X T + A 2 A † 2X T P T + A 2 B 2 Z2X T T P T PXZ 2(1790)= 0 ⇔R(B 1 )⊆ N(A 1 ) and R(B 2 )⊆ N(A 2 )(or Z 2 = 0) because A T = A T AA † . Symmetry requirement (1786) is implicit.Were instead P T = ∆ (A †T2 + Z 2 B2 T )A T 2 and the gradient with respect to B 1observed, then similar results are obtained. The projector is unique.

626 APPENDIX E. PROJECTIONPerpendicularity (1788) establishes uniqueness [73,4.9] of projection P 1 XP 2on a matrix subspace. The minimum-distance projector is the orthogonalprojector, and vice versa.E.7.2.0.2 Example. PXP redux & N(V).Suppose we define a subspace of m ×n matrices, each elemental matrixhaving columns constituting a list whose geometric center (5.5.1.0.1) is theorigin in R m :R m×nc∆= {Y ∈ R m×n | Y 1 = 0}= {Y ∈ R m×n | N(Y ) ⊇ 1} = {Y ∈ R m×n | R(Y T ) ⊆ N(1 T )}= {XV | X ∈ R m×n } ⊂ R m×n (1791)the nonsymmetric geometric center subspace. Further suppose V ∈ S n isa projection matrix having N(V )= R(1) and R(V ) = N(1 T ). Then linearmapping T(X)=XV is the orthogonal projection of any X ∈ R m×n on R m×ncin the Euclidean (vectorization) sense because V is symmetric, N(XV )⊇1,and R(VX T )⊆ N(1 T ).Now suppose we define a subspace of symmetric n ×n matrices each ofwhose columns constitute a list having the origin in R n as geometric center,S n c∆= {Y ∈ S n | Y 1 = 0}= {Y ∈ S n | N(Y ) ⊇ 1} = {Y ∈ S n | R(Y ) ⊆ N(1 T )}(1792)the geometric center subspace. Further suppose V ∈ S n is a projectionmatrix, the same as before. Then V XV is the orthogonal projection ofany X ∈ S n on S n c in the Euclidean sense (1788) because V is symmetric,V XV 1=0, and R(V XV )⊆ N(1 T ). Two-sided projection is necessary onlyto remain in the ambient symmetric matrix subspace. ThenS n c = {V XV | X ∈ S n } ⊂ S n (1793)has dim S n c = n(n−1)/2 in isomorphic R n(n+1)/2 . We find its orthogonalcomplement as the aggregate of all negative directions of orthogonalprojection on S n c : the translation-invariant subspace (5.5.1.1)S n⊥c∆= {X − V XV | X ∈ S n } ⊂ S n= {u1 T + 1u T | u∈ R n }(1794)

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 627characterized by the doublet u1 T + 1u T (B.2). E.12 Defining thegeometric center mapping V(X) = −V XV 1 consistently with (830), then2N(V)= R(I − V) on domain S n analogously to vector projectors (E.2);id est,N(V) = S n⊥c (1795)a subspace of S n whose dimension is dim S n⊥c = n in isomorphic R n(n+1)/2 .Intuitively, operator V is an orthogonal projector; any argumentduplicitously in its range is a fixed point. So, this symmetric operator’snullspace must be orthogonal to its range.Now compare the subspace of symmetric matrices having all zeros in thefirst row and columnS n 1∆= {Y ∈ S n | Y e 1 = 0}{[ ] [ ] }0 0T 0 0T= X | X ∈ S n0 I 0 I{ [0 √ ] T [ √ ]= 2VN Z 0 2VN | Z ∈ SN}(1796)[ ] 0 0Twhere P = is an orthogonal projector. Then, similarly, PXP is0 Ithe orthogonal projection of any X ∈ S n on S n 1 in the Euclidean sense (1788),andS n⊥1{[ ] [ ] }0 0T 0 0TX − X | X ∈ S n ⊂ S n0 I 0 I= { (1797)ue T 1 + e 1 u T | u∈ R n}∆=Obviously, S n 1 ⊕ S n⊥1 = S n . E.12 Proof.{X − V X V | X ∈ S n } = {X − (I − 1 n 11T )X(I − 11 T 1 n ) | X ∈ Sn }Because {X1 | X ∈ S n } = R n ,= { 1 n 11T X + X11 T 1 n − 1 n 11T X11 T 1 n | X ∈ Sn }{X − V X V | X ∈ S n } = {1ζ T + ζ1 T − 11 T (1 T ζ 1 n ) | ζ ∈Rn }= {1ζ T (I − 11 T 12n ) + (I − 12n 11T )ζ1 T | ζ ∈R n }where I − 12n 11T is invertible.

628 APPENDIX E. PROJECTIONE.8 Range/Rowspace interpretationFor idempotent matrices P 1 and P 2 of any rank, P 1 XP2T is a projectionof R(X) on R(P 1 ) and a projection of R(X T ) on R(P 2 ) : For any givenX = UΣQ T ∈ R m×p , as in compact singular value decomposition (1376),P 1 XP T 2 =η∑σ i P 1 u i qi T P2 T =i=1η∑σ i P 1 u i (P 2 q i ) T (1798)i=1where η = ∆ min{m , p}. Recall u i ∈ R(X) and q i ∈ R(X T ) when thecorresponding singular value σ i is nonzero. (A.6.1) So P 1 projects u i onR(P 1 ) while P 2 projects q i on R(P 2 ) ; id est, the range and rowspace of anyX are respectively projected on the ranges of P 1 and P 2 . E.13E.9 Projection on convex setThus far we have discussed only projection on subspaces. Now wegeneralize, considering projection on arbitrary convex sets in Euclidean space;convex because projection is, then, unique minimum-distance and a convexoptimization problem:For projection P C x of point x on any closed set C ⊆ R n it is obvious:C = {P C x | x∈ R n } (1799)If C ⊆ R n is a closed convex set, then for each and every x∈ R n there existsa unique point Px belonging to C that is closest to x in the Euclidean sense.Like (1702), unique projection Px (or P C x) of a point x on convex set Cis that point in C closest to x ; [183,3.12]There exists a converse:‖x − Px‖ 2 = infy∈C ‖x − y‖ 2 (1800)E.13 When P 1 and P 2 are symmetric and R(P 1 )=R(u j ) and R(P 2 )=R(q j ), then the j thdyad term from the singular value decomposition of X is isolated by the projection. Yetif R(P 2 )= R(q l ), l≠j ∈{1... η}, then P 1 XP 2 =0.

E.9. PROJECTION ON CONVEX SET 629E.9.0.0.1 Theorem. (Bunt-Motzkin) Convex set if projections unique.[282,7.5] [146] If C ⊆ R n is a nonempty closed set and if for each and everyx in R n there is a unique Euclidean projection Px of x on C belonging toC , then C is convex.⋄Borwein & Lewis propose, for closed convex set C [41,3.3, exer.12(d)]for any point x whereas, for x /∈ C∇‖x − Px‖ 2 2 = 2(x − Px) (1801)∇‖x − Px‖ 2 = (x − Px) ‖x − Px‖ −12 (1802)E.9.0.0.2 Exercise. Norm gradient.Prove (1801) and (1802). (Not proved in [41].)A well-known equivalent characterization of projection on a convex set isa generalization of the perpendicularity condition (1701) for projection on asubspace:E.9.1Dual interpretation of projection on convex setE.9.1.0.1 Definition. Normal vector. [232, p.15]Vector z is normal to convex set C at point Px∈ C if〈z , y−Px〉 ≤ 0 ∀y ∈ C (1803)A convex set has a nonzero normal at each of its boundary points.[232, p.100] Hence, the normal or dual interpretation of projection:E.9.1.0.2 Theorem. Unique minimum-distance projection. [148,A.3.1][183,3.12] [73,4.1] [56] (Figure 129(b), p.646) Given a closed convex setC ⊆ R n , point Px is the unique projection of a given point x∈ R n on C(Px is that point in C nearest x) if and only ifPx ∈ C , 〈x − Px , y − Px〉 ≤ 0 ∀y ∈ C (1804)△⋄

630 APPENDIX E. PROJECTIONAs for subspace projection, operator P is idempotent in the sense: foreach and every x∈ R n , P(Px)=Px . Yet operator P is not linear;projector P is a linear operator if and only if convex set C (on whichprojection is made) is a subspace. (E.4)E.9.1.0.3 Theorem. Unique projection via normal cone. E.14 [73,4.3]Given closed convex set C ⊆ R n , point Px is the unique projection of agiven point x∈ R n on C if and only ifPx ∈ C , Px − x ∈ (C − Px) ∗ (1805)In other words, Px is that point in C nearest x if and only if Px − x belongsto that cone dual to translate C − Px .⋄E.9.1.1Dual interpretation as optimizationDeutsch [76, thm.2.3] [75,2] and Luenberger [183, p.134] carry forwardNirenberg’s dual interpretation of projection [207] as solution to amaximization problem: Minimum distance from a point x∈ R n to a convexset C ⊂ R n can be found by maximizing distance from x to hyperplane ∂Hover the set of all hyperplanes separating x from C . Existence of aseparating hyperplane (2.4.2.7) presumes point x lies on the boundary orexterior to set C .The optimal separating hyperplane is characterized by the fact it alsosupports C . Any hyperplane supporting C (Figure 22(a)) has form∂H − = { y ∈ R n | a T y = σ C (a) } (109)where the support function is convex, definedσ C (a) ∆ = supz∈Ca T z (459)When point x is finite and set C contains finite points, under this projectioninterpretation, if the supporting hyperplane is a separating hyperplane thenthe support function is finite. From Example E.5.0.0.8, projection P ∂H− x ofx on any given supporting hyperplane ∂H − isP ∂H− x = x − a(a T a) −1( a T x − σ C (a) ) (1806)E.14 −(C − Px) ∗ is the normal cone to set C at point Px. (E.10.3.2)

E.9. PROJECTION ON CONVEX SET 631∂H −C(a)P C xH −H +κaP ∂H− xx(b)Figure 125: Dual interpretation of projection of point x on convex set Cin R 2 . (a) κ = (a T a) −1( a T x − σ C (a) ) (b) Minimum distance from x toC is found by maximizing distance to all hyperplanes supporting C andseparating it from x . A convex problem for any convex set, distance ofmaximization is unique.

632 APPENDIX E. PROJECTIONWith reference to Figure 125, identifyingthenH + = {y ∈ R n | a T y ≥ σ C (a)} (88)‖x − P C x‖ = sup ‖x − P ∂H− x‖ = sup ‖a(a T a) −1 (a T x − σ C (a))‖∂H − | x∈H + a | x∈H += supa | x∈H +1‖a‖ |aT x − σ C (a)|(1807)which can be expressed as a convex optimization, for arbitrary positiveconstant τ‖x − P C x‖ = 1 τ maximize a T x − σ C (a)a(1808)subject to ‖a‖ ≤ τThe unique minimum-distance projection on convex set C is thereforewhere optimally ‖a ⋆ ‖= τ .P C x = x − a ⋆( a ⋆T x − σ C (a ⋆ ) ) 1τ 2 (1809)E.9.1.1.1 Exercise. Dual projection technique on polyhedron.Test that projection paradigm from Figure 125 on any convex polyhedralset.E.9.1.2Dual interpretation of projection on coneIn the circumstance set C is a closed convex cone K and there exists ahyperplane separating given point x from K , then optimal σ K (a ⋆ ) takesvalue 0 [148,C.2.3.1]. So problem (1808) for projection of x on K becomes‖x − P K x‖ = 1 τ maximize a T xasubject to ‖a‖ ≤ τ (1810)a ∈ K ◦The norm inequality in (1810) can be handled by Schur complement(3.1.7.2). Normals a to all hyperplanes supporting K belong to the polarcone K ◦ = −K ∗ by definition: (276)a ∈ K ◦ ⇔ 〈a, x〉 ≤ 0 for all x ∈ K (1811)

E.9. PROJECTION ON CONVEX SET 633Projection on cone K isP K x = (I − 1τ 2a⋆ a ⋆T )x (1812)whereas projection on the polar cone −K ∗ is (E.9.2.2.1)P K ◦x = x − P K x = 1τ 2a⋆ a ⋆T x (1813)Negating vector a , this maximization problem (1810) becomes aminimization (the same problem) and the polar cone becomes the dual cone:E.9.2‖x − P K x‖ = − 1 τ minimize a T xasubject to ‖a‖ ≤ τ (1814)a ∈ K ∗Projection on coneWhen convex set Cconditions:is a cone, there is a finer statement of optimalityE.9.2.0.1 Theorem. Unique projection on cone. [148,A.3.2]Let K ⊆ R n be a closed convex cone, and K ∗ its dual (2.13.1). Then Px isthe unique minimum-distance projection of x∈ R n on K if and only ifPx ∈ K , 〈Px − x, Px〉 = 0, Px − x ∈ K ∗ (1815)In words, Px is the unique minimum-distance projection of x on K ifand only if1) projection Px lies in K2) direction Px−x is orthogonal to the projection Px3) direction Px−x lies in the dual cone K ∗ .As the theorem is stated, it admits projection on K having empty interior;id est, on convex cones in a proper subspace of R n .⋄

634 APPENDIX E. PROJECTIONProjection on K of any point x∈−K ∗ , belonging to the negative dualcone, is the origin. By (1815): the set of all points reaching the origin, whenprojecting on K , constitutes the negative dual cone; a.k.a, the polar coneE.9.2.1K ◦ = −K ∗ = {x∈ R n | Px = 0} (1816)Relation to subspace projectionConditions 1 and 2 of the theorem are common with orthogonal projectionon a subspace R(P ) : Condition 1 is the most basic requirement;namely, Px∈ R(P ) , the projection belongs to the subspace. Invokingperpendicularity condition (1701), we recall the second requirement forprojection on a subspace:Px − x ⊥ R(P ) or Px − x ∈ R(P ) ⊥ (1817)which corresponds to condition 2. Yet condition 3 is a generalizationof subspace projection; id est, for unique minimum-distance projection ona closed convex cone K , polar cone −K ∗ plays the role R(P ) ⊥ playsfor subspace projection (P R x = x − P R ⊥ x). Indeed, −K ∗ is the algebraiccomplement in the orthogonal vector sum (p.692) [199] [148,A.3.2.5]K ⊞ −K ∗ = R n ⇔ cone K is closed and convex (1818)Also, given unique minimum-distance projection Px on K satisfyingTheorem E.9.2.0.1, then by projection on the algebraic complement via I −PinE.2 we have−K ∗ = {x − Px | x∈ R n } = {x∈ R n | Px = 0} (1819)consequent to Moreau (1822). Recalling any subspace is a closed convexcone E.15 K = R(P ) ⇔ −K ∗ = R(P ) ⊥ (1820)meaning, when a cone is a subspace R(P ) then the dual cone becomes itsorthogonal complement R(P ) ⊥ . [46,2.6.1] In this circumstance, condition 3becomes coincident with condition 2.The properties of projection on cones following inE.9.2.2 furthergeneralize to subspaces by: (4)K = R(P ) ⇔ −K = R(P ) (1821)E.15 but a proper subspace is not a proper cone (2.7.2.2.1).

E.9. PROJECTION ON CONVEX SET 635E.9.2.2 Salient properties: Projection Px on closed convex cone K[148,A.3.2] [73,5.6] For x, x 1 , x 2 ∈ R n1. P K (αx) = α P K x ∀α≥0 (nonnegative homogeneity)2. ‖P K x‖ ≤ ‖x‖3. P K x = 0 ⇔ x ∈ −K ∗4. P K (−x) = −P −K x5. (Jean-Jacques Moreau (1962)) [199]x = x 1 + x 2 , x 1 ∈ K , x 2 ∈−K ∗ , x 1 ⊥ x 2⇔x 1 = P K x , x 2 = P −K∗x(1822)6. K = {x − P −K∗x | x∈ R n } = {x∈ R n | P −K∗x = 0}7. −K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (1819)E.9.2.2.1 Corollary. I −P for cones. (conferE.2)Denote by K ⊆ R n a closed convex cone, and call K ∗ its dual. Thenx −P −K∗x is the unique minimum-distance projection of x∈ R n on K if andonly if P −K∗x is the unique minimum-distance projection of x on −K ∗ thepolar cone.⋄Proof. Assume x 1 = P K x . Then by Theorem E.9.2.0.1 we havex 1 ∈ K , x 1 − x ⊥ x 1 , x 1 − x ∈ K ∗ (1823)Now assume x − x 1 = P −K∗x . Then we havex − x 1 ∈ −K ∗ , −x 1 ⊥ x − x 1 , −x 1 ∈ −K (1824)But these two assumptions are apparently identical. We must therefore havex −P −K∗x = x 1 = P K x (1825)

636 APPENDIX E. PROJECTIONE.9.2.2.2 Corollary. Unique projection via dual or normal cone.[73,4.7] (E.10.3.2, confer Theorem E.9.1.0.3) Given point x∈ R n andclosed convex cone K ⊆ R n , the following are equivalent statements:1. point Px is the unique minimum-distance projection of x on K2. Px ∈ K , x − Px ∈ −(K − Px) ∗ = −K ∗ ∩ (Px) ⊥3. Px ∈ K , 〈x − Px, Px〉 = 0, 〈x − Px, y〉 ≤ 0 ∀y ∈ K⋄E.9.2.2.3 Example. Unique projection on nonnegative orthant.(confer (1152)) From Theorem E.9.2.0.1, to project matrix H ∈ R m×n onthe self-dual orthant (2.13.5.1) of nonnegative matrices R m×n+ in isomorphicR mn , the necessary and sufficient conditions are:H ⋆ ≥ 0tr ( (H ⋆ − H) T H ⋆) = 0H ⋆ − H ≥ 0(1826)where the inequalities denote entrywise comparison. The optimal solutionH ⋆ is simply H having all its negative entries zeroed;H ⋆ ij = max{H ij , 0} , i,j∈{1... m} × {1... n} (1827)Now suppose the nonnegative orthant is translated by T ∈ R m×n ; id est,consider R m×n+ + T . Then projection on the translated orthant is [73,4.8]H ⋆ ij = max{H ij , T ij } (1828)E.9.2.2.4 Example. Unique projection on truncated convex cone.Consider the problem of projecting a point x on a closed convex cone thatis artificially bounded; really, a bounded convex polyhedron having a vertexat the origin:minimize ‖x − Ay‖ 2y∈R Nsubject to y ≽ 0(1829)‖y‖ ∞ ≤ 1

E.9. PROJECTION ON CONVEX SET 637where the convex cone has vertex-description (2.12.2.0.1), for A∈ R n×NK = {Ay | y ≽ 0} (1830)and where ‖y‖ ∞ ≤ 1 is the artificial bound. This is a convex optimizationproblem having no known closed-form solution, in general. It arises, forexample, in the fitting of hearing aids designed around a programmablegraphic equalizer (a filter bank whose only adjustable parameters are gainper band each bounded above by unity). [66] The problem is equivalent to aSchur-form semidefinite program (3.1.7.2)minimizey∈R N , t∈Rsubject tot[tI x − Ay(x − Ay) T t]≽ 0(1831)0 ≼ y ≼ 1E.9.3nonexpansivityE.9.3.0.1 Theorem. Nonexpansivity. [125,2] [73,5.3]When C ⊂ R n is an arbitrary closed convex set, projector P projecting on Cis nonexpansive in the sense: for any vectors x,y ∈ R n‖Px − Py‖ ≤ ‖x − y‖ (1832)with equality when x −Px = y −Py . E.16⋄Proof. [40]‖x − y‖ 2 = ‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2+ 2〈x − Px, Px − Py〉 + 2〈y − Py , Py − Px〉(1833)Nonnegativity of the last two terms follows directly from the uniqueminimum-distance projection theorem (E.9.1.0.2).E.16 This condition for equality corrects an error in [56] (where the norm is applied to eachside of the condition given here) easily revealed by counter-example.

638 APPENDIX E. PROJECTIONThe foregoing proof reveals another flavor of nonexpansivity; for each andevery x,y ∈ R n‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2 ≤ ‖x − y‖ 2 (1834)Deutsch shows yet another: [73,5.5]E.9.4‖Px − Py‖ 2 ≤ 〈x − y , Px − Py〉 (1835)Easy projectionsProjecting any matrix H ∈ R n×n in the Euclidean/Frobenius senseorthogonally on the subspace of symmetric matrices S n in isomorphicR n2 amounts to taking the symmetric part of H ; (2.2.2.0.1) id est,(H+H T )/2 is the projection.To project any H ∈ R n×n orthogonally on the symmetric hollowsubspace S n h in isomorphic Rn2 (2.2.3.0.1), we may take the symmetricpart and then zero all entries along the main diagonal, or vice versa(because this is projection on the intersection of two subspaces); id est,(H + H T )/2 − δ 2 (H) .To project a matrix on the nonnegative orthant R m×n+ , simply clip allnegative entries to 0. Likewise, projection on the nonpositive orthantR m×n−sees all positive entries clipped to 0. Projection on other orthantsis equally simple with appropriate clipping.Clipping in excess of |1| each entry of a point x∈ R n is equivalentto unique minimum-distance projection of x on the unit hypercubecentered at the origin. (conferE.10.3.2)Projecting on hyperplane, halfspace, slab:E.5.0.0.8.Projection of x∈ R n on a (rectangular) hyperbox: [46,8.1.1]C = {y ∈ R n | l ≼ y ≼ u, l ≺ u} (1836)⎧⎨ l k , x k ≤ l kP(x) k = x k , l k ≤ x k ≤ u k (1837)⎩u k , x k ≥ u k

E.9. PROJECTION ON CONVEX SET 639Unique minimum-distance projection of H ∈ S n on the positivesemidefinite cone S n + in the Euclidean/Frobenius sense is accomplishedby eigen decomposition (diagonalization) followed by clipping allnegative eigenvalues to 0.Unique minimum-distance projection on the generally nonconvexsubset of all matrices belonging to S n + having rank not exceeding ρ(2.9.2.1) is accomplished by clipping all negative eigenvalues to 0 andzeroing the smallest nonnegative eigenvalues keeping only ρ largest.(7.1.2)Unique minimum-distance projection of H ∈ R m×n on the set of allm ×n matrices of rank no greater than k in the Euclidean/Frobeniussense is the singular value decomposition (A.6) of H having allsingular values beyond the k th zeroed. [248, p.208] This is also a solutionto the projection in the sense of spectral norm. [46,8.1]Projection on K of any point x∈−K ∗ , belonging to the polar cone, isequivalent to projection on the origin. (E.9.2)Projection on Lorentz cone: [46, exer.8.3(c)]P S N+ ∩ S N c= P S N+P S N c(1096)P RN×N+ ∩ S N h= P RN×NP + S Nh(7.0.1.1)P RN×N+ ∩ S= P N RN×N+P S N(E.9.5)E.9.4.0.1 Exercise. Largest singular value.Find the unique minimum-distance projection on the set of all m ×nmatrices whose largest singular value does not exceed 1. Deutsch [75] provides an algorithm for projection on polyhedral cones.Youla [302,2.5] lists eleven “useful projections”, of square-integrableuni- and bivariate real functions on various convex sets, in closed form.Unique minimum-distance projection on an ellipsoid. (Example 4.6.0.0.1,Figure 12)

640 APPENDIX E. PROJECTION❇❇❇❇❇❇❇❇ x ❜❜R n⊥ ❇❙ ❜❇❇❇❇◗◗◗◗ ❙❙❙❙❙❙ ❜❜❜❜❜z❜y❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧ R n❜❜C❜❜❜❜❜❜x −z ⊥ z −y ❜❇❜❇❇❇❇ ❜❜❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧Figure 126: Closed convex set C belongs to subspace R n (shown boundedin sketch and drawn without proper perspective). Point y is uniqueminimum-distance projection of x on C ; equivalent to product of orthogonalprojection of x on R n and minimum-distance projection of result z on C .

E.9. PROJECTION ON CONVEX SET 641E.9.5Projection on convex set in subspaceSuppose a convex set C is contained in some subspace R n . Then uniqueminimum-distance projection of any point in R n ⊕ R n⊥ on C can beaccomplished by first projecting orthogonally on that subspace, and thenprojecting the result on C ; [73,5.14] id est, the ordered product of twoindividual projections that is not commutable.Proof. (⇐) To show that, suppose unique minimum-distance projectionP C x on C ⊂ R n is y as illustrated in Figure 126;‖x − y‖ ≤ ‖x − q‖ ∀q ∈ C (1838)Further suppose P Rn x equals z . By the Pythagorean theorem‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 (1839)because x − z ⊥ z − y . (1701) [183,3.3] Then point y = P C x is the sameas P C z because‖z − y‖ 2 = ‖x − y‖ 2 − ‖x − z‖ 2 ≤ ‖z − q‖ 2 = ‖x − q‖ 2 − ‖x − z‖ 2which holds by assumption (1838).(⇒) Now suppose z = P Rn x and∀q ∈ C(1840)‖z − y‖ ≤ ‖z − q‖ ∀q ∈ C (1841)meaning y = P C z . Then point y is identical to P C x because‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 ≤ ‖x − q‖ 2 = ‖x − z‖ 2 + ‖z − q‖ 2by assumption (1841).∀q ∈ C(1842)This proof is extensible via translation argument. (E.4) Uniqueminimum-distance projection on a convex set contained in an affine subsetis, therefore, similarly accomplished.Projecting matrix H ∈ R n×n on convex cone K = S n ∩ R n×n+ in isomorphicR n2 can be accomplished, for example, by first projecting on S n and only thenprojecting the result on R n×n+ (confer7.0.1). This is because that projectionproduct is equivalent to projection on the subset of the nonnegative orthantin the symmetric matrix subspace.

642 APPENDIX E. PROJECTIONE.10 Alternating projectionAlternating projection is an iterative technique for finding a point in theintersection of a number of arbitrary closed convex sets C k , or for findingthe distance between two nonintersecting closed convex sets. Because it cansometimes be difficult or inefficient to compute the intersection or expressit analytically, one naturally asks whether it is possible to instead project(unique minimum-distance) alternately on the individual C k , often easier.Once a cycle of alternating projections (an iteration) is complete, we theniterate (repeat the cycle) until convergence. If the intersection of two closedconvex sets is empty, then by convergence we mean the iterate (the resultafter a cycle of alternating projections) settles to a point of minimum distanceseparating the sets.While alternating projection can find the point in the nonemptyintersection closest to a given point b , it does not necessarily find it.Dependably finding that point is solved by an elegantly simple enhancementto the alternating projection technique: this Dykstra algorithm (1880)for projection on the intersection is one of the most beautiful projectionalgorithms ever discovered. It is accurately interpreted as the discoveryof what alternating projection originally sought to accomplish: uniqueminimum-distance projection on the nonempty intersection of a number ofarbitrary closed convex sets C k . Alternating projection is, in fact, a specialcase of the Dykstra algorithm whose discussion we defer untilE.10.3.E.10.0.1commutative projectorsGiven two arbitrary convex sets C 1 and C 2 and their respectiveminimum-distance projection operators P 1 and P 2 , if projectors commutefor each and every x∈ R n then it is easy to show P 1 P 2 x∈ C 1 ∩ C 2 andP 2 P 1 x∈ C 1 ∩ C 2 . When projectors commute (P 1 P 2 =P 2 P 1 ), a point in theintersection can be found in a finite number of steps; while commutativity isa sufficient condition, it is not necessary (6.8.1.1.1 for example).When C 1 and C 2 are subspaces, in particular, projectors P 1 and P 2commute if and only if P 1 P 2 = P C1 ∩ C 2or iff P 2 P 1 = P C1 ∩ C 2or iff P 1 P 2 isthe orthogonal projection on a Euclidean subspace. [73, lem.9.2] Subspaceprojectors will commute, for example, when P 1 (C 2 )⊂ C 2 or P 2 (C 1 )⊂ C 1 orC 1 ⊂ C 2 or C 2 ⊂ C 1 or C 1 ⊥ C 2 . When subspace projectors commute, thismeans we can find a point in the intersection of those subspaces in a finite

E.10. ALTERNATING PROJECTION 643bC 2∂K ⊥ C 1 ∩ C 2(Pb) + PbC 1Figure 127: First several alternating projections (1853) invon Neumann-style projection of point b converging on closest pointPb in intersection of two closed convex sets in R 2 ; C 1 and C 2 are partiallydrawn in vicinity of their intersection. The pointed normal cone K ⊥ (1882)is translated to Pb , the unique minimum-distance projection of b onintersection. For this particular example, it is possible to start anywherein a large neighborhood of b and still converge to Pb . The alternatingprojections are themselves robust with respect to some significant amountof noise because they belong to translated normal cone.number of steps; we find, in fact, the closest point.E.10.0.1.1 Theorem. Kronecker projector. [247,2.7]Given any projection matrices P 1 and P 2 (subspace projectors), thenP 1 ⊗ P 2 and P 1 ⊗ I (1843)are projection matrices. The product preserves symmetry if present. ⋄E.10.0.2noncommutative projectorsTypically, one considers the method of alternating projection when projectorsdo not commute; id est, when P 1 P 2 ≠P 2 P 1 .The iconic example for noncommutative projectors illustrated inFigure 127 shows the iterates converging to the closest point in theintersection of two arbitrary convex sets. Yet simple examples likeFigure 128 reveal that noncommutative alternating projection does not

644 APPENDIX E. PROJECTIONbH 1H 2P 2 bH 1 ∩ H 2PbP 1 P 2 bFigure 128: The sets {C k } in this example comprise two halfspaces H 1 andH 2 . The von Neumann-style alternating projection in R 2 quickly convergesto P 1 P 2 b (feasibility). The unique minimum-distance projection on theintersection is, of course, Pb .always yield the closest point, although we shall show it always yields somepoint in the intersection or a point that attains the distance between twoconvex sets.Alternating projection is also known as successive projection [128] [125][48], cyclic projection [99] [192,3.2], successive approximation [56], orprojection on convex sets [245] [246,6.4]. It is traced back to von Neumann(1933) [280] and later Wiener [286] who showed that higher iterates of aproduct of two orthogonal projections on subspaces converge at each pointin the ambient space to the unique minimum-distance projection on theintersection of the two subspaces. More precisely, if R 1 and R 2 are closedsubspaces of a Euclidean space and P 1 and P 2 respectively denote orthogonalprojection on R 1 and R 2 , then for each vector b in that space,lim (P 1P 2 ) i b = P R1 ∩ R 2b (1844)i→∞Deutsch [73, thm.9.8, thm.9.35] shows rate of convergence for subspaces tobe geometric [301,1.4.4]; bounded above by κ 2i+1 ‖b‖ , i=0, 1, 2..., where

E.10. ALTERNATING PROJECTION 6450≤κ

646 APPENDIX E. PROJECTIONa(a){y | (b −P 1 b) T (y −P 1 b)=0}b(b)C 1yC 1C 1P 1 bC 2C 2bC 2PbP 1 P 2 b(c)Figure 129:(a) (distance) Intersection of two convex sets in R 2 is empty. Method ofalternating projection would be applied to find that point in C 1 nearest C 2 .(b) (distance) Given b ∈ C 2 , then P 1 b ∈ C 1 is nearest b iff(y −P 1 b) T (b −P 1 b)≤0 ∀y ∈ C 1 by the unique minimum-distance projectiontheorem (E.9.1.0.2). When P 1 b attains the distance between the two sets,hyperplane {y | (b −P 1 b) T (y −P 1 b)=0} separates C 1 from C 2 . [46,2.5.1](c) (0 distance) Intersection is nonempty.(optimization) We may want the point Pb in ⋂ C k nearest point b .(feasibility) We may instead be satisfied with a fixed point of the projectionproduct P 1 P 2 b in ⋂ C k .

E.10. ALTERNATING PROJECTION 647E.10.1Distance and existenceExistence of a fixed point is established:E.10.1.0.1 Theorem. Distance. [56]Given any two closed convex sets C 1 and C 2 in R n , then P 1 b∈ C 1 is a fixedpoint of the projection product P 1 P 2 if and only if P 1 b is a point of C 1nearest C 2 .⋄Proof. (⇒) Given fixed point a = P 1 P 2 a ∈ C 1 with b ∆ = P 2 a ∈ C 2 intandem so that a = P 1 b , then by the unique minimum-distance projectiontheorem (E.9.1.0.2)(1849)(b − a) T (u − a) ≤ 0 ∀u∈ C 1(a − b) T (v − b) ≤ 0⇔∀v ∈ C 2‖a − b‖ ≤ ‖u − v‖ ∀u∈ C 1 and ∀v ∈ C 2by Schwarz inequality ‖〈x,y〉‖ ≤ ‖x‖ ‖y‖ [167] [232].(⇐) Suppose a∈ C 1 and ‖a − P 2 a‖ ≤ ‖u − P 2 u‖ ∀u∈ C 1 . Now suppose wechoose u =P 1 P 2 a . Then‖u − P 2 u‖ = ‖P 1 P 2 a − P 2 P 1 P 2 a‖ ≤ ‖a − P 2 a‖ ⇔ a = P 1 P 2 a (1850)Thus a = P 1 b (with b =P 2 a∈ C 2 ) is a fixed point in C 1 of the projectionproduct P 1 P 2 . E.18E.10.2Feasibility and convergenceThe set of all fixed points of any nonexpansive mapping is a closed convexset. [107, lem.3.4] [23,1] The projection product P 1 P 2 is nonexpansive byTheorem E.9.3.0.1 because, for any vectors x,a∈ R n‖P 1 P 2 x − P 1 P 2 a‖ ≤ ‖P 2 x − P 2 a‖ ≤ ‖x − a‖ (1851)If the intersection of two closed convex sets C 1 ∩ C 2 is empty, then the iteratesconverge to a point of minimum distance, a fixed point of the projectionproduct. Otherwise, convergence is to some fixed point in their intersectionE.18 Point b=P 2 a can be shown, similarly, to be a fixed point of the product P 2 P 1 .

648 APPENDIX E. PROJECTION(a feasible point) whose existence is guaranteed by virtue of the fact that eachand every point in the convex intersection is in one-to-one correspondencewith fixed points of the nonexpansive projection product.Bauschke & Borwein [23,2] argue that any sequence monotonic in thesense of Fejér is convergent. E.19E.10.2.0.1 Definition. Fejér monotonicity. [200]Given closed convex set C ≠ ∅ , then a sequence x i ∈ R n , i=0, 1, 2..., ismonotonic in the sense of Fejér with respect to C iff‖x i+1 − c‖ ≤ ‖x i − c‖ for all i≥0 and each and every c ∈ C (1852)Given x 0 ∆ = b , if we express each iteration of alternating projection byx i+1 = P 1 P 2 x i , i=0, 1, 2... (1853)and define any fixed point a =P 1 P 2 a , then sequence x i is Fejér monotonewith respect to fixed point a because‖P 1 P 2 x i − a‖ ≤ ‖x i − a‖ ∀i ≥ 0 (1854)by nonexpansivity. The nonincreasing sequence ‖P 1 P 2 x i − a‖ is boundedbelow hence convergent because any bounded monotonic sequence in Ris convergent; [190,1.2] [30,1.1] P 1 P 2 x i+1 = P 1 P 2 x i = x i+1 . Sequencex i therefore converges to some fixed point. If the intersection C 1 ∩ C 2is nonempty, convergence is to some point there by the distance theorem.Otherwise, x i converges to a point in C 1 of minimum distance to C 2 .E.10.2.0.2 Example. Hyperplane/orthant intersection.Find a feasible point (1847) belonging to the nonempty intersection of twoconvex sets: given A∈ R m×n , β ∈ R(A)C 1 ∩ C 2 = R n + ∩ A = {y | y ≽ 0} ∩ {y | Ay = β} ⊂ R n (1855)the nonnegative orthant with affine subset A an intersection of hyperplanes.Projection of an iterate x i ∈ R n on A is calculatedP 2 x i = x i − A T (AA T ) −1 (Ax i − β) (1746)E.19 Other authors prove convergence by different means; e.g., [125] [48].△

E.10. ALTERNATING PROJECTION 649y 2θC 1 = R 2 +bPby 1C 2 = A = {y | [ 1 1 ]y = 1}Figure 130: From Example E.10.2.0.2 in R 2 , showing von Neumann-stylealternating projection to find feasible point belonging to intersection ofnonnegative orthant with hyperplane. Point Pb lies at intersection ofhyperplane with ordinate axis. In this particular example, the feasible pointfound is coincidentally optimal. Rate of convergence depends upon angle θ ;as it becomes more acute, convergence slows. [125,3]∥ x ∏i − ( ∞ ∏P k )b∥j=1k201816141210864200 5 10 15 20 25 30 35 40 45iFigure 131: Geometric convergence of iterates in norm, forExample E.10.2.0.2 in R 1000 .

650 APPENDIX E. PROJECTIONwhile, thereafter, projection of the result on the orthant is simplyx i+1 = P 1 P 2 x i = max{0,P 2 x i } (1856)where the maximum is entrywise (E.9.2.2.3).One realization of this problem in R 2 is illustrated in Figure 130: ForA = [ 1 1 ] , β =1, and x 0 = b = [ −3 1/2 ] T , the iterates converge to thefeasible point Pb = [ 0 1 ] T .To give a more palpable sense of convergence in higher dimension, wedo this example again but now we compute an alternating projection forthe case A∈ R 400×1000 , β ∈ R 400 , and b∈R 1000 , all of whose entries areindependently and randomly set to a uniformly distributed real number inthe interval [−1, 1] . Convergence is illustrated in Figure 131. This application of alternating projection to feasibility is extensible toany finite number of closed convex sets.E.10.2.0.3 Example. Under- and over-projection. [43,3]Consider the following variation of alternating projection: We begin withsome point x 0 ∈ R n then project that point on convex set C and thenproject that same point x 0 on convex set D . To the first iterate we assignx 1 = (P C (x 0 ) + P D (x 0 )) 1 . More generally,2x i+1 = (P C (x i ) + P D (x i )) 1 2, i=0, 1, 2... (1857)Because the Cartesian product of convex sets remains convex, (2.1.8) wecan reformulate this problem.Consider the convex set [ ]S =∆ C(1858)Drepresenting Cartesian product C × D . Now, those two projections P C andP D are equivalent to one projection on the Cartesian product; id est,([ ]) [ ]xi PC (xP S = i )(1859)x i P D (x i )Define the subspaceR ∆ ={ [ ] ∣ }Rn ∣∣∣v ∈R n [I −I ]v = 0(1860)

E.10. ALTERNATING PROJECTION 651By the results in Example E.5.0.0.6([ ]) ([ ])xi PC (xP RS = P i )x R =i P D (x i )[PC (x i ) + P D (x i )P C (x i ) + P D (x i )] 12(1861)This means the proposed variation of alternating projection is equivalent toan alternation of projection on convex sets S and R . If S and R intersect,these iterations will converge to a point in their intersection; hence, to a pointin the intersection of C and D .We need not apply equal weighting to the projections, as supposed in(1857). In that case, definition of R would change accordingly. E.10.2.1Relative measure of convergenceInspired by Fejér monotonicity, the alternating projection algorithm fromthe example of convergence illustrated by Figure 131 employs a redundant∏sequence: The first sequence (indexed by j) estimates point ( ∞ ∏P k )b inthe presumably nonempty intersection, then the quantity( ∞ ∥ x ∏∏i − P k)b∥j=1kj=1k(1862)in second sequence x i is observed per iteration i for convergence. A prioriknowledge of a feasible point (1847) is both impractical and antithetical. Weneed another measure:Nonexpansivity implies( ( ∥ ∏ ∏ ∥∥∥∥ P∥ l)x k,i−1 − P l)x ki = ‖x ki − x k,i+1 ‖ ≤ ‖x k,i−1 − x ki ‖ (1863)llwherex ki ∆ = P k x k+1,i ∈ R n (1864)represents unique minimum-distance projection of x k+1,i on convex set k atiteration i . So a good convergence measure is the total monotonic sequenceε i ∆ = ∑ k‖x ki − x k,i+1 ‖ , i=0, 1, 2... (1865)where limi→∞ε i = 0 whether or not the intersection is nonempty.

652 APPENDIX E. PROJECTIONE.10.2.1.1 Example. Affine subset ∩ positive semidefinite cone.Consider the problem of finding X ∈ S n that satisfiesX ≽ 0, 〈A j , X〉 = b j , j =1... m (1866)given nonzero A j ∈ S n and real b j . Here we take C 1 to be the positivesemidefinite cone S n + while C 2 is the affine subset of S nC 2 = A = ∆ {X | tr(A j X)=b j , j =1... m} ⊆ S n⎡ ⎤svec(A 1 ) T= {X | ⎣ . ⎦svec X = b}svec(A m ) T∆= {X | A svec X = b}(1867)where b = [b j ] ∈ R m , A ∈ R m×n(n+1)/2 , and symmetric vectorization svec isdefined by (47). Projection of iterate X i ∈ S n on A is: (E.5.0.0.6)P 2 svec X i = svec X i − A † (A svec X i − b) (1868)Euclidean distance from X i to A is thereforedist(X i , A) = ‖X i − P 2 X i ‖ F = ‖A † (A svec X i − b)‖ 2 (1869)Projection of P 2 X i ∆ = ∑ jλ j q j q T j on the positive semidefinite cone (7.1.2) isfound from its eigen decomposition (A.5.2);P 1 P 2 X i =n∑max{0 , λ j }q j qj T (1870)j=1Distance from P 2 X i to the positive semidefinite cone is thereforen∑dist(P 2 X i , S+) n = ‖P 2 X i − P 1 P 2 X i ‖ F = √ min{0,λ j } 2 (1871)When the intersection is empty A ∩ S n + = ∅ , the iterates converge to thatpositive semidefinite matrix closest to A in the Euclidean sense. Otherwise,convergence is to some point in the nonempty intersection.j=1

E.10. ALTERNATING PROJECTION 653Barvinok (2.9.3.0.1) shows that if a point feasible with (1866) exists,then there exists an X ∈ A ∩ S n + such that⌊√ ⌋ 8m + 1 − 1rankX ≤(233)2E.10.2.1.2 Example. Semidefinite matrix completion.Continuing Example E.10.2.1.1: When m≤n(n + 1)/2 and the A j matricesare distinct members of the standard orthonormal basis {E lq ∈ S n } (50){ }el e T{A j ∈ S n l , l = q = 1... n, j =1... m} ⊆ {E lq } =√12(e l e T q + e q e T l ), 1 ≤ l < q ≤ nand when the constants b jX = ∆ [X lq ]∈ S n{b j , j =1... m} ⊆(1872)are set to constrained entries of variable{ }Xlq√, l = q = 1... nX lq 2 , 1 ≤ l < q ≤ n= {〈X,E lq 〉} (1873)then the equality constraints in (1866) fix individual entries of X ∈ S n . Thusthe feasibility problem becomes a positive semidefinite matrix completionproblem. Projection of iterate X i ∈ S n on A simplifies to (confer (1868))P 2 svec X i = svec X i − A T (A svec X i − b) (1874)From this we can see that orthogonal projection is achieved simply bysetting corresponding entries of P 2 X i to the known entries of X , whilethe remaining entries of P 2 X i are set to corresponding entries of the currentiterate X i .Using this technique, we find a positive semidefinite completion for⎡ ⎤4 3 ? 2⎢ 3 4 3 ?⎥⎣ ? 3 4 3 ⎦ (1875)2 ? 3 4Initializing the unknown entries to 0, they all converge geometrically to1.5858 (rounded) after about 42 iterations.

654 APPENDIX E. PROJECTION10 0dist(P 2 X i , S n +)10 −10 2 4 6 8 10 12 14 16 18iFigure 132: Distance (confer (1871)) between PSD cone and iterate (1874) inaffine subset A (1867) for Laurent’s completion problem; initially, decreasinggeometrically.Laurent gives a problem for which no positive semidefinite completionexists: [173]⎡ ⎤1 1 ? 0⎢ 1 1 1 ?⎥⎣ ? 1 1 1 ⎦ (1876)0 ? 1 1Initializing unknowns to 0, by alternating projection we find the constrainedmatrix closest to the positive semidefinite cone,⎡⎤1 1 0.5454 0⎢ 1 1 1 0.5454⎥⎣ 0.5454 1 1 1 ⎦ (1877)0 0.5454 1 1and we find the positive semidefinite matrix closest to the affine subset A(1867): ⎡⎤1.0521 0.9409 0.5454 0.0292⎢ 0.9409 1.0980 0.9451 0.5454⎥⎣ 0.5454 0.9451 1.0980 0.9409 ⎦ (1878)0.0292 0.5454 0.9409 1.0521These matrices (1877) and (1878) attain the Euclidean distance dist(A , S+) n .Convergence is illustrated in Figure 132.

E.10. ALTERNATING PROJECTION 655bH 1y 21x 22x 21Hy 211x 12H 1 ∩ H 2x 11Figure 133: H 1 and H 2 are the same halfspaces as in Figure 128.Dykstra’s alternating projection algorithm generates the alternationsb, x 21 , x 11 , x 22 , x 12 , x 12 ...,. The path illustrated from b to x 12 in R 2terminates at the desired result, Pb . The alternations are not so robust inpresence of noise as for the example in Figure 127.E.10.3Optimization and projectionUnique projection on the nonempty intersection of arbitrary convex sets tofind the closest point therein is a convex optimization problem. The firstsuccessful application of alternating projection to this problem is attributedto Dykstra [84] [47] who in 1983 provided an elegant algorithm that prevailstoday. In 1988, Han [128] rediscovered the algorithm and provided aprimal−dual convergence proof. A synopsis of the history of alternatingprojection E.20 can be found in [49] where it becomes apparent that Dykstra’swork is seminal.E.20 For a synopsis of alternating projection applied to distance geometry, see [266,3.1].

656 APPENDIX E. PROJECTIONE.10.3.1Dykstra’s algorithmAssume we are given some point b ∈ R n and closed convex sets{C k ⊂ R n | k=1... L}. Let x ki ∈ R n and y ki ∈ R n respectively denote aprimal and dual vector (whose meaning can be deduced from Figure 133and Figure 134) associated with set k at iteration i . Initializey k0 = 0 ∀k=1... L and x 1,0 = b (1879)Denoting by P k t the unique minimum-distance projection of t on C k , andfor convenience x L+1,i ∆ = x 1,i−1 , calculation of the iterates x 1i proceeds: E.21for i=1, 2,...until convergence {for k=L... 1 {t = x k+1,i − y k,i−1x ki = P k ty ki = P k t − t}}(1880)Assuming a nonempty intersection, then the iterates converge to the uniqueminimum-distance projection of point b on that intersection; [73,9.24]Pb = limi→∞x 1i (1881)In the case all the C k are affine, then calculation of y ki is superfluousand the algorithm becomes identical to alternating projection. [73,9.26][99,1] Dykstra’s algorithm is so simple, elegant, and represents such a tinyincrement in computational intensity over alternating projection, it is nearlyalways arguably cost-effective.E.10.3.2Normal coneGlunt [106,4] observes that the overall effect of Dykstra’s iterative procedureis to drive t toward the translated normal cone to ⋂ C k at the solutionPb (translated to Pb). The normal cone gets its name from its graphicalconstruction; which is, loosely speaking, to draw the outward-normals at Pb(Definition E.9.1.0.1) to all the convex sets C k touching Pb . The relativeinterior of the normal cone subtends these normal vectors.E.21 We reverse order of projection (k=L...1) in the algorithm for continuity of exposition.

E.10. ALTERNATING PROJECTION 657K ⊥ H 1 ∩ H 2(0)K ⊥ H 1 ∩ H 2(Pb) + PbH 10H 2K ∆ = H 1 ∩ H 2PbbFigure 134: Two examples (truncated): Normal cone to H 1 ∩ H 2 at theorigin, and at point Pb on the boundary. H 1 and H 2 are the same halfspacesfrom Figure 133. The normal cone at the origin K ⊥ H 1 ∩ H 2(0) is simply −K ∗ .E.10.3.2.1 Definition. Normal cone. [198] [30, p.261] [148,A.5.2][41,2.1] [231,3] The normal cone to any set S ⊆ R n at any particularpoint a∈ R n is defined as the closed coneK ⊥ S (a) ∆ = {z ∈ R n | z T (y −a)≤0 ∀y ∈ S} = −(S − a) ∗ (1882)an intersection of halfspaces about the origin in R n hence convex regardlessof the convexity of S ; the negative dual cone to the translate S − a . △Examples of normal cone construction are illustrated in Figure 134: Thenormal cone at the origin is the vector sum (2.1.8) of two normal cones;[41,3.3, exer.10] for H 1 ∩ int H 2 ≠ ∅K ⊥ H 1 ∩ H 2(0) = K ⊥ H 1(0) + K ⊥ H 2(0) (1883)This formula applies more generally to other points in the intersection.The normal cone to any affine set A at α∈ A , for example, is theorthogonal complement of A − α . Projection of any point in the translatednormal cone KC ⊥ (a∈ C) + a on convex set C is identical to a ; in other words,point a is that point in C closest to any point belonging to the translatednormal cone KC ⊥ (a) + a ; e.g., Theorem E.4.0.0.1.

658 APPENDIX E. PROJECTIONE 3K ⊥ E 3 (11 T ) + 11 TFigure 135: A few renderings (next page) of normal cone K ⊥ Eto elliptope3E 3 (Figure 92) at point 11 T , projected on R 3 . In [174, fig.2], normalcone is claimed circular in this dimension (severe numerical artifacts corruptboundary and make interior corporeal, drawn truncated).

E.10. ALTERNATING PROJECTION 659When set S is a convex cone K , then the normal cone to K at the originK ⊥ K(0) = −K ∗ (1884)is the negative dual cone. Any point belonging to −K ∗ , projected on K ,projects on the origin. More generally, [73,4.5]K ⊥ K(a) = −(K − a) ∗ (1885)KK(a∈ ⊥ K) = −K ∗ ∩ a ⊥ (1886)⋂The normal cone to Ck at Pb in Figure 128 is the ray{ξ(b −Pb) | ξ ≥0} illustrated in Figure 134. Applying Dykstra’s algorithm tothat example, convergence to the desired result is achieved in two iterations asillustrated in Figure 133. Yet applying Dykstra’s algorithm to the examplein Figure 127 does not improve rate of convergence, unfortunately, becausethe given point b and all the alternating projections already belong to thetranslated normal cone at the vertex of intersection.E.10.3.3speculationFrom these few examples we surmise, unique minimum-distance projection onblunt polyhedral cones having nonempty interior may be found by Dykstra’salgorithm in few iterations.

660 APPENDIX E. PROJECTION

Appendix FMatlab programsMade by The MathWorks http://www.mathworks.com , Matlab is a highlevel programming language and graphical user interface for linear algebra.F.1 isedm()% Is real D a Euclidean Distance Matrix. -Jon Dattorro%% [Dclosest,X,isisnot,r] = isedm(D,tolerance,verbose,dimension,V)%% Returns: closest EDM in Schoenberg sense (default output),% a generating list X,% string ’is’ or ’isnot’ EDM,% actual affine dimension r of EDM output.% Input: matrix D,% optional absolute numerical tolerance for EDM determination,% optional verbosity ’on’ or ’off’,% optional desired affine dim of generating list X output,% optional choice of ’Vn’ auxiliary matrix (default) or ’V’.function [Dclosest,X,isisnot,r] = isedm(D,tolerance_in,verbose,dim,V);isisnot = ’is’;N = length(D);2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.661

662 APPENDIX F. MATLAB PROGRAMSif nargin < 2 | isempty(tolerance_in)tolerance_in = eps;endtolerance = max(tolerance_in, eps*N*norm(D));if nargin < 3 | isempty(verbose)verbose = ’on’;endif nargin < 5 | isempty(V)use = ’Vn’;elseuse = ’V’;end% is emptyif N < 1if strcmp(verbose,’on’), disp(’Input D is empty.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend% is squareif size(D,1) ~= size(D,2)if strcmp(verbose,’on’), disp(’An EDM must be square.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend% is realif ~isreal(D)if strcmp(verbose,’on’), disp(’Because an EDM is real,’), endisisnot = ’isnot’;D = real(D);end

F.1. ISEDM() 663% is nonnegativeif sum(sum(chop(D,tolerance) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is nonnegative,’),endend% is symmetricif sum(sum(abs(chop((D - D’)/2,tolerance)) > 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is symmetric,’), endD = (D + D’)/2; % only required conditionend% has zero diagonalif sum(abs(diag(chop(D,tolerance))) > 0)isisnot = ’isnot’;if strcmp(verbose,’on’)disp(’Because an EDM has zero main diagonal,’)endend% is EDMif strcmp(use,’Vn’)VDV = -Vn(N)’*D*Vn(N);elseVDV = -Vm(N)’*D*Vm(N);end[Evecs Evals] = signeig(VDV);if ~isempty(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because -VDV < 0,’), endendif strcmp(verbose,’on’)if strcmp(isisnot,’isnot’)disp(’matrix input is not EDM.’)elseif tolerance_in == epsdisp(’Matrix input is EDM to machine precision.’)elsedisp(’Matrix input is EDM to specified tolerance.’)end

664 APPENDIX F. MATLAB PROGRAMSend% find generating listr = max(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) > 0));if isempty(r)r = 0;endif nargin < 4 | isempty(dim)dim = r;elsedim = round(dim);endt = r;r = min(r,dim);if r == 0X = zeros(1,N);elseif strcmp(use,’Vn’)X = [zeros(r,1) diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’];elseX = [diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’]/sqrt(2);endendif strcmp(isisnot,’isnot’) | dim < tDclosest = Dx(X);elseDclosest = D;end

F.1. ISEDM() 665F.1.1Subroutines for isedm()F.1.1.1 chop()% zeroing entries below specified absolute tolerance threshold% -Jon Dattorrofunction Y = chop(A,tolerance)R = real(A);I = imag(A);if nargin == 1tolerance = max(size(A))*norm(A)*eps;endidR = find(abs(R) < tolerance);idI = find(abs(I) < tolerance);R(idR) = 0;I(idI) = 0;Y = R + i*I;F.1.1.2 Vn()function y = Vn(N)y = [-ones(1,N-1);eye(N-1)]/sqrt(2);F.1.1.3 Vm()% returns EDM V matrixfunction V = Vm(n)V = [eye(n)-ones(n,n)/n];

666 APPENDIX F. MATLAB PROGRAMSF.1.1.4 signeig()% Sorts signed real part of eigenvalues% and applies sort to values and vectors.% [Q, lam] = signeig(A)% -Jon Dattorrofunction [Q, lam] = signeig(A);[q l] = eig(A);lam = diag(l);[junk id] = sort(real(lam));id = id(length(id):-1:1);lam = diag(lam(id));Q = q(:,id);if nargout < 2Q = diag(lam);endF.1.1.5 Dx()% Make EDM from point listfunction D = Dx(X)[n,N] = size(X);one = ones(N,1);del = diag(X’*X);D = del*one’ + one*del’ - 2*X’*X;

F.2. CONIC INDEPENDENCE, CONICI() 667F.2 conic independence, conici()(2.10) The recommended subroutine lp() (F.2.1) is a linear program solver(simplex method) from Matlab’s Optimization Toolbox v2.0 (R11). Laterreleases of Matlab replace lp() with linprog() (interior-point method)that we find quite inferior to lp() on an assortment of problems; indeed,inherent limitation of numerical precision to1E-8 in linprog() causes failurein programs previously working with lp().Given an arbitrary set of directions, this c.i. subroutine removes theconically dependent members. Yet a conically independent set returned isnot necessarily unique. In that case, if desired, the set returned may bealtered by reordering the set input.% Test for c.i. of arbitrary directions in rows or columns of X.% -Jon Dattorrofunction [Xci, indep_str, how_many_depend] = conici(X,rowORcol,tol);if nargin < 3tol=max(size(X))*eps*norm(X);endif nargin < 2 | strcmp(rowORcol,’col’)rowORcol = ’col’;Xin = X;elseif strcmp(rowORcol,’row’)Xin = X’;elsedisp(’Invalid rowORcol input.’)returnend[n, N] = size(Xin);indep_str = ’conically independent’;how_many_depend = 0;if rank(Xin) == NXci = X;returnend

668 APPENDIX F. MATLAB PROGRAMScount = 1;new_N = N;% remove zero rows or columnsfor i=1:Nif chop(Xin(:,count),tol)==0how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];new_N = new_N - 1;elsecount = count + 1;endend% remove conic dependenciescount = 1;newer_N = new_N;for i=1:new_Nif newer_N > 1A = [Xin(:,1:count-1) Xin(:,count+1:newer_N); -eye(newer_N-1)];b = [Xin(:,count); zeros(newer_N-1,1)];[a, lambda, how] = lp(zeros(newer_N-1,1),A,b,[ ],[ ],[ ],n,-1);if ~strcmp(how,’infeasible’)how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];newer_N = newer_N - 1;elsecount = count + 1;endendendif strcmp(rowORcol,’col’)Xci = Xin;elseXci = Xin’;end

F.2. CONIC INDEPENDENCE, CONICI() 669F.2.1 lp()LP Linear programming.X=LP(f,A,b) solves the linear programming problem:min f’x subject to: Ax

670 APPENDIX F. MATLAB PROGRAMSF.3 Map of the USAF.3.1EDM, mapusa()(5.13.1.0.1)% Find map of USA using only distance information.% -Jon Dattorro% Reconstruction from EDM.clear all;close all;load usalo; % From Matlab Mapping Toolbox% http://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.html% To speed-up execution (decimate map data), make% ’factor’ bigger positive integer.factor = 1;Mg = 2*factor; % Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);

F.3. MAP OF THE USA 671% plot original dataplot3(x,y,z), axis equal, axis offlengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelat statelonclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;% destroy input dataclear XVn = [-ones(1,N-1); speye(N-1)];VDV = (-Vn’*D*Vn)/2;clear D Vnpack[evec evals flag] = eigs(VDV, speye(size(VDV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));index = find(abs(evals) > eps*normest(VDV)*N);n = sum(evals(index) > 0);Xs = [zeros(n,1) diag(sqrt(evals(index)))*evec(:,index)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)% plot map found via EDM.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

672 APPENDIX F. MATLAB PROGRAMSF.3.1.1USA map input-data decimation, decimate()function xd = decimate(x,m)roll = 0;rock = 1;for i=1:length(x)if isnan(x(i))roll = 0;xd(rock) = x(i);rock=rock+1;elseif ~mod(roll,m)xd(rock) = x(i);rock=rock+1;endroll=roll+1;endendxd = xd’;F.3.2(5.13.2.1)EDM using ordinal data, omapusa()% Find map of USA using ordinal distance information.% -Jon Dattorroclear all;close all;load usalo; % From Matlab Mapping Toolbox% http://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.htmlfactor = 1;Mg = 2*factor; % Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);

F.3. MAP OF THE USA 673statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);% plot original dataplot3(x,y,z), axis equal, axis offlengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelatclear statelon state stateborder greatlakesclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;% ORDINAL MDS - vectorize Dcount = 1;M = (N*(N-1))/2;f = zeros(M,1);for j=2:Nfor i=1:j-1f(count) = D(i,j);count = count + 1;endend

674 APPENDIX F. MATLAB PROGRAMS% sorted is f(idx)[sorted idx] = sort(f);clear D sorted Xf(idx)=((1:M).^2)/M^2;% Create ordinal data matrixO = zeros(N,N);count = 1;for j=2:Nfor i=1:j-1O(i,j) = f(count);O(j,i) = f(count);count = count+1;endendclear f idxVn = sparse([-ones(1,N-1); eye(N-1)]);VOV = (-Vn’*O*Vn)/2;clear O Vnpack[evec evals flag] = eigs(VOV, speye(size(VOV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));Xs = [zeros(3,1) diag(sqrt(evals(1:3)))*evec(:,1:3)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)% plot map found via Ordinal MDS.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

F.4. RANK REDUCTION SUBROUTINE, RRF() 675F.4 Rank reduction subroutine, RRf()(4.3.1.0.1)% Rank Reduction function -Jon Dattorro% Inputs are:% Xstar matrix,% affine equality constraint matrix A whose rows are in svec format.%% Tolerance scheme needs revision...function X = RRf(Xstar,A);rand(’seed’,23);m = size(A,1);n = size(Xstar,1);if size(Xstar,1)~=size(Xstar,2)disp(’Rank Reduction subroutine: Xstar not square’), pauseendtoler = norm(eig(Xstar))*size(Xstar,1)*1e-9;if sum(chop(eig(Xstar),toler)

676 APPENDIX F. MATLAB PROGRAMS% try to find sparsity pattern for Z_itolerance = norm(X,’fro’)*size(X,1)*1e-9;Ztem = zeros(rho,rho);pattern = find(chop(cumu,tolerance)==0);if isempty(pattern) % if no sparsity, do random projectionranp = svect(2*(rand(rho,rho)-0.5));Z(1:rho,1:rho,i)...=svectinv((eye(rho*(rho+1)/2)-pinv(svectRAR)*svectRAR)*ranp);elsedisp(’sparsity pattern found’)Ztem(pattern)=1;Z(1:rho,1:rho,i) = Ztem;endphiZ = 1;toler = norm(eig(Z(1:rho,1:rho,i)))*rho*1e-9;if sum(chop(eig(Z(1:rho,1:rho,i)),toler)

F.4. RANK REDUCTION SUBROUTINE, RRF() 677F.4.1 svect()% Map from symmetric matrix to vector% -Jon Dattorrofunction y = svect(Y,N)if nargin == 1N=size(Y,1);endy = zeros(N*(N+1)/2,1);count = 1;for j=1:Nfor i=1:jif i~=jy(count) = sqrt(2)*Y(i,j);elsey(count) = Y(i,j);endcount = count + 1;endend

678 APPENDIX F. MATLAB PROGRAMSF.4.2 svectinv()% convert vector into symmetric matrix. m is dim of matrix.% -Jon Dattorrofunction A = svectinv(y)m = round((sqrt(8*length(y)+1)-1)/2);if length(y) ~= m*(m+1)/2disp(’dimension error in svectinv()’);pauseendA = zeros(m,m);count = 1;for j=1:mfor i=1:mif i

F.5. STURM’S PROCEDURE 679F.5 Sturm’s procedureThis is a demonstration program that can easily be transformed to asubroutine for decomposing positive semidefinite matrix X . This procedureprovides a nonorthogonal alternative (A.7.5.0.1) to eigen decomposition.That particular decomposition obtained is dependent on choice of matrix A .% Sturm procedure to find dyad-decomposition of X -Jon Dattorroclear allN = 4;r = 2;X = 2*(rand(r,N)-0.5);X = X’*X;t = null(svect(X)’);A = svectinv(t(:,1));% Suppose given matrix A is positive semidefinite%[v,d] = signeig(X);%d(1,1)=0; d(2,2)=0; d(3,3)=pi;%A = v*d*v’;tol = 1e-8;Y = X;y = zeros(size(X,1),r);rho = r;for k=1:r[v,d] = signeig(Y);v = v*sqrt(chop(d,1e-14));viol = 0;j = [ ];for i=2:rhoif chop((v(:,1)’*A*v(:,1))*(v(:,i)’*A*v(:,i)),tol) ~= 0viol = 1;endif (v(:,1)’*A*v(:,1))*(v(:,i)’*A*v(:,i)) < 0j = i;

680 APPENDIX F. MATLAB PROGRAMSbreakendendif ~violy(:,k) = v(:,1);elseif isempty(j)disp(’Sturm procedure taking default j’), j = 2; returnend % debugalpha = (-2*(v(:,1)’*A*v(:,j)) + sqrt((2*v(:,1)’*A*v(:,j)).^2 ...-4*(v(:,j)’*A*v(:,j))*(v(:,1)’*A*v(:,1))))/(2*(v(:,j)’*A*v(:,j)));y(:,k) = (v(:,1) + alpha*v(:,j))/sqrt(1+alpha^2);if chop(y(:,k)’*A*y(:,k),tol) ~= 0alpha = (-2*(v(:,1)’*A*v(:,j)) - sqrt((2*v(:,1)’*A*v(:,j)).^2 ...-4*(v(:,j)’*A*v(:,j))*(v(:,1)’*A*v(:,1))))/(2*(v(:,j)’*A*v(:,j)));y(:,k) = (v(:,1) + alpha*v(:,j))/sqrt(1+alpha^2);if chop(y(:,k)’*A*y(:,k),tol) ~= 0disp(’Zero problem in Sturm!’), returnend % debugendendY = Y - y(:,k)*y(:,k)’;rho = rho - 1;endz = zeros(size(y));e = zeros(N,N);for i=1:rz(:,i) = y(:,i)/norm(y(:,i));e(i,i) = norm(y(:,i))^2;endlam = diag(e);[junk id] = sort(real(lam));id = id(length(id):-1:1);z = [z(:,id(1:r)) null(z’)] % Sturme = diag(lam(id))[v,d] = signeig(X) % eigenvalue decompositionX-z*e*z’traceAX = trace(A*X)

F.6. CONVEX ITERATION DEMONSTRATION 681F.6 Convex Iteration demonstrationWe demonstrate implementation of a rank constraint in a semidefiniteBoolean feasibility problem from4.6.0.0.4. It requires CVX, [117] an intuitiveMatlab interface for interior-point method solvers.There are a finite number 2 N=50 ≈1E15 of binary vectors x . The feasibleset of semidefinite program (683) is the intersection of an elliptope withM =10 halfspaces in vectorized composite G . Size of the optimal rank-1solution set is proportional to the positive factor scaling vector b . Thesmaller that optimal Boolean solution set, the harder this problem is to solve;indeed, it can be made as small as one point. That scale factor and initialstate of random number generators, making matrix A and vector b , areselected to demonstrate Boolean solution to one instance in a few iterations(a few seconds), whereas sequential binary search takes one hour to test25.7 million vectors before finding one Boolean solution feasible to nonconvexproblem (680). (Other parameters can be selected to realize a reversal ofthese timings.)% Discrete optimization problem demo.% -Jon Dattorro, June 4, 2007% Find x\in{-1,1}^N such that Ax

682 APPENDIX F. MATLAB PROGRAMSwhile 1cvx_begin % requires CVX Boydvariable X(N,N) symmetric;variable x(N,1);G = [X, x;x’, 1];minimize(trace(G*Y));diag(X) == 1;G == semidefinite(N+1);A*x max(diag(d))*1e-8)oldtrace = traceGY;traceGY = trace(G*Y)if rankG == 1breakendif round((oldtrace - traceGY)*1e3) == 0disp(’STALLED’);disp(’ ’);Y = -v(:,2:N+1)*(v(:,2:N+1)’ + randn(N,1)*v(:,1)’);endcount = count + 1;endxcounttocdisp(’Ax

F.6. CONVEX ITERATION DEMONSTRATION 683disp(’ ’);disp(’Combinatorial search for a feasible binary solution:’)ticfor i=1:2^Nbinary = str2num(dec2bin(i-1)’);binary(find(~binary)) = -1;y = [-ones(N-length(binary),1); binary];if sum(A*y

684 APPENDIX F. MATLAB PROGRAMSF.7 fast max cutWe use the graph generator (C program) rudy written by Giovanni Rinaldi[226] which can be found at http://convexoptimization.com/TOOLS/RUDYtogether with graph data. (4.6.0.0.6)% fast max cut, Jon Dattorro, July 2007, http://convexoptimization.comclear all;format short g; ticfid = fopen(’graphs12’,’r’);average = 0;NN = 0;s = fgets(fid);cvx_precision([1e-12, 1e-4]);cvx_quiet(true);w = 1000;while s ~= -1s = str2num(s);N = s(1);A = zeros(N);for i=1:s(2)s = str2num(fgets(fid));A(s(1),s(2)) = s(3);A(s(2),s(1)) = s(3);endQ = (diag(A*ones(N,1)) - A)/4;W = zeros(N);traceXW = 1e15;while 1cvx_begin% CVX Boydvariable X(N,N) symmetric;maximize(trace(Q*X) - w*trace(W*X));X == semidefinite(N);diag(X) == 1;cvx_end[v,d,q] = svd(X);W = v(:,2:N)*v(:,2:N)’;rankX = sum(diag(d) > max(diag(d))*1e-8)

F.7. FAST MAX CUT 685endoldtrace = traceXW;traceXW = trace(X*W)if (rankX == 1)breakendif round((oldtrace - traceXW)*1e3) maximmaxim = y’*Q*y;ymax = y;endendif (maxim == 0) && (abs(trace(Q*X))

686 APPENDIX F. MATLAB PROGRAMSF.8 Signal dropout problem(4.5.1.0.2) Requires CVX. [117]%signal dropout problem -Jon Dattorroclear allclose allN = 500;k = 4;% cardinalityPhi = dctmtx(N)’; % DCT basissuccesses = 0;total = 0;logNormOfDifference_rate = 0;SNR_rate = 0;for i=1:1close allSNR = 0;na = .02;SNR10 = 10;while SNR

F.8. SIGNAL DROPOUT PROBLEM 687while 1cvx_beginvariable x(N);f = Phi*x;minimize(x’*y + norm([f(1:ll)-(s(1:ll)+noise(1:ll));f(ul:N)-(s(ul:N)+noise(ul:N))]));x >= 0;cvx_endendif ~(strcmp(cvx_status,’Solved’) ||...strcmp(cvx_status,’Inaccurate/Solved’))disp(’cit failed’)returnendcvx_beginvariable y(N);minimize(x’*y);0

688 APPENDIX F. MATLAB PROGRAMScountfigure(1);plot([s(1:ll)+noise(1:ll); noise(ll+1:ul-1);s(ul:N)+noise(ul:N)]);hold on; plot([s(1:ll)+noise(1:ll); zeros(ul-ll-1,1);s(ul:N)+noise(ul:N)],’k’)V = axis;figure(2);plot(f,’r’);hold on; plot([s(1:ll)+noise(1:ll); noise(ll+1:ul-1);s(ul:N)+noise(ul:N)]);axis(V);logNormOfDifference = 20*log10(norm(f-s)/norm(s))SNRfigure(3);plot(f,’r’); hold on; plot(s);axis(V);figure(4);plot(f-s);axis(V);temp = sort([find(chop(x,max(x)*1e-8)); zeros(k-cardx,1)])...- sort(p(1:k))’if ~sum(temp)successes = successes + 1;endtotal = total + 1success_avg = successes/totallogNormOfDifference_rate = logNormOfDifference_rate...+ logNormOfDifference;SNR_rate = SNR_rate + SNR;logNormOfDifferenceAvg = logNormOfDifference_rate/totalSNR_avg = SNR_rate/total, disp(’ ’)endsuccesses

Appendix GNotation and a few definitionsbb ib i:jb k (i:j)b Tb HA −2TA T 1vector, scalar, logical condition (italic abcdefghijklmnopqrstuvwxyz)i th entry of vector b=[b i , i=1... n] or i th b vector from a set or list{b j , j =1... n} or i th iterate of vector bor b(i:j) , truncated vector comprising i th through j th entry of vector btruncated vector comprising i th through j th entry of vector b kvector transposeHermitian (conjugate) transposematrix transpose of squared inversefirst of various transpositions of a cubix or quartix AAskinnymatrix, scalar, or logical condition(italic ABCDEFGHIJKLMNOPQRSTUV WXY Z)⎡a skinny matrix; meaning, more rows than columns: ⎣⎤⎦. Whenthere are more equations than unknowns, we say that the system Ax = bis overdetermined. [110,5.3]fat a fat matrix; meaning, more columns than rows:[ ]underdetermined2001 Jon Dattorro. CO&EDG version 2007.11.26. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.689

690 APPENDIX G. NOTATION AND A FEW DEFINITIONSAF(C ∋A)G(K)A −1A †√some set (calligraphic ABCDEFGHIJ KLMN OPQRST UVWX YZ)smallest face (139) that contains element A of set Cgenerators (2.8.1.2) of set K ; any collection of points and directionswhose hull constructs Kinverse of matrix AMoore-Penrose pseudoinverse of matrix Apositive square rootA 1/2 and √ A A 1/2 is any matrix √ such that A 1/2 A 1/2 =A .For A ∈ S n + , A ∈ Sn+ is unique and √ A √ A =A . [41,1.2] (A.5.2.1)◦√D = ∆ [ √ d ij ] . (1179) Hadamard positive square root: D = ◦√ D ◦ ◦√ D .EE ijA ijA iD iA(:,i)A(j,:)A i:j,k:le.g.no.a.i.c.i.elementary matrixmember of standard orthonormal basis for symmetric (50) or symmetrichollow (65) matricesij th entry of matrix Ai th matrix from a seti th principal submatrix or i th iterate of Di th column of matrix A [110,1.1.8]j th row of matrix Aor A(i:j, k:l) , submatrix taken from i th through j th row andk th through l th columnexempli gratia, from the Latin meaning for sake of examplenumber, from the Latin numeroaffinely independentconically independent

691l.i.w.r.tlinearly independentwith respect toa.k.a also known asReImı or jreal partimaginary part√ −1⊂ ⊃ ∩ ∪ standard set theory, subset, superset, intersection, union∈membership, element belongs to, or element is a member of∋ membership, contains as in C ∋ y (C contains element y)∃∴∀∝∞≡such thatthere existsthereforefor all, or over allproportional toinfinityequivalent to∆= defined equal to, or equal by definition≈≃∼ =approximately equal toisomorphic to or withcongruent to or withHadamard quotient as in, for x,y ∈ R n ,xy∆= [x i /y i , i=1... n ]∈ R n◦ Hadamard product of matrices: x ◦ y ∆ = [x i y i , i=1... n ]∈ R n

692 APPENDIX G. NOTATION AND A FEW DEFINITIONS⊗⊕⊖⊞Kronecker product of matrices (D.1.2.1)vector sum of sets X = Y ⊕ Z where every element x∈X has uniqueexpression x = y + z where y ∈ Y and z ∈ Z ; [232, p.19] then thesummands are algebraic complements. X = Y ⊕ Z ⇒ X = Y + Z .Now assume Y and Z are nontrivial subspaces. X = Y + Z ⇒X = Y ⊕ Z ⇔ Y ∩ Z =0 [233,1.2] [73,5.8]. Each element froma vector sum (+) of subspaces has a unique representation (⊕) when abasis from each subspace is linearly independent of bases from all theother subspaces.likewise, the vector difference of setsorthogonal vector sum of sets X = Y ⊞ Z where every element x∈Xhas unique orthogonal expression x = y + z where y ∈ Y , z ∈ Z ,and y ⊥ z . [249, p.51] X = Y ⊞ Z ⇒ X = Y + Z . If Z ⊆ Y ⊥ thenX = Y ⊞ Z ⇔ X = Y ⊕ Z . [73,5.8] If Z = Y ⊥ then the summandsare orthogonal complements.± plus or minus⊥as in A ⊥ B meaning A is orthogonal to B (and vice versa), whereA and B are sets, vectors, or matrices. When A and B arevectors (or matrices under Frobenius norm), A ⊥ B ⇔ 〈A,B〉 = 0⇔ ‖A + B‖ 2 = ‖A‖ 2 + ‖B‖ 2\ as in \A means logical not A , or relative complement of set A ;id est, \A = {x /∈A} ; e.g., B\A ∆ = {x∈ B | x /∈A} ≡ B ∩\A⇒ or ⇐ sufficiency or necessity, implies; e.g., A ⇒ B ⇔ \A ⇐ \B⇔is or ←→if and only if (iff) or corresponds to or necessary and sufficient orthe same asas in A is B means A ⇒ B ; conventional usage of English languageby mathematiciansdoes not implyis replaced with; substitution, assignmentgoes to, or approaches, or maps to

693t → 0 +t goes to 0 from above; meaning, from the positive [148, p.2]: as in f : R n → R m meaning f is a mapping, or sequence of successiveintegers specified by bounds as in i:j (if j < i then sequence isdescending)f : M → Rmeaning f is a mapping from ambient space M to ambient R , notnecessarily denoting either domain or range| as in f(x) | x∈ C means with the condition(s) or such that orevaluated for, or as in {f(x) | x∈ C} means evaluated for each andevery x belonging to set Cg| xpexpression g evaluated at x pA, B as in, for example, A, B ∈ S N means A ∈ S N and B ∈ S N(A, B) open interval between A and B in R , or variable pair perhaps ofdisparate dimension[A, B ]closed interval or line segment between A and B in R( ) hierarchal, parenthetical, optional{ } curly braces denote a set or list, e.g., {Xa | a ≽ 0} the set of all Xafor each and every a ≽ 0 where membership of a to some space isimplicit, a union〈〉 angle brackets denote vector inner-product (26) (31)[ ] matrix or vector, or quote insertion, or citation[d ij ] matrix whose ij th entry is d ij[x i ] vector whose i th entry is x ix pparticular value of xx 0 particular instance of x , or initial value of a sequence x ix 1 first entry of vector x , or first element of a set or list {x i }x εextreme point

694 APPENDIX G. NOTATION AND A FEW DEFINITIONSx + vector x whose negative entries are replaced with 0 ,or clipped vector x or nonnegative part of xx ⋆x ∗f ∗P C x or PxP k xδ(A)δ 2 (A)δ(A) 2λ i (X)λ(X) iλ(A)σ(A)Σ∑π(γ)ΞΠ∏optimal value of variable xcomplex conjugate or dual variableconvex conjugate functionprojection of point x on set C , P is operator or idempotent matrixprojection of point x on set C k or on range of implicit vector(A.1) vector made from the main diagonal of A if A is a matrix;otherwise, diagonal matrix made from vector A≡ δ(δ(A)). For vector or diagonal matrix Λ , δ 2 (Λ) = Λ= δ(A)δ(A) where A is a vectori th entry of vector λ is function of Xi th entry of vector-valued function of Xvector of eigenvalues of matrix A , (1291) typically arranged innonincreasing ordervector of singular values of matrix A (always arranged in nonincreasingorder), or support functiondiagonal matrix of singular values, not necessarily squaresumnonlinear permutation operator (or presorting function) arrangesvector γ into nonincreasing order (7.1.3)permutation matrixdoublet or permutation matrixproduct

695ψ(Z)DDD T (X)D(X) TD −1 (X)D(X) −1D ⋆D ∗signum-like step function that returns a scalar for matrix argument(605), it returns a vector for vector argument (1391)symmetric hollow matrix of distance-square,or Euclidean distance matrixEuclidean distance matrix operatoradjoint operatortranspose of D(X)inverse operatorinverse of D(X)optimal value of variable Ddual to variable DD ◦ polar variable D∂ partial derivative or matrix of distance-square squared or as in ∂K ;boundary of set K∂y√d ijd ijpartial differential of y(absolute) distance scalardistance-square scalar, EDM entryV geometric centering operator, V(D)= −V DV 1 2V NVV N (D)= −V T N DV NN ×N symmetric elementary, auxiliary, projector, geometric centeringmatrix, R(V )= N(1 T ) , N(V )= R(1) , V 2 =V (B.4.1)V N N ×N −1 Schoenberg auxiliary matrix, R(V N )= N(1 T ) ,N(VN T )= R(1) (B.4.2)V X V X V T X ≡ V T X T XV (1022)

696 APPENDIX G. NOTATION AND A FEW DEFINITIONSXGrnNdomonontoepispanpoint list (having cardinality N) arranged columnar in R n×N , or set ofgenerators, or extreme directions, or matrix variableGram matrix X T Xaffine dimensionEuclidean dimension of list X , or integercardinality of list X , or integerfunction domainfunction f(x) on A means A is domf , or projection of x on Ameans A is Euclidean body on which projection of x is madefunction f(x) maps onto M means f over its domain is a surjectionwith respect to Mfunction epigraphas in spanA = R(A) = {Ax | x∈ R n } when A is a matrixR(A) the subspace: range of A , or span basis R(A) ; R(A) ⊥ N(A T )basis R(A)columnar basis for range of A , or a minimal set constituting generatorsfor the vertex-description of R(A) , or a linearly independent set ofvectors spanning R(A)N(A) the subspace: nullspace of A ; N(A) ⊥ R(A T )× Cartesian product[ ] RmR n R m × R n = R m+nR m×nEuclidean vector space of m by n dimensional real matricesR n Euclidean n-dimensional real vector space (nonnegative integer n)C n or C n×nEuclidean complex vector space of respective dimension n and n×nR n + or R n×n+ nonnegative orthant in Euclidean vector space of respective dimensionn and n×n

697R n −or R n×n−S nS n⊥S n +int S n +S n +(ρ)EDM N√EDMNPSDSDPSVDSNRdBEDMS n 1S n hS n⊥hnonpositive orthant in Euclidean vector space of respective dimensionn and n×nsubspace comprising all (real) symmetric n×n matrices,the symmetric matrix subspaceorthogonal complement of S n in R n×n , the antisymmetric matricesconvex cone comprising all (real) symmetric positive semidefinite n×nmatrices, the positive semidefinite coneinterior of convex cone comprising all (real) symmetric positivesemidefinite n×n matrices; id est, positive definite matricesconvex set of all positive semidefinite n×n matrices whose rank equalsor exceeds ρcone of N ×N Euclidean distance matrices in the symmetric hollowsubspacenonconvex cone of N ×N Euclidean absolute distance matrices in thesymmetric hollow subspacepositive semidefinitesemidefinite programsingular value decompositionsignal to noise ratiodecibelEuclidean distance matrixsubspace comprising all symmetric n×n matrices having all zeros infirst row and column (1796)subspace comprising all symmetric hollow n×n matrices (0 maindiagonal), the symmetric hollow subspace (57)orthogonal complement of S n h in Sn (58), the set of all diagonal matrices

698 APPENDIX G. NOTATION AND A FEW DEFINITIONSS n csubspace comprising all geometrically centered symmetric n×nmatrices; geometric center subspace S N ∆c = {Y ∈ S N | Y 1=0} (1792)S n⊥c orthogonal complement of S n c in S n (1794)R m×ncsubspace comprising all geometrically centered m×n matricesX ⊥ basis N(X T )x ⊥ N(x T ) , {y | x T y = 0}R(P ) ⊥ N(P T )R ⊥ orthogonal complement of R⊆ R n ; R ⊥ ={y ∆ ∈ R n | 〈x,y〉=0 ∀x∈ R}K ⊥KK ∗normal coneconedual coneK ◦ polar cone; K ∗ = −K ◦K M+K MK λK ∗ λδHH −H +∂H∂Hmonotone nonnegative conemonotone conespectral conecone of majorizationhalfspacehalfspace described using an outward-normal (87) to the hyperplanepartially bounding ithalfspace described using an inward-normal (88) to the hyperplanepartially bounding ithyperplane; id est, partial boundary of halfspacesupporting hyperplane

699∂H −∂H +da supporting hyperplane having outward-normal with respect to set itsupportsa supporting hyperplane having inward-normal with respect to set itsupportsvector of distance-squared ijlower bound on distance-square d ijd ijABABCupper bound on distance-square d ijclosed line segment between points A and Bmatrix multiplication of A and Bclosure of set Cdecomposition orthonormal (1706) page 606, biorthogonal (1683) page 599expansion orthogonal (1716) page 608, biorthogonal (345) page 164vectorcubixquartixfeasible setsolution setnatural ordertightcolumn vector in R nmember of R M×N×Lmember of R M×N×L×Kmost simply, the set of all variable values satisfying all constraints ofan optimization problemmost simply, the set of all optimal solutions to an optimization problem;a subset of the feasible set and not necessarily a single pointwith reference to stacking columns in a vectorization means a vectormade from superposing column 1 on top of column 2 then superposingthe result on column 3 and so on; as in a vector made from entries of themain diagonal δ(A) means taken from left to right and top to bottomwith reference to a bound means a bound that can be met,with reference to an inequality means equality is achievable

700 APPENDIX G. NOTATION AND A FEW DEFINITIONSg ′g ′′→Ydg→Ydg 2∇∆△ ijkIII∅first derivative of possibly multidimensional function with respect toreal argumentsecond derivative with respect to real argumentfirst directional derivative of possibly multidimensional function g indirection Y ∈R K×L (maintains dimensions of g)second directional derivative of g in direction Ygradient from calculus, ∇f is shorthand for ∇ x f(x). ∇f(y) means∇ y f(y) or gradient ∇ x f(y) of f(x) with respect to x evaluated at y ,∇ 2 is second-order gradientdistance scalar, or infinitesimal difference operator, or diagonal matrixtriangle made by vertices i , j , and kRoman numeralidentity matrixindex set, a discrete set of indicesempty set, an implicit member of every set0 real zero0 origin or vector or matrix of zerosO sort-index matrix, or order of magnitude of information required,or computational intensity: O(N) is first order, O(N 2 ) is second,and so on1 real one1 vector of onese imaxvector whose i th entry is 1 (otherwise 0), or i th member of the standardbasis for R n (51)maximum [148,0.1.1] or largest element of a totally ordered set

701maximizexargsup Xarg supf(x)subject tominminimizexinf Xarg inf f(x)find the maximum of a function with respect to variable xargument of operator or function, or variable of optimizationsupremum of totally ordered set X , least upper bound, may or maynot belong to set [148,0.1.1]argument x at supremum of function f ; not necessarily unique or amember of function domainspecifies constraints to an optimization problemminimum [148,0.1.1] or smallest element of a totally ordered setfind the function minimum with respect to variable xinfimum of totally ordered set X , greatest lower bound, may or maynot belong to set [148,0.1.1]argument x at infimum of function f ; not necessarily unique or amember of function domainiff if and only if, necessary and sufficient; also the meaningindiscriminately attached to appearance of the word “if ” in thestatement of a mathematical definition, an esoteric practice worthyof abolition because of ambiguity thus conferredrelintlimsgnroundmodtrrankrelativeinteriorlimitsignum function or signround to nearest integermodulus functionmatrix traceas in rankA, rank of matrix A ; dim R(A)

702 APPENDIX G. NOTATION AND A FEW DEFINITIONSdim dimension, dim R n = n , dim(x∈ R n ) = n , dim R(x∈ R n ) = 1,dim R(A∈ R m×n )= rank(A)affdim affaffine hullaffine dimensioncard cardinality, cardx ∆ = ‖x‖ 0convcenvconeconvex hullconvex envelopeconic hullcontent content of high-dimensional bounded polyhedron, volume in 3dimensions, area in 2, and so oncofdistmatrix of cofactors corresponding to matrix argumentdistance between point or set argumentsvec vectorization of m ×n matrix, Euclidean dimension mn (30)svec vectorization of symmetric n ×n matrix, Euclidean dimensionn(n + 1)/2 (47)dvecvectorization of symmetric hollow n ×n matrix, Euclidean dimensionn(n − 1)/2 (64)(x,y) angle between vectors x and y , or dihedral angle between affinesubsets≽≻generalized inequality; e.g., A ≽ 0 means vector or matrix A must beexpressible in a biorthogonal expansion having nonnegative coordinateswith respect to extreme directions of some implicit pointed closed convexcone K , or comparison to the origin with respect to some implicitpointed closed convex cone, or (when K= S+) n matrix A belongs to thepositive semidefinite cone of symmetric matrices (2.9.0.1), or (whenK= R n +) vector A belongs to the nonnegative orthant (each vector entryis nonnegative,2.3.1.1)strict generalized inequality

703⊁≥not positive definitescalar inequality, greater than or equal to; comparison of scalars,or entrywise comparison of vectors or matrices with respect to R +nonnegative for α∈ R n , α ≽ 0> greater thanpositive for α∈ R n , α ≻ 0 ; id est, no zero entries when with respect tononnegative orthant, no vectors on boundary with respect to pointedclosed convex cone K⌊ ⌋ floor function, ⌊x⌋ is greatest integer not exceeding x| | entrywise absolute value of scalars, vectors, and matricesdetmatrix determinant‖x‖ vector 2-norm or Euclidean norm ‖x‖ 2√n∑‖x‖ l= l |x j | l vector l-normj=1‖x‖ ∞ = max{|x j | ∀j} infinity-norm‖x‖ 2 2 = x T x = 〈x , x〉‖x‖ 1 = 1 T |x| 1-norm, dual infinity-norm‖x‖ 0 0-norm, cardinality of vector x , cardx ≡ ‖x‖ 0 , 0 0 = ∆ 0‖X‖ 2= sup ‖Xa‖ 2 = σ 1 = √ λ(X T X) 1‖a‖=1largest singular value,matrix 2-norm (spectral norm),‖δ(x)‖ 2 = ‖x‖ ∞‖X‖ = ‖X‖ F Frobenius’ matrix norm

704 APPENDIX G. NOTATION AND A FEW DEFINITIONS

Bibliography[1] Suliman Al-Homidan and Henry Wolkowicz. Approximate and exactcompletion problems for Euclidean distance matrices using semidefiniteprogramming, April 2004.orion.math.uwaterloo.ca/∼hwolkowi/henry/reports/edmapr04.ps[2] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions. Linear Algebra and its Applications, 370:1–14, 2003.[3] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions: the case of points in general position. Linear Algebra andits Applications, 397:265–277, 2005.[4] Abdo Y. Alfakih, Amir Khandani, and Henry Wolkowicz. SolvingEuclidean distance matrix completion problems via semidefiniteprogramming. Computational Optimization and Applications,12(1):13–30, January 1999.http://citeseer.ist.psu.edu/alfakih97solving.html[5] Abdo Y. Alfakih and Henry Wolkowicz. On the embeddability ofweighted graphs in Euclidean spaces. Research Report CORR 98-12,Department of Combinatorics and Optimization, University ofWaterloo, May 1998.http://citeseer.ist.psu.edu/alfakih98embeddability.htmlErratum: p.348 herein.[6] Abdo Y. Alfakih and Henry Wolkowicz. Matrix completion problems.In Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe,editors, Handbook of Semidefinite Programming: Theory, Algorithms,and Applications, chapter 18. Kluwer, 2000.705

706 BIBLIOGRAPHY[7] Abdo Y. Alfakih and Henry Wolkowicz. Two theorems on Euclideandistance matrices and Gale transform. Linear Algebra and itsApplications, 340:149–154, 2002.[8] Farid Alizadeh. Combinatorial Optimization with Interior PointMethods and Semi-Definite Matrices. PhD thesis, University ofMinnesota, Minneapolis, Computer Science Department, October1991.[9] Farid Alizadeh. Interior point methods in semidefinite programmingwith applications to combinatorial optimization. SIAM Journal onOptimization, 5(1):13–51, February 1995.[10] Kurt Anstreicher and Henry Wolkowicz. On Lagrangian relaxation ofquadratic matrix constraints. SIAM Journal on Matrix Analysis andApplications, 22(1):41–55, 2000.[11] Howard Anton. Elementary Linear Algebra. Wiley, second edition,1977.[12] James Aspnes, David Goldenberg, and Yang Richard Yang. Onthe computational complexity of sensor network localization. InProceedings of the First International Workshop on AlgorithmicAspects of Wireless Sensor Networks: ALGOSENSORS 2004, volume3121 of Lecture Notes in Computer Science, pages 32–44, Turku,Finland, July 2004. Springer-Verlag.cs-www.cs.yale.edu/homes/aspnes/localization-abstract.html[13] D. Avis and K. Fukuda. A pivoting algorithm for convex hulls andvertex enumeration of arrangements and polyhedra. Discrete andComputational Geometry, 8:295–313, 1992.[14] Christine Bachoc and Frank Vallentin. New upper bounds for kissingnumbers from semidefinite programming. ArXiv.org, October 2006.http://arxiv.org/abs/math/0608426[15] Mihály Bakonyi and Charles R. Johnson. The Euclidean distancematrix completion problem. SIAM Journal on Matrix Analysis andApplications, 16(2):646–654, April 1995.

BIBLIOGRAPHY 707[16] Keith Ball. An elementary introduction to modern convex geometry.In Silvio Levy, editor, Flavors of Geometry, volume 31, chapter 1,pages 1–58. MSRI Publications, 1997.www.msri.org/publications/books/Book31/files/ball.pdf[17] George Phillip Barker. Theory of cones. Linear Algebra and itsApplications, 39:263–291, 1981.[18] George Phillip Barker and David Carlson. Cones of diagonallydominant matrices. Pacific Journal of Mathematics, 57(1):15–32, 1975.[19] George Phillip Barker and James Foran. Self-dual cones in Euclideanspaces. Linear Algebra and its Applications, 13:147–155, 1976.[20] Alexander Barvinok. A Course in Convexity. American MathematicalSociety, 2002.[21] Alexander I. Barvinok. Problems of distance geometry and convexproperties of quadratic maps. Discrete & Computational Geometry,13(2):189–202, 1995.[22] Alexander I. Barvinok. A remark on the rank of positive semidefinitematrices subject to affine constraints. Discrete & ComputationalGeometry, 25(1):23–31, 2001.http://citeseer.ist.psu.edu/304448.html[23] Heinz H. Bauschke and Jonathan M. Borwein. On projectionalgorithms for solving convex feasibility problems. SIAM Review,38(3):367–426, September 1996.[24] Steven R. Bell. The Cauchy Transform, Potential Theory, andConformal Mapping. CRC Press, 1992.[25] Jean Bellissard and Bruno Iochum. Homogeneous and faciallyhomogeneous self-dual cones. Linear Algebra and its Applications,19:1–16, 1978.[26] Adi Ben-Israel. Linear equations and inequalities on finite dimensional,real or complex, vector spaces: A unified theory. Journal ofMathematical Analysis and Applications, 27:367–389, 1969.

708 BIBLIOGRAPHY[27] Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern ConvexOptimization: Analysis, Algorithms, and Engineering Applications.SIAM, 2001.[28] Abraham Berman. Cones, Matrices, and Mathematical Programming,volume 79 of Lecture Notes in Economics and Mathematical Systems.Springer-Verlag, 1973.[29] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific,second edition, 1999.[30] Dimitri P. Bertsekas, Angelia Nedić, and Asuman E. Ozdaglar.Convex Analysis and Optimization. Athena Scientific, 2003.[31] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.[32] Pratik Biswas, Tzu-Chen Liang, Kim-Chuan Toh, Yinyu Ye, andTa-Chung Wang. Semidefinite programming approaches for sensornetwork localization with noisy distance measurements. IEEETransactions on Automation Science and Engineering, 3(4):360–371,October 2006.[33] Pratik Biswas, Tzu-Chen Liang, Ta-Chung Wang, and Yinyu Ye.Semidefinite programming based algorithms for sensor networklocalization, 2005.http://www.stanford.edu/∼yyye/combined rev3.pdf[34] Pratik Biswas, Kim-Chuan Toh, and Yinyu Ye. A distributed SDPapproach for large-scale noisy anchor-free graph realization withapplications to molecular conformation, February 2007.http://www.convexoptimization.com/TOOLS/distmolecule.pdf[35] Pratik Biswas and Yinyu Ye. Semidefinite programming for ad hocwireless sensor network localization, September 2003.http://www.stanford.edu/∼yyye/adhocn2.pdf[36] Leonard M. Blumenthal. On the four-point property. Bulletin of theAmerican Mathematical Society, 39:423–426, 1933.[37] Leonard M. Blumenthal. Theory and Applications of DistanceGeometry. Oxford University Press, 1953.

BIBLIOGRAPHY 709[38] A. W. Bojanczyk and A. Lutoborski. The Procrustes problem fororthogonal Stiefel matrices. SIAM Journal on Scientific Computing,21(4):1291–1304, February 1998. Date is electronic publication:http://citeseer.ist.psu.edu/500185.html[39] Ingwer Borg and Patrick Groenen. Modern Multidimensional Scaling.Springer-Verlag, 1997.[40] Jonathan M. Borwein and Heinz Bauschke.Projection algorithms and monotone operators, 1998.oldweb.cecm.sfu.ca/personal/jborwein/projections4.pdf[41] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis andNonlinear Optimization: Theory and Examples. Springer-Verlag, 2000.[42] Richard Bouldin. The pseudo-inverse of a product. SIAM Journal onApplied Mathematics, 24(4):489–495, June 1973.http://www.convexoptimization.com/TOOLS/Pseudoinverse.pdf[43] Stephen Boyd and Jon Dattorro. Alternating projections, 2003.http://www.stanford.edu/class/ee392o/alt proj.pdf[44] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and VenkataramananBalakrishnan. Linear Matrix Inequalities in System and ControlTheory. SIAM, 1994.[45] Stephen Boyd, Seung-Jean Kim, Lieven Vandenberghe, and ArashHassibi. A tutorial on geometric programming. Optimization andEngineering, 8(1):1389–4420, March 2007.http://www.stanford.edu/∼boyd/reports/gp tutorial.pdf[46] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.Cambridge University Press, 2004.http://www.stanford.edu/∼boyd/cvxbook[47] James P. Boyle and Richard L. Dykstra. A method for findingprojections onto the intersection of convex sets in Hilbert spaces. InR. Dykstra, T. Robertson, and F. T. Wright, editors, Advances in OrderRestricted Statistical Inference, pages 28–47. Springer-Verlag, 1986.

710 BIBLIOGRAPHY[48] Lev M. Brègman. The method of successive projection for finding acommon point of convex sets. Soviet Mathematics, 162(3):487–490,1965. AMS translation of Doklady Akademii Nauk SSSR, 6:688-692.[49] Lev M. Brègman, Yair Censor, Simeon Reich, and YaelZepkowitz-Malachi. Finding the projection of a point onto theintersection of convex sets via projections onto halfspaces, 2003.http://www.optimization-online.org/DB FILE/2003/06/669.pdf[50] Mike Brookes. Matrix reference manual: Matrix calculus, 2002.http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html[51] Michael Carter, Holly Jin, Michael Saunders, and Yinyu Ye.SPASELOC: An adaptive subproblem algorithm for scalable wirelesssensor network localization. SIAM Journal on Optimization,17(4):1102–1128, 2006.http://www.convexoptimization.com/TOOLS/JinSIAM.pdfErratum: p.259 herein.[52] Lawrence Cayton and Sanjoy Dasgupta. Robust Euclidean embedding.In Proceedings of the 23 rd International Conference on MachineLearning - ICML 2006, Pittsburgh, Pennsylvania USA, 2006.icml2006.org/icml documents/camera-ready/022 Robust Euclidean Emb.pdf[53] Yves Chabrillac and Jean-Pierre Crouzeix. Definiteness andsemidefiniteness of quadratic forms revisited. Linear Algebra and itsApplications, 63:283–292, 1984.[54] Manmohan K. Chandraker, Sameer Agarwal, Fredrik Kahl, DavidNistér, and David J. Kriegman. Autocalibration via rank-constrainedestimation of the absolute quadric. In Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition, June 2007.http://vision.ucsd.edu/∼manu/cvpr07 quadric.pdf[55] Chi-Tsong Chen. Linear System Theory and Design. Oxford UniversityPress, 1999.[56] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets.Proceedings of the American Mathematical Society, 10:448–450, 1959.

BIBLIOGRAPHY 711[57] Steven Chu. Autobiography from Les Prix Nobel, 1997.nobelprize.org/physics/laureates/1997/chu-autobio.html[58] John B. Conway. A Course in Functional Analysis. Springer-Verlag,second edition, 1990.[59] Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone. The LinearComplementarity Problem. Academic Press, 1992.[60] G. M. Crippen and T. F. Havel. Distance Geometry and MolecularConformation. Wiley, 1988.[61] Frank Critchley. Multidimensional scaling: a critical examination andsome new proposals. PhD thesis, University of Oxford, Nuffield College,1980.[62] Frank Critchley. On certain linear mappings between inner-productand squared-distance matrices. Linear Algebra and its Applications,105:91–107, 1988.[63] Joachim Dahl, Bernard H. Fleury, and Lieven Vandenberghe.Approximate maximum-likelihood estimation using semidefiniteprogramming. In Proceedings of the IEEE International Conference onAcoustics, Speech, and Signal Processing, volume VI, pages 721–724,April 2003.www.ee.ucla.edu/faculty/papers/vandenberghe ICASSP03 apr03.pdf[64] George B. Dantzig. Linear Programming and Extensions.Princeton University Press, 1998.[65] Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, andGert R. G. Lanckriet. A direct formulation for sparse PCA usingsemidefinite programming. In Lawrence K. Saul, Yair Weiss, andLéon Bottou, editors, Advances in Neural Information ProcessingSystems 17, pages 41–48. MIT Press, Cambridge, MA, 2005.http://books.nips.cc/papers/files/nips17/NIPS2004 0645.pdf[66] Jon Dattorro. Constrained least squares fit of a filter bank to anarbitrary magnitude frequency response, 1991.http://www.stanford.edu/∼dattorro/Hearing.htm

712 BIBLIOGRAPHY[67] Joel Dawson, Stephen Boyd, Mar Hershenson, and Thomas Lee.Optimal allocation of local feedback in multistage amplifiers viageometric programming. IEEE Circuits and Systems I: FundamentalTheory and Applications, 2001.[68] Jan de Leeuw. Fitting distances by least squares. UCLA StatisticsSeries Technical Report No. 130, Interdivisional Program in Statistics,UCLA, Los Angeles, CA, 1993.http://citeseer.ist.psu.edu/deleeuw93fitting.html[69] Jan de Leeuw. Multidimensional scaling. In International Encyclopediaof the Social & Behavioral Sciences. Elsevier, 2001.http://preprints.stat.ucla.edu/274/274.pdf[70] Jan de Leeuw. Unidimensional scaling. In Brian S. Everitt andDavid C. Howell, editors, Encyclopedia of Statistics in BehavioralScience, volume 4, pages 2095–2097. Wiley, 2005.http://repositories.cdlib.org/uclastat/papers/2004032701[71] Jan de Leeuw and Willem Heiser. Theory of multidimensional scaling.In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics,volume 2, chapter 13, pages 285–316. North-Holland Publishing,Amsterdam, 1982.[72] Erik D. Demaine, Francisco Gomez-Martin, Henk Meijer, DavidRappaport, Perouz Taslakian, Godfried T. Toussaint, Terry Winograd,and David R. Wood. The distance geometry of music, 2007.http://arxiv.org/abs/0705.4085[73] Frank Deutsch. Best Approximation in Inner Product Spaces.Springer-Verlag, 2001.[74] Frank Deutsch and Hein Hundal. The rate of convergence ofDykstra’s cyclic projections algorithm: The polyhedral case. NumericalFunctional Analysis and Optimization, 15:537–565, 1994.[75] Frank Deutsch, John H. McCabe, and George M. Phillips. Somealgorithms for computing best approximations from convex cones.SIAM Journal on Numerical Analysis, 12(3):390–403, June 1975.

BIBLIOGRAPHY 713[76] Frank R. Deutsch and Peter H. Maserick. Applications of theHahn-Banach theorem in approximation theory. SIAM Review,9(3):516–530, July 1967.[77] Michel Marie Deza and Monique Laurent. Geometry of Cuts andMetrics. Springer-Verlag, 1997.[78] Carolyn Pillers Dobler. A matrix approach to finding a set of generatorsand finding the polar (dual) of a class of polyhedral cones. SIAMJournal on Matrix Analysis and Applications, 15(3):796–803, July1994. Erratum: p.182 herein.[79] Elizabeth D. Dolan, Robert Fourer, Jorge J. Moré, and Todd S.Munson. Optimization on the NEOS server. SIAM News, 35(6):4,8,9,August 2002.[80] Bruce Randall Donald. 3-D structure in chemistry and molecularbiology, 1998.http://www.cs.dartmouth.edu/∼brd/Teaching/Bio[81] David L. Donoho. Compressed sensing. IEEE Transactions onInformation Theory, 52(4):1289–1306, April 2006.www-stat.stanford.edu/∼donoho/Reports/2004/CompressedSensing091604.pdf[82] David L. Donoho, Michael Elad, and Vladimir Temlyakov. Stablerecovery of sparse overcomplete representations in the presence ofnoise, February 2004.www-stat.stanford.edu/∼donoho/Reports/2004/StableSparse-Donoho-etal.pdf[83] Miguel Nuno Ferreira Fialho dos Anjos. New Convex Relaxations forthe Maximum Cut and VLSI Layout Problems. PhD thesis, Universityof Waterloo, Ontario, Canada, Department of Combinatorics andOptimization, 2001.cheetah.vlsi.uwaterloo.ca/∼anjos/MFAnjosPhDThesis.pdf[84] Richard L. Dykstra. An algorithm for restricted least squaresregression. Journal of the American Statistical Association,78(384):837–842, 1983.

714 BIBLIOGRAPHY[85] Carl Eckart and Gale Young. The approximation of one matrix byanother of lower rank. Psychometrika, 1(3):211–218, September 1936.www.convexoptimization.com/TOOLS/eckart&young.1936.pdf[86] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometryof algorithms with orthogonality constraints. SIAM Journal on MatrixAnalysis and Applications, 20(2):303–353, 1998.[87] John J. Edgell. Graphics calculator applications on 4-D constructs,1996.http://archives.math.utk.edu/ICTCM/EP-9/C47/pdf/paper.pdf[88] Ivar Ekeland and Roger Témam. Convex Analysis and VariationalProblems. SIAM, 1999.[89] Julius Farkas. Theorie der einfachen Ungleichungen. Journal für diereine und angewandte Mathematik, 124:1–27, 1902.dz-srv1.sub.uni-goettingen.de/sub/digbib/loader?ht=VIEW&did=D261364[90] Maryam Fazel. Matrix Rank Minimization with Applications. PhDthesis, Stanford University, Department of Electrical Engineering,March 2002.http://www.cds.caltech.edu/∼maryam/thesis-final.pdf[91] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rankminimization heuristic with application to minimum order systemapproximation. In Proceedings of the American Control Conference,volume 6, pages 4734–4739. American Automatic Control Council(AACC), June 2001.http://www.cds.caltech.edu/∼maryam/nucnorm.html[92] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Log-detheuristic for matrix rank minimization with applications to Hankel andEuclidean distance matrices. In Proceedings of the American ControlConference. American Automatic Control Council (AACC), June 2003.http://www.cds.caltech.edu/∼maryam/acc03 final.pdf[93] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Rankminimization and applications in system theory. In Proceedings of theAmerican Control Conference. American Automatic Control Council

BIBLIOGRAPHY 715(AACC), June 2004.http://www.cds.caltech.edu/∼maryam/acc04-tutorial.pdf[94] J. T. Feddema, R. H. Byrne, J. J. Harrington, D. M. Kilman,C. L. Lewis, R. D. Robinett, B. P. Van Leeuwen, and J. G. Young.Advanced mobile networking, sensing, and controls. Sandia ReportSAND2005-1661, Sandia National Laboratories, Albuquerque,New Mexico, March 2005.www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2005/051661.pdf[95] César Fernández, Thomas Szyperski, Thierry Bruyère, Paul Ramage,Egon Mösinger, and Kurt Wüthrich. NMR solution structure ofthe pathogenesis-related protein P14a. Journal of Molecular Biology,266:576–593, 1997.[96] Richard Phillips Feynman, Robert B. Leighton, and Matthew L. Sands.The Feynman Lectures on Physics: Commemorative Issue, volume I.Addison-Wesley, 1989.[97] Paul Finsler. Über das Vorkommen definiter und semidefiniter Formenin Scharen quadratischer Formen. Commentarii Mathematici Helvetici,9:188–192, 1937.[98] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interiormethods for nonlinear optimization. SIAM Review, 44(4):525–597,2002.[99] Norbert Gaffke and Rudolf Mathar. A cyclic projection algorithm viaduality. Metrika, 36:29–54, 1989.[100] Jérôme Galtier. Semi-definite programming as a simple extension tolinear programming: convex optimization with queueing, equity andother telecom functionals. In 3ème Rencontres Francophones sur lesAspects Algorithmiques des Télécommunications (AlgoTel). INRIA,France, 2001.www-sop.inria.fr/mascotte/Jerome.Galtier/misc/Galtier01b.pdf[101] Laurent El Ghaoui. EE 227A: Convex Optimization and Applications,Lecture 11 − October 3. University of California, Berkeley, Fall 2006.Scribe: Nikhil Shetty.http://www.convexoptimization.com/TOOLS/Ghaoui.pdf

716 BIBLIOGRAPHY[102] Laurent El Ghaoui and Silviu-Iulian Niculescu, editors. Advances inLinear Matrix Inequality Methods in Control. SIAM, 2000.[103] Philip E. Gill, Walter Murray, and Margaret H. Wright. NumericalLinear Algebra and Optimization, volume 1. Addison-Wesley, 1991.[104] Philip E. Gill, Walter Murray, and Margaret H. Wright. PracticalOptimization. Academic Press, 1999.[105] James Gleik. Isaac Newton. Pantheon Books, 2003.[106] W. Glunt, Tom L. Hayden, S. Hong, and J. Wells. An alternatingprojection algorithm for computing the nearest Euclidean distancematrix. SIAM Journal on Matrix Analysis and Applications,11(4):589–600, 1990.[107] K. Goebel and W. A. Kirk. Topics in Metric Fixed Point Theory.Cambridge University Press, 1990.[108] Michel X. Goemans and David P. Williamson. Improved approximationalgorithms for maximum cut and satisfiability problems usingsemidefinite programming. Journal of the Association for ComputingMachinery, 42(6):1115–1145, November 1995.http://www-math.mit.edu/∼goemans/maxcut-jacm.pdf[109] D. Goldfarb and K. Scheinberg. Interior point trajectoriesin semidefinite programming. SIAM Journal on Optimization,8(4):871–886, 1998.[110] Gene H. Golub and Charles F. Van Loan. Matrix Computations. JohnsHopkins, third edition, 1996.[111] P. Gordan. Ueber die auflösung linearer gleichungen mit reellencoefficienten. Mathematische Annalen, 6:23–28, 1873.[112] John Clifford Gower. Euclidean distance geometry. The MathematicalScientist, 7:1–14, 1982.[113] John Clifford Gower. Properties of Euclidean and non-Euclideandistance matrices. Linear Algebra and its Applications, 67:81–97, 1985.

BIBLIOGRAPHY 717[114] John Clifford Gower and Garmt B. Dijksterhuis. Procrustes Problems.Oxford University Press, 2004.[115] John Clifford Gower and David J. Hand. Biplots. Chapman & Hall,1996.[116] Alexander Graham. Kronecker Products and Matrix Calculus: withApplications. Wiley, 1981.[117] Michael Grant, Stephen Boyd, and Yinyu Ye.cvx: Matlab software for disciplined convex programming, 2007.http://www.stanford.edu/∼boyd/cvx[118] Robert M. Gray. Toeplitz and circulant matrices: A review, 2002.http://www-ee.stanford.edu/∼gray/toeplitz.pdf[119] T. N. E. Greville. Note on the generalized inverse of a matrix product.SIAM Review, 8:518–521, 1966.[120] Rémi Gribonval and Morten Nielsen. Highly sparse representationsfrom dictionaries are unique and independent of the sparsenessmeasure, October 2003.http://www.math.aau.dk/research/reports/R-2003-16.pdf[121] Rémi Gribonval and Morten Nielsen. Sparse representations in unionsof bases. IEEE Transactions on Information Theory, 49(12):1320–1325,December 2003.http://www.math.aau.dk/∼mnielsen/papers.htm[122] Karolos M. Grigoriadis and Eric B. Beran. Alternating projectionalgorithms for linear matrix inequalities problems with rankconstraints. In Laurent El Ghaoui and Silviu-Iulian Niculescu, editors,Advances in Linear Matrix Inequality Methods in Control, chapter 13,pages 251–267. SIAM, 2000.[123] Peter Gritzmann and Victor Klee. On the complexity of some basicproblems in computational convexity: II. volume and mixed volumes.Technical Report TR:94-31, DIMACS, Rutgers University, 1994.ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1994/94-31.ps

718 BIBLIOGRAPHY[124] Peter Gritzmann and Victor Klee. On the complexity of somebasic problems in computational convexity: II. volume and mixedvolumes. In T. Bisztriczky, P. McMullen, R. Schneider, and A. IvićWeiss, editors, Polytopes: Abstract, Convex and Computational, pages373–466. Kluwer Academic Publishers, 1994.[125] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projectionsfor finding the common point of convex sets. U.S.S.R. ComputationalMathematics and Mathematical Physics, 7(6):1–24, 1967.[126] Osman Güler and Yinyu Ye. Convergence behavior of interior-pointalgorithms. Mathematical Programming, 60(2):215–228, 1993.[127] P. R. Halmos. Positive approximants of operators. Indiana UniversityMathematics Journal, 21:951–960, 1972.[128] Shih-Ping Han. A successive projection method. MathematicalProgramming, 40:1–14, 1988.[129] Godfrey H. Hardy, John E. Littlewood, and George Pólya. Inequalities.Cambridge University Press, second edition, 1952.[130] Arash Hassibi and Mar Hershenson. Automated optimal design ofswitched-capacitor filters. Design Automation and Test in EuropeConference, 2001.[131] Johan Håstad. Some optimal inapproximability results, 1999.http://citeseer.ist.psu.edu/280448.html[132] Timothy F. Havel and Kurt Wüthrich. An evaluation of the combineduse of nuclear magnetic resonance and distance geometry for thedetermination of protein conformations in solution. Journal ofMolecular Biology, 182:281–294, 1985.[133] Tom L. Hayden and Jim Wells. Approximation by matrices positivesemidefinite on a subspace. Linear Algebra and its Applications,109:115–130, 1988.[134] Tom L. Hayden, Jim Wells, Wei-Min Liu, and Pablo Tarazaga.The cone of distance matrices. Linear Algebra and its Applications,144:153–169, 1991.

BIBLIOGRAPHY 719[135] Uwe Helmke and John B. Moore. Optimization and DynamicalSystems. Springer-Verlag, 1994.[136] Bruce Hendrickson. Conditions for unique graph realizations. SIAMJournal on Computing, 21(1):65–84, February 1992.[137] T. Herrmann, Peter Güntert, and Kurt Wüthrich. Protein NMRstructure determination with automated NOE assignment using thenew software CANDID and the torsion angle dynamics algorithmDYANA. Journal of Molecular Biology, 319(1):209–227, May 2002.[138] Mar Hershenson. Design of pipeline analog-to-digital converters viageometric programming. International Conference on Computer AidedDesign - ICCAD, 2002.[139] Mar Hershenson. Efficient description of the design space of analogcircuits. 40 th Design Automation Conference, 2003.[140] Mar Hershenson, Stephen Boyd, and Thomas Lee. Optimal design ofa CMOS OpAmp via geometric programming. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2001.[141] Mar Hershenson, Dave Colleran, Arash Hassibi, and Navraj Nandra.Synthesizable full custom mixed-signal IP. Electronics DesignAutomation Consortium - EDA, 2002.[142] Mar Hershenson, Sunderarajan S. Mohan, Stephen Boyd, and ThomasLee. Optimization of inductor circuits via geometric programming,1999.[143] Mar Hershenson and Xiling Shen. A new analog design flow. Cadence,2002.[144] Nick Higham. Matrix Procrustes problems, 1995.http://www.ma.man.ac.uk/∼higham/talksLecture notes.[145] Richard D. Hill and Steven R. Waters. On the cone of positivesemidefinite matrices. Linear Algebra and its Applications, 90:81–88,1987.

720 BIBLIOGRAPHY[146] Jean-Baptiste Hiriart-Urruty. Ensembles de Tchebychev vs. ensemblesconvexes: l’état de la situation vu via l’analyse convexe non lisse.Annales des Sciences Mathématiques du Québec, 22(1):47–62, 1998.[147] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysisand Minimization Algorithms II: Advanced Theory and BundleMethods. Springer-Verlag, second edition, 1996.[148] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Fundamentalsof Convex Analysis. Springer-Verlag, 2001.[149] Alan J. Hoffman and Helmut W. Wielandt. The variation of thespectrum of a normal matrix. Duke Mathematical Journal, 20:37–40,1953.[150] Roger A. Horn and Charles R. Johnson. Matrix Analysis. CambridgeUniversity Press, 1987.[151] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis.Cambridge University Press, 1994.[152] Alston S. Householder. The Theory of Matrices in Numerical Analysis.Dover, 1975.[153] Hong-Xuan Huang, Zhi-An Liang, and Panos M. Pardalos. Someproperties for the Euclidean distance matrix and positive semidefinitmatrix completion problems. Journal of Global Optimization,25(1):3–21, 2003.[154] 5W Infographic. Wireless 911. Technology Review, 107(5):78–79, June2004. http://www.technologyreview.com[155] Nathan Jacobson. Lectures in Abstract Algebra, vol. II - Linear Algebra.Van Nostrand, 1953.[156] Viren Jain and Lawrence K. Saul. Exploratory analysis andvisualization of speech and music by locally linear embedding. InProceedings of the IEEE International Conference on Acoustics,Speech, and Signal Processing, volume 3, pages 984–987, 2004.http://www.cis.upenn.edu/∼lsaul/papers/lle icassp04.pdf

BIBLIOGRAPHY 721[157] Joakim Jaldén, Cristoff Martin, and Björn Ottersten. Semidefiniteprogramming for detection in linear systems − Optimality conditionsand space-time decoding. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, volume VI,April 2003.http://www.s3.kth.se/signal/reports/03/IR-S3-SB-0309.pdf[158] Florian Jarre. Convex analysis on symmetric matrices. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 2. Kluwer, 2000.[159] Charles R. Johnson and Pablo Tarazaga. Connections between the realpositive semidefinite and distance matrix completion problems. LinearAlgebra and its Applications, 223/224:375–391, 1995.[160] Charles R. Johnson and Pablo Tarazaga. Binary representation ofnormalized symmetric and correlation matrices. Linear and MultilinearAlgebra, 52(5):359–366, 2004.[161] George B. Thomas, Jr. Calculus and Analytic Geometry.Addison-Wesley, fourth edition, 1972.[162] Thomas Kailath. Linear Systems. Prentice-Hall, 1980.[163] Tosio Kato. Perturbation Theory for Linear Operators.Springer-Verlag, 1966.[164] Paul J. Kelly and Norman E. Ladd. Geometry. Scott, Foresman andCompany, 1965.[165] Yoonsoo Kim and Mehran Mesbahi. On the rank minimizationproblem. In Proceedings of the American Control Conference, volume 3,pages 2015–2020. American Automatic Control Council (AACC), June2004.[166] Ron Kimmel. Numerical Geometry of Images: Theory, Algorithms,and Applications. Springer-Verlag, 2003.[167] Erwin Kreyszig. Introductory Functional Analysis with Applications.Wiley, 1989.

722 BIBLIOGRAPHY[168] Jean B. Lasserre. A new Farkas lemma for positive semidefinitematrices. IEEE Transactions on Automatic Control, 40(6):1131–1133,June 1995.[169] Jean B. Lasserre and Eduardo S. Zeron. A Laplace transform algorithmfor the volume of a convex polytope. Journal of the Association forComputing Machinery, 48(6):1126–1140, November 2001.[170] Jean B. Lasserre and Eduardo S. Zeron. A new algorithm for thevolume of a convex polytope. arXiv.org, June 2001.http://arxiv.org/abs/math/0106168[171] Monique Laurent. A connection between positive semidefinite andEuclidean distance matrix completion problems. Linear Algebra andits Applications, 273:9–22, 1998.[172] Monique Laurent. A tour d’horizon on positive semidefinite andEuclidean distance matrix completion problems. In Panos M. Pardalosand Henry Wolkowicz, editors, Topics in Semidefinite andInterior-Point Methods, pages 51–76. American Mathematical Society,1998.[173] Monique Laurent. Matrix completion problems. In Christodoulos A.Floudas and Panos M. Pardalos, editors, Encyclopedia of Optimization,volume III (Interior - M), pages 221–229. Kluwer, 2001.http://homepages.cwi.nl/∼monique/files/opt.ps[174] Monique Laurent and Svatopluk Poljak. On a positive semidefiniterelaxation of the cut polytope. Linear Algebra and its Applications,223/224:439–461, 1995.[175] Monique Laurent and Svatopluk Poljak. On the facial structure ofthe set of correlation matrices. SIAM Journal on Matrix Analysis andApplications, 17(3):530–547, July 1996.[176] Monique Laurent and Franz Rendl. Semidefinite programming andinteger programming. Optimization Online, 2002.http://www.optimization-online.org/DB HTML/2002/12/585.html[177] Charles L. Lawson and Richard J. Hanson. Solving Least SquaresProblems. SIAM, 1995.

BIBLIOGRAPHY 723[178] Jung Rye Lee. The law of cosines in a tetrahedron. Journal of the KoreaSociety of Mathematical Education Series B: The Pure and AppliedMathematics, 4(1):1–6, 1997.[179] Vladimir L. Levin. Quasi-convex functions and quasi-monotoneoperators. Journal of Convex Analysis, 2(1/2):167–172, 1995.[180] Adrian S. Lewis. Eigenvalue-constrained faces. Linear Algebra and itsApplications, 269:159–181, 1998.[181] Anhua Lin. Projection algorithms in nonlinear programming. PhDthesis, Johns Hopkins University, 2003.[182] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and HervéLebret. Applications of second-order cone programming. LinearAlgebra and its Applications, 284:193–228, November 1998. SpecialIssue on Linear Algebra in Control, Signals and Image Processing.http://www.stanford.edu/∼boyd/socp.html[183] David G. Luenberger. Optimization by Vector Space Methods. Wiley,1969.[184] David G. Luenberger. Introduction to Dynamic Systems: Theory,Models, & Applications. Wiley, 1979.[185] David G. Luenberger. Linear and Nonlinear Programming.Addison-Wesley, second edition, 1989.[186] Zhi-Quan Luo, Jos F. Sturm, and Shuzhong Zhang. Superlinearconvergence of a symmetric primal-dual path following algorithm forsemidefinite programming. SIAM Journal on Optimization, 8(1):59–81,1998.[187] Zhi-Quan Luo and Wei Yu. An introduction to convex optimizationfor communications and signal processing. IEEE Journal On SelectedAreas In Communications, 24(8):1426–1438, August 2006.[188] K. V. Mardia. Some properties of classical multi-dimensionalscaling. Communications in Statistics: Theory and Methods,A7(13):1233–1241, 1978.

724 BIBLIOGRAPHY[189] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis.Academic Press, 1979.[190] Jerrold E. Marsden and Michael J. Hoffman. Elementary ClassicalAnalysis. Freeman, second edition, 1995.[191] Rudolf Mathar. The best Euclidean fit to a given distance matrix inprescribed dimensions. Linear Algebra and its Applications, 67:1–6,1985.[192] Rudolf Mathar. Multidimensionale Skalierung. B. G. TeubnerStuttgart, 1997.[193] Mehran Mesbahi and G. P. Papavassilopoulos. On the rankminimization problem over a positive semi-definite linear matrixinequality. IEEE Transactions on Automatic Control, 42(2):239–243,1997.[194] Mehran Mesbahi and G. P. Papavassilopoulos. Solving a class of rankminimization problems via semi-definite programs, with applicationsto the fixed order output feedback synthesis. In Proceedings ofthe American Control Conference, volume 1, pages 77–80. AmericanAutomatic Control Council (AACC), June 1997.[195] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Simple accurate expressions for planar spiral inductances. IEEEJournal of Solid-State Circuits, 1999.[196] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Bandwidth extension in CMOS with optimized on-chip inductors.IEEE Journal of Solid-State Circuits, 2000.[197] E. H. Moore. On the reciprocal of the general algebraic matrix. Bulletinof the American Mathematical Society, 26:394–395, 1920. Abstract.[198] B. S. Mordukhovich. Maximum principle in the problem of timeoptimal response with nonsmooth constraints. Journal of AppliedMathematics and Mechanics, 40:960–969, 1976.[199] Jean-Jacques Moreau. Décomposition orthogonale d’un espaceHilbertien selon deux cônes mutuellement polaires. Comptes Rendusde l’Académie des Sciences, Paris, 255:238–240, 1962.

BIBLIOGRAPHY 725[200] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linearinequalities. Canadian Journal of Mathematics, 6:393–404, 1954.[201] Neil Muller, Lourenço Magaia, and B. M. Herbst. Singular valuedecomposition, eigenfaces, and 3D reconstructions. SIAM Review,46(3):518–545, September 2004.[202] Katta G. Murty and Feng-Tien Yu. Linear Complementarity, Linearand Nonlinear Programming. Heldermann Verlag, Internet edition,1988.ioe.engin.umich.edu/people/fac/books/murty/linear complementarity webbook[203] Oleg R. Musin. An extension of Delsarte’s method. The kissingproblem in three and four dimensions. ArXiv.org, December 2005.http://arxiv.org/abs/math.MG/0512649[204] Navraj Nandra. Synthesizable analog IP. IP Based Design Workshop,2002.[205] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming.McGraw-Hill, 1996.[206] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point PolynomialAlgorithms in Convex Programming. SIAM, 1994.[207] L. Nirenberg. Functional Analysis. New York University, New York,1961. Lectures given in 1960-1961, notes by Lesley Sibner.[208] Jorge Nocedal and Stephen J. Wright. Numerical Optimization.Springer-Verlag, 1999.[209] Royal Swedish Academy of Sciences. Nobel prize in chemistry, 2002.http://nobelprize.org/chemistry/laureates/2002/public.html[210] C. S. Ogilvy. Excursions in Geometry. Dover, 1990. Citation:Proceedings of the CUPM Geometry Conference, MathematicalAssociation of America, No.16 (1967), p.21.[211] Önder Filiz and Aylin Yener. Rank constrained temporal-spatial filtersfor CDMA systems with base station antenna arrays. In Proceedingsof the Johns Hopkins University Conference on Information Sciences

726 BIBLIOGRAPHYand Systems, March 2003.labs.ee.psu.edu/labs/wcan/Publications/filiz-yener ciss03.pdf[212] Robert Orsi, Uwe Helmke, and John B. Moore. A Newton-like methodfor solving rank constrained linear matrix inequalities, April 2006.users.rsise.anu.edu.au/∼robert/publications/OHM06 extended version.pdf[213] Brad Osgood. Notes on the Ahlfors mapping of a multiply connecteddomain, 2000.www-ee.stanford.edu/∼osgood/Ahlfors-Bergman-Szego.pdf[214] M. L. Overton and R. S. Womersley. On the sum of the largesteigenvalues of a symmetric matrix. SIAM Journal on Matrix Analysisand Applications, 13:41–45, 1992.[215] Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems,With Applications. PhD thesis, Department of Mathematics, Universityof Manchester, October 1993.[216] Panos M. Pardalos and Henry Wolkowicz, editors. Topics inSemidefinite and Interior-Point Methods. American MathematicalSociety, 1998.[217] Gábor Pataki. Cone-LP’s and semidefinite programs: Geometry anda simplex-type method. In William H. Cunningham, S. ThomasMcCormick, and Maurice Queyranne, editors, Integer Programmingand Combinatorial Optimization, Proceedings of the 5th InternationalIPCO Conference, Vancouver, British Columbia, Canada, June 3-5,1996, volume 1084 of Lecture Notes in Computer Science, pages162–174. Springer-Verlag, 1996.[218] Gábor Pataki. On the rank of extreme matrices in semidefiniteprograms and the multiplicity of optimal eigenvalues. Mathematicsof Operations Research, 23(2):339–358, 1998.http://citeseer.ist.psu.edu/pataki97rank.htmlErratum: p.557 herein.[219] Gábor Pataki. The geometry of semidefinite programming. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 3. Kluwer, 2000.

BIBLIOGRAPHY 727[220] Teemu Pennanen and Jonathan Eckstein. Generalized Jacobiansof vector-valued convex functions. Technical Report RRR 6-97,RUTCOR, Rutgers University, May 1997.http://rutcor.rutgers.edu/pub/rrr/reports97/06.ps[221] Roger Penrose. A generalized inverse for matrices. In Proceedings ofthe Cambridge Philosophical Society, volume 51, pages 406–413, 1955.[222] Chris Perkins. A convergence analysis of Dykstra’s algorithm forpolyhedral sets. SIAM Journal on Numerical Analysis, 40(2):792–804,2002.[223] Florian Pfender and Günter M. Ziegler. Kissing numbers, spherepackings, and some unexpected proofs. Notices of the AmericanMathematical Society, 51(8):873–883, September 2004.http://www.convexoptimization.com/TOOLS/Pfender.pdf[224] Benjamin Recht. Convex Modeling with Priors. PhD thesis,Massachusetts Institute of Technology, Media Arts and SciencesDepartment, 2006.www.media.mit.edu/physics/publications/theses/06.06.Recht.pdf[225] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteedminimum-rank solutions of linear matrix equations via nuclear normminimization. Optimization Online, 2007.www.optimization-online.org/DB HTML/2007/06/1707.html[226] Franz Rendl, Giovanni Rinaldi, and Angelika Wiegele. SolvingMax-Cut to optimality by intersecting semidefinite and polyhedralrelaxations, May 2007.www.optimization-online.org/DB HTML/2007/05/1659.html[227] Alex Reznik. Problem 3.41, from Sergio Verdú. Multiuser Detection.Cambridge University Press, 1998.www.ee.princeton.edu/∼verdu/mud/solutions/3/3.41.areznik.pdf ,2001.[228] Sara Robinson. Hooked on meshing, researcher creates award-winningtriangulation program. SIAM News, 36(9), November 2003.http://www.siam.org/news/news.php?id=370

728 BIBLIOGRAPHY[229] R. Tyrrell Rockafellar. Conjugate Duality and Optimization. SIAM,1974.[230] R. Tyrrell Rockafellar. Lagrange multipliers in optimization.SIAM-American Mathematical Society Proceedings, 9:145–168, 1976.[231] R. Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAMReview, 35(2):183–238, June 1993.[232] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,1997. First published in 1970.[233] C. K. Rushforth. Signal restoration, functional analysis, and Fredholmintegral equations of the first kind. In Henry Stark, editor, ImageRecovery: Theory and Application, chapter 1, pages 1–27. AcademicPress, 1987.[234] Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control.Springer-Verlag, 1999.[235] Uwe Schäfer. A linear complementarity problem with a P-matrix.SIAM Review, 46(2):189–201, June 2004.[236] Isaac J. Schoenberg. Remarks to Maurice Fréchet’s article “Sur ladéfinition axiomatique d’une classe d’espace distanciés vectoriellementapplicable sur l’espace de Hilbert”. Annals of Mathematics,36(3):724–732, July 1935.http://www.convexoptimization.com/TOOLS/Schoenberg2.pdf[237] Isaac J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44:522–536, 1938.http://www.convexoptimization.com/TOOLS/Schoenberg3.pdf[238] Peter H. Schönemann. A generalized solution of the orthogonalProcrustes problem. Psychometrika, 31(1):1–10, 1966.[239] Peter H. Schönemann, Tim Dorcey, and K. Kienapple. Subadditiveconcatenation in dissimilarity judgements. Perception andPsychophysics, 38:1–17, 1985.

BIBLIOGRAPHY 729[240] Joshua A. Singer. Log-Penalized Linear Regression. PhD thesis,Stanford University, Department of Electrical Engineering, June 2004.http://www.convexoptimization.com/TOOLS/Josh.pdf[241] Anthony Man-Cho So and Yinyu Ye. Theory of semidefiniteprogramming for sensor network localization, 2004.http://www.stanford.edu/∼yyye/local-theory.pdf[242] Anthony Man-Cho So and Yinyu Ye. A semidefinite programmingapproach to tensegrity theory and realizability of graphs, 2005.http://www.stanford.edu/∼yyye/d-real-soda2.pdf[243] Steve Spain. Polaris Wireless corp., 2005.http://www.polariswireless.comPersonal communication.[244] Wolfram Stadler, editor. Multicriteria Optimization in Engineering andin the Sciences. Springer-Verlag, 1988.[245] Henry Stark, editor. Image Recovery: Theory and Application.Academic Press, 1987.[246] Henry Stark. Polar, spiral, and generalized sampling and interpolation.In Robert J. Marks II, editor, Advanced Topics in Shannon Samplingand Interpolation Theory, chapter 6, pages 185–218. Springer-Verlag,1993.[247] Willi-Hans Steeb. Matrix Calculus and Kronecker Product withApplications and C++ Programs. World Scientific Publishing Co.,1997.[248] Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory.Academic Press, 1990.[249] Josef Stoer and Christoph Witzgall. Convexity and Optimization inFinite Dimensions I. Springer-Verlag, 1970.[250] Gilbert Strang. Course 18.06: Linear algebra, 2004.http://web.mit.edu/18.06/www[251] Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace,third edition, 1988.

730 BIBLIOGRAPHY[252] Gilbert Strang. Calculus. Wellesley-Cambridge Press, 1992.[253] Gilbert Strang. Introduction to Linear Algebra. Wellesley-CambridgePress, second edition, 1998.[254] S. Straszewicz. Über exponierte Punkte abgeschlossener Punktmengen.Fund. Math., 24:139–143, 1935.[255] Jos F. Sturm. SeDuMi (self-dual minimization).Software for optimization over symmetric cones, 2005.http://sedumi.mcmaster.ca[256] Jos F. Sturm and Shuzhong Zhang. On cones of nonnegative quadraticfunctions. Optimization Online, April 2001.www.optimization-online.org/DB HTML/2001/05/324.html[257] George P. H. Styan. A review and some extensions of Takemura’sgeneralizations of Cochran’s theorem. Technical Report 56, StanfordUniversity, Department of Statistics, September 1982.[258] George P. H. Styan and Akimichi Takemura. Rank additivityand matrix polynomials. Technical Report 57, Stanford University,Department of Statistics, September 1982.[259] Chen Han Sung and Bit-Shun Tam. A study of projectionally exposedcones. Linear Algebra and its Applications, 139:225–252, 1990.[260] Yoshio Takane. On the relations among four methods ofmultidimensional scaling. Behaviormetrika, 4:29–43, 1977.http://takane.brinkster.net/Yoshio/p008.pdf[261] Akimichi Takemura. On generalizations of Cochran’s theorem andprojection matrices. Technical Report 44, Stanford University,Department of Statistics, August 1980.[262] Peng Hui Tan and Lars K. Rasmussen. The application of semidefiniteprogramming for detection in CDMA. IEEE Journal on Selected Areasin Communications, 19(8), August 2001.[263] Pablo Tarazaga. Faces of the cone of Euclidean distance matrices:Characterizations, structure and induced geometry. Linear Algebraand its Applications, 408:1–13, 2005.

BIBLIOGRAPHY 731[264] Warren S. Torgerson. Theory and Methods of Scaling. Wiley, 1958.[265] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra.SIAM, 1997.[266] Michael W. Trosset. Extensions of classical multidimensional scaling:Computational theory. www.math.wm.edu/∼trosset/r.mds.html ,2001. Revision of technical report entitled “Computing distancesbetween convex sets and subsets of the positive semidefinite matrices”first published in 1997.[267] Michael W. Trosset. Applications of multidimensional scaling tomolecular conformation. Computing Science and Statistics, 29:148–152,1998.[268] Michael W. Trosset. Distance matrix completion by numericaloptimization. Computational Optimization and Applications,17(1):11–22, October 2000.[269] Michael W. Trosset and Rudolf Mathar. On the existence of nonglobalminimizers of the STRESS criterion for metric multidimensionalscaling. In Proceedings of the Statistical Computing Section, pages158–162. American Statistical Association, 1997.[270] Jan van Tiel. Convex Analysis, an Introductory Text. Wiley, 1984.[271] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM Review, 38(1):49–95, March 1996.[272] Lieven Vandenberghe and Stephen Boyd. Connections betweensemi-infinite and semidefinite programming. In R. Reemtsen andJ.-J. Rückmann, editors, Semi-Infinite Programming, chapter 8, pages277–294. Kluwer Academic Publishers, 1998.[273] Lieven Vandenberghe and Stephen Boyd. Applications of semidefiniteprogramming. Applied Numerical Mathematics, 29(3):283–299, March1999.[274] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu. Determinantmaximization with linear matrix inequality constraints. SIAM Journalon Matrix Analysis and Applications, 19(2):499–533, April 1998.

732 BIBLIOGRAPHY[275] Robert J. Vanderbei. Convex optimization: Interior-point methodsand applications, 1999.www.sor.princeton.edu/∼rvdb/pdf/talks/pumath/talk.pdf[276] Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, 2004.[277] Martin Vetterli and Jelena Kovačević. Wavelets and Subband Coding.Prentice Hall, 1995.[278] È. B. Vinberg. The theory of convex homogeneous cones.Transactions of the Moscow Mathematical Society, 12:340–403, 1963.American Mathematical Society and London Mathematical Societyjoint translation, 1965.[279] Marie A. Vitulli. A brief history of linear algebra and matrix theory,2004.darkwing.uoregon.edu/∼vitulli/441.sp04/LinAlgHistory.html[280] John von Neumann. Functional Operators, Volume II: The Geometryof Orthogonal Spaces. Princeton University Press, 1950. Reprintedfrom mimeographed lecture notes first distributed in 1933.[281] Michael B. Wakin, Jason N. Laska, Marco F. Duarte, Dror Baron,Shriram Sarvotham, Dharmpal Takhar, Kevin F. Kelly, and Richard G.Baraniuk. An architecture for compressive imaging. In Proceedingsof the IEEE International Conference on Image Processing (ICIP),pages 1273–1276, October 2006.http://www.dsp.rice.edu/cs/CSCam-ICIP06.pdf[282] Roger Webster. Convexity. Oxford University Press, 1994.[283] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning ofimage manifolds by semidefinite programming. In Proceedings of theIEEE Computer Society Conference on Computer Vision and PatternRecognition, volume 2, pages 988–995, 2004.http://www.cis.upenn.edu/∼lsaul/papers/sde cvpr04.pdf[284] Eric W. Weisstein. Mathworld – A Wolfram Web Resource, 2007.http://mathworld.wolfram.com/search

BIBLIOGRAPHY 733[285] Bernard Widrow and Samuel D. Stearns. Adaptive Signal Processing.Prentice-Hall, 1985.[286] Norbert Wiener. On factorization of matrices. CommentariiMathematici Helvetici, 29:97–111, 1955.[287] Ami Wiesel, Yonina C. Eldar, and Shlomo Shamai (Shitz). Semidefiniterelaxation for detection of 16-QAM signaling in MIMO channels. IEEESignal Processing Letters, 12(9):653–656, September 2005.[288] Michael P. Williamson, Timothy F. Havel, and Kurt Wüthrich.Solution conformation of proteinase inhibitor IIA from bull seminalplasma by 1 H nuclear magnetic resonance and distance geometry.Journal of Molecular Biology, 182:295–315, 1985.[289] Willie W. Wong. Cayley-Menger determinant and generalizedN-dimensional Pythagorean theorem, November 2003.http://www.princeton.edu/∼wwong/papers/gp-r.pdfApplication of Linear Algebra: Notes on Talk given to PrincetonUniversity Math Club.[290] William Wooton, Edwin F. Beckenbach, and Frank J. Fleming. ModernAnalytic Geometry. Houghton Mifflin, 1975.[291] Margaret H. Wright. The interior-point revolution in optimization:History, recent developments, and lasting consequences. Bulletin ofthe American Mathematical Society, 42(1):39–56, January 2005.[292] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.[293] Shao-Po Wu. max-det Programming with Applications in MagnitudeFilter Design. A dissertation submitted to the department of ElectricalEngineering, Stanford University, December 1997.[294] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programming and determinant maximization problemswith matrix structure. User’s guide, 1996.http://www.stanford.edu/∼boyd/SDPSOL.html[295] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programs with matrix structure. In Laurent El Ghaoui

and Silviu-Iulian Niculescu, editors, Advances in Linear MatrixInequality Methods in Control, chapter 4, pages 79–91. SIAM, 2000.http://www.stanford.edu/∼boyd/sdpsol.html[296] Shao-Po Wu, Stephen Boyd, and Lieven Vandenberghe. FIR filterdesign via spectral factorization and convex optimization, 1997.http://www.stanford.edu/∼boyd/reports/magdes.pdf[297] Naoki Yamamoto and Maryam Fazel. A computational approach toquantum encoder design for purity optimization, 2006.http://arxiv.org/abs/quant-ph/0606106[298] David D. Yao, Shuzhong Zhang, and Xun Yu Zhou. Stochasticlinear-quadratic control via primal-dual semidefinite programming.SIAM Review, 46(1):87–111, March 2004. Erratum: p.194 herein.[299] Yinyu Ye. Semidefinite programming for Euclidean distance geometricoptimization. Lecture notes, 2003.http://www.stanford.edu/class/ee392o/EE392o-yinyu-ye.pdf[300] Yinyu Ye. Convergence behavior of central paths for convexhomogeneous self-dual cones, 1996.http://www.stanford.edu/∼yyye/yyye/ye.ps[301] Yinyu Ye. Interior Point Algorithms: Theory and Analysis. Wiley,1997.[302] D. C. Youla. Mathematical theory of image restoration by the methodof convex projection. In Henry Stark, editor, Image Recovery: Theoryand Application, chapter 2, pages 29–77. Academic Press, 1987.[303] Fuzhen Zhang. Matrix Theory: Basic Results and Techniques.Springer-Verlag, 1999.[304] Günter M. Ziegler. Kissing numbers: Surprises in dimension four.Emissary, pages 4–5, Spring 2004.http://www.msri.org/communications/emissary734

Index0-norm, 241, 273, 275, 2941-norm, 187–1892-norm, 45, 187, 188, 282matrix, 471, 523, 534, 542, 703Ax = b , 74, 241, 245, 253, 273, 275,281, 282, 298, 597δ , 495∞-norm, 187–189π , 465, 469, 490ψ , 249, 525k-largest norm, 1890 eigenvalues theorem, 526accuracy, 244active, 255adaptive, 259adjacency, 47adjointoperator, 45, 142, 147, 442self, 319, 495affine, 193combination, 59, 66dimension, 36, 54, 354complementary, 443hull, 36, 54independence, 67, 121preservation, 67map, 68set, 36as hyperplane intersection, 74as span of nullspace basis, 75subset, 54, 59projector, 599, 610transformation, 44, 68, 86, 101,224, 405, 446affinely independent, 66affinity, 36algebralinear, 495algebraiccomplement, 81, 134multiplicity, 519Alizadeh, 247alternating projection, 321, 642alternation, 392alternative, 148, 149weak, 150ambient, 435, 448vector space, 34anchor, 259, 325, 332angle, 72, 338acute, 62alternating projection, 649brackets, 45dihedral, 72, 337, 397, 549EDM cone, 107halfspace, 61inequality, 310, 395matrix, 340inequality, 341interpoint, 315735

736 INDEXobtuse, 62positive semidefinite cone, 107right, 109, 137, 155, 156antisymmetric, 48antihollow, 51, 455artificial intelligence, 23axioms of the metric, 307axis of revolution, 108, 549ball packing, 321Barvinok, 33, 97, 118, 233, 303base, 81station, 333basis, 50, 696discrete cosine transform, 276nonorthogonal, 163, 611projection on, 611orthogonalprojection, 610orthonormal, 50, 76, 607standard, 50, 131bees, 27, 321bijective, 45, 102, 562Gram form, 345inner-product form, 349isometry, 48, 391linear, 47, 48, 50, 52, 122, 157,469, 550biorthogonaldecomposition, 599, 608expansion, 134, 163, 602biorthogonality, 161, 164condition, 159, 538blunt, 659boundlower, 458polyhedral, 407boundary, 38, 80, 427and extreme, 95classical, 38cone, 90conventional, 81membership, 147, 238positive semidefinite cone, 103bounded, 56bowl, 576, 577box, 638bridge, 337, 364, 414, 459, 475calculusmatrix, 565of inequalities, 7of variations, 151Carathéodory’s theorem, 146, 621cardinality, 312minimum, 241problem, 241, 273, 275, 294Cartesianaxes, 35, 55, 68, 84, 92, 132, 144cone, 134, 146coordinates, 386product, 43, 141, 234, 650Cayley-Menger, 355, 377, 397, 489cellular, 334telephone network, 333centerof gravity, 343of mass, 343central path, 233certificatenull, 148chain rule, 572two argument, 572chop(), 665Chu, 7circumhypersphere, 355

INDEX 737clipping, 190, 454, 638, 639closure, 37coefficientbinomial, 189projection, 159, 430, 617, 623cofactor, 398combinatorial, 286compaction, 261, 270, 318, 411, 479comparable, 89, 160comparison, 56orthant, 153complementalgebraic, 81, 134projection on, 603orthogonal, 48relative, 77, 692complementarity, 187, 240linear, 178maximal, 228, 233, 255problem, 178strict, 240complementaryaffine dimension, 443dimension, 73eigenvalues, 603halfspaces, 63inertia, 378, 515slackness, 239subspace, 48completiongeometry, 411problem, 308, 363, 370, 401semidefinite, 653compositionEDM, 371compressed sensing, 296compressive sampling, 296concave, 183, 479, 553concavity, 183conditionbiorthogonality, 159, 538optimality, 239cone, 59, 81, 84, 702blade, 145circular, 86, 108, 109right, 108convex, 43, 85positive semidefinite, 100dual, 134, 135, 179, 181, 381algorithm, 167construction, 136, 137examples, 143formula, 152, 167linear inequality, 147Lorentz, 143, 156positive semidefinite, 154properties, 141unique, 135, 158EDM, 403, 405, 406, 408boundary, 421construction, 420convexity, 409decomposition, 444dual, 435, 436, 446, 450extreme direction, 422face, 422, 424positive semidefinite, 425, 432projection, 472halfspace, 138ice cream, 86invariance, 86Lorentz, 86, 128dual, 156majorization, 498membership, 87, 98relation, 146, 168

738 INDEXmonotone, 172, 173nonnegative, 170nonconvex, 82–85normal, 175, 177, 429, 636, 656,657elliptope, 658orthant, 177orthogonal, 144pointed, 86, 146, 161, 174closed convex, 87, 122polyhedral, 92, 129polar, 81, 135, 429, 634polyhedral, 59, 121, 127dual, 155, 158, 159halfspace description, 127majorization, 491pointed, 158proper, 123positive semidefinite, 97, 439boundary, 432circular, 108face, 431inscription, 112inverse image, 101optimization, 232polyhedral, 113rank, 230visualization, 229proper, 89, 179quadratic, 86recession, 138second order, 86self-dual, 154simplicial, 60, 131, 132, 160dual, 160, 168spectral, 376, 465dual, 381orthant, 468tangent, 429unique, 97, 135, 405congruence, 375congruent, 344coniccombination, 59hull, 59independence, 25, 120–122, 125preservation, 122problem, 226sectioncircular cone, 111conically independent, 120conici(), 667conservationdimension, 354, 526, 527constellation, 20constraintcardinality, 275, 294equality, 151Gram, 319inequality, 255nonconvex, 286polynomial, 286rank, 256, 286, 464sort, 390content, 399, 702contour plot, 176, 213convergence, 258, 391, 642geometric, 644convexcombination, 34, 59cone, 43, 85envelope, 245, 267, 332, 476, 477equivalent, 268form, 225function, 183geometry, 33

INDEX 739hull, 53, 56, 306of outer product, 57iteration, 256, 258, 275, 276, 481stall, 276, 284, 287optimization, 7, 186, 226, 325polyhedron, 56, 126halfspace description, 126vertices, 128problem, 140, 151, 177, 195, 201,228, 234, 470, 628geometry, 19, 25tractable, 226set, 34Schur-form, 100simultaneously, 199, 200, 218strictly, 185convexity, 183, 341first order, 211, 214, 215, 217second order, 215, 219coordinate, 166Crippen & Havel, 337criteriamatrix, 356cubix, 566, 569, 699curvature, 222decimate(), 672decomposition, 699biorthogonal, 599, 608dyad, 531eigen, 518orthonormal, 606, 607singular value, 521compact, 521full, 522subcompact, 522Delsarte method, 322dense, 79derivative, 205directional, 214, 565, 573, 577dimension, 576second, 578table, 586descriptionconversion, 134halfspace, 62dual cone, 152vertex, 59, 158of halfspace, 124determinant, 394, 505, 517, 592diagonal, 495, 552dominance, 114diagonalizable, 161, 518simultaneously, 240, 508, 529diagonalization, 39, 518expansion by, 161symmetric, 520difference, 42PSD matrices, 118vector, 43diffusion, 411dilation, 388dimension, 36affine, 21, 36, 53, 351, 353, 363,443low, 329minimization, 476, 488reduction, 400spectral projection, 489complementary, 73conservation, 354, 526, 527embedding, 54positive semidefinite cone, 105Précis, 354rank, 353, 354direction, 81, 91

740 INDEXexposed, 94extreme, 91, 159PSD cone, 99matrix, 257disc, 59discretization, 152, 185, 215, 447distancegeometry, 20, 333matrix, 306origin to hyperplane, 65doublet, 343, 539, 627range, 539dualaffine dimension, 443cone, 135feasible set, 234norm, 143of dual, 141, 243, 292positive semidefinite cone, 154problem, 138, 227, 238–240, 292projection, 629strong, 140, 239, 562duality gap, 139, 140, 235, 292dvec, 52Dx(), 666dyad, 531, 534-decomposition, 531independence, 536, 537negative, 534projector, 535, 600, 617range, 535sum, 518, 521range, 538symmetric, 536Dykstra algorithm, 642edge, 77EDM, 26, 306, 449closest, 463composition, 372cone, 313, 405projection dual, 487construction, 418criterion, 434dual, 442definition, 312, 416, 430Gram form, 315, 427inner-product form, 338interpoint angle, 315relative-angle form, 340dual, 443exponential entry, 372graph, 309, 322, 329, 330, 412projection, 433range, 417unique, 370, 383, 384, 464eigen, 518, 552decomposition, 518matrix, 98, 519spectrum, 376, 465ordered, 378unordered, 380value, 355, 504coefficients, 618, 621EDM, 375, 388, 417interlaced, 361, 375intertwined, 361largest, 363Schur, 516smallest, 363, 371unique, 108, 518zero, 526vectordistinct, 519EDM, 417left, 518

INDEX 741unique, 519elbow, 457, 458elementminimal, 89minimum, 89elementary matrix, 373, 540, 542range, 540ellipsoid, 34, 38, 40, 44, 283elliptope, 245, 325, 365, 366, 374, 427,428embedding, 351dimension, 54empty, 37interior, 37set, 37entry, 184, 689epigraph, 197, 217form, 202errata, 182, 194, 259, 348, 557errorinput, 456Euclideandistance, 306geometry, 20, 326, 333, 624metric, 307, 358fifth property, 307, 310, 311,393, 395, 401norm, 703projection, 116space, 34exclusivemutually, 150expansion, 164, 699biorthogonal, 134, 159, 163, 602,612as projection, 611EDM, 422unique, 161, 163, 166, 423, 538,612implied by diagonalization, 161orthogonal, 50, 134, 606, 608w.r.t orthant, 163exponential, 593exposed, 76direction, 94extreme, 78, 126face, 77, 79point, 79, 94density, 79extreme, 76and boundary, 95direction, 91, 96distinct, 91EDM cone, 422, 441positive semidefinite cone, 107unique, 91exposed, 78, 126point, 76, 233, 258, 275ray, 91face, 39, 77algebra, 80transitivity, 80face recognition, 23facet, 79Fantope, 57, 58, 108, 200, 257, 258Farkas’ lemma, 146, 149definite, 236not definite, 238semidefinite, 234fat, 74, 689full-rank, 74finitely generated, 127Finsler criterion, 434flared horn, 85

742 INDEXfloor, 703Forsgren & Gill, 225Fréchet differential, 574Frobeniusnorm, 47minimization, 207, 625full, 37-dimensional, 37, 89rank, 74functionaffine, 193, 207supremum, 196composition, 210, 224concave, 223convex, 183differentiable, 219strictly, 185linear, 184, 193matrix, 216monotonic, 183, 189, 210, 223multidimensional, 183, 194, 205,219objective, 72, 140, 151, 176, 177bilinear, 256, 273linear, 195nonlinear, 195presorting, 465, 469, 490quadratic, 186, 203, 209, 220,312, 339, 484, 576, 622quasiconcave, 115, 249, 273quasiconvex, 115, 221–223quasilinear, 223, 525sorting, 465, 469, 490step, 525matrix, 249support, 195vector, 184fundamentalconvexgeometry, 62, 68, 72, 317optimization, 225metric property, 307, 358subspaces, 73, 605test semidefiniteness, 154, 499theorem algebra, 518Gâteaux differential, 574Gale matrix, 329generating list, 57generator, 93geometriccenter, 317, 343, 384subspace, 440, 626centeringmatrix, 542operator, 347Hahn-Banach theorem, 62multiplicity, 519realizability, 320Geršgorin, 113gimbal, 548global positioning system, 22, 259Gower, 305gradient, 151, 176, 195, 204, 565, 584derivative, 583dimension, 205first order, 583monotonic, 210product, 569second order, 584, 585table, 586Gram matrix, 266, 315halfline, 81halfspace, 59, 61, 138description, 60, 63

INDEX 743polarity, 62handoff, 335over, 335Hardy-Littlewood-Pólya, 466Hayden & Wells, 403, 416, 453Hermitian matrix, 499Hessian, 205, 565hexagon, 320homogeneity, 187, 224, 313honeycomb, 27Horn & Johnson, 500, 501hull, 53, 55affine, 36, 53unique, 54conic, 59convex, 53, 56, 306of outer product, 57unique, 56hyperboloid, 530hypercube, 71, 638slice, 275hyperdimensional, 570hyperdisc, 577hyperplane, 59, 61, 63, 207independent, 74movement, 64normal, 63radius, 64separating, 72supporting, 68, 70polarity, 70strictly, 70unique, 70, 212tangent, 70vertex description, 66hypersphere, 57, 321, 367hypograph, 198idempotent, 598, 603symmetric, 604, 607iff, 701imageinverse, 44, 101, 157indefinite, 374independenceaffine, 66, 121preservation, 67conic, 120–122, 125preservation, 122linear, 35, 121preservation, 35inequalitygeneralized, 25, 89, 134, 146dual, 146, 154partial order, 87linear, 147matrix, 25, 156, 157, 234, 235,238spectral, 376triangle, 187, 358unique, 393variation, 151inertia, 375, 515complementary, 378, 515Sylvester’s law, 507infimum, 224, 551, 604, 625, 701inflection, 215injective, 48, 90, 345, 595inner product, 45vectorized matrix, 45interior, 37interior-point method, 228, 324, 445intersection, 42cone, 86hyperplane with convex set, 68line with boundary, 39

744 INDEXof subspaces, 76planes, 75positive semidefinite coneaffine, 118, 238line, 327tangential, 40invariance, 342Gram form, 343inner-product form, 344orthogonal, 48, 391translation, 342invariant set, 373inversionGram form, 347is, 692isedm(), 661isometry, 48, 391, 460isomorphic, 47, 48, 51, 106, 320, 422,691isometrically, 39, 47, 102, 154isomorphism, 47, 156, 348, 404isometric, 48, 52, 391symmetric matrix subspace, 49isotonic reconstruction, 388iterate, 642iterationalternating projection, 485, 642,656convex, 256, 258, 276, 481stall, 276, 284, 287Jacobian, 205K-convexity, 184Karhunen−Loéve transform, 382kissing number, 322KKT conditions, 463Kronecker product, 102, 312, 497,562, 570, 588, 596, 643Lagrange multiplier, 151Lagrangian, 292, 669lattice, 262–265, 271, 272Laurent, 372law of cosines, 338leastnorm, 242, 598squares, 259, 598Legendre-Fenchel transform, 476Lemaréchal, 19line, 207, 219tangential, 41linearcomplementarity, 178function, 184, 193independence, 35, 121inequality, 147matrix, 25, 156, 157, 234, 235,238program, 71, 141, 195, 227, 322prototypical, 141, 227transformation, 35, 67, 90, 122,127linearly independent, 35localization, 22, 327sensor network, 259standardized test, 261unique, 21, 259, 261, 326, 329wireless, 333log det, 221, 333, 479, 582, 591logarithm, 593Lorentz cone, 86, 128dual, 143, 156lp(), 669machine learning, 23majorization, 498symmetric hollow, 498

INDEX 745manifold, 23, 24, 105, 410, 411, 458,492, 558mapisotonic, 388linearof cone, 122USA, 26, 386, 670mapusa(), 670mater, 566Matlab, 386, 661lp() versus linprog(), 667matrix, 566auxiliary, 542, 546orthonormal, 545projector, 542Schoenberg, 544table, 546bordered, 361, 375calculus, 565circulant, 220, 520, 543commutative, 220, 240, 506, 507,529, 642correlation, 56, 366diagonal, 508, 518, 520, 521, 600direction, 257, 269, 294, 323, 482doublet, 343, 539range, 539dyad, 531, 534, 617, 620independence, 536projector, 600range, 535sum, 518, 521, 538symmetric, 536elementary, 373, 540, 542range, 540Euclidean distance, 305exponential, 220, 593fat, 74, 689full-rank, 74Gale, 329geometric centering, 405, 414,542Gram, 266, 315Householder, 541auxiliary, 543idempotent, 598, 604inverse, 219Jordan form, 504measurement, 454normal, 47, 505, 520orthogonal, 520, 547orthonormal, 48, 57, 257, 284,522, 545, 555, 596partitioned, 514permutation, 520, 542, 547, 561positive semidefinite, 499from extreme directions, 108product, 223, 571determinant, 507fractional, 199, 200, 218gradient, 569Hadamard, 46, 472, 571, 691Kronecker, 570, 588, 596, 643positive definite, 501pseudofractional, 198trace, 506zero, 529projection, 102, 543, 598, 604,606product, 642pseudoinverse, 102, 164, 207, 242,514product, 596SVD, 525transpose, 161unique, 595

746 INDEXrotation, 550Schur-form, 514simple, 533skinny, 164, 689sort index, 387square root, 520squared, 220step function, 249symmetric, 48subspace of, 48unitary, 547membership relation, 146, 168discretized, 152, 447in subspace, 168metricproperty, 307, 358fifth, 307, 310, 311, 393space, 307minimalelement, 89set, 66, 124, 615minimax problem, 140minimization, 151, 176, 261, 577on unit cube, 71minimumcardinality, 241element, 89globalunique, 183, 577unique, 186molecular conformation, 22, 336monotonic, 183, 202, 210, 258, 648,651Fejér, 648gradient, 210Moore-Penrose conditions, 595Muller, 525multidimensionalfunction, 183, 194, 205, 219scaling, 22, 382multilateration, 259, 333multipath, 333nearest neighbors, 411neighborhood graph, 410, 411nesting, 360Newton, 565nondegeneracy, 307nonexpansive, 606, 637nonisomorphic, 529nonnegative, 356, 703nonsymmetric, 504nontrivial support, 70nonvertical, 207norm, 187, 273, 703k largest, 189Euclidean, 45Frobenius, 47minimization, 207, 625least, 242, 598spectral, 458, 471, 541, 639, 703zero, 526normal, 63, 176, 629cone, 176, 636, 657equation, 597facet, 159inward, 62, 435matrix, 47, 520outward, 62vector, 629Notation, 689NP-hard, 481nuclear magnetic resonance, 336nullspace, 73, 543, 627form, 73numerical precision, 244

INDEX 747objectivebilinear, 256, 273convex, 225strictly, 473, 478function, 72, 140, 151, 176, 177linear, 195, 267, 327multidimensional, 186nonlinear, 195polynomial, 286quadratic, 203real, 186value, 238, 252unique, 227offset, 342omapusa(), 672on, 696one-dimensional projection, 619onto, 696operatoradjoint, 45, 142, 147, 442, 495linear, 43, 48, 51, 315, 347, 349,391, 425, 495, 627projector, 603, 607, 610, 630nullspace, 343, 345, 347, 349, 627permutation, 465, 469, 490optimalanalytical results, 551solution, 186optimality, 176, 239conic, 177equality constraint, 151first order, 151, 176optimizationcombinatorial, 286, 289, 494convex, 7, 186, 226, 325multicriteria, 186vector, 186ordernatural, 45, 495, 699nonincreasing, 467, 520, 521, 694of projection, 454partial, 87, 100, 159, 160generalized inequality, 87ordinal multidimensional scaling, 388origin, 34, 352, 614, 634, 639Orion nebula, 20orthant, 36, 163, 177nonnegative, 441orthogonal, 45complement, 48equivalence, 550expansion, 50, 134, 606, 608invariance, 48, 391matrix, 547orthonormal, 45, 344decomposition, 606matrix, 48orthonormality condition, 162overdetermined, 689parallel, 36pattern recognition, 22Penrose conditions, 595pentahedron, 398pentatope, 398perpendicular, 45, 604, 625perturbation, 247, 249optimality, 252plane, 59segment, 91pointexposed, 78extreme, 78fixed, 645inflection, 215polychoron, 81

748 INDEXpolyhedralcone, 59, 121, 127polyhedron, 39, 71, 126, 364vertex description, 128vertices, 128polynomial, 515constraint, 286objective, 286polytope, 126positive, 703definiteFarkas’ lemma, 236semidefinite, 502difference, 118Farkas’ lemma, 234square root, 520semidefinite cone, 40, 97, 99, 440boundary, 115dimension, 105dual, 154extreme direction, 107face, 105inscription, 112rank, 105strictly, 357postulates, 307primalfeasible set, 234problem, 140, 227via dual, 240principalcomponent analysis, 294, 382eigenvector, 294submatrix, 325, 393, 398, 424leading, 358, 360theorem, 512problemBoolean, 71, 241, 286cardinality, 241, 275, 294feasibility, 273completion, 308, 363, 370, 401,411, 653, 654conic, 226convex, 140, 151, 177, 195, 201,228, 234, 470, 628dual, 138, 227, 238–240, 292equivalent, 257feasibility, 178cardinality, 273semidefinite, 230, 300max cut, 289minimax, 140prevalent, 459primal, 140, 227via dual, 240Procrustes, 284proximity, 461same, 257stress, 460, 481proceduredyad-decomposition, 679rank reduction, 248Sturm, 679Procrustes, 284diagonal, 564linear program, 561maximization, 561orthogonal, 558two sided, 560, 562solutionunique, 559symmetric, 563translation, 559product, 42Cartesian, 43, 86, 141, 234, 650direct, 570

INDEX 749Hadamard, 46, 472, 571, 691Kronecker, 102, 312, 497, 562,570, 588, 596, 643matrix, 223, 571determinant, 507fractional, 199, 200, 218gradient, 569positive definite, 501pseudofractional, 198trace, 506zero, 529positive definitenonsymmetric, 501pseudoinverse, 596semidefinitesymmetric, 512tensor, 570program, 71geometric, 8linear, 71, 141, 195, 227, 322prototypical, 141, 227semidefinite, 141, 195, 225, 227,474prototypical, 141, 227projection, 430, 595algebra, 610alternating, 642, 643convergence, 647, 649, 651, 654distance, 645, 646Dykstra, 655, 656feasibility, 645–647on affine/psd cone, 652on EDM cone, 485on halfspaces, 644on orthant/hyperplane, 648,649optimization, 645, 646, 655over/under, 650biorthogonal expansion, 611coefficient, 159, 430, 617, 623cyclic, 644dual, 629as optimization, 630on cone, 632on convex set, 629, 631on EDM cone, 487easy, 638Euclidean, 116, 455, 458, 462,464, 472, 483, 629matrix, 598, 604, 606minimum-distance, 602, 605, 625,628nonorthogonal, 471, 601, 617eigenvalue coefficients, 618on dyad, 617on elementary matrix, 613on matrix, 617oblique, 598on affine, 610, 615vertex description, 615on algebraic complement, 603on cone, 633, 635on convex set, 628in affine subset, 641in subspace, 640, 641on convex sets, 644on EDM cone, 473, 483on ellipsoid, 282on halfspace, 616on hyperplane, 616origin, 614on matrixvectorized, 617, 624on nonorthogonal basis, 611on orthant, 636on orthogonal basis, 610

750 INDEXon PSD cone, 104, 116, 461, 464rank constrained, 464, 470on slab, 616on subspace, 634on truncated cone, 636one dimensional, 619order of, 454orthogonal, 619eigenvalue coefficients, 621on dyad, 620on function domain, 198on matrix subspace, 624on vectorized matrix, 619semidefiniteness test as, 622spectral, 465, 468unique, 468successive, 644two sided, 626range, 628unique, 602, 605, 625, 628, 629projectorbiorthogonal, 608commutative, 440, 642directionuniversal, 608dyad, 535, 600, 617linear operator, 603, 607, 610, 630noncommutative, 641, 643nonorthogonal, 602, 609orthogonal, 606orthonormal, 608product, 440, 642range, 605universal, 604prototypical, 141proximityEDMnonconvex, 468in spectral norm, 471problem, 453, 455rank heuristic, 478, 480semidefinite program, 474, 484Gram form, 475pyramid, 399, 400quadrant, 36quadraticcone, 86form, 215function, 186, 203, 209, 220, 312,339, 484, 576, 622nonnegative, 515program, 391quartix, 568, 699quasilinear, 215range, 73, 625form, 73rotation, 548rank, 701constraint, 256, 286, 464, 481convex envelope, 477dimension, 354, 701heuristic, 478, 480, 481log det, 479, 480lowest, 230minimization, 476one, 536modification, 540partitioned matrix, 516Précis, 354quasiconcavity, 115reduction, 228, 247, 248, 253procedure, 248regularization, 270, 481Schur-form, 516

INDEX 751trace, 477, 478heuristic, 476rank ρ subset, 103, 206ray, 81extreme, 427Rayleigh’s quotient, 623realizable, 20reconstruction, 382isometric, 329, 386isotonic, 386unique, 309, 329, 330, 344, 345recursion, 269, 445, 495, 512reflection, 344regular, 27, 130, 261, 395, 398, 409regularization, 270relative, 37angleinequality, 310, 395matrix, 340matrix inequality, 341boundary, 37, 57complement, 77, 692interior, 37relaxation, 71, 243, 245, 327, 389Riemann, 228robotics, 23rotation, 344invariant, 344round, 701RRf(), 675saddle value, 140scaling, 382multidimensional, 22, 382unidimensional, 481Schoenberg, 377, 447auxiliary matrix, 314criterion, 28, 317Schur-form, 100, 514anomaly, 203convex set, 101rank, 516semidefinite program, 202, 637sparse, 515complement, 267, 327, 514, 515second ordercone, 86program, 228section, 110selfadjoint, 495dual, 147, 154semidefinite, 504domain, 499Farkas’ lemma, 234, 235program, 141, 195, 225, 227, 474equality constraint, 100Gram form, 486prototypical, 141, 227Schur-form, 485versus symmetry, 499sensor, 325, 332sensor networklocalization, 259, 325, 332separation, 72sequence, 648setfeasible, 42, 151, 176, 234, 699dual, 234primal, 234invariant, 373level, 176, 193, 204, 205, 213, 223,226minimal, 66, 615solution, 699

752 INDEXsublevel, 197, 212, 218, 223superlevel, 198, 223shape, 21shift, 342SIAM, 226signaldropout, 276processingdigital, 276, 296, 382signeig(), 666simplex, 130, 131, 399unit, 130, 131simplicial, 131singular valuedecompositioncompact, 521full, 522geometrical, 524pseudoinverse, 525subcompact, 522symmetric, 525problem, 471skinny, 164, 689slab, 35, 268, 616slack variable, 227Slater’s condition, 140, 235slice, 109, 156, 333, 576, 577smooth, 365solid, 126solutionnumerical, 269set, 699unique, 42, 185, 186, 230, 327,471, 473, 478, 488sortfunction, 465, 469, 490index matrix, 387largest entries, 189monotone nonnegative, 467smallest entries, 188span, 696sparsity, 241, 273, 275, 294spectralcone, 378norm, 458, 471, 541, 639, 703projection, 414, 465sphere packing, 321standard basismatrix, 50, 53vector, 50, 131, 316steepest descent, 576step function, 249, 525strictcomplementarity, 240positivity, 357triangle inequality, 360strictlyconvex function, 186feasible, 235separating hyperplane, 72supporting, 70strongdual, 140, 239, 562duality, 239subject to, 701subspace, 34as hyperplane intersection, 74as span of nullspace basis, 75complementary, 48geometric center, 346, 438, 626orthogonal complement, 105,626hollow, 51, 346, 439projection on, 44proper, 34representation, 73

INDEX 753symmetric hollow, 51tangent, 105translation invariant, 626successiveapproximation, 644projection, 644sum, 42vector, 43unique, 48, 134, 537supportfunction, 70, 195nontrivial, 70supporting hyperplane, 69supremum, 701surjective, 48, 345, 347, 349, 466, 696linear, 349, 350, 367, 369svec, 49svect(), 677svectinv(), 678Swiss roll, 24symmetrized, 500tangenthyperplane, 70, 209line, 40, 327subspace, 105tangential, 40, 328Taylor series, 565, 582tetrahedronangle inequality, 395theorem0 eigenvalues, 526alternating projectiondistance, 647alternative, 148, 449semidefinite, 237Barvinok, 118Bunt-Motzkin, 629Carathéodory, 146, 621cone faces, 90convexity condition, 211decompositiondyad, 531directional derivative, 577discrete membership, 153dual cone intersection, 180EDM, 367eigenvalue0, 526of difference, 511order, 510sum, 511exposed, 95extreme existence, 77extremes, 93, 94Farkas’ lemma, 146, 149definite, 236not definite, 238semidefinite, 234fundamentalalgebra, 518Geršgorin discs, 113gradient monotonicity, 210halfspaces, 62image, 44, 46intersection, 42inverse image, 44line, 219linearly independent dyads, 537mean value, 582monotone nonnegative sort, 467nonexpansivity, 637pointed cones, 87positive semidefinite, 503cone subsets, 116matrix sum, 115

754 INDEXprincipal submatrix, 512symmetric, 513projection, 440algebraic complement, 635minimum-distance, 629on affine, 610on cone, 633via dual cone, 636via normal cone, 630projectorrank/trace, 607semidefinite, 513proper-cone boundary, 90Pythagorean, 339, 641range of dyad sum, 538rankaffine dimension, 355partitioned matrix, 516trace, 604real eigenvector, 518Schur, 498Schur-formrank, 516subspace projection, 44Tour, 372tight, 309, 699torsion, 27trace, 45, 495, 552, 589rank gap, 477trajectory, 373, 434sensor, 331transformationaffine, 44, 68, 86, 101, 224, 405,446linear, 35, 67, 122, 127dual, 147injective, 90rigid, 21, 329similarity, 622translation invariant, 342subspace, 343, 626trefoil, 412, 413, 415triangle, 308triangulation, 21trilateration, 21, 42, 326tandem, 332unboundedbelow, 71, 149underdetermined, 298, 689unfolding, 411unfurling, 411unimodal, 221unique, 8eigenvalues, 518extreme direction, 91localization, 259, 261minimum element, 89solution, 185, 328unit simplex, 131unitarylinear operator, 48matrix, 547unraveling, 411, 413, 415untieing, 413, 415USA, 385valueabsolute, 187minimumunique, 186objectiveunique, 227singularlargest, 639variable

INDEX 755matrix, 65slack, 227vec, 45, 495vector, 34binary, 56, 245, 286, 365direction, 275dual, 656normal, 629Perron, 375primal, 656vectorization, 45, 430symmetric, 49hollow, 52Venn, 456vertex, 38, 42, 79description, 59, 60of halfspace, 125polyhedral cone, 129Vitulli, 533Vm(), 665Vn(), 665vortex, 420Wüthrich, 22weakalternative, 150duality theorem, 239wedge, 92, 143wireless location, 22, 259, 333womb, 566zero, 526definite, 530entry, 526matrix product, 529trace, 529

A searchable color pdfBookJon Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo, 2007is available online from Meboo Publishing.756

757

758

Convex Optimization & Euclidean Distance Geometry, DattorroMεβoo Publishing USA

v2007.11.26 - Convex Optimization

Create successful ePaper yourself

Delete template?

Save as template?