On Fisher information inequalities and score functions in non-invertible

Journal of Inequalities **in** Pure **and**Applied Mathematicshttp://jipam.vu.edu.au/Volume 4, Issue 4, Article 71, 2003ON FISHER INFORMATION INEQUALITIES AND SCORE FUNCTIONS INNON-INVERTIBLE LINEAR SYSTEMSC. VIGNAT AND J.-F. BERCHERE.E.C.S. UNIVERSITY OF MICHIGAN1301 N. BEAL AVENUEANN ARBOR MI 48109, USA.vignat@univ-mlv.frURL: http://www-syscom.univ-mlv.fr/ vignat/ÉQUIPE SIGNAL ET INFORMATIONESIEE AND UMLV93 162 NOISY-LE-GRAND, FRANCE.jf.bercher@esiee.frReceived 04 June, 2003; accepted 09 September, 2003Communicated by F. HansenABSTRACT. In this note, we review **score** **functions** properties **and** discuss ** inequalities** on the

2 C. VIGNAT AND J.-F. BERCHERwe give two alternate derivations of Zamir’s FII **in**equalities**and** show how they can be relatedto Papathanasiou’s results. Third, we exam**in**e the cases of equality **and** give an **in**terpretationthat highlights the concept of extractable component of the **in**put vector of a l**in**ear system, **and**its relationship with the concepts of pseudo**in**verse **and** gaussianity.2. NOTATIONS AND DEFINITIONSIn this note, we consider a l**in**ear system with a (n × 1) r**and**om vector **in**put X **and** a (m × 1)r**and**om vector output Y , represented by a m × n matrix A, with m ≤ n asY = AX.Matrix A is assumed to have full row rank (rank A = m).Let f X **and** f Y denote the probability densities of X **and** Y . The probability density f X issupposed to satisfy the three regularity conditions (cf. [4])(1) f X (x) is cont**in**uous **and** has cont**in**uous first **and** second order partial derivatives,(2) f X (x) is def**in**ed on R n **and** lim ||x||→∞ f X (x) = 0,(3) the **Fisher** ** information** matrix J X (with respect to a translation parameter) is def

FISHER INFORMATION INEQUALITIES 3Theorem 3.1. Under the hypotheses expressed **in** Section 2, the solution to the m**in**imum meansquare estimation problem[(3.1) φX̂(X) = arg m**in** E X,Y ||φX (X) − w(Y )|| 2] subject to Y = AX,isw(3.2) ̂ φX (X) = A T φ Y (Y ) .The proof we propose here relies on elementary algebraic manipulations accord**in**g to therules expressed **in** the follow**in**g lemma.Lemma 3.2. If X **and** Y are two r**and**om vectors such that Y = AX, where A is a full rowrankmatrix then for any smooth **functions** g 1 : R m → R, g 2 : R n → R, h 1 : R n → R n ,h 2 : R m → R m ,Rule 0 E X [g 1 (AX)] = E Y [g 1 (Y )]Rule 1E X [φ X (X) g 2 (X)] = −E X [∇ X g 2 (X)]Rule 2 E X[φX (X) h T 1 (X) ] = −E X[∇X h T 1 (X) ]Rule 3 ∇ X h T 2 (AX) = A T ∇ Y h T 2 (Y )Rule 4 E X[∇X φ T Y (Y ) ] = −A T J Y .Proof. Rule 0 is proved **in** [2, vol. 2, p.133]. Rule 1 **and** Rule 2 are easily proved us**in**g **in**tegrationby parts. For Rule 3, denote by h k the k th component of vector h = h 2 , **and** remarkthat∂∂x jh k (AX) ==m∑i=1∂h k (AX)∂y i∂y i∂x jm∑A ij [∇ Y h k (Y )] ii=1= [ A T ∇ Y h k (Y ) ] j .Now h T (Y ) = [ h T 1 (Y ) , . . . , h T n (Y ) ] so that ∇ X h T (Y ) = [ ∇ X h T 1 (AX) , . . . , ∇ X h T n (AX) ] =A T ∇ Y h T (Y ) .Rule 4 can be deduced as follows:E X[∇X φ T Y (Y ) ]Rule 3= A T E X[∇Y φ T Y (Y ) ]Rule 0= A T E Y[∇Y φ T Y (Y ) ]Rule 2= −A T E Y[φ Y (Y ) φ Y (Y ) T ] .For the proof of Theorem 3.1, we will also need the follow**in**g orthogonality result.□J. Inequal. Pure **and** Appl. Math., 4(4) Art. 71, 2003 http://jipam.vu.edu.au/

4 C. VIGNAT AND J.-F. BERCHERLemma 3.3. For all multivariate **functions** h : R m → R n , ̂ φX (X) = A T φ Y (Y ) satisfies(3.3) E X,Y(φ X (X) − ̂ φ X (X)) Th (Y ) = 0.Proof. Exp**and** **in**to two terms **and** compute first term us**in**g the trace operator tr(·)]E X,Y[φ X (X) T [h (Y ) = tr E X,Y φX (X) h T (Y ) ]Rule 2, Rule 0 [= −tr E Y ∇X h T (Y ) ]Second term writesthus the terms are equal.Rule 3= −tr A T E Y[∇Y h T (Y ) ] .[ ]E X,Y φ̂X (X) T [ ]h (Y ) = tr E X,Y φX ̂ (X)h T (Y )[= tr E Y A T φ Y (Y ) h T (Y ) ]= tr A T E Y[φY (Y ) h T (Y ) ]Rule 2= −tr A T E Y[∇Y h T (Y ) ]Us**in**g Lemma 3.2 **and** Lemma 3.3 we are now **in** a position to prove Theorem 3.1.Proof of Theorem 3.1. From Lemma 3.3, we haveE X,Y[(φ X (X) − ̂φ) ] [(X (X) h(Y ) = E X,Y φX (X) − A T φ Y (Y ) ) h(Y ) ]= E Y[EX|Y[(φX (X) − A T φ Y (Y ) ) h(Y ) ]] = 0.S**in**ce this is true for all h, it means the **in**ner expectation is null, so thatE X|Y [φ X (X)] = A T φ Y (Y ) .□Hence, we deduce that the estimator ̂φ X (X) = A T φ Y (Y ) is noth**in**g else but the conditionalexpectation of φ X (X) given Y . S**in**ce it is well known (see [8] for **in**stance) that the conditionalexpectation is the solution of the M**in**imum Mean Square Error (MMSE) estimation problemaddressed **in** Theorem 3.1, the result follows.□Theorem 3.1 not only restates Zamir’s result **in** terms of an estimation problem, but alsoextends its conditions of application s**in**ce our proof does not require, as **in** [7], the **in**dependenceof the components of X.4. FISHER INFORMATION MATRIX INEQUALITIESAs was shown by Zamir [6], the result of Theorem 3.1 may be used to derive the pair of**Fisher** Information Inequalities stated **in** the follow**in**g theorem:Theorem 4.1. Under the assumptions of Theorem 3.1,(4.1) J X ≥ A T J Y A**and**(4.2) J Y ≤ ( AJ −1X AT ) −1.J. Inequal. Pure **and** Appl. Math., 4(4) Art. 71, 2003 http://jipam.vu.edu.au/

FISHER INFORMATION INEQUALITIES 5We exhibit here an extension **and** two alternate proofs of these results, that do not even rely onTheorem 3.1. The first proof relies on a classical matrix **in**equality comb**in**ed with the algebraicproperties of **score** **functions** as expressed by Rule 1 to Rule 4. The second (partial) proof isdeduced as a particular case of results expressed by Papathanasiou [4].The first proof we propose is based on the well-known result expressed **in** the follow**in**glemma.Lemma 4.2. If U = [ ACwith equality if **and** only if rank(U) = dim(D).BD]is a block symmetric **non**-negative matrix such that D −1 exists, thenA − BD −1 C ≥ 0,Proof. Consider the block L∆M factorization [3] of matrix U :[ ][ ][]I BD−1 A − BDU =−1 C 0 I 00 I0 D D −1 C I} {{ }} {{ }} {{ }LWe remark that the symmetry of U implies that L = M **and** thus∆∆ = L −1 UL −TM T .so that ∆ is a symmetric **non**negative def**in**ite matrix. Hence, all its pr**in**cipal m**in**ors are **non**negative,**and**A − BD −1 C ≥ 0.□Us**in**g this matrix **in**equality, we can complete the proof of Theorem 4.1 by consider**in**g thetwo follow**in**g (m + n) × (m + n) matrices[ ]φX (X) [U 1 = EφTφ Y (Y ) X (X) φ T Y (Y ) ] ,[ ]φY (Y ) [U 2 = EφTφ X (X) Y (Y ) φ T X (X) ] .For matrix U 1 , we have, from Lemma 4.2(4.3) E X[φX (X) φ T X (X) ]≥ E X,Y[φX (X) φ T Y (Y ) ] ( E Y[φY (Y ) φ T Y (Y ) ]) −1EX,Y[φY (Y ) φ T X (X) ] .Then, us**in**g the rules of Lemma 3.2, we can recognize thatE X[φX (X) φ T X (X) ] = J X ,E Y[φY (Y ) φ T Y (Y ) ] = J Y ,E X,Y[φX (X) φ T Y (Y ) ] = −E Y[∇φTY (Y ) ] = A T J Y ,E X,Y[φY (Y ) φ T X (X) ] = ( A T J Y) T= JY A.Replac**in**g these expressions **in** **in**equality (4.3), we deduce the first **in**equality (4.1).Apply**in**g the result of Lemma 4.2 to matrix U 2 yields similarlyMultiply**in**g both on left **and** right by J −1YJ Y ≥ JY T AJ −1X AT J Y .= ( )J −1 TY yields **in**equality (4.2).Another proof of **in**equality (4.2) is now exhibited, as a consequence of a general resultderived by Papathanasiou [4]. This result states as follows.J. Inequal. Pure **and** Appl. Math., 4(4) Art. 71, 2003 http://jipam.vu.edu.au/

6 C. VIGNAT AND J.-F. BERCHERTheorem 4.3. (Papathanasiou [4]) If g(X) is a function R n → R m such that, ∀i ∈ [1, m], g i (x)is differentiable **and** var[g i (X)] ≤ ∞, the covariance matrix cov[g(X)] of g(X) verifies:cov[g(X)] ≥ E X[∇ T g(X) ] J −1X E X [∇g(X)] .Now, **in**equality (4.2) simply [ results from the choice g(X) = φ Y (AX), s**in**ce **in** this casecov[g(X)] = J Y **and** E X ∇ T g(X) ] = −J Y A. Note that Papathanasiou’s theorem does notallow us to retrieve **in**equality (4.1).5. CASE OF EQUALITY IN MATRIX FIIWe now explicit the cases of equality **in** both ** inequalities** (4.1)

FISHER INFORMATION INEQUALITIES 7with ˜X = X 0 + MX 1 . S**in**ce A 0 is square **and** **in**vertible, it follows that( )φ Y (Y ) = A −T0 φ ˜X ˜Xso thatφ X = A T φ Y (Y )( )= A T A −T0 φ ˜X ˜X[ ]A=T ( )0M T A T A −T0 φ ˜X ˜X0[ ] ( I=M T φ ˜X)˜X⎡ ( ⎤˜X)φ ˜X= ⎣ ( ) ⎦ .M T φ ˜X ˜XAs X has **in**dependent components, φ X can be decomposed as[ ]φX0 (Xφ X =0 )φ X1 (X 1 )so that f**in**ally⎡ ( ⎤[ ]˜X)φX0 (X 0 )= ⎣φ ˜X( ) ⎦ ,φ X1 (X 1 ) M T φ ˜X ˜Xfrom which we deduce thatφ X1 (X 1 ) = M T φ X0 (X 0 ) .As X 0 **and** X 1 are **in**dependent, this is not possible unless M = 0, which is the equality conditionexpressed **in** Theorem 5.1.Reciprocally, if these conditions are met, then obviously, equality is reached **in** **in**equality(4.1). □Case of equality **in** **in**equality (4.2). Assum**in**g that components of X are mutually **in**dependent,the case of equality **in** **in**equality (4.2) is characterized as follows:Theorem 5.2. Equality holds **in** **in**equality (4.2) if **and** only if each component X i of X verifiesat least one of the follow**in**g conditionsa) X i is Gaussian,b) X i can be recovered from the observation of Y = AX, i.e. X i is ‘extractable’,c) X i corresponds to a null column of A.Proof. Accord**in**g to the (first) proof of **in**equality (4.2), equality holds, as previously, if **and**only if there exists a matrix C such that(5.1) φ Y (Y ) = Cφ X (X),which implies that J Y = CJ X C t . Then, as by assumption J −1Y= AJ −1X At , C = J Y AJ −1Xissuch a matrix. Denot**in**g ˜φ X (X) = J −1X φ X(X) **and** ˜φ Y (Y ) = J −1Yφ Y (Y ), equality (5.1) writes(5.2) ˜φY (Y ) = A ˜φ X (X).The rest of the proof relies on the follow**in**g two well-known results:• if X is Gaussian then equality holds **in** **in**equality (4.2),J. Inequal. Pure **and** Appl. Math., 4(4) Art. 71, 2003 http://jipam.vu.edu.au/

8 C. VIGNAT AND J.-F. BERCHER• if A is a **non** s**in**gular square matrix, equality holds **in** **in**equality (4.2) irrespectively ofX.We thus need to isolate the ‘**in**vertible part’ of matrix A. In this aim, we consider the pseudo**in**verseA # of A **and** form the product A # A. This matrix writes, up to a permutation of rows**and** columns⎡A # A = ⎣ I 0 0⎤0 M 0 ⎦ ,0 0 0where I is the n i × n i identity, M is a n ni × n ni matrix **and** 0 is a n z × n z matrix with n z =n − n i − n ni (i st**and**s for **in**vertible, n i for not **in**vertible **and** z for zero). Remark that n z isexactly the number of null columns of A. Follow**in**g [6, 7], n i is the number of ‘extractable’components, that is the number of components of X that can be deduced from the observationY = AX. We provide here an alternate characterization of n i as follows: the set of solutions ofY = AX is an aff**in**e setX = A # Y + (I − A # A)Z = X 0 + (I − A # A)Z,where X 0 is the m**in**imum norm solution of the l**in**ear system Y = AX **and** Z is any vector.Thus, n i is exactly the number of components shared by X **and** X 0 .The expression of A # A allows us to express R n as the direct sum R n = R i ⊕ R ni ⊕ R z , **and**to express accord**in**gly X as X = [ ]Xi T , Xni, T XzT T. Then equality **in** **in**equality (4.2) can bestudied separately **in** the three subspaces as follows:(1) restricted to subspace R i , A is an **in**vertible operator, **and** thus equality holds withoutcondition,(2) restricted to subspace R ni , equality (5.2) writes M ˜φ(X ni ) = ˜φ(MX ni ) that means thatnecessarily all components of X ni are gaussian,(3) restricted to subspace R z , equality holds without condition.□As a f**in**al note, remark that, although A is supposed full rank, n i ≤ rankA. For **in**stance,consider matrix[ ]1 0 0A =0 1 1for which n i = 1 **and** n ni = 2. This example shows that the notion of ‘extractability’ should notbe confused with the **in**vertibility restricted to a subspace. A is clearly **in**vertible **in** the subspacex 3 = 0. However, such a subspace is irrelevant here s**in**ce, as we deal with cont**in**uous r**and**om**in**put vectors, X has a null probability to belong to this subspace.ACKNOWLEDGMENTThe authors wish to acknowledge Pr. Alfred O. Hero III at EECS for useful discussions **and**suggestions, particularly regard**in**g Theorem 3.1. The authors also wish to thank the referee forhis careful read**in**g of the manuscript that helped to improve its quality.REFERENCES[1] N.M. BLACHMAN, The convolution **in**equality for entropy powers, IEEE Trans. on InformationTheory, IT , 11 (1965), 267–271.[2] W. FELLER, Introduction to Probability Theory **and** its Applications, New York, John Wiley &Sons, 1971.[3] G. GOLUB, Matrix Computations, Johns Hopk**in**s University Press, 1996.J. Inequal. Pure **and** Appl. Math., 4(4) Art. 71, 2003 http://jipam.vu.edu.au/

FISHER INFORMATION INEQUALITIES 9[4] V. PAPATHANASIOU, Some characteristic properties of the **Fisher** ** information** matrix viaCacoullos-type