- Text
- Entropy,
- Inequality,
- Fisher,
- Variables,
- Monotonicity,
- Inequalities,
- Functions,
- Lemma,
- Barron,
- Sums,
- Theorem,
- Stat.yale.edu

The Monotonicity of Information in the Central Limit Theorem and ...

**The** **Monotonicity** **of** **Information** **in** **the** **Central** **Limit** **The**orem **and** Entropy Power Inequalities Mokshay Madiman **and** Andrew Barron Department **of** Statistics Yale University Email: mokshay.madiman , **and**rew.barron @yale.edu Abstract— We provide a simple pro**of** **of** **the** monotonicity **of** **in**formation **in** **the** **Central** **Limit** **The**orem for i.i.d. summ**and**s. Extensions to **the** more general case **of** **in**dependent, not identically distributed summ**and**s are also presented. New families **of** Fisher **in**formation **and** entropy power **in**equalities are discussed. for general collections we have a correspond**in**g **in**equality for **in**verse Fisher **in**formation. Details **of** **the**se results can be found **in** [4]. **The**se **in**equalities are relevant for **the** exam**in**ation **of** monotonicity **in** central limit **the**orems. Indeed, if X 1 **and** X 2 are **in**dependent **and** identically distributed (i.i.d.), **the**n (1) is I. INTRODUCTION equivalent to ( ) X1 + X H √ 2 ≥ H(X 1 ). (5) 2 This fact P n implies that **the** entropy **of** **the** st**and**ardized sums i=1 Y n = √ Xi n **in**creases along **the** powers-**of**-2 subsequence, i.e., H(Y 2 k) is non-decreas**in**g **in** k. Characterization **of** **the** j=1 **in**crease **in** entropy **in** (5) was used **in** pro**of**s **of** central limit **the**orems by Shimizu [5], Barron [6] **and** Johnson **and** Barron [7]. In particular, Barron [6] showed that **the** sequence {H(Y n∑ ( n )} **of** entropies **of** **the** normalized sums converges to **the** P e 2H entropy **of** **the** normal; this, **in**cidentally, is equivalent to **the** j≠i ), Xj (2) convergence to 0 **of** **the** relative entropy (Kullback divergence) i=1 from a normal distribution when **the** X i have zero mean. In 2004, Artste**in**, Ball, Bar**the** **and** Naor [3] (hereafter denoted by ABBN [3]) showed that H(Y n ) is **in** fact a nondecreas**in**g sequence for every n, solv**in**g a long-st**and****in**g conjecture. In fact, (2) is equivalent **in** **the** i.i.d. case to **the** ( n−1 ) ∑ monotonicity property ( P ) ( ) ( e 2H j∈s Xj , (3) X1 + . . . + X n X1 + . . . + X H √ ≥ H √ n−1 ). (6) m−1 s∈Ω m n n − 1 for **the** collection **of** all subsets **of** Note that **the** presence **of** **the** factor n − 1 (ra**the**r than n) **in** **the** denom**in**ator **of** (2) is crucial for this monotonicity. Likewise, for sums **of** **in**dependent r**and**om variables, our **in**equality (3) is equivalent to “monotonicity on average” properties for certa**in** st**and**ardizations; for **in**stance, { ( )} X1 + . . . + X exp 2H √ n ≥ ( 1 n n ) ∑ { exp 2H m s∈Ω m ∑ ( P e 2H j∈s ). Xj (4) s∈S ) . So (4) extends (3). Likewise Let X 1 , X 2 , . . . , X n be **in**dependent r**and**om variables with densities **and** f**in**ite variances, **and** let H denote **the** (differential) entropy. **The** classical entropy power **in**equality **of** Shannon [1] **and** Stam [2] states n∑ e 2H(X1+...+Xn) ≥ e 2H(Xj) . (1) Recently, Artste**in**, Ball, Bar**the** **and** Naor [3] proved a new entropy power **in**equality e 2H(X1+...+Xn) ≥ 1 n − 1 where each term **in**volves **the** entropy **of** **the** sum **of** n − 1 **of** **the** variables exclud**in**g **the** i-th, which is an improvement over (1). Indeed, repeated application **of** (2) for a succession **of** values **of** n yields not only (1) but also a whole family **of** **in**termediate **in**equalities e 2H(X1+...+Xn) ≥ 1 where we write Ω m {1, 2, . . . , n} **of** size m. Below, we give a simplified **and** direct pro**of** **of** (3) (**and**, **in** particular, **of** (2)), **and** also show that equality holds if **and** only if **the** X i are normally distributed (**in** which case it becomes an identity for sums **of** variances). In fact, all **the**se **in**equalities are particular cases **of** a generalized entropy power **in**equality, which we develop **in** [4]. Let S be an arbitrary collection **of** subsets **of** {1, . . . , n} **and** let r = r(S, n) be **the** maximum number **of** subsets **in** S **in** which any one **in**dex i can appear, for i = 1, . . . , n. **The**n e 2H(X1+...+Xn) ≥ 1 r For example, if S consists **of** subsets s whose elements are m consecutive **in**dices **in** {1, . . . , n}, **the**n r = m, whereas if S = Ω m , **the**n r = ( n−1 m−1 A similar monotonicity also holds, as we shall show, when **the** sums are st**and**ardized by **the**ir variances. Here aga**in** **the** ( n−1 m−1) ra**the**r than ( n m) **in** **the** denom**in**ator **of** (3) for **the** unst**and**ardized version is crucial. We f**in**d that all **the** above **in**equalities (**in**clud**in**g (2)) as well as correspond**in**g **in**equalities for Fisher **in**formation can be proved by simple tools. Two **of** **the**se tools, a convolution identity for score functions **and** **the** relationship between Fisher (∑ i∈s X i √ m )}.

- Page 2 and 3: information and entropy (discussed
- Page 4 and 5: For clarity of presentation of idea