# Comparison between two types of large sample covariance matrices

Comparison between two types of large sample covariance matrices

Comparison between two types of large samplecovariance matricesGuangming PanDivision of Mathematical SciencesSchool of Physical and Mathematical SciencesNanyang Technological UniversitySingapore 637371AbstractLet {X ij }, i,j = ··· , be a double array of independent and identically distributed(i.i.d.) real random variables with EX 11 = µ,E|X 11 − µ| 2 = 1 and E|X 11 | 4 < ∞.n∑Considersamplecovariance matrices (with/without empirical centering) S = 1 n(s j −¯s)(s j −¯s) T and S = 1 nn∑s j s T j , where ¯s = 1 nj=1n∑j=1j=1s j and s j = T 1/2n (X 1j ,··· ,X pj ) T with(T 1/2n ) 2 = T n , non-random symmetric non-negative definite matrix. It is proved thatcentral limit theorems of eigenvalue statistics of S and S are different as n → ∞ withp/n approaching a positive constant. Moreover, it is also proved that such a differentbehavior is not observed in the average behavior of eigenvectors.AMS 2000 subject classifications. 15A52, 60F15, 62E20; 60F17.Key words: Central limit theorems, Eigenvectors and eigenvalues, Sample covariancematrix, Stieltjes transform, Strong convergence.Email address: gmpan@ntu.edu.sg1

1 IntroductionLet {X ij }, i,j = ··· , be a double array of independent and identically distributed (i.i.d.)real random variables with EX 11 = µ,E|X 11 − µ| 2 = 1 and E|X 11 | 4 < ∞. Write x j =(X 1j ,··· ,X pj ) T , s j = T 1/2n x j and X n = (s 1 ,··· ,s n ) where (T 1/2n ) 2 = T n , non-randomsymmetric non-negative definite matrix. It is well known that the sample covariance matrixis defined byS = 1 nn∑(s j −¯s)(s j −¯s) T ,j=1where ¯s = 1 nn∑j=1s j = T 1/2n ¯x with ¯x = 1 nn∑x j . It lies at the roots of many methods inj=1multivariate analysis (see [1]). However, the commonly used sample covariance matrix inrandom matrix theory isS = 1 nn∑s j s T j .j=1Since the matrix S has been well studied in the literature (see [5]), a natural question isthat whether the asymptotic results for eigenvalues and (or) eigenvectors of S apply for thematrix S as well. This paper attempts to address it.First, observe thatS = S−¯s¯s T (1.1)and thus by the rank inequality (see Theorem A. 44 in [5]) there is no difference when thelimiting empirical spectral distribution (ESD) of eigenvalues is our concern only. Here theESD of a matrix B is defined byF B (x) = 1 pp∑I(µ i ≤ x), (1.2)i=1where µ 1 ,··· ,µ p are the eigenvalues of B. When F Tn converges weakly to a non-randomdistribution H and p/n → c > 0, Marcenko and Pastur [14], Yin [25] and Silverstein [21]2

and when c < 1λ min (S) a.s.−→ (1− √ c) 2 , (1.7)where λ max (S) and λ min (S) denote, respectively, the maximum and minimum eigenvalues ofS. Moreover, it was proved in [10] that asymptotic distribution of the maximum eigenvaluesof S after centering and scaling is the Tracy-Widom law of order 1 when S has a Wishartdistribution W p (n,I). In this case, note that nS has the same distribution as n−1 ∑x j x T j andhence it is not difficult to prove that the maximum eigenvalues of S and S share the samecentral limit theorem.Thirdly, remarkable central limit theorems for the eigevalues of S were established in [3].Moreover, the constraint that EX11 4 = 3, imposed in [3], was removed in [16]. Since theESDsof S and S have the same limit it seems that they might have the same central limit theoremfor eigenvalues. Surprisingly, a detailed investigation shows they have different central limittheorems. The main reason is that ¯s involved in S contributes to the central limit theoremas well. Formally, letG n (x) = p(F S (x)−F cn,H n(x)),j=1where c n = p n and F c n,H n(x) denotes the function by substituting c n for c and H n for H inF c,H (x).Theorem 1 Suppose that(1) For each n X ij = X n ij,i,j = 1,2,··· , are i.i.d. real random variables with EX 11 =µ,E|X 11 −µ| 2 = 1 and E|X 11 | 4 < ∞.p(2) lim = c ∈ (0,∞).n→∞ n(3) g 1 ,··· ,g k are functions on R analytic on an open region D of the complex plane, whichcontains the real interval[liminfnλ min (T n )I (0,1) (c)(1− √ c) 2 ,limsupλ max (T n )(1+ √ c) 2 ].n4

4) Let T n be a p×p non-random symmetric non-negative definite matrix with spectral normbounded above by a positive constant such that H n = F Tn converges weakly to a non-randomdistribution H.5) Let e i be the vector of size p×1 with the i-th element 1 and others 0. The matrix T n alsosatisfies1nn∑i=1e T i T1/2 n (m(z 1)T n +I) −1 T 1/2n e ie T i T1/2 n (m(z 2)T n +I) −1 T 1/2n e i → h 1 (z 1 ,z 2 )and1nn∑i=1e T i T1/2 n (m(z)T n +I) −1 T 1/2n e ie T i T1/2 n (m(z)T n +I) −2 T 1/2n e i → h 2 (z).Then (∫ g 1 (x)dG n (x),··· , ∫ g k (x)dG n (x) ) convergesweaklyto a Gaussian vector(X g1 ,··· ,X gk ),with meanEX g = − c ∫2πi+ c ∫2πig(z)g(z)∫ m 3 (z)t 2 dH(t)(1+tm(z)) 3) 2dz − c(E(X 11 −µ) 4 ∫−3)2πi(1+tm(z)) 2(1−c ∫ m 2 (z)t 2 dH(t)m 3 (z)h 2 (z)g(z)1−c ∫ dzm 2 (z)t 2 dH(t)(1+tm(z)) 2m(z) ∫ tdH(t)(1+tm(z))(2z 1−c ∫ )dz (1.8)(m(z)) 2 t 2 dH(t)(1+tm(z)) 2and covariance functionCov(X g1 ,X g2 ) = − 12π 2 ∫ ∫g 1 (z 1 )g 2 (z 2 ) dm(z(m(z 1 )−m(z 2 )) 2 1 ) d m(z 2 )dz 1 dz 2dz 1 dz 2− c(E(X 11 −µ) 4 −3)4π 2∫ ∫d 2 [ ]g 1 (z 1 )g 2 (z 2 ) m(z 1 )m(z 2 )h 1 (z 1 ,z 2 ) dz 1 dz 2 . (1.9)dz 1 dz 2The contours in (1.8) and (1.9) are both contained in the analytic region for the functionsg 1 ,··· ,g k and both enclose the support of F cn,H n(x) for large n. Moreover, the contours in5

(1.9) are disjoint.Remark 1 From Theorem 1.1 of [3], Theorem 1.4 of [16] and Theorem 1 we see that functionalsof eigenvalues of S and S have the same asymptotic variances but they have differentasymptotic means. Indeed, the additional term in the asymptotic mean is the last term of(1.8), which can be further written as∫c2πig(z)(zm(z) ∫ tdH(t)(1+tm(z)) 21−c ∫ (m(z)) 2 t 2 dH(t)(1+tm(z)) 2)dz = − 1 π∫ ba[ ]g ′ (x)arg xm(x) dx, (1.10)where m(x) = limz→xm(z) for 0 ≠ x ∈ R. Here one should note that that Theorem 1.1 of [3]did not handle the case when the common mean of the underlying random variables is notequal to zero.Remark 2 When developing central limit theorems, it is necessary to study the limits of theproducts of (E(X 11 − µ) 4 − 3) and the diagonal elements of (S − zI) −1 . For general T n ,the limits of such diagonal elements are not necessary the same. That is why Assumption5 has to be imposed. Roughly speaking, Assumption 5 is some kind of rotation invariance.For example, it is satisfied if T n is the inverse of another sample covariance matrix with thepopulation matrix being the identity. However, when E(X 11 −µ) 4 = 3, Assumption 5 can beremoved as well because such diagonal elements disappear in this case thanks to the specialfourth moment and one may see [3]. Also, when T n is a diagonal matrix Assumption 5 inTheorem 1 is automatically satisfied, as pointed out in [16], and in this case∫h 1 (z 1 ,z 2 ) =t 2 ∫dH(t)((m(z 1 )t+1))((m(z 2 )t+1)) , h 2(z) =t 2 dH(t)(m(z)t+1) 3.6

Particulary, when T n = I we haveEX g = 1 ∫ b [g ′ (x)arg 1− cm2 (x)]dx− 1 2π a (1+m(x)) 2 π− c(E(X 11 −µ) 4 −3)π∫ ba[g(x)I∫ ba[ ]g ′ (x)arg xm(x) dxm 3 (x)(m(x)+1)[(m(x)+1) 2 −cm 2 (x)]]dx,(1.11)wherem(x) = −(x+1−c)+√ (x−a)(b−x)i,2x(see [3]). For some simplified formulas for covariance function, refer to (1.24) in [3] and(1.24) in [16].Finally, let us take a look at the eigenvectors of S and S. The matrix of orthonormaleigenvectors of S has a Haar measure on the orthogonal matrices when X 11 is normallydistributed and T n = I. It is conjectured that for large n and for general X 11 the matrix oforthonormal eigenvectors of S is close to being Haar distributed. Silverstein (1981) createdan approach to address it. Further work along this direction can be found in Silverstein [20],[22], Bai, Miao and Pan [4] and Ledoit and Peche [13]. Here we would also like to point outthat there are similar interests in eigenvectors of large Wigner matrix and one may see [6],[7], [9] and [23]. To consider the matrix S, let U n Λ n Un T denote the spectral decompositionof S, where Λ n = diag(λ 1 , λ 2 ,··· ,λ p ), U n = (u ij ) is an orthogonal matrix consisting ofthe orthonormal eigenvectors of S. Assume that x n ∈ R p ,‖x n ‖ = 1, is an arbitrary nonrandomunit vector and y = (y 1 ,y 2 ,··· ,y p ) T = Un Tx n. Following their approach, we definean empirical distribution function based on eigenvectors and eigenvalues asF S 1 (x) =p∑|y i | 2 I(λ i ≤ x), (1.12)i=1where λ ′ is are eigenvalues of S. Based on the above empirical spectral distribution function,7

wemayfurtherinvestigatecentrallimittheoremsoffunctionsofeigenvaluesandeigenvectors.As will be seen below, the asymptotic properties of the eigenvectors matrix of S are thesame as those of S in terms of this property (note: EX 11 = 0 required for S). However,the advantage of S over S is that the common mean of underlying random variables is notnecessarily zero to keep these types of properties. Unfortunately, this is not the case for S.For example, consider e T 1Se 1 when X ij are i.i.d. with EX 11 = µ and Var(X 11 ) = 1, andT n = I. Then it is a simple matter to show thate T 1 Se 1a.s.−→ 1+µ 2which is dependent on µ, different from Corollary 1 in Bai, Miao and Pan [4] when µ ≠ 0.Indeed, whendealingwithcentrallimittheoremsoffunctionsofeigenvalues andeigenvectors,[20] also proved that EX 11 = 0 is a necessary condition to keep this type of property for thematrix S with T n = I.Formally, letG n1 (x) = √ n(F1 S (x)−F c n,H n(x)).Theorem 2 In addition to Conditions (1), (2) and (4) in Theorem 1 suppose that x n ∈R p 1 = {x ∈ R p , ‖ x ‖= 1} and x T n(T n −zI) −1 x n → m H (z) where m H (z) denotes the Stieltjestransform of H(x). Then, it holds thatF S 1 (x) → F c,H(x) a.s.By Theorem 2, we obtainCorollary 1 Let (S m ) ii ,m = 1,2,··· , denote the i-th diagonal elements of matrices S m .8

Under the conditions of Theorem 2 for x n = e i , then for any fixed m,limn→∞∫∣ (Sm ) ii −x m dF c,H (x)∣ → 0, a.s..Theorem 3 In addition to the assumptions in Theorem 1 and Theorem 2 suppose that asn → ∞√ ∣ ∫∣∣xsup nTn (m Fc,H(z)T n +I) −1 x n −zdH n (t)∣ → 0. (1.13)m Fc,H(z)t+1Then the following conclusions hold.(a) The random vector(∫∫g 1 (x)dG n1 (x),··· ,)g k (x)dG n1 (x)forms a tight sequence.(b) If E(X 11 −µ) 4 = 3, the above random vector converges weakly to a Gaussian vectorX g1 ,··· ,X gk , with zero means and covariance functionCov(X g1 ,X g2 ) = − 12π 2 ∫ ∫(z 2 m(z 2 )−z 1 m(z 1 )) 2g 1 (z 1 )g 2 (z 2 )c 2 z 1 z 2 (z 2 −z 1 )(m(z 2 )−m(z 1 )) dz 1dz 2 , (1.14)where the contours in (1.14) are the same as in those in (1.9).(c) Instead of E(X 11 −µ) 4 = 3, assuming thatmaxi∣ ∣∣e ∗i T 1/2n (zm(z)T n +zI) −1 x n∣ ∣∣ → 0, (1.15)the assertions in (b) still holds.Remark 3 Assumptions (1.13) and (1.15) are imposed for the same reason as given in9

Remark 2. Besides, when T n reduces to the identity matrix, (1.15) becomesmax|x ni | → 0. (1.16)iAs can be seen from (1.14) the asymptotic covariance does not depend on the fourth momentof X 11 and hence we have to impose the condition like (1.16) to ensure that the term involvingthe product of the fourth moment of X 11 and the diagonal elements of (S−zI) −1 convergesto zero in probability. One may see [16] for more details.Remark 4 As pointed out in [4], when T n = I condition (1.13) holds and formula (1.14)becomesCov(X g1 ,X g2 ) = 2 c ( ∫∫g 1 (x)g 2 (x)dF c (x)−∫g 1 (x 1 )dF c (x 1 )g 2 (x 2 )dF c (x 2 )). (1.17)Theorem 1 essentially boils down to the study of the Stieltjes transform of F S (x) whileTheorems 2 and 3 boils down to the study of the Stieltjes transform of F1 S (x). In view of thiswe conjecture the same phenomenon holds for other objects involving F S (x) or F1 S (x). Forexample we believe central limit theorems of the normalized empirical spectral distributionfunctions of S and S are different ([17] considers central limit theorems of the smoothedempirical spectral distribution functions).We conclude this section by stating the structure of this work. Section 2 gives the proofof Theorem 1. Section 3 and Section 4 pick up the proofs of Theorem 2 and 3, respectively.Throughout the paper M and C denote constants which may change from line to line, andall matrices and vectors are denoted by boldface letters.10

2 Proof of Theorem 1 and (1.11)The proof of Theorem 1 is essentially based on the Stieltjes transform following [3]. First,by analyticity of functions it is enough to consider the Stieltjes transforms of the empiricalfunctions of sample covariance matrices. Moreover, note that the Stieltjes transform of theempirical function of S can be decomposed as the sum of the Stieltjes transform of theempirical function of S and some random variable involving sample mean and S. Thus,our main aim is to prove that the extra random variable converges in probability in the Cspace to some nonrandom variable. Once this is finished, Theorem 1 follows from Slutsky’stheorem, the continuous mapping theorem and the results in [3] and [16].Before proceeding we observe the following useful fact. By the structure of S, withoutloss of generality, we can assume EX 11 = 0,EX11 2 = 1 in the course of establishing Theorems1-3 ( but the fourth moment will be E(X 11 −µ) 4 ).2.1 Truncation of underlying random variables and random processesWe begin the proof with the replacement of the underlying random variables X ij with truncatedand centralized variables. To this end, writeS = 1 n (X n −B)(X T n −B T ),where B = (¯s,¯s,··· ,¯s) p×n . Since the argument for (1.8) (and two lines below (1.8)) in [3]can be carried over directly to the present case we can choose a positive sequence ε n suchthatε n → 0, ε −4n E|X 11 | 4 I(|X 11 | ≥ ε n√ n) → 0, εn n 1/4 → ∞. (2.18)11

Let Ŝ = (1/n)(ˆX n − ˆB)(ˆX T n − ˆB T ) where ˆX n and ˆB are respectively obtained from X n andB with the entries X ij replaced by X ij I(|X ij | < ε n√ n). We then obtainP(Ŝ ≠ S) ≤ P( ⋃i≤p, j≤n(|X ij | ≥ ε n√ n))≤ pnP(|X 11 | ≥ ε n√ n)≤ Mε −4n E|X 11| 4 I(|X 11 | ≥ ε n√ n) → 0, (2.19)where and in what follows M denotes a constant which may change from line to line.Define ˜S = (1/n)(˜X n − ˜B)(˜X T n− ˜B T ) where ˜X n and ˜B are respectively obtained from X nand B with the entries X ij replaced by ( ˆX ij −E ˆX ij )/σ n with ˆX √ij = X ij I(|X ij | < ε n n) andσn 2 = E| ˆX ij −E ˆX ij | 2 . Denote by ˜G n (x) and Ĝn(x) the analogues of G n (x) with the matrixS replaced by Ŝ and ˜S respectively. As in [3] (one may also refer to the proof of CorollaryA.42 of [5]) we have for any g(x) ∈ {g 1 (x),··· ,g k (x)},( ∫ g j (x)d˜G n (x)−∫)g j (x)dĜn(x) ≤ Mn∑k=1|λ˜Sk −λŜk|≤ √ M ( [tr (ˆX n − ˆB)−(˜X n − ˜B))((ˆX n − ˜X n )−(ˆB− ˜B)) T ] 1/2 [ )] 1/2tr(˜S + Ŝn= Mσ n (1− 1 [) trσ ˜S] 1/2 [ )] 1/2tr(˜S + Ŝn[ ] 1/2 [ ] 1/2,≤ M(σn 2 −1) nλ max (˜S) nλ max (˜S)+nλ max (Ŝ)where the third step uses the fact that by the structure of (ˆX n − ˆB),1((ˆX n −nˆB)−(˜X n − ˜B))((ˆX n − ˜X ˜B)) Tn )−(ˆB− = σ2n (1− 1 ) 2 ˜S.σ nFrom [25] and ‖T n ‖ ≤ M we obtain limsupλ max (˜S) is bounded by M(1+ √ c) 2 with probabilityone. Similarly, limsupλ max (Ŝ) is bounded by M(1+√ c) 2 with probability one by thenn12

structure of Ŝ and estimate (2.20) below. Moreover it is not difficult to prove thatσ 2 n −1 = o(ε 2 nn −1 ). (2.20)Summarizing the above we obtain∫∫g(x)dĜn(x)−g(x)d˜G n (x) i.p.−→ 0.This, together with (2.19), implies that∫∫g(x)dG n (x)−g(x)d˜G n (x) i.p.−→ 0.Therefore, in what follows, we may assume that|X ij | ≤ √ nε n , EX ij = 0, E|X ij | 2 = 1 (2.21)and for simplicity we shall suppress all subscripts or superscripts on the variables.As in [3], by Cauchy’s formula, (1.6) and (1.7), with probability one for all n large,∫g(x)dG n (x) = − 1 ∫2πi()g(z) tr(S −zI) −1 −nm cn (z) dz, (2.22)where m cn (z) is obtained from m(z) with c replaced by c n and the complex integral is over C.The contour C is specified below. Let v 0 > 0 be arbitrary and set C u = {u+iv 0 ,u ∈ [u l ,u r ]},where u r > limsupλ max (T n )(1+ √ c) 2 and 0 < u l < liminfλ min (T n )I (0,1) (c)(1− √ c) 2 or u lnnis any negative number if liminfλ min (T n )I (0,1) (c)(1− √ c) 2 = 0. Then definenC + = {u l +iv : v ∈ [0,v 0 ]}∪C u ∪{u r +iv : v ∈ [0,v 0 ]}13

andlet C − bethe symmetric part of C + about thereal axis. Then set C = C + ∪C − . Moreover,sincetr(S −zI) −1 = trA −1 (z)+ ¯sT A −2 (z)¯s1−¯s T A −1 (z)¯s ,we have() [ ]tr(S −zI) −1 −nm cn (z) = trA −1 (z)−nm cn (z) + ¯sT A −2 (z)¯s1−¯s T A −1 (z)¯s , (2.23)where A −1 (z) = (S−zI) −1 . The first term on the right hand above was investigated in [3].We next consider the second term on the right hand in (2.23). To this end, introduce atruncation version of ¯s T A −2 (z)¯s. Define C r = {u r +iv : v ∈ [n −1 ρ n ,v 0 ]},⎧⎪⎨ {u l +iv : v ∈ [n −1 ρ n ,v 0 } if u l > 0C l =,⎪⎩ {u l +iv : v ∈ [0,v 0 ]} if u l < 0whereρ n ↓ 0, ρ n ≥ n −α , for some α ∈ (0,1). (2.24)Let C + n = C l ∪C u ∪C r and C − n denote the symmetric part of C + n with respect to the real axis.We then define the truncated process ̂ ¯sT A −2 (z)¯s of the process ¯s T A −2 (z)¯s for z = u+iv by⎧¯s T A −2 (z)¯sfor z ∈ C n = C n + ⎪⎨∪C− n¯s T Â−2 (z)¯s = nv+ρ n2ρ n¯s T A −2 (z r1 )¯s+ ρn−nv2ρ n¯s T A −2 (z r2 )¯s for u = u r ,v ∈ [−n −1 ρ n ,n −1 ρ n ]⎪⎩nv+ρ n2ρ n¯s T A −2 (z l1 )¯s+ ρn−nv2ρ n¯s T A −2 (z l2 )¯s for u = u l > 0,v ∈ [−n −1 ρ n ,n −1 ρ n ],where z r1 = u r −in −1 ρ n ,z r2 = u r +in −1 ρ n ,z l1 = u l −in −1 ρ n and z l2 = u l +in −1 ρ n . Similarlyone may define the truncation version ̂ ¯sT A −1 (z)¯s of ¯s T A −1 (z)¯s.14

From Theorem 2 in [15] we have‖¯s T¯s‖ ≤ ‖T n ‖‖¯x T¯x‖ i.p.≤ M‖¯xT¯x‖ −→ Mc (2.25)Note that¯s T (S −zI) −1¯s = ¯s T A −1 (z)¯s+ (¯sT A −1 (z)¯s) 21−¯s T A −1 (z)¯s ,where we use the identityr T (D+arr T ) −1 =r T D −11+ar T D −1 r , (2.26)where D and D+arr T are both invertible, r ∈ R p and a ∈ R. This implies that11−¯s T A −1 (z)¯s = 1+¯sT (S −zI) −1¯s. (2.27)We then conclude that1( M|1−¯s T A −1 (z)¯s | ≤ 1+ M+v 0 |λ max (S)−u r | + M)¯s T¯s. (2.28)|λ min (S)−u l |This, together with (2.25), (1.6) and (1.7), ensures that∫∣( ¯s T Âg(z)−2 (z)¯s1−¯s TÂ−1 (z)¯s − ¯sT A −2 (z)¯s)dz∣ (2.29)1−¯s T A −1 (z)¯s∫ ( (¯s T Â≤ ∣ g(z)−2 (z)¯s−¯s T A −2 (z)¯s))dz∣1−¯s TÂ−1 (z)¯s∫(¯s T A −2 (z)¯s(¯s TÂ−1 (z)¯s−¯s T A −1 (z)¯s))+ ∣ g(z)dz∣(1−¯s T A −1 (z)¯s)(1−¯s TÂ−1 (z)¯s)15

≤ [(¯sT¯s) 2 +(¯s T¯s) 4 ]Mρ(n 1n |λ max (S)−u r | + 1|λ min (S)−u l |converging to zero in probability.)(1|λ max (S)−u r | + 1),|λ min (S)−u l |The aim of subsequent sections 2.2-2.4 is to prove convergence of ¯s TÂ−j (z)¯s,j = 1,2,in probability to nonrandom variables in the space of continuous functions for z ∈ C. It iswell-known that convergence in probability is equivalent to convergence in distribution whenthe limiting variable is nonrandom (for example see page 25 of [8]). Thus, instead, we shallprove convergence of¯s TÂ−j (z)¯s,j = 1,2, in distribution to nonrandom variables in the spaceof continuous functions for z ∈ C.2.2 Convergence of finite dimensional distributions of ¯s T A −j (z)¯s−E(¯s T A −j (z)¯s),j = 1,2.Consider z ∈ C u and ¯s T A −2 (z)¯s−E(¯s T A −2 (z)¯s) first. Define the σ-field F j = σ(s 1 ,··· ,s j ),andletE j (·) = E(·|F j )andE 0 (·)betheunconditionalexpectation. Introduce¯s j = ¯s−n −1 s j ,A −1j (z) = (S−n −1 s j s T j −zI) −1 , γ j (z) = 1 n sT j A −1j (z)s j −E 1 n trA−1 j (z)T n ,β j (z) =1, b1+ 1 1 (z) =n sT j A−1 j (z)s j1.1+ 1 n EtrA−1 1 (z)T nWrite¯s T A −2 (z)¯s−E(¯s T A −2 (z)¯s) =n∑E j (¯s T A −2 (z)¯s)−E j−1 (¯s T A −2 (z)¯s)j=1=n∑ [](E j −E j−1 ) ¯s T A −2 (z)¯s−¯s T j A−2 j (z)¯s jj=1n∑]= (E j −E j−1 )[d n1 +d n2 +d n3 ,j=116

whered n1 = (¯s−¯s j ) T A −2 (z)¯s, d n2 = ¯s T j (A −2 (z)−A −2j (z))¯s,andd n3 = ¯s T j A −2j (z)(¯s−¯s j ).Consider d n3 first. It is proved in (3.12) of [15] thatE|x T 1 B¯x 1| k = O(n k 2 −2 ε k−4n ), for k ≥ 4, (2.30)where x 1 and ¯x j = ¯x − n −1 x j are independent of B with a bounded spectral norm. Thisimplies thatE|s T 1A −2j (z)¯s 1 | k = E|x T 1T 1/2n A −2j (z)T 1/2n ¯x 1| k = O(n k 2 −2 ε k−4n ), for k ≥ 4, (2.31)where yieldsn∑i.p.(E j −E j−1 )d n3 −→ 0.j=1Furthermore, writed n1 = d (1)n1 +d(2) n1 +d(3) n1 +d(4) n1 ,whered (1)n1 = 1 n sT j A−2 j (z)¯s j , d (2)n1 = 1 n 2sT j A−2 j (z)s j ,andd (3)n1 = 1 n sT j (A −2 (z)−A −2j (z))¯s j , d (4)n1 = 1 n 2sT j (A −2 (z)−A −2j (z))s j .From (2.31) we haven∑j=1(E j −E j−1 )d (1)n1i.p.−→ 0.17

By Lemma 2.7 in [3] and (2.21)n −k E|s T 1B(z)s 1 −trBT| k = O(ε 2k−4n n −1 ), k ≥ 2. (2.32)This givesE∣n∑(E j −E j−1 )d (2)n1∣ 2 = 1 n∑E|(En 4 j −E j−1 )(s T j A−2 j (z)s j −trT n A −2j (z))| 2 = O(n −2 ).j=1j=1Since ‖A −2 (z)‖ ≤ 1/v 2 we obtainNote thatE∣n∑(E j −E j−1 )d (4)n1∣ 2 ≤ M n∑E|‖sn 4 v 4 j ‖ 4 ≤ M n .j=1Applying it we may writej=1A −1 (z)−A −1j (z) = − 1 n A−1 j (z)s j s T j A−1 j (z)β j (z). (2.33)d (3)n1 = − 1 n 2sT j A −1j (z)s j s T j A −2j (z)¯s j β j (z)+ 1 n 3sT j A −1j (z)s j s T j A −2j (z)s j s T j A −1j (z)¯s j β 2 j(z)− 1 n 2sT j A−2 j (z)s j s T j A−1 j (z)¯s j β j (z).It follows from (2.31), (2.32) and the fact that |β j (z)| ≤ 1/v thatE∣n∑(E j −E j−1 )d (3)∣ 2 ≤ M n .j=1n1So far we have proved thatn∑i.p.(E j −E j−1 )d n1 −→ 0.j=118

As in dealing with d n1 , one may verify thatn∑i.p.(E j −E j−1 )d n2 −→ 0.j=1Summarizing the above we have obtained¯s T A −2 (z)¯s−E(¯s T A −2 (z)¯s) i.p.−→ 0.Applying the argument for ¯s T A −2 (z)¯s to ¯s T A −1 (z)¯s yields¯s T A −1 (z)¯s−E(¯s T A −1 (z)¯s) i.p.−→ 0.2.3 Tightness of (¯s T A −j (z)¯s−E(¯s T A −j (z)¯s)),j = 1,2.ThissectionistoprovetightnessofK (j)n (z),j = 1,2whereK (j)n (z) = ¯s T A −j (z)¯s−E(¯s T A −j (z)¯s).We shall use Theorem 12.3 of [8]. In what follows, we consider the case j = 2 only and thecase j = 1 can be handled similarly, even simpler. As pointed out in [3], condition (i) ofTheorem 12.3 of [8] can be replaced with the assumption of tightness at any point in theinterval. From (3.18) of [15] we see thatE|¯s T A −2 (z)¯s| 2 ≤ M, (2.34)which ensures condition (i) of Theorem 12.3 of [8]. We next verify condition (ii) of Theorem12.3 of [8] by provingE|K n (2) (z 1 )−K n (2) (z 2 )| 2sup< ∞. (2.35)n,z 1 ,z 2 ∈∈C n|z 1 −z 2 | 2Below we only prove the above inequality on C + n and the remaining cases can be verifiedsimilarly.19

Since the truncation steps are the same as those in [25] we haveP(‖λ max (S)‖ ≥ ζ r ) = o(n −l ), P(λ min (S) ≤ ζ l ) = o(n −l ), (2.36)for any l (see (1.9a) and (1.9b) in [3]), where limsupλ max (T n )(1+ √ c) 2 < ζ r and 0 < ζ l

anda n3 = ¯s T j A −2j (z 1 )A −1j (z 2 )(¯s−¯s j ).It is straightforward to check that E‖s T j ‖4 = E(s T j s j) 2 = O(n 2 ). As in (3.17) of [15] one canprove thatE|¯s T¯s| k = E|¯x T T n¯x| k ≤ ME|¯x T¯x| k = O(1), for k ≥ 4. (2.38)This, together with (2.37), implies thatE∣≤ M n 2 n∑j=1n∑(E j −E j−1 )(a n1 ) ∣ 2 ≤ M n∑E‖s Tn 2 j A −2 (z 1 )A −1 (z 2 )¯s‖ 2j=1This argument also works for a n3 .whereWe further writej=1(E‖s T j ‖4 ) 1/2 (E‖A −2 (z 1 )‖ 12 E‖A −1 (z 2 )‖ 12 E‖¯s‖ 12 ) 1/6≤ M.a n2 = a (1)n2 +a(2) n2 +a(3) n2 ,a (1)n2= ¯s T j (A−1 (z 1 )−A −1j (z 1 ))A −1 (z 1 )A −1 (z 2 )¯s= − 1 n¯sT j A−1 j (z 1 )s j s T j A−1 j (z 1 )A −1 (z 1 )A −1 (z 2 )¯sβ j (z)a (2)n2 = ¯s T j A−1 j (z 1 )(A −1 (z 1 )−A −1j (z 1 ))A −1 (z 2 )¯s= − 1 n¯sT j A−2 j (z 1 )s j s T j A−1 j (z 1 )A −1 (z 2 )¯sβ j (z)21

anda (3)n2= ¯s T j A −2j (z 1 )(A −1 (z 2 )−A −1j (z 2 ))¯s= − 1 n¯sT j A−2 j (z 1 )A −1j (z 2 )s j s T j A−1 j (z 2 )¯sβ j (z).In the second equality of each a (j)n2,j = 1,2,3 above we also use (2.33). Note that (one maysee the remark to (3.2) in [3])|β j (z)| = |1−n −1 s T j A−1 (z)s j | ≤ 1+Mη r +Mn 2+α I(‖S‖ ≥ η r or λ min (S) ≤ η l ), (2.39)where limsupnAlso we haveλ max (T n )(1+ √ c) 2 < η r < u r and u l < η l < limsupλ min (T n )I (0,1) (c)(1− √ c) 2 .nn −1/2 |s T 1A −11 (z 1 )A −1 (z 1 )A −1 (z 2 )¯s| ≤ n −1/2 ‖s T 1‖‖¯s‖‖A −11 (z 1 )A −1 (z 1 )A −1 (z 2 )‖≤ Mη r ‖¯s‖+Mn 6+4α I(‖S‖ ≥ η r or λ min (S 1 ) ≤ η l ), (2.40)where we use (2.24) and the fact that n −1/2 ‖s 1 ‖ 2 is smaller than η r if ‖S‖ < η r , andn −1/2 ‖s 1 ‖ 2 ≤ n otherwise. Here S 1 = S−n −1 s 1 s T 1 . One should note that (2.31) still holdswhen B isreplaced by A −1j (z) or A −2j (z) due to (2.37). Wethen conclude from(2.36), (2.39),(2.31), (2.38) and (2.40) thatE∣n∑(E j −E j−1 )(a (1)∣ 2 ≤ Mj=1n2)n∑j=1E|a (1)n2| 2( ) 1/2≤ M E|¯s T j A−1 j (z 1 )s j | 4 E‖¯s‖ 4 +Mn 20+12α P(‖S‖ ≥ η r or λ min (S 1 ) ≤ η l ) ≤ M.This argument apparently applies to the terms a (j)n2,j = 2,3. Therefore (2.35) is satisfiedand K (2)n (z) is tight.22

2.4 Convergence of E(¯s T A −2 (z)¯s) and E(¯s T A −1 (z)¯s)The aim in this section is to find the limits of E(¯s T A −2 (z)¯s) and E(¯s T A −1 (z)¯s). In whatfollows, all bounds including O(·) and o(·) expressions hold uniformly on C n .n∑Consider E(¯s T A −2 (z)¯s) first. Applying ¯s = 1 sn j , (2.26) and (2.33) we obtain[ ]E ¯s T A −2 (z)¯s = 1 nn∑j=1j=1[ ]E s T j A −1j (z)A −1 (z)¯sβ j (z)= 1 nn∑j=1[E s T j A−2 j (z)¯sβ j (z)]− 1 n 2 n∑j=1= q n1 +q n2 +q n3 +q n4 ,[]E s T j A−2 j (z)s j s T j A−1 j (z)¯sβ j 2 (z)whereq n1 = 1 nn∑j=1[E s T j A−2 j (z)¯s j β j (z)], q n3 = − 1 n 2 n∑j=1[]E s T j A−2 j (z)s j s T j A−1 j (z)¯s j βj 2 (z)andq n2 = 1 n 2 n∑Usingj=1[E s T j A−2 j (z)s j β j (z)], q n4 = − 1 n 3 n∑j=1[E s T j A−2 j (z)s j s T j A−1 j (z)s j βj ].2 (z)β j (z) = b 1 (z)−β j (z)b 1 (z)γ j (z) (2.41)we further obtainq n1 = − 1 nn∑j=1[]E s T j A −2j (z)¯s j β j (z)b 1 (z)γ j (z) .Moreover, it is proved in [3] (see (3.5) and three lines below it in [3]) thatE|γ 1 (z)| k ≤ Mn −1 ε 2k−4n , E|β 1 (z)| k ≤ M, k ≥ 2, |b 1 (z)| ≤ M. (2.42)23

We then conclude from (2.39), (2.31), (2.40) and (2.42) that() 1/2|q n1 | ≤ M E|s T 1A −21 (z)¯s 1 | 2 E|γ 1 (z)| 2 +Mn 7+4α P(‖S‖ ≥ η r or λ min (S 1 ) ≤ η l ) ≤ √ M . nApplying a similar approach one can prove that(2.43)|q n3 | ≤ M √ n.By (2.41) we have[ 1]q n2 = b 1 (z)En trA−2 1 (z)T n + b 1(z)n 2n∑j=1[ ]E s T j A −2j (z)s j β j (z)γ j (z)[ 1]= b 1 (z)En trA−2 1 (z)T n +O( √ 1 ), (2.44)nwhere the last step uses the fact that the absolute value of the second term of q n2 is boundedby M/ √ n by the same approach as that used in (2.43).From (2.41) we may write[ 1] [ 1]q n4 = −b 2 1 (z)E n trA−1 1 (z)T n En trA−2 1 (z)T n +q (1)n4 +q(2) n4 +q(3) n4 ,whereandq (3)q (1)n4 = − 1 n 3 n∑q (2)j=1n4 = 2 n 3 n∑n4 = − 1 n 2 n∑j=1j=1[]E s T j A −2j (z)s j s T j A −1j (z)s j βj(z)b 2 2 1(z)γj(z)2 ,[]E s T j A−2 j (z)s j s T j A−1 j (z)s j β j (z)b 2 1 (z)γ j(z)[]E (n −1 s T j A −2j (z)s j −n −1 EtrA −2j (z))γ j (z)b 2 1(z) .24

By (3.2) in [3], as in (2.43), we may prove that |q (k)n4| ≤ M/ √ n,k = 1,2,3. It follows fromthat[ 1] [ 1]q n4 = −b 2 1 (z)E n trA−1 1 (z)T n En trA−2 1 (z)T n +O( √ 1 ). (2.45) nThis, together with (2.44), (2.43), (2.33) and (2.42), yields[ ][ 1]E ¯s T A −2 (z)¯s = q n1 +q n2 +q n2 +q n4 = b 2 1(z)En trA−2 (z)T n +O( √ 1 ). (2.46) nIt was proved in Section 4 of [3] (see three lines above (4.14) there) thatConsider Eb 1 (z)+zm(z) −→ 0. (2.47)[1n trA−2 (z)T n]now. To this end we need formula (4.13) in [3], which statesA −1 (z) = −H −1 (z)+b 1 (z)A(z)+B(z)+C(z), (2.48)whereH −1 (z) = (zI−b 1 (z)T n ) −1 , A(z) =n∑j=1H −1 (z)(s j s ∗ j −n −1 T n )A −1j (z),B(z) =n∑j=1(β j (z)−b 1 (z))H −1 (z)s j s ∗ j A−1 j (z)andC(z) = n −1 b 1 (z)H −1 (z)T nn∑j=1β j (z)A −1j (z)s j s ∗ j A−1 j (z).It was proved in (4.15) and (4.16) of [3] that1n trB(z)M ≤ Cn−1/2 (E‖M‖ 4 ) 1/4 ,1n trC(z)M ≤ Cn−1 (E‖M‖ 4 ) 1/4 . (2.49)25

Taking the matrix M = I in (4.17)-(4.19) of [3] yieldsEn −1 A 1 (z) = −b 1 (z)(En −1 trA −1 (z)T n A −1 (z)T n )(En −1 trA −1 (z)H −1 (z)T n )+o(1). (2.50)From (2.48)-(2.49) we have]En −1 trA −1 (z)H −1 (z)T n = n −1 tr[(−H −1 (z)+EB(z)+EC(z))H −1 (z)T n= − c nz 2 ∫tdH n (t)+o(1). (2.51)(1+tEm n )2It follows from (2.48)-(2.50) that[ 1] [ 1]En trA−2 (z)T n = −En trA−1 (z)H −1 (z)T n[]−b 2 1 (z)E n −1 trA −1 (z)T n A −1 (z)T n]E[n −1 trA −1 (z)H −1 (z)T n +o(1).By (4.22) of [3] we obtain]E[n −1 trA −1 (z)T n A −1 (z)T n =c n∫ t 2 dH n(t)z 2 (1+tEm n ) 21−c n∫ (Emn ) 2 t 2 dH n(t)(1+tEm n ) 2+o(1).This, together with (2.51), yields[ 1]En trA−2 (z)T n =c n∫tdH n(t)z 2 (1+tEm n ) 21−c n∫ (Emn ) 2 t 2 dH n(t)(1+tEm n ) 2+o(1).It was proved in (4.1) of [3] thatsup |Em n (z)−m(z)| → 0.z∈C n26

This, (2.46) and (2.47) ensure that[ ]E ¯s T A −2 (z)¯s = cm2 (z) ∫ tdH(t)(1+tm) 21−c ∫ (m) 2 t 2 dH(t)(1+tm) 2+o(1).Consider E(¯s T A −1 (z)¯s) next. As in dealing with the term E(¯s T A −2 (z)¯s), writeE(¯s T A −1 (z)¯s) = 1 nn∑j=1[E s T j A−1 j (z)¯s j β j (z)]+ 1 n 2 n∑j=1[ ]E s T j A−1 j (z)s j β j (z) .By (2.41) and (2.42) the absolute value of the fist term on the right hand side of the aboveequality converges to zero as n → ∞. From (2.41), (2.42) and (2.47) the above second termbecomesThereforeb 1 (z)E 1 n trA−1 1 (z)T n = 1−b 1 (z) → 1+zm(z).E(¯s T A −1 (z)¯s) = 1−b 1 (z) → 1+zm(z). (2.52)2.5 Convergence of ¯s T A −2 (z)¯s/(1−¯s T A −1 (z)¯s)From sections 2.2 to 2.4 we see that after centering the stochastic processes ̂ ¯sT A −2 (z)¯s and̂ ¯s T A −1 (z)¯s converge in distribution in the C space, the space of continuous functions, to zerofor z ∈ C. This implies thatsupz∈C∣ ¯sT Â−2 (z)¯s− cm2 (z) ∫ tdH(t)(1+tm(z)) 21−c ∫ (m(z)) 2 t 2 dH(t)(1+tm(z)) 2∣ ∣∣ i.p.−→ 0and thatsupz∈C∣ ¯sT Â−1 ∣(z)¯s−1−zm(z) −→ 0. (2.53)∣ i.p.27

Thus we conclude from (2.25), (2.28) and (2.59) below thatsupz∈C∣̂ ¯s T A −2 (z)¯s1− ̂ ¯s T A −1 (z)¯scm(z) ∫ tdH(t)(1+tm(z))+ (2z 1−c ∫ (m(z)) 2 t 2 dH(t)∣) ∣ i.p.(1+tm(z)) 2−→ 0. (2.54)Theorem 1 follows from Lemma 1.1 of [3], the argument of Theorem 1.4 of [16], Theorems4.4 and 5.1 of [8], (2.22), (2.23), (2.29) and (2.54). Formulaes (1.8) and (1.9) follow from(1.18) and (1.19) of [16].2.6 Derivations of (1.10) and (1.11)We now verify (1.10) and (1.11). Consider (1.10) first. As in [3], we select the contour to bethe rectangle with sides parallel to the axes. It intersects the real axis at a 1 ≠ 0 and b 1 ( thesupport of F c,H is a subset of (a 1 ,b 1 )), and the horizontal sides are a distance v away fromthe real axis. We let v → 0. By (1.3) we obtaindm(z)dz=m 2 (z)1−c ∫ .(m(z)) 2 t 2 dH(t)(1+tm(z)) 2This, together with (1.3) and integration by parts, ensures that∫12πi= 1 ∫2πig(z) (zg(z) d dzcm(z) ∫ tdH(t)(1+tm(z)) 21−c ∫ (m(z)) 2 t 2 dH(t)(1+tm(z)) 2)(Log(zm(z)))dzdz = − 12πi∫g ′ (z)Log(zm(z))dz, (2.55)where Log denotes any branch of the logarithm. By the fact that |m(z)| ≤ 1/v and (5.1)in [3] the integrals on the two vertical sides in (2.55) are bounded in absolute value by28

Mvlogv −1 , which converges to zero. Therefore the integral in (2.55) equals− 1 π∫ b1a 1g ′ i(x+iv)log|(x+iv)m(x+iv)|dx− 1 π∫ b1a 1g ′ r(x+iv)arg[ ]−(x+iv)m(x+iv) dx. (2.56)By the fact that |m(z)| ≤ 1/v, (5.6) and (5.1) in [3] the first term in (2.56) is bounded inabsolute value by Mvlogv −1 , converging to zero. Therefore we conclude from the dominatedconvergence theorem that− 1 ∫2πig ′ (z)Log(zm(z)dz → − 1 π∫[ ]g ′ (x)arg xm(x) dx. (2.57)Consider (1.11) now. Keep in mind that T n = I in this case. We select the same contourfor evaluating (1.11) as that for (1.10) but with a 1 and b 1 replaced by a and b (a and b aredefined in the introduction). The simplified formula for the first term on the right hand of(1.8) is given in [3]. By remark 2 the second term on the right hand of (1.8) then becomes− c(EX4 11 −3)2πiTo calculate (2.58), solving (1.3) gives∫ g(z)m 3 (z)(1+m(z)) 31−c m2 (z)(1+m(z)) 2 dz. (2.58)m(z) = −(z +1−c)+√ (z −1−c) 2 −4c.2zIt follows that |m(z)| and |m(z)| are bounded on the contour C. Hence by (1.3) and (1.5)1| | = |1−c−czm(z)| ≤ M (2.59)1+cm(z)and by (1.3)cm(z)| | = |1+zm(z)| ≤ M.1+m(z)29

One can verify that√m 2 (z) (z1−c(1+m(z)) = −1−c)2 −4c2 2c(1+cm(z)),cm(z)where we use (1.5) and (1.3). Note that| √ (z −1−c) 2 −4c| = | √ (z −a)(z −b)|and that∫ ba1√(x−a)(b−x)dx < ∞. (2.60)We then conclude from (2.59)-(2.60) that the integrals on the vertical lines in (2.58) arebounded in absolute value byMv∫ ba1√(x−a)(b−x)dx,converging to zero. The integral on the two horizontal lines equals− c(EX4 11 −3)π∫[m 3 (z)](1+m(z))g i (z)Re3dz − c(EX4 11 −3) ∫1−c m2 (z)π(1+m(z)) 2[m 3 (z)](1+m(z))g r (z)I3dz,1−c m2 (z)(1+m(z)) 2(2.61)where g i ′(x + iv) and g′ r (x + iv) denote, respectively, the imaginary part and real part ofg ′ (x+iv) and Re(·) and I(·) also denote, respectively, the imaginary part and real part ofthe corresponding function. It follows from (5.6) in [3] and (2.59)-(2.60) that the first termin (2.61) is bounded in absolute value byMv∫ ba1√(x−a)(b−x)dx,converging to zero. Applying the generalized dominated convergence theorem and (2.59)-30

(2.60) to the second term in (2.61) yields the third term in (1.11).3 Proof of Theorem 2Proof. Let ‖·‖ denote the spectral norm of matrices or the Euclidean norm of vectors. Letz = u+iv,v > 0. For K > 0, let ˜X n = ( ˜Xn∑ij ), ¯˜s = 1 ˜sn j and ˜S = 1 ˜X n n˜XT n −¯˜s¯˜s T , where˜X ij = X ij I(|X ij | ≤ K)−EX ij I(|X ij | ≤ K) and ˜s j = ( ˜X 1j ,··· , ˜X pj ) T . Note thatj=1|x T n (S −zI)−1 x n −x T n (˜S −zI) −1 x n | ≤ ‖x n ‖ 2 ‖(S −zI) −1 −(˜S −zI) −1 ‖≤ 1 v 2‖1 n X nX T n − 1 n ˜X n˜XT n ‖+ 1 v 2‖¯s¯sT −¯˜s¯˜s T ‖.Moreover, itisprovenin[4]that‖ 1 n X nX T n− 1 n ˜X n˜XT n ‖canbearbitrarysmallifK issufficientlylarge. Thus, it suffices to investigate ‖(¯s−¯˜s)(¯s−¯˜s T )‖ and‖(¯s−¯˜s)¯˜s T ‖.Define ŝ j = Tn 1/2 (X 1j − ˜X 1j ,··· ,X pj − ˜X pj ) T and then obtain‖(¯s−¯˜s)(¯s−¯˜s) T ‖ = 1 n∑(ŝ Tn 2 j ŝj −E(ŝ T j ŝj))+ 1 n∑ŝ Tn 2 j 1 ŝ j2 + 1 n∑E(ŝ Tn 2 j ŝj). (3.1)j=1j 1 ≠j 2 j=1SinceE| 1 n∑(ŝ Tn 2 j ŝj −E(ŝ T j ŝj))| 2 = 1 n∑E|(ŝ Tn 4 j ŝj −E(ŝ T j ŝj))| 2 ≤ M (3.2)n 2j=1j=1n∑we have 1 (ŝ T n 2 j ŝj −Eŝ T jj=1ŝj)a.s.−→ 0 by the Borel-Cantelli Lemma. Obviously 1 n 2 n∑j=1E(ŝ T j ŝj)31

can be arbitrarily small by choosing K large sufficiently. A direct calculation indicates thatE| 1 n∑ŝ Tn 2 j 1 ŝ j2 | 4 = 1 ∑E[ŝ Tn 8 j 1 ŝ j2 ŝ T j 3 ŝ j4 ŝ T j 5 ŝ j6 ŝ T j 7 ŝ j8 ]j 1 ≠j 2 j 1 ≠j 2 ,j 3 ≠j 4 ,j 5 ≠j 6 ,j 7 ≠j 8≤ M ∑E|ŝ Tn 8 j 1 ŝ j2 | 4 + M ∑E|ŝ Tn 8j 1 ŝ j2 ŝ T j 2 ŝ j3 ŝ T j 3 ŝ j4 ŝ T j 4 ŝ j1 |j 1 ≠j 2 j 1 ≠j 2 ,j 2 ≠j 3 ,j 3 ≠j 4 ,j 4 ≠j 1+ M ∑E|(ŝ Tn 8 j 1 ŝ j2 ) 2 ŝ T j 2 ŝ j3 ŝ T j 3 ŝ j1 |+ M ∑E|(ŝ Tn 8 j 1 ŝ j2 ) 2 (ŝ T j 2 ŝ j3 ) 2 |j 1 ≠j 2 ,j 2 ≠j 3 ,j 3 ≠j 1 j 1 ≠j 2 ,j 3 ≠j 2+ M ∑E|ŝ Tn 8 j 1 ŝ j2 | 2 E|ŝ T j 3 ŝ j4 | 2 = O( 1 n2). (3.3)j 1 ≠j 2 ,j 3 ≠j 4Therefore ‖(¯s−¯˜s)(¯s−¯˜s T )‖ can be arbitrary small with probability one by choosing K largesufficiently. Likewise, one can also verify that ‖(¯s−¯˜s)¯˜s T ‖ can be arbitrary small by choosingK large sufficiently. The re-scaling of ˜X ij can be treated similarly, because lim E| ¯X 11 | 2 = 1.n→∞Hence, in what follows, we may assume |X ij | ≤ K,EX 11 = 0 and E|X 11 | 2 = 1 (for simplicity,suppressing all super- or sub-scripts on the variables X ij ).Recalling A −1 (z) = (S−zI) −1 , it is observed thatx T n (S −zI)−1 x n = x T n A−1 (z)x n + xT n A−1 (z)¯s¯s T A −1 (z)x n. (3.4)1−¯s T A −1 (z)¯sTo prove Theorem 2, according to the argument of Theorem 1 in [4] it is sufficient to showthatTo this end, we first show thatx T nA −1 (z)¯s¯s T A −1 (z)x n1−¯s T A −1 (z)¯sa.s.−→ 0. (3.5)x T nA −1 (z)¯s−E(x T nA −1 (z)¯s) a.s.−→ 0. (3.6)32

We use the same notation as in the proof of Theorem 1. Writex T nA −1 (z)¯s−E(x T nA −1 (z)¯s) ==n∑E j (x T nA −1 (z)¯s)−E j−1 (x T nA −1 (z)¯s)j=1n∑(E j −E j−1 )(x T n A−1 (z)¯s−x T n A−1 j (z)¯s j ). (3.7)j=1Furthermore, we havex T n A−1 (z)¯s−x T n A−1 j (z)¯s j = c n1 +c n2 +c n3 , (3.8)where, via (2.33),c n1 = x T n (A−1 (z)−A −1j (z))(¯s−¯s j ) = − 1 n 2xT n A−1 j (z)s j s T j A−1 j (z)s j β j (z),andc n2 = x T n (A−1 (z)−A −1j (z))¯s j = − 1 n xT n A−1 j (z)s j s T j A−1 j (z)¯s j β j (z)c n3 = x T n A−1 j (z)(¯s−¯s j ) = 1 n xT n A−1 j (z)s j .Observe that |β j (z)| ≤ |z|/v and |b j (z)| ≤ |z|/v (see (3.4) in [2]). Using an argumentsimilar to (3.19) of [15] one can prove thatE|x T n A−1 j (z)s j | 4 = O(1). (3.9)By the Burkholder inequality and the fact that |s T j A−1 j (z)s j /n| ≤ M/v we haveE|j=1≤ M n 3 n∑n∑(E j −E j−1 )c n1 | 4 ≤ M [ n∑n 4E | 1 ] 2n xT nA −1j (z)s j s T j A −1j (z)s j | 2j=1j=1E|x T n A−1 j (z)s j | 4 = O( 1 n2), (3.10)33

∑which implies that n a.s.∑(E j −E j−1 )c n1 −→ 0. The above argument also ensures that n (E j −j=1a.s.E j−1 )c n3 −→ 0. Similarly, by the Burkholder inequality, Lemma 2.7 in [2] and (2.38) wehaveE|n∑(E j −E j−1 )c n2 | 4 ≤ M n∑n 3j=1≤ M n 3 n∑j=1≤ M n 3 n∑j=1j=1E|s T j A−1 j (z)¯s j x T n A−1 j (z)s j | 4[E|s T j A −1j (z)¯s j x T nA −1j (z)s j −x T nA −2j (z)¯s j | 4 +E|x T nA −2j (z)¯s j | 4 ]E‖¯s T j¯s j‖ 2 = O( 1 n2). (3.11)j=1Thus we prove (3.6), as expected.Finally applying ¯s = ∑ js j /n givesE(x T nA −1 (z)¯s) = 1 ∑E(x TnnA −1 (z)s j ) = 1 ∑n= − 1 nj=1∑j=1j=1(E x T nA −1j (z)s j b 1 (z)β j (z)γ j (z)E(x T nA −1j (z)s j β j (z))= O( 1 √ n), (3.12))where the step before to the last one uses (2.41) and the last step uses Holder’s inequality,(2.42) and (3.9). Therefore (3.5) follows from (3.6), (3.12), (2.28), and the fact that ¯s T¯s ≤a.s.M¯x T¯x → Mc, which can be verified by an argument similar to (3.1)-(3.3).4 Proof of Theorem 3Proof.The proof of Theorem 3 is similar to that of Theorem 1. The only difference is that theextra random variable here converges in probability in the C space to zero.34

Similar to (2.19) we can also truncate the underlying random variables at ε n√ n with εndefined as before. Thus, in view of the structure of S we may assume that the followingadditional conditions hold:|X ij | ≤ √ nε n , EX ij = 0, EX 2 11 = 1+o(p−1 ) (4.1)and under assumption (b) of Theorem 3E(X 11 −µ) 4 = 3+o(1). (4.2)It is proved in section 10.7 of [5] that Lemma 2 in [4] holds under conditions (4.1) and (4.2).Also, we see that Theorem 1.3 of [16] is true under conditions (4.1) and (4.2) by carefullychecking on the argument of Theorem 1.3 of [16]. Thus, to prove Theorem 3, by Theorems4.4 and 5.1 of [8], (2.53) and (3.4) it is sufficient to prove that on the contour C (C is definedin Theorem 1)n 1/4 x T nÂ−1 (z)¯s i.p.−→ 0, (4.3)where the truncated process x T nÂ−1 (z)¯s is obtained from x T n A−1 (z)¯s like the truncated process¯s TÂ−2 (z)¯s is obtained from ¯s T A −2 (z)¯s in Theorem 1.To prove (4.3), as in Theorem 1, it is sufficient to prove thatn 1/4 x T nÂ−1 (z)¯sd−→ 0. (4.4)For each z ∈ C u from the definition of c n1 in (3.8) we haven∑E|n 1/4 (E j −E j−1 )c n1 | 2 ≤ M √ n∑n E|c n1 | 2 ≤ Mn −1/2 ,j=1j=135

ecause by Holder’s inequality, (2.42), (2.20) and (3.9)(E|c n1 | 2 ≤ M E| 1 n xT nA −1j (z)s j | 4 E| 1 ) 1/2n sT j A −1j (z)s j | 4 = O(n −2 ).Appealing to (3.9) yieldsn∑E|n 1/4 (E j −E j−1 )c n3 | 2 ≤ Mn −1/2 .j=1By (2.31), (2.20) and (3.9) we obtainn∑E|n 1/4 (E j −E j−1 )c n2 | 2 ≤ Mn −1/2 .j=1With M n (z) = n 1/4 (x T nA −1 (z)¯s−Ex T nA −1 (z)¯s) we thus haveE|M n (z)| 2 = O(n −1/2 ). (4.5)By (2.37) and (2.40), (3.9) holds for z ∈ C + n. For z ∈ C + n, it follows from (3.12), (3.9),(2.20), (2.36), (2.39) and (2.42) that( ) ∑ ()|E n 1/4 x T n A−1 1(z)¯s | = | E x T n 3/4 n A−1 j (z)s j b 1 (z)β j (z)γ j (z) |j=1() 1/2≤ Mn 1/4 E|x T nA −11 (z)s 1 | 2 E|γ 1 (z)| 2 +Mv −4 n 3 P(λ max (S) > η r or λ min (S 1 ) ≤ η l )≤ M/n 1/4 . (4.6)From (3.12) one can verify that for I(z) = v 0 (v 0 is defined in the proof of Theorem 1)E|n 1/4 x T n A−1 (z)¯s| 2 ≤ M,36

Here we also use the fact that ¯s = ¯s j +s j /n and the identity above (3.7) in [3]. By (2.36),(2.40), (2.39), (3.9) and (2.39) we obtainE|d n1 | 2 ≤ M √ nE|x T n A−1 1 (z 1)s 1 | 2 +Mv −12 n 11/2 P(λ max (S) > η r or λ min (S 1 ) ≤ η l ) ≤ M √ n.This argument of course handles the terms d n2 and d n3 . Furthermore, we conclude from(2.36), (2.40), (2.39), (3.9), (2.39), (2.31) and Holder’s inequality thatE|d n4 | 2 ≤ √ M () 1/2E|s Tn1 A−1 1 (z 2)¯s 1 | 2 E|x T n A−1 1 (z 1)s 1 | 2+Mv −12 n 11/2 P(λ max (S) > η r or λ min (S 1 ) ≤ η l ) ≤ M √ n.Obviously, the argument for d n4 also applies to d n5 and d n6 . Thus, the proof of (4.3) iscomplete.AcknowledgmentsThe author would like to thank the editor and two referees for their constructive comments,which have helped to improve the paper a great deal. G.M. Pan is partially supported bythe Ministry of Education, Singapore, under a grant #ARC 14/11.References[1] Anderson, T.W. (1984). An introduction to multivariate statistical analysis. The second edition. JohnWiley & Sons.[2] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectraldistribution of large dimensional random matrices. Ann. Probab. 26 316-345.38