25.07.2014 Views

Estimating the covariance function with functional data - Statistics ...

Estimating the covariance function with functional data - Statistics ...

Estimating the covariance function with functional data - Statistics ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

247<br />

British Journal of Ma<strong>the</strong>matical and Statistical Psychology (2002), 55, 247–261<br />

© 2002 The British Psychological Society<br />

www.bps.org.uk<br />

<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong><br />

<strong>function</strong>al <strong>data</strong><br />

Sik-Yum Lee 1 *, Wenyang Zhang 2 and Xin-Yuan Song 1<br />

1 Department of <strong>Statistics</strong>, The Chinese University of Hong Kong, Hong Kong<br />

2 Department of <strong>Statistics</strong>, London School of Economics & Political Science, UK<br />

This paper describes a two-step procedure for estimating <strong>the</strong> <strong>covariance</strong> <strong>function</strong> and<br />

its eigenvalues and eigen<strong>function</strong>s in situations where <strong>the</strong> <strong>data</strong> are curves or <strong>function</strong>s.<br />

The rst step produces initial estimates of eigen<strong>function</strong>s using a standard principal<br />

components analysis. At <strong>the</strong> second step, <strong>the</strong>se initial estimates are smoo<strong>the</strong>d via local<br />

polynomial tting, <strong>with</strong> <strong>the</strong> bandwidth in <strong>the</strong> kernel <strong>function</strong> being selected by a <strong>data</strong>driven<br />

procedure. The results of a simulation study and three real examples are<br />

presented to illustrate <strong>the</strong> performance of <strong>the</strong> proposed methodology.<br />

1. Introduction<br />

In many elds, <strong>the</strong> <strong>data</strong> are <strong>function</strong>s observed at certain values. Typical examples are<br />

curves of learning and forgetting, repeated test scores, and physiological responses over<br />

time. O<strong>the</strong>r examples are given by Ramsay (1982). Functional <strong>data</strong> analysis, which can<br />

be regarded as a generalization of multivariate <strong>data</strong> analysis, has received a lot of<br />

attention in statistics. For examples, Hart and Wehrly (1986) used a kernel regression<br />

approach to estimate <strong>the</strong> mean curve; Rice and Silverman (1991) and Ramsay and<br />

Dalzell (1991) studied <strong>the</strong> estimation of mean curves and used principal components<br />

analysis to extract salient features of curves; Leurgans, Moyeed, and Silverman (1993)<br />

extended <strong>the</strong> canonical correlation analysis to random <strong>function</strong>s and showed that<br />

smoothing is needed in order to give sensible analyses; Brumback and Rice (1998)<br />

developed smoothing spline models for <strong>the</strong> analysis of nested and crossed samples of<br />

curves; Ramsay and Li (1998) proposed a nonparametric <strong>function</strong> estimation technique<br />

for identifying smooth monotone transformations. For an excellent introduction to<br />

some of <strong>the</strong>se techniques, see Ramsay and Silverman (1997). However, except for <strong>the</strong><br />

* Requests for reprints should be addressed to Sik-Yum Lee, Department of <strong>Statistics</strong>, The Chinese University of Hong Kong,<br />

Shatin, N.T., Hong Kong (e-mail: sylee@sparc2.sta.cuhk.edu.hk).


248 Sik-Yum Lee et al.<br />

important contributions of Ramsay (1982) and Besse and Ramsay (1986), <strong>the</strong>re are very<br />

few publications concerned <strong>with</strong> <strong>the</strong>oretical developments or applications in <strong>the</strong><br />

psychometric literature. One reason may be <strong>the</strong> highly technical nature of <strong>the</strong> required<br />

statistical and ma<strong>the</strong>matical background knowledge associated <strong>with</strong> existing methods.<br />

The main objective of this paper is to propose a two-step procedure for estimating<br />

<strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> as a non-technical complement to <strong>the</strong> work<br />

cited above. We calculate <strong>the</strong> raw estimates of <strong>the</strong> eigen<strong>function</strong>s via <strong>the</strong> standard<br />

principal components method in multivariate analysis, and <strong>the</strong>n obtain smooth estimates<br />

of <strong>the</strong> eigen<strong>function</strong>s and eigenvalues via a one-dimensional smoothing technique.<br />

Hence, <strong>the</strong> proposed procedure is simple to understand and easy to implement.<br />

In this paper, we will use <strong>the</strong> local polynomial approach (see Cleveland, 1979; Ruppert<br />

& Wand, 1994) to complete <strong>the</strong> second step. This choice is motivated by its nice<br />

properties; for example, it is highly intuitive and simple to implement (Fan & Marron,<br />

1994), achieves automatic boundary correction and possesses certain important<br />

optimal properties (Cheng, Fan, & Marron, 1997), as well as good empirical performance<br />

(Fan & Gijbels, 1996; Fan &Zhang, 1999). However, we emphasize that standard<br />

nonparametric methods, such as spline smoothing or cross-validation, can be applied.<br />

The paper is organized as follows. The motivation for our method is given in Section<br />

2. In Section 3, we propose a two-step procedure which applies local polynomial tting<br />

to estimate <strong>the</strong> <strong>covariance</strong> <strong>function</strong>, its eigen<strong>function</strong>s and eigenvalues. In Section 4, <strong>the</strong><br />

results of a simulation and three real examples are presented to illustrate <strong>the</strong> empirical<br />

performance of <strong>the</strong> proposed method. A discussion given in Section 5.<br />

2. Motivation<br />

First consider a random sample of multivariate <strong>data</strong> from a population <strong>with</strong> mean zero<br />

and <strong>covariance</strong> matrix S. The classical statistical inference on S is based on <strong>the</strong> sample<br />

<strong>covariance</strong> matrix. Since S is symmetric and positive denite, we have <strong>the</strong> following<br />

orthogonal expansion:<br />

S = Xp<br />

i = 1<br />

l i a i a T i , (1)<br />

where l 1 $ . . . $ l p $ 0 are <strong>the</strong> eigenvalues of S, and a i = (a 1i , . . . , a pi ) T is <strong>the</strong><br />

normalized eigenvector corresponding to <strong>the</strong> eigenvalue l i . Hence, S is determined<br />

by l 1 , . . . , l p and a 1 , . . . , a p . In particular, <strong>the</strong> (i, j )th element of S is given by<br />

j i j = Xp<br />

k = 1<br />

l k a i k a j k . (2)<br />

In addition, we have <strong>the</strong> following decomposition on <strong>the</strong> corresponding random<br />

vector X:<br />

X = Xp<br />

i = 1<br />

a i y i , (3)<br />

where y 1 , . . . , y p are uncorrelated random variables <strong>with</strong> zero mean and variances<br />

l 1 , . . . , l p respectively. It is well known that <strong>the</strong> decomposition (3) is not unique and is<br />

not identiable.<br />

Now consider <strong>the</strong> situation <strong>with</strong> <strong>function</strong>al <strong>data</strong>, where we have a univariate<br />

stochastic process X(t) and <strong>the</strong> <strong>data</strong> are curves. Without loss of generality, we assume


$ $ $<br />

E(X(t)) = 0. Let n(u , v) = Cov(X(u), X(v)) be <strong>the</strong> <strong>covariance</strong> <strong>function</strong>. Viewing <strong>the</strong><br />

random <strong>function</strong> X(t) as a vector <strong>with</strong> innite dimension and tracing <strong>the</strong> idea behind of<br />

(2.1) and (2.2), it is natural to impose <strong>the</strong> following condition on n(u , v): <strong>the</strong>re exists a<br />

series orthonormal <strong>function</strong>s f 1 ( · ), f 2 ( · ), . . . and m 1 m 2 . . . 0, such that <strong>the</strong><br />

<strong>covariance</strong> <strong>function</strong> n(u , v) is given by<br />

n(u , v) = X¥<br />

i = 1<br />

m i f i (u)f i (v), (4)<br />

(see Loeve, 1963). Here <strong>the</strong> m i play <strong>the</strong> role of l i , and <strong>the</strong> f i ( · ) play <strong>the</strong> role of<br />

elements in a i . Moreover, if f i ( · ) and m i , i = 1, 2, . . . , satisfy (4), we have a similar<br />

decomposition on X(t) as in (2.3). More specically,<br />

X(t) = X¥<br />

i = 1<br />

á X, f i ñ f i (t) = X¥<br />

i = 1<br />

h i f i (t), (5)<br />

where á f , gñ = „ f (t)g(t) dt. For i Þ j, and under some regularity conditions, we have<br />

…<br />

…<br />

¼ … …<br />

Eh i h j = E X(t)f i (t) dt X(t)f j (t) dt = EfX(t)X(s)g f i (t)f j (s) dt ds<br />

=<br />

= X¥<br />

… …<br />

n(t, s)f i (t)f j (s) dt ds = X¥<br />

k = 1<br />

k = 1<br />

… …<br />

m k f k (t)f i (t) dt f k (s)f j (s) ds = 0,<br />

m k<br />

… …<br />

f k (t)f k (s)f i (t)f j (s) dt ds<br />

and Eh i = 0, so Cov(h i , h j ) = 0. This implies that h 1 , h 2 , . . . are uncorrelated. Similarly,<br />

for each h i ,<br />

Var(h i ) = X¥ … …<br />

m k f k (t)f i (t) dt f k (s)f i (s) ds = m i .<br />

k = 1<br />

<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 249<br />

Hence, (5) can be regarded as an extension of (3). Similarly to <strong>the</strong> multivariate case, <strong>the</strong><br />

representations in (4) and (5) are nei<strong>the</strong>r unique nor identiable. If, for any i Þ j,<br />

m i Þ m j , <strong>the</strong>n f i ( · ) is identiable, except for <strong>the</strong> change of sign.<br />

Our development is not hindered by <strong>the</strong> non-identication of (5), because our<br />

interest is in how to estimate n(u , v) and how to nd <strong>the</strong> orthonormal <strong>function</strong>s<br />

f 1 ( · ), f 2 ( · ), . . . that satisfy (4), from <strong>the</strong> observed <strong>function</strong>al <strong>data</strong>. We will call <strong>the</strong>se<br />

orthonormal <strong>function</strong>s f 1 ( · ), f 2 ( · ), . . . <strong>the</strong> eigen<strong>function</strong>s, and <strong>the</strong> corresponding<br />

m 1 , m 2 , . . . <strong>the</strong> eigenvalues of n(u , v).<br />

3. A two-step estimation method<br />

First, let us motivate our method <strong>with</strong> a p-dimensional random vector X. Let ˆl 1 $ . . . $ ˆl p<br />

be <strong>the</strong> eigenvalues of <strong>the</strong> sample <strong>covariance</strong> matrix and â i = (â 1i , . . . , â pi ) T be <strong>the</strong><br />

eigenvector corresponding to ˆl i . An estimate of <strong>the</strong> (i, j )th element of S is given by<br />

ĵ i j = Xp<br />

k = 1<br />

ˆl k â i k â j k .<br />

The idea will be used to handle <strong>function</strong>al <strong>data</strong> as follows.


250 Sik-Yum Lee et al.<br />

Now suppose we have a univariate stochastic process X(t); <strong>with</strong>out loss of<br />

generality, we assume E(X(t)) = 0 and t [ [0, 1]. Let X 1 ( · ), . . . , X m ( · ) be a collection<br />

of m sample curves, each observed at t 1 , . . . , t n . More explicitly, we have <strong>the</strong> <strong>data</strong> set<br />

X h (t i ), i = 1, . . . , n, h = 1, . . . , m. For simplicity, we denote X h (t i ) by X i h . Obviously, for<br />

i, k = 1, . . . , n, and h, l = 1, . . . , m, we have EX i h = 0 and<br />

<br />

0, h Þ l,<br />

Cov(X i h , X kl ) =<br />

n(t i , t k ), h = l.<br />

Our procedure in estimating n(u, v), m i and f i ( · ) involves <strong>the</strong> following two steps.<br />

Step 1. Ignore temporarily <strong>the</strong> fact that <strong>the</strong> <strong>data</strong> are continuous <strong>function</strong>s and treat<br />

<strong>the</strong> problem as a standard principal components problem in multivariate analysis.<br />

Specically, let S be <strong>the</strong> sample <strong>covariance</strong> matrix <strong>with</strong> (i, j )th element<br />

1<br />

m<br />

X m<br />

k = 1<br />

X i k X j k .<br />

Compute n ˆm 1 $ . . . $ n ˆm n , <strong>the</strong> eigenvalues of S; and n ± 1/2 ˆûj = (n ± 1/2 ˆb1 j , . . . , n ± 1/2 ˆbn j ) T ,<br />

<strong>the</strong> orthonormal eigenvector corresponding to <strong>the</strong> eigenvalue n ˆm j , for j = 1, . . . , n.<br />

Step 2. For each j, we treat ( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ) as a sample of size n from <strong>the</strong><br />

regression model<br />

Y = f j (t) + « , (6)<br />

where Y is <strong>the</strong> ˆb. For each j = 1, . . . , n , smooth ˆb 1 j , . . . , ˆb n j <strong>with</strong> respect to t 1 , . . . , t n , and<br />

obtain an estimate of <strong>the</strong> eigen<strong>function</strong>, ˆf j ( · ).<br />

The procedure for estimating <strong>the</strong> eigenvalue m j of <strong>the</strong> <strong>covariance</strong> <strong>function</strong> n( · , · ) is<br />

as follows. For j = 1, 2, . . . , n, <strong>the</strong> j th eigenvalue of <strong>the</strong> <strong>covariance</strong> <strong>function</strong> of X(t) is <strong>the</strong><br />

variance of<br />

h j =<br />

…<br />

X(t)f j (t) dt <<br />

and f j (t i ) can be estimated by ˆf j (t i ). Hence,<br />

Fur<strong>the</strong>rmore, let<br />

˜h kj = 1 n<br />

h j <<br />

X n<br />

i = 1<br />

The sample variance of ˜h j is given by<br />

dVar(h j ) = 1 m<br />

˜h j = 1 n<br />

X n<br />

i = 1<br />

1<br />

n<br />

X n<br />

i = 1<br />

X(t i ) ˆf j (t i ).<br />

X(t i )f j (t i )<br />

X ki ˆfj (t i ), k = 1, . . . , m.<br />

X Á m<br />

˜h k j ± 1 X ! m 2 ˜h k j<br />

m<br />

.<br />

k = 1<br />

k = 1<br />

This gives an estimate for <strong>the</strong> variance of h j and hence <strong>the</strong> j th eigenvalue m j of <strong>the</strong><br />

<strong>covariance</strong> <strong>function</strong> n(u, v). Finally, an estimate of n(u , v) is given by<br />

ˆn(u , v) = Xn<br />

j = 1<br />

ˆm j ˆfj (u) ˆf j (v). (7)<br />

The initial estimator ˆû j is a rough estimator for f j ( · ). In <strong>the</strong> second step, a much<br />

better estimator, ˆfj ( · ), is obtained via an application of a smoothing technique to


( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ). By this smoothing step, information from neighbouring points is<br />

pooled toge<strong>the</strong>r to improve <strong>the</strong> efciency of <strong>the</strong> raw initial estimator. Many standard<br />

nonparametric methods, such as wavelet thresholding (Donoho, 1995; Donoho &<br />

Johnstone, 1995), spline smoothing (Eubank, 1988; Wahba, 1990; Green and Silverman,<br />

1994), or local polynomial modelling (Cleveland, 1979; Ruppert and Wand, 1994) can<br />

be used to nd ˆf j ( · ) in <strong>the</strong> second step. Therefore, <strong>the</strong> proposed two-step procedure is<br />

conceptually very simple; it just involves <strong>the</strong> computation of eigenvalues and eigenvectors<br />

of a sample <strong>covariance</strong> matrix using standard principal components analysis,<br />

and <strong>the</strong>n a standard one-dimensional nonparametric smoothing. Implementation of<br />

this procedure is quite straightforward.<br />

A nonparametric method in estimating eigenvalues/eigen<strong>function</strong>s of a <strong>covariance</strong><br />

structure <strong>with</strong> curve <strong>data</strong> has been developed by Rice and Silverman (1991). In <strong>the</strong>ir<br />

method, a class of estimates was obtained by bounding <strong>the</strong> roughness of <strong>the</strong> eigen<strong>function</strong><br />

via maximizing u T Su subject to k u k = 1 and u T Du # g, where g is a smoothing<br />

parameter and D is a roughening matrix, say F T F <strong>with</strong> a second-differencing operator F.<br />

The smoothing parameter g was selected by cross-validation based on a prediction<br />

viewpoint. After obtaining <strong>the</strong> rst j eigen<strong>function</strong>s, <strong>the</strong> ( j + 1)th eigen<strong>function</strong> was<br />

obtained by repeating <strong>the</strong> procedure of obtaining <strong>the</strong> eigen<strong>function</strong> in <strong>the</strong> orthogonal<br />

complement of <strong>the</strong> space generated by <strong>the</strong> rst j eigen<strong>function</strong>s. This process was<br />

continued until all n eigen<strong>function</strong>s were obtained. Evaluation of g via cross-validation is<br />

required to obtain each of <strong>the</strong> eigen<strong>function</strong>s. Hence, this method is not two-step in<br />

nature and is quite different from our simple two-step procedure.<br />

Motivated by many of its nice properties, we use <strong>the</strong> local polynomial approach to<br />

compute ˆf j ( · ) in <strong>the</strong> second step of our procedure. It has been shown that odd-order<br />

polynomial ts are preferable to even order polynomial ts (Fan & Gijbels, 1996), so we<br />

use a local polynomial of odd order q to estimate <strong>the</strong> underlying <strong>function</strong>. For each point<br />

t 0 , we approximate <strong>the</strong> underlying <strong>function</strong> f j (t) locally by<br />

f j (t) <<br />

X q<br />

1<br />

k! f(k) j<br />

k = 0<br />

(t 0 )(t ± t 0 ) k = Xq<br />

k = 0<br />

, k (t ± t 0 ) k , (8)<br />

for t in a neighbourhood of t 0 . This leads to <strong>the</strong> following local least squares problem:<br />

minimize<br />

( )<br />

X n<br />

2<br />

ˆb i j ± Xq<br />

, k (t i ± t 0 ) k K h j<br />

(t i ± t 0 ), (9)<br />

i = 1<br />

k = 0<br />

<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 251<br />

<strong>with</strong> respect to , k , k = 0, . . . , q, for a given kernel <strong>function</strong> K and a bandwidth h j ,<br />

where K h ( · ) = K( · /h)/h. Let W(h) = diag(K h (t 1 ± t 0 ), . . . , K h (t n ± t 0 )), and<br />

0<br />

1 (t 1 ± t 0 ) . . . (t 1 ± t 0 ) q 1<br />

T q =<br />

. . .<br />

. . .<br />

.<br />

B<br />

C<br />

@<br />

. . A .<br />

1 (t n ± t 0 ) . . . (t n ± t 0 ) q<br />

The solution of <strong>the</strong> least squares problem (9) gives <strong>the</strong> following estimate of <strong>the</strong><br />

underlying <strong>function</strong> at t 0 . More specically, <strong>the</strong> estimate ˆf j (t 0 ) is equal to <strong>the</strong> rst<br />

element of <strong>the</strong> matrix:<br />

(T T q W(h j )T q ) ± 1 T T q W(h j ) ˆû j . (10)<br />

Since ˆû j , j = 1, . . . , n are orthonormal, after this local polynomial smoothing, ˆfj ( · ),<br />

j = 1, . . . , n are approximately orthonormal.


252 Sik-Yum Lee et al.<br />

In <strong>the</strong> smoothing procedure, <strong>the</strong> bandwidth h j plays an important role. In general, a<br />

bandwidth h j = 0 basically results in interpolating <strong>the</strong> <strong>data</strong>, and hence leads to <strong>the</strong> most<br />

complex model. A bandwidth h j = ¥ corresponds to tting globally a polynomial of<br />

degree q, and hence leads to <strong>the</strong> simplest model. Although we use a <strong>data</strong>-driven method<br />

(see Fan and Gijbels, 1995; Zhang and Lee, 2000) to select <strong>the</strong> bandwidth in this paper,<br />

<strong>the</strong>re are o<strong>the</strong>r procedures, such as cross-validation, which can be implemented by<br />

S-PLUS (see Venables and Ripley, 1999).<br />

As pointed out by Castro, Lawton, and Sylvestre (1986), if <strong>the</strong> prime points t i are not<br />

equally spaced, <strong>the</strong> following adjustment is required in <strong>the</strong> rst step of <strong>the</strong> two-step<br />

procedure to ensure <strong>the</strong> invariance of eigen<strong>function</strong>s <strong>with</strong> respect to different choices<br />

of design. Using <strong>the</strong> quadrature rule (see Baker, 1977), we obtain<br />

w i =<br />

… 1<br />

0<br />

l i (t) dt, where l i (t) =<br />

Y n<br />

j = 0, j Þ<br />

i<br />

t ± t j<br />

t i ± t j<br />

. (11)<br />

Let A = diag(w 1 , . . . , w n ), and ˆm i <strong>the</strong> i th eigenvalue of A 1/2 SA 1/2 . Let ˜û i be <strong>the</strong> i th<br />

orthonormal eigenvector of A 1/2 SA 1/2 corresponding to eigenvalue ˆm i . The initial rststep<br />

estimates are obtained via ˆû i = A ± 1/2 ˜ûi , i = 1, . . . , n. These estimates are <strong>the</strong>n<br />

smoo<strong>the</strong>d at <strong>the</strong> second step for obtaining <strong>the</strong> nal estimates of <strong>the</strong> eigen<strong>function</strong>s. For<br />

unequally spaced sampling all inner products must use <strong>the</strong> quadrature rule, so when we<br />

estimate <strong>the</strong> j th eigenvalue m j of <strong>the</strong> <strong>covariance</strong> <strong>function</strong> n(u , v), ˜h k j becomes<br />

˜h k j = Xn<br />

i = 1<br />

X ki ˆfj (t i )w i .<br />

The modication of <strong>the</strong> proposed two-step procedure for dealing <strong>with</strong> <strong>the</strong> unequally<br />

spaced case is minor. In <strong>the</strong> rst step, after obtaining <strong>the</strong> diagonal matrix A that contains<br />

<strong>the</strong> weights w i , we obtain <strong>the</strong> eigenvalues and orthonormal eigenvectors of A 1/2 SA 1/2<br />

and <strong>the</strong> initial rst-step estimate of <strong>the</strong> eigen<strong>function</strong> ˆb i , i = 1, . . . , n. At <strong>the</strong> second step<br />

for producing <strong>the</strong> nal estimates, <strong>the</strong> standard one-dimensional nonparametric smoothing<br />

is conducted on <strong>the</strong> basis of <strong>the</strong> regression model (6), <strong>with</strong> <strong>the</strong> sample<br />

( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ) for each j. Again, this step can be completed by using a local<br />

polynomial <strong>data</strong>-driven method or by software packages such as SAS and S-PLUS.<br />

4. Numerical illustrations<br />

4.1. A simulation study<br />

The following example is used to illustrate <strong>the</strong> performance of our method for<br />

estimating <strong>covariance</strong> <strong>function</strong>s, <strong>the</strong>ir eigen<strong>function</strong>s and eigenvalues. Consider<br />

X(t) =<br />

p <br />

2 sin(pt)y1 +<br />

p <br />

2 cos(pt)y2 , t [ [0, 1].<br />

Random variables y 1 and y 2 in this example are independently and normally distributed<br />

<strong>with</strong> mean zero. Their variances are equal to f2.0, 1.0g. The numbers of sample points<br />

for each sample curve were selected to be n = 15, 50, 100 and 200; and for each n we<br />

took m = 50, 100, and 200. Sample points of <strong>the</strong> sample curve were observed at (i)<br />

equally spaced points t i = i/(n + 1), i = 1, . . . , n; and (ii) unequally spaced points<br />

obtained from sorted random points simulated from a uniform distribution on [0, 1].<br />

For cases (i) and (ii) <strong>with</strong> different n and m, <strong>the</strong> number of replications was 200.


pThe truep <strong>covariance</strong>p <strong>function</strong> pof X(t) in this example is: n(u, v) =<br />

2p2<br />

sin(pu) p 2 sin(pv) + 2 cos(pu) 2 cos(pv). The eigen<strong>function</strong>s are equal to<br />

f 2 sin(pt), 2 cos(pt)g. Note that <strong>the</strong>se eigen<strong>function</strong>s are not unique; <strong>the</strong>y allow<br />

sign differences. The mean integrated square error (MISE)<br />

… …<br />

D ± 1 E (ˆn(u , v) ± n(u , v)) 2 du dv,<br />

where <strong>the</strong> D is <strong>the</strong> support set of (u, v), is employed as <strong>the</strong> criterion to evaluate <strong>the</strong><br />

performance of <strong>the</strong> estimator.<br />

The proposed procedure was used to estimate <strong>the</strong> <strong>covariance</strong> <strong>function</strong>s and nd<br />

<strong>the</strong>ir eigen<strong>function</strong>s and eigenvalues. Local linear t (q = 1) and local cubic t (q = 3)<br />

for <strong>the</strong> regression curve are considered, and <strong>the</strong> kernel <strong>function</strong> is taken to be <strong>the</strong><br />

Epanechnikov kernel K(t) = 0.75(1 ± t 2 ) + . The MISEs of <strong>the</strong> estimators are reported in<br />

Table 1. The following phenomena are observed from this table: (i) Generally, <strong>the</strong> MISE<br />

improved signicantly <strong>with</strong> increase of sample size m. (ii) The MISE <strong>with</strong> n = 50 has a<br />

signicant improvement over <strong>the</strong> case <strong>with</strong> n = 15. However, changes in MISE are very<br />

small for situations <strong>with</strong> n larger than 50. Most probably, <strong>the</strong> reason for this phenomenon<br />

is that, <strong>with</strong> a signicantly large number of points, <strong>the</strong> information between<br />

adjacent points does not contribute in improving <strong>the</strong> accuracy of <strong>the</strong> estimator.<br />

Differences between linear t and cubic t are very small for large n. (iii) To achieve<br />

more accurate results, it is more important to have a large m than large n. (iv) The<br />

differences in MISE between <strong>the</strong> equally and unequally spaced cases are not substantial,<br />

especially <strong>with</strong> moderate sample sizes. Roughly speaking, performance of a cubic t for<br />

n = 15 and m = 50 is acceptable.<br />

Table 1. Mean integrated square errors in simulation study<br />

<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 253<br />

n<br />

m q 15 50 100 200<br />

Equally 50 1 0.316 0.274 0.275 0.275<br />

spaced 3 0.288 0.279 0.277 0.275<br />

100 1 0.198 0.137 0.137 0.137<br />

3 0.145 0.139 0.138 0.138<br />

200 1 0.130 0.063 0.063 0.062<br />

3 0.066 0.063 0.063 0.063<br />

Unequally 50 1 0.344 0.303 0.275 0.266<br />

spaced 3 0.325 0.305 0.277 0.266<br />

100 1 0.214 0.160 0.140 0.132<br />

3 0.178 0.159 0.141 0.132<br />

200 1 0.139 0.083 0.066 0.061<br />

3 0.093 0.080 0.066 0.061<br />

From <strong>the</strong> 200 replications, we select <strong>the</strong> estimator <strong>with</strong> <strong>the</strong> median performance for<br />

a fur<strong>the</strong>r illustration of <strong>the</strong> performance of <strong>the</strong> procedure. Since results produced by<br />

cubic t are similar, only results obtained from <strong>the</strong> linear t are presented. Estimates of<br />

eigenvalues <strong>with</strong> linear t for some choices of n and m are presented in Table 2. As<br />

expected, under <strong>the</strong> same m, results for different large n are similar, so are not


254 Sik-Yum Lee et al.<br />

presented. It seems that <strong>the</strong> eigenvalue estimates are reasonably accurate. Figure 1<br />

depicts <strong>the</strong> resulting estimated eigen<strong>function</strong>s <strong>with</strong> n = 200 and m = 200 for equally<br />

spaced and unequally spaced cases. It can be seen that <strong>the</strong> true and <strong>the</strong> estimated<br />

eigen<strong>function</strong>s are close to each o<strong>the</strong>r. To study empirically whe<strong>the</strong>r <strong>the</strong> orthogonality<br />

among <strong>the</strong> original components is affected by separate smoothing, we compute<br />

( ˆf j , ˆf j ¢ ) = Xn<br />

i = 1<br />

ˆf j (t i ) ˆf j ¢ (t i )<br />

for all pairs of ˆf j and ˆf j ¢ . For <strong>the</strong> equally spaced case, ( ˆf 1 , ˆf 2 ) for (n, m) = (15, 50),<br />

(50, 50), (100, 100), and (200, 200) are equal to ± 3.65 ´ 10 ± 4 , ± 1.88 ´ 10 ± 6 ,<br />

1.18 ´ 10 ± 7 and 1.69 ´ 10 ± 8 , respectively. It seems that for moderate sizes n and<br />

m, orthogonality is not seriously affected by separate smoothing of <strong>the</strong> different<br />

components. A similar phenomenon is also observed for <strong>the</strong> unequally spaced case.<br />

Table 2. Estimates of eigenvalues in simulation study<br />

(n, m)<br />

True<br />

value (15, 50) (50, 50) (100, 100) (200, 200)<br />

Equally 2.0 2.17 2.42 1.89 2.20<br />

spaced 1.0 1.47 0.73 0.98 1.05<br />

Unequally 2.0 3.05 1.77 2.07 2.12<br />

spaced 1.0 0.81 0.81 0.89 1.09<br />

4.2. Real examples<br />

We rst illustrate our method by an application to <strong>the</strong> tongue <strong>data</strong> given in Besse and<br />

Ramsay (1986, pp. 288–289). The <strong>data</strong> consist of 42 records of tongue dorsum movements<br />

collected by Munhall (1984) using an ultrasound sensing technique developed by<br />

Keller and Ostry (1983). It will be assumed that <strong>the</strong> interval of <strong>the</strong> observation has been<br />

normalized to be [0, p], and <strong>the</strong> sampled values are observations at 13 equally spaced<br />

points. See Besse and Ramsay (1986) for more detailed background on this <strong>data</strong> set. The<br />

proposed two-step procedure <strong>with</strong> a cubic t at <strong>the</strong> second step is used to nd <strong>the</strong><br />

eigen<strong>function</strong>s and <strong>the</strong>ir eigenvalues of <strong>the</strong> <strong>covariance</strong> <strong>function</strong>s.<br />

Figure 2 describes <strong>the</strong> rst three eigen<strong>function</strong>s of <strong>the</strong> sample <strong>function</strong>s. Using<br />

…<br />

Var(X(t)) dt<br />

to measure <strong>the</strong> variation of <strong>the</strong> random <strong>function</strong> X(t), we nd that <strong>the</strong> rst three<br />

eigen<strong>function</strong>s account for 89.5%, 8.4%and 1.4%of <strong>the</strong> overall variation of <strong>the</strong> sample<br />

<strong>function</strong>s, respectively. These three eigen<strong>function</strong>s account for a total of 99.3%of <strong>the</strong><br />

overall variance. It can be seen from Fig. 2 that <strong>the</strong> rst eigen<strong>function</strong> can be regarded as<br />

constant (compared <strong>with</strong> o<strong>the</strong>r eigen<strong>function</strong>s), <strong>the</strong> second is close to sin(t), and <strong>the</strong><br />

third is close to cos(t). Our results are very similar to those obtained by Besse and<br />

Ramsay (1986), in which <strong>the</strong> eigenvalues respectively account for 90.0%, 8.3%and 1.5%<br />

of <strong>the</strong> variation, and <strong>the</strong> eigen<strong>function</strong>s are also very similar (see Besse & Ramsay, 1986,<br />

Fig. 5). At a reviewer’s suggestion, <strong>the</strong> estimated <strong>covariance</strong> <strong>function</strong> is shown in Fig. 3.


<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 255<br />

Figure 1. Estimated<br />

p <br />

eigen<strong>function</strong>s<br />

p <br />

for (a) <strong>the</strong> equally and (b) <strong>the</strong> unequally spaced cases. The solid<br />

curves are (i) 2 sin(pt), (ii) ± 2 cos(pt).<br />

Hence <strong>the</strong> analysis via <strong>the</strong> proposed two-step procedure also suggests <strong>the</strong> following<br />

model as in Besse and Ramsay (1986):<br />

X(t) = y 1 + y 2 sin(t) + y 3 cos(t) + e(t),<br />

and leads to <strong>the</strong> similar conclusion that <strong>the</strong> simple harmonic motion describes tongue<br />

dorsum behaviour adequately in [0, p].<br />

As a second illustrative example, we reanalyse <strong>the</strong> curves of saggital hip angles in<br />

Rice and Silverman (1991). In this <strong>data</strong> set, n = 20 and m = 39, and observations are<br />

taken over a gait cycle consisting of one (double) step taken by each child; see Olshen,<br />

Biden, Wyatt, and Su<strong>the</strong>rland (1989) for more details. Using <strong>the</strong> proposed two-step<br />

procedure <strong>with</strong> a cubic t at <strong>the</strong> second step, we obtained <strong>the</strong> rst four eigen<strong>function</strong>s.<br />

The corresponding eigenvalues respectively account for 70.6%, 12.1%, 8.4%and 3.8%(a<br />

total of 94.9%) of <strong>the</strong> total variability. These results are very close to those reported in<br />

Rice and Silverman (1991). The eigen<strong>function</strong>s are depicted in Fig. 4. Comprising this<br />

gure <strong>with</strong> Fig. 2 in Rice and Silverman (1991), it is clear that eigen<strong>function</strong>s obtained<br />

from our procedure and <strong>the</strong>ir procedures are also similar. The fourth eigen<strong>function</strong> is<br />

not very close to <strong>the</strong> raw eigenvector, which may indicate that <strong>the</strong> amount of noise of<br />

<strong>the</strong> raw eigenvector ltered by <strong>the</strong> local polynomial smoothing is substantial. In any


256 Sik-Yum Lee et al.<br />

Figure 2. The rst three eigen<strong>function</strong>s of <strong>the</strong> tongue <strong>data</strong>; <strong>the</strong> rst, second and third eigen<strong>function</strong>s<br />

are represented by <strong>the</strong> solid, dotted and dashed curves, respectively.<br />

Figure 3. Covariance <strong>function</strong> of <strong>the</strong> tongue <strong>data</strong>. The height of <strong>the</strong> surface indicates <strong>the</strong> amount of<br />

<strong>covariance</strong>.


<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 257<br />

Figure 4. First four raw eigenvectors (dots) and smoo<strong>the</strong>d eigen<strong>function</strong> (solid curve) of <strong>the</strong> hip angle<br />

<strong>data</strong>.<br />

case, since it is <strong>the</strong> least important eigen<strong>function</strong>, statistical conclusions drawn on <strong>the</strong><br />

basis of <strong>the</strong> analysis are not affected. For completeness, <strong>the</strong> estimated <strong>covariance</strong><br />

<strong>function</strong> is displayed in Fig. 5. Detailed interpretations of <strong>the</strong> results, which are available<br />

in Rice and Silverman (1991), are not presented to save space.<br />

Our nal example is based on a variable selected from a study (Mackinnon et al.,<br />

1991) of school smoking-prevention programmes based on social-psychological principles.<br />

These social inuence programmes were designed to teach social skills and to<br />

create a social environment less receptive to drug use. If <strong>the</strong>y work as planned, <strong>the</strong>n<br />

favourable changes in mediating variables for drug use and behavioural intentions are<br />

indicators of success. One of <strong>the</strong> main objectives of <strong>the</strong> study was to evaluate <strong>the</strong> impact<br />

of a social inuence based programme on <strong>the</strong> mediating variables it was designed to<br />

change. The <strong>data</strong> were obtained from public middle schools and junior high schools in<br />

midwestern states of USA. As an illustration, we considered a variable X(t) measuring<br />

students’ control of <strong>the</strong>ir cigarette consumption (<strong>the</strong> original question is: ‘If your best<br />

friend offered you a cigarette, how hard would it be to refuse <strong>the</strong> offer?’), measured at<br />

ve time points for students from 50 schools. At t = t 0 , all students were in <strong>the</strong> seventh<br />

grade; t 1 was six months later and <strong>the</strong> o<strong>the</strong>r time intervals were one year. Hence, <strong>the</strong>


258 Sik-Yum Lee et al.<br />

Figure 5. Covariance <strong>function</strong> of <strong>the</strong> hip <strong>data</strong>. The height of <strong>the</strong> surface indicates <strong>the</strong> amount of<br />

<strong>covariance</strong>.<br />

time points are not equally spaced. Using <strong>the</strong> proposed procedure <strong>with</strong> an adjustment<br />

for unequally spaced time points, we found that <strong>the</strong> rst two eigenvalues respectively<br />

accounted for 68.3%and 17.0%of <strong>the</strong> total variability. The corresponding eigen<strong>function</strong>s<br />

obtained via a local polynomial smoothing <strong>with</strong> a linear t are displayed in Fig. 6.<br />

Roughly, <strong>the</strong>se eigen<strong>function</strong>s can be regarded as quadratic. They reveal <strong>the</strong> change in<br />

students’ control of cigarette consumption over time. From <strong>the</strong> rst eigen<strong>function</strong>, it<br />

seems that <strong>the</strong> time that students’ best friend has <strong>the</strong> strongest inuence is between half<br />

and one year after seventh grade, after which <strong>the</strong> best friend’s inuence decreases.<br />

However, it should be pointed out that this is just an illustrative example <strong>with</strong> only ve<br />

time points, and its interpretation should be treated <strong>with</strong> caution. More <strong>data</strong> on more<br />

time points are required for a more substantive conclusion to be drawn.<br />

Several examples given in Ramsay and Silverman (1997) have been analysed. We<br />

observe that results obtained via <strong>the</strong> proposed two-step procedure are similar to those<br />

reported in Ramsay and Silverman (1997).<br />

5. Discussion<br />

Functional <strong>data</strong> analysis is particularly useful for investigating changes over time for<br />

characteristics that are measured continuously or repeatedly for each object. Examples<br />

of this type of research are longitudinal studies, and analyses of growth curves and<br />

learning curves. It is not straightforward to extend <strong>the</strong> classical multivariate <strong>data</strong> analysis<br />

to <strong>function</strong>al <strong>data</strong> analysis. As a complement to <strong>the</strong> existing work, this paper proposes a<br />

simple two-step procedure to estimate <strong>the</strong> <strong>covariance</strong> <strong>function</strong> and its eigenvalues and


<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 259<br />

Figure 6. The rst two eigen<strong>function</strong>s of <strong>the</strong> <strong>data</strong> on school smoking-prevention programme; <strong>the</strong> rst<br />

and second <strong>function</strong>s are represented by <strong>the</strong> solid and dotted curves, respectively.<br />

eigen<strong>function</strong>s. At <strong>the</strong> rst step, we compute <strong>the</strong> eigenvectors and eigenvalues of<br />

certain sample <strong>covariance</strong> matrices. This task can be completed easily and efciently<br />

using standard software. The second step only involves some one-dimensional nonparametric<br />

smoothing, simpler than <strong>the</strong> multi-dimensional smoothing required by o<strong>the</strong>r<br />

methods. Our simulation results indicate that <strong>the</strong> results produced are quite accurate.<br />

The proposed procedure also gives very similar results to those in Besse and Ramsay<br />

(1986) and Rice and Silverman (1991) in analysing some real examples. Hence, it is<br />

simple to understand, easy to implement, and efcient in producing reliable results.<br />

Owing to <strong>the</strong> optimal statistical properties associated <strong>with</strong> <strong>the</strong> local polynomial<br />

smoothing, we apply it to obtain <strong>the</strong> nal eigen<strong>function</strong>s at <strong>the</strong> second step of <strong>the</strong><br />

proposed procedure. In fact, o<strong>the</strong>r methods such as spline smoothing can be used. The<br />

available code for most of <strong>the</strong>se more traditional methods can be found in existing<br />

software such as SAS, S-PLUS and Matlab. For practitioners, <strong>the</strong> smoothing at <strong>the</strong> second<br />

step can be completed via <strong>the</strong> above standard software. Hence, <strong>the</strong> proposed procedure<br />

can be used by those <strong>with</strong> less technical training. Owing to <strong>the</strong> pioneering work of<br />

Ramsay and Dalzell (1991), Ramsay and Silverman (1997) and o<strong>the</strong>rs, we expect that<br />

<strong>function</strong>al <strong>data</strong> analysis will gain in popularity, and that <strong>the</strong> standard software will<br />

include an option to perform local polynomial smoothing in <strong>the</strong> near future.<br />

Acknowledgements<br />

The research for this paper was partiallysupported by a grant from <strong>the</strong> Research Grants Council of<br />

<strong>the</strong> Hong Kong Special Administration Region, China (Project No. CVHK 4346/OIH), and a direct


260 Sik-Yum Lee et al.<br />

grant from CVHK. The authors are greatly indebted to <strong>the</strong> Editor and two reviewers for some<br />

valuable comments for improvement of <strong>the</strong> paper, and thankful to Drs M. A. Pentz and C. P. Chou<br />

for providing <strong>the</strong> <strong>data</strong> in <strong>the</strong> last real example. The assistance of Es<strong>the</strong>r L. S. Tam in preparing <strong>the</strong><br />

manuscript is also acknowledged.<br />

References<br />

Baker, C. T. H. (1977). The numerical treatment of integral equations. Oxford: Clarendon Press.<br />

Besse, P., & Ramsay, J. O. (1986). Principal components analysis of sampled <strong>function</strong>s.<br />

Psychometrika, 51, 285–311.<br />

Brumback, B. A., & Rice, J. A. (1998). Smoothing spline models for <strong>the</strong> analysis of nested and<br />

crossed samples of curves (<strong>with</strong> discussion). Journal of <strong>the</strong> American Statistical Association,<br />

93, 961–994.<br />

Castro, P. E., Lawton, W. H., & Sylvestre, E. A. (1986). Principal modes of variation for process <strong>with</strong><br />

continuous sample curves. Technometrics, 28, 329–337.<br />

Cheng, M. Y., Fan, J., & Marron, J. S. (1997). On automatic boundary corrections. Annals of<br />

<strong>Statistics</strong>, 25, 1691–1708.<br />

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal<br />

of <strong>the</strong> American Statistical Association, 74, 829 –836.<br />

Donoho, D. L. (1995). Nonlinear solution of linear inverse problems by wavelet-vaguelette<br />

decomposition. Applied and Computational Harmonic Analysis, 2, 101–126.<br />

Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet<br />

shrinkage. Journal of <strong>the</strong> American Statistical Association, 90, 1200–1224.<br />

Eubank, R. L. (1988). Spline smoothing and nonparametric regression. New York: Marcel Dekker.<br />

Fan, J., & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial tting: variable<br />

bandwidth and spatial adaptation. Journal of <strong>the</strong> Royal Statistical Society, Series B, 57,<br />

371 –394.<br />

Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. London: Chapman<br />

& Hall.<br />

Fan, J., & Marron, J. S. (1994). Fast implementation of nonparametric curve estimator. Journal of<br />

Computational and Graphical <strong>Statistics</strong>, 3, 35–56.<br />

Fan, J., & Zhang, W. (1999). Statistical estimation in varying coefcient models. Annals of<br />

<strong>Statistics</strong>, 27, 1491–1518.<br />

Green, P. J., &Silverman, B. W. (1994). Nonparametric regression and generalized linear models:<br />

A roughness penalty approach. London: Chapman & Hall.<br />

Hart, J. D., &Wehrly, T. E. (1986). Kernel regression estimation using repeated measurements <strong>data</strong>.<br />

Journal of <strong>the</strong> American Statistical Association, 81, 1080–1088.<br />

Keller, E., & Ostry, D. J. (1983). Computerized measurement of tongue dorsum movements <strong>with</strong><br />

pulsed-echo ultrasound. Journal of <strong>the</strong> Acoustical Society of America, 73, 1309–1315.<br />

Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when <strong>the</strong><br />

<strong>data</strong> are curves. Journal of <strong>the</strong> Royal Statistical Society, Series B, 55, 725–740.<br />

Loeve, M. (1963). Probability <strong>the</strong>ory, 3rd ed. Princeton, N.J.: D. Van Nostrand.<br />

Mackinnon, D. P., Johnson, C. A., Pentz, M. A., Dwyer, J. H., Hansen, W. B., Flay, B. R., & Wang,<br />

E. Y.-I. (1991). Mediating mechanism in a school-based drug prevention program: First year<br />

effects of <strong>the</strong> midwestern prevention project. Health Psychology, 10, 164–172.<br />

Munhall, K. G. (1984). Temporal adjustment in speech motor control: Evidence from laryngeal<br />

kinematics. Unpublished doctoral dissertation, McGill University.<br />

Olshen, R. A., Biden, E. N., Wyatt, M. P., &Su<strong>the</strong>rland, D. H. (1989). Gait analysis and <strong>the</strong> bootstrap.<br />

Annals of <strong>Statistics</strong>, 17, 1419–1440.<br />

Ramsay, J. O. (1982). When <strong>the</strong> <strong>data</strong> are <strong>function</strong>s. Psychometrika , 47, 379–396.<br />

Ramsay, J. O., & Dalzell, C. J. (1991). Some tools for <strong>function</strong>al <strong>data</strong> analysis (<strong>with</strong> discussions).<br />

Journal of <strong>the</strong> Royal Statistical Society, Series B, 53, 539–572.


Ramsay, J. O., & Silverman, B. W. (1997). Functional <strong>data</strong> analysis. New York: Springer-Verlag.<br />

Ramsay, J. O., & Li, X. (1998). Curve registration. Journal of <strong>the</strong> Royal Statistical Society, Series B,<br />

60, 351–363.<br />

Rice, J. A., & Silverman, B. W. (1991). <strong>Estimating</strong> <strong>the</strong> mean and <strong>covariance</strong> structure nonparametrically<br />

when <strong>the</strong> <strong>data</strong> are curves. Journal of <strong>the</strong> Royal Statistical Society, Series B,<br />

53, 233–243.<br />

Ruppert, D., & Wand, M. P. (1994). Multivariate weighted least squares regression. Annals of<br />

<strong>Statistics</strong>, 22, 1346–1370.<br />

Venables, W. N., & Ripley, B. D. (1999). Modern applied statistics <strong>with</strong> S-PLUS, 3rd ed. New York:<br />

Springer-Verlag.<br />

Wahba, G. (1990). Spline models for observing <strong>data</strong>. Philadelphia: Society for Industrial and<br />

Applied Ma<strong>the</strong>matics.<br />

Zhang, W., &Lee, S.-Y. (2000). Variable bandwidth selection in varying coefcient models. Journal<br />

of Multivariate Analysis, 74, 116–134.<br />

Received 22 August 2000; revised version received 14 August 2001<br />

<strong>Estimating</strong> <strong>the</strong> <strong>covariance</strong> <strong>function</strong> <strong>with</strong> <strong>function</strong>al <strong>data</strong> 261

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!