Estimating the covariance function with functional data - Statistics ...

247 

British Journal of Mathematical and Statistical Psychology (2002), 55, 247–261 

© 2002 The British Psychological Society 

www.bps.org.uk 

Estimating the covariance function with 

functional data 

Sik-Yum Lee 1 *, Wenyang Zhang 2 and Xin-Yuan Song 1 

1 Department of Statistics, The Chinese University of Hong Kong, Hong Kong 

2 Department of Statistics, London School of Economics & Political Science, UK 

This paper describes a two-step procedure for estimating the covariance function and 

its eigenvalues and eigenfunctions in situations where the data are curves or functions. 

The rst step produces initial estimates of eigenfunctions using a standard principal 

components analysis. At the second step, these initial estimates are smoothed via local 

polynomial tting, with the bandwidth in the kernel function being selected by a datadriven 

procedure. The results of a simulation study and three real examples are 

presented to illustrate the performance of the proposed methodology. 

1. Introduction 

In many elds, the data are functions observed at certain values. Typical examples are 

curves of learning and forgetting, repeated test scores, and physiological responses over 

time. Other examples are given by Ramsay (1982). Functional data analysis, which can 

be regarded as a generalization of multivariate data analysis, has received a lot of 

attention in statistics. For examples, Hart and Wehrly (1986) used a kernel regression 

approach to estimate the mean curve; Rice and Silverman (1991) and Ramsay and 

Dalzell (1991) studied the estimation of mean curves and used principal components 

analysis to extract salient features of curves; Leurgans, Moyeed, and Silverman (1993) 

extended the canonical correlation analysis to random functions and showed that 

smoothing is needed in order to give sensible analyses; Brumback and Rice (1998) 

developed smoothing spline models for the analysis of nested and crossed samples of 

curves; Ramsay and Li (1998) proposed a nonparametric function estimation technique 

for identifying smooth monotone transformations. For an excellent introduction to 

some of these techniques, see Ramsay and Silverman (1997). However, except for the 

* Requests for reprints should be addressed to Sik-Yum Lee, Department of Statistics, The Chinese University of Hong Kong, 

Shatin, N.T., Hong Kong (e-mail: sylee@sparc2.sta.cuhk.edu.hk).

248 Sik-Yum Lee et al. 

important contributions of Ramsay (1982) and Besse and Ramsay (1986), there are very 

few publications concerned with theoretical developments or applications in the 

psychometric literature. One reason may be the highly technical nature of the required 

statistical and mathematical background knowledge associated with existing methods. 

The main objective of this paper is to propose a two-step procedure for estimating 

the covariance function with functional data as a non-technical complement to the work 

cited above. We calculate the raw estimates of the eigenfunctions via the standard 

principal components method in multivariate analysis, and then obtain smooth estimates 

of the eigenfunctions and eigenvalues via a one-dimensional smoothing technique. 

Hence, the proposed procedure is simple to understand and easy to implement. 

In this paper, we will use the local polynomial approach (see Cleveland, 1979; Ruppert 

& Wand, 1994) to complete the second step. This choice is motivated by its nice 

properties; for example, it is highly intuitive and simple to implement (Fan & Marron, 

1994), achieves automatic boundary correction and possesses certain important 

optimal properties (Cheng, Fan, & Marron, 1997), as well as good empirical performance 

(Fan & Gijbels, 1996; Fan &Zhang, 1999). However, we emphasize that standard 

nonparametric methods, such as spline smoothing or cross-validation, can be applied. 

The paper is organized as follows. The motivation for our method is given in Section 

2. In Section 3, we propose a two-step procedure which applies local polynomial tting 

to estimate the covariance function, its eigenfunctions and eigenvalues. In Section 4, the 

results of a simulation and three real examples are presented to illustrate the empirical 

performance of the proposed method. A discussion given in Section 5. 

2. Motivation 

First consider a random sample of multivariate data from a population with mean zero 

and covariance matrix S. The classical statistical inference on S is based on the sample 

covariance matrix. Since S is symmetric and positive denite, we have the following 

orthogonal expansion: 

S = Xp 

i = 1 

l i a i a T i , (1) 

where l 1 $ . . . $ l p $ 0 are the eigenvalues of S, and a i = (a 1i , . . . , a pi ) T is the 

normalized eigenvector corresponding to the eigenvalue l i . Hence, S is determined 

by l 1 , . . . , l p and a 1 , . . . , a p . In particular, the (i, j )th element of S is given by 

j i j = Xp 

k = 1 

l k a i k a j k . (2) 

In addition, we have the following decomposition on the corresponding random 

vector X: 

X = Xp 

i = 1 

a i y i , (3) 

where y 1 , . . . , y p are uncorrelated random variables with zero mean and variances 

l 1 , . . . , l p respectively. It is well known that the decomposition (3) is not unique and is 

not identiable. 

Now consider the situation with functional data, where we have a univariate 

stochastic process X(t) and the data are curves. Without loss of generality, we assume

$ $ $ 

E(X(t)) = 0. Let n(u , v) = Cov(X(u), X(v)) be the covariance function. Viewing the 

random function X(t) as a vector with innite dimension and tracing the idea behind of 

(2.1) and (2.2), it is natural to impose the following condition on n(u , v): there exists a 

series orthonormal functions f 1 ( · ), f 2 ( · ), . . . and m 1 m 2 . . . 0, such that the 

covariance function n(u , v) is given by 

n(u , v) = X¥ 

i = 1 

m i f i (u)f i (v), (4) 

(see Loeve, 1963). Here the m i play the role of l i , and the f i ( · ) play the role of 

elements in a i . Moreover, if f i ( · ) and m i , i = 1, 2, . . . , satisfy (4), we have a similar 

decomposition on X(t) as in (2.3). More specically, 

X(t) = X¥ 

i = 1 

á X, f i ñ f i (t) = X¥ 

i = 1 

h i f i (t), (5) 

where á f , gñ = „ f (t)g(t) dt. For i Þ j, and under some regularity conditions, we have 

… 

… 

¼ … … 

Eh i h j = E X(t)f i (t) dt X(t)f j (t) dt = EfX(t)X(s)g f i (t)f j (s) dt ds 

= 

= X¥ 

… … 

n(t, s)f i (t)f j (s) dt ds = X¥ 

k = 1 

k = 1 

… … 

m k f k (t)f i (t) dt f k (s)f j (s) ds = 0, 

m k 

… … 

f k (t)f k (s)f i (t)f j (s) dt ds 

and Eh i = 0, so Cov(h i , h j ) = 0. This implies that h 1 , h 2 , . . . are uncorrelated. Similarly, 

for each h i , 

Var(h i ) = X¥ … … 

m k f k (t)f i (t) dt f k (s)f i (s) ds = m i . 

k = 1 

Estimating the covariance function with functional data 249 

Hence, (5) can be regarded as an extension of (3). Similarly to the multivariate case, the 

representations in (4) and (5) are neither unique nor identiable. If, for any i Þ j, 

m i Þ m j , then f i ( · ) is identiable, except for the change of sign. 

Our development is not hindered by the non-identication of (5), because our 

interest is in how to estimate n(u , v) and how to nd the orthonormal functions 

f 1 ( · ), f 2 ( · ), . . . that satisfy (4), from the observed functional data. We will call these 

orthonormal functions f 1 ( · ), f 2 ( · ), . . . the eigenfunctions, and the corresponding 

m 1 , m 2 , . . . the eigenvalues of n(u , v). 

3. A two-step estimation method 

First, let us motivate our method with a p-dimensional random vector X. Let ˆl 1 $ . . . $ ˆl p 

be the eigenvalues of the sample covariance matrix and â i = (â 1i , . . . , â pi ) T be the 

eigenvector corresponding to ˆl i . An estimate of the (i, j )th element of S is given by 

ĵ i j = Xp 

k = 1 

ˆl k â i k â j k . 

The idea will be used to handle functional data as follows.


Now suppose we have a univariate stochastic process X(t); without loss of 

generality, we assume E(X(t)) = 0 and t [ [0, 1]. Let X 1 ( · ), . . . , X m ( · ) be a collection 

of m sample curves, each observed at t 1 , . . . , t n . More explicitly, we have the data set 

X h (t i ), i = 1, . . . , n, h = 1, . . . , m. For simplicity, we denote X h (t i ) by X i h . Obviously, for 

i, k = 1, . . . , n, and h, l = 1, . . . , m, we have EX i h = 0 and 

 

0, h Þ l, 

Cov(X i h , X kl ) = 

n(t i , t k ), h = l. 

Our procedure in estimating n(u, v), m i and f i ( · ) involves the following two steps. 

Step 1. Ignore temporarily the fact that the data are continuous functions and treat 

the problem as a standard principal components problem in multivariate analysis. 

Specically, let S be the sample covariance matrix with (i, j )th element 

1 

m 

X m 

k = 1 

X i k X j k . 

Compute n ˆm 1 $ . . . $ n ˆm n , the eigenvalues of S; and n ± 1/2 ˆûj = (n ± 1/2 ˆb1 j , . . . , n ± 1/2 ˆbn j ) T , 

the orthonormal eigenvector corresponding to the eigenvalue n ˆm j , for j = 1, . . . , n. 

Step 2. For each j, we treat ( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ) as a sample of size n from the 

regression model 

Y = f j (t) + « , (6) 

where Y is the ˆb. For each j = 1, . . . , n , smooth ˆb 1 j , . . . , ˆb n j with respect to t 1 , . . . , t n , and 

obtain an estimate of the eigenfunction, ˆf j ( · ). 

The procedure for estimating the eigenvalue m j of the covariance function n( · , · ) is 

as follows. For j = 1, 2, . . . , n, the j th eigenvalue of the covariance function of X(t) is the 

variance of 

h j = 

… 

X(t)f j (t) dt < 

and f j (t i ) can be estimated by ˆf j (t i ). Hence, 

Furthermore, let 

˜h kj = 1 n 

h j < 

X n 

i = 1 

The sample variance of ˜h j is given by 

dVar(h j ) = 1 m 

˜h j = 1 n 

X n 

i = 1 

1 

n 

X n 

i = 1 

X(t i ) ˆf j (t i ). 

X(t i )f j (t i ) 

X ki ˆfj (t i ), k = 1, . . . , m. 

X Á m 

˜h k j ± 1 X ! m 2 ˜h k j 

m 

. 

k = 1 

k = 1 

This gives an estimate for the variance of h j and hence the j th eigenvalue m j of the 

covariance function n(u, v). Finally, an estimate of n(u , v) is given by 

ˆn(u , v) = Xn 

j = 1 

ˆm j ˆfj (u) ˆf j (v). (7) 

The initial estimator ˆû j is a rough estimator for f j ( · ). In the second step, a much 

better estimator, ˆfj ( · ), is obtained via an application of a smoothing technique to

( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ). By this smoothing step, information from neighbouring points is 

pooled together to improve the efciency of the raw initial estimator. Many standard 

nonparametric methods, such as wavelet thresholding (Donoho, 1995; Donoho & 

Johnstone, 1995), spline smoothing (Eubank, 1988; Wahba, 1990; Green and Silverman, 

1994), or local polynomial modelling (Cleveland, 1979; Ruppert and Wand, 1994) can 

be used to nd ˆf j ( · ) in the second step. Therefore, the proposed two-step procedure is 

conceptually very simple; it just involves the computation of eigenvalues and eigenvectors 

of a sample covariance matrix using standard principal components analysis, 

and then a standard one-dimensional nonparametric smoothing. Implementation of 

this procedure is quite straightforward. 

A nonparametric method in estimating eigenvalues/eigenfunctions of a covariance 

structure with curve data has been developed by Rice and Silverman (1991). In their 

method, a class of estimates was obtained by bounding the roughness of the eigenfunction 

via maximizing u T Su subject to k u k = 1 and u T Du # g, where g is a smoothing 

parameter and D is a roughening matrix, say F T F with a second-differencing operator F. 

The smoothing parameter g was selected by cross-validation based on a prediction 

viewpoint. After obtaining the rst j eigenfunctions, the ( j + 1)th eigenfunction was 

obtained by repeating the procedure of obtaining the eigenfunction in the orthogonal 

complement of the space generated by the rst j eigenfunctions. This process was 

continued until all n eigenfunctions were obtained. Evaluation of g via cross-validation is 

required to obtain each of the eigenfunctions. Hence, this method is not two-step in 

nature and is quite different from our simple two-step procedure. 

Motivated by many of its nice properties, we use the local polynomial approach to 

compute ˆf j ( · ) in the second step of our procedure. It has been shown that odd-order 

polynomial ts are preferable to even order polynomial ts (Fan & Gijbels, 1996), so we 

use a local polynomial of odd order q to estimate the underlying function. For each point 

t 0 , we approximate the underlying function f j (t) locally by 

f j (t) < 

X q 

1 

k! f(k) j 

k = 0 

(t 0 )(t ± t 0 ) k = Xq 

k = 0 

, k (t ± t 0 ) k , (8) 

for t in a neighbourhood of t 0 . This leads to the following local least squares problem: 

minimize 

( ) 

X n 

2 

ˆb i j ± Xq 

, k (t i ± t 0 ) k K h j 

(t i ± t 0 ), (9) 

i = 1 

k = 0 


with respect to , k , k = 0, . . . , q, for a given kernel function K and a bandwidth h j , 

where K h ( · ) = K( · /h)/h. Let W(h) = diag(K h (t 1 ± t 0 ), . . . , K h (t n ± t 0 )), and 

0 

1 (t 1 ± t 0 ) . . . (t 1 ± t 0 ) q 1 

T q = 

. . . 

. . . 

. 

B 

C 

@ 

. . A . 

1 (t n ± t 0 ) . . . (t n ± t 0 ) q 

The solution of the least squares problem (9) gives the following estimate of the 

underlying function at t 0 . More specically, the estimate ˆf j (t 0 ) is equal to the rst 

element of the matrix: 

(T T q W(h j )T q ) ± 1 T T q W(h j ) ˆû j . (10) 

Since ˆû j , j = 1, . . . , n are orthonormal, after this local polynomial smoothing, ˆfj ( · ), 

j = 1, . . . , n are approximately orthonormal.


In the smoothing procedure, the bandwidth h j plays an important role. In general, a 

bandwidth h j = 0 basically results in interpolating the data, and hence leads to the most 

complex model. A bandwidth h j = ¥ corresponds to tting globally a polynomial of 

degree q, and hence leads to the simplest model. Although we use a data-driven method 

(see Fan and Gijbels, 1995; Zhang and Lee, 2000) to select the bandwidth in this paper, 

there are other procedures, such as cross-validation, which can be implemented by 

S-PLUS (see Venables and Ripley, 1999). 

As pointed out by Castro, Lawton, and Sylvestre (1986), if the prime points t i are not 

equally spaced, the following adjustment is required in the rst step of the two-step 

procedure to ensure the invariance of eigenfunctions with respect to different choices 

of design. Using the quadrature rule (see Baker, 1977), we obtain 

w i = 

… 1 

0 

l i (t) dt, where l i (t) = 

Y n 

j = 0, j Þ 

i 

t ± t j 

t i ± t j 

. (11) 

Let A = diag(w 1 , . . . , w n ), and ˆm i the i th eigenvalue of A 1/2 SA 1/2 . Let ˜û i be the i th 

orthonormal eigenvector of A 1/2 SA 1/2 corresponding to eigenvalue ˆm i . The initial rststep 

estimates are obtained via ˆû i = A ± 1/2 ˜ûi , i = 1, . . . , n. These estimates are then 

smoothed at the second step for obtaining the nal estimates of the eigenfunctions. For 

unequally spaced sampling all inner products must use the quadrature rule, so when we 

estimate the j th eigenvalue m j of the covariance function n(u , v), ˜h k j becomes 

˜h k j = Xn 

i = 1 

X ki ˆfj (t i )w i . 

The modication of the proposed two-step procedure for dealing with the unequally 

spaced case is minor. In the rst step, after obtaining the diagonal matrix A that contains 

the weights w i , we obtain the eigenvalues and orthonormal eigenvectors of A 1/2 SA 1/2 

and the initial rst-step estimate of the eigenfunction ˆb i , i = 1, . . . , n. At the second step 

for producing the nal estimates, the standard one-dimensional nonparametric smoothing 

is conducted on the basis of the regression model (6), with the sample 

( ˆb 1 j , t 1 ), . . . , ( ˆb n j , t n ) for each j. Again, this step can be completed by using a local 

polynomial data-driven method or by software packages such as SAS and S-PLUS. 

4. Numerical illustrations 

4.1. A simulation study 

The following example is used to illustrate the performance of our method for 

estimating covariance functions, their eigenfunctions and eigenvalues. Consider 

X(t) = 

p 

2 sin(pt)y1 + 

p 

2 cos(pt)y2 , t [ [0, 1]. 

Random variables y 1 and y 2 in this example are independently and normally distributed 

with mean zero. Their variances are equal to f2.0, 1.0g. The numbers of sample points 

for each sample curve were selected to be n = 15, 50, 100 and 200; and for each n we 

took m = 50, 100, and 200. Sample points of the sample curve were observed at (i) 

equally spaced points t i = i/(n + 1), i = 1, . . . , n; and (ii) unequally spaced points 

obtained from sorted random points simulated from a uniform distribution on [0, 1]. 

For cases (i) and (ii) with different n and m, the number of replications was 200.

pThe truep covariancep function pof X(t) in this example is: n(u, v) = 

2p2 

sin(pu) p 2 sin(pv) + 2 cos(pu) 2 cos(pv). The eigenfunctions are equal to 

f 2 sin(pt), 2 cos(pt)g. Note that these eigenfunctions are not unique; they allow 

sign differences. The mean integrated square error (MISE) 

… … 

D ± 1 E (ˆn(u , v) ± n(u , v)) 2 du dv, 

where the D is the support set of (u, v), is employed as the criterion to evaluate the 

performance of the estimator. 

The proposed procedure was used to estimate the covariance functions and nd 

their eigenfunctions and eigenvalues. Local linear t (q = 1) and local cubic t (q = 3) 

for the regression curve are considered, and the kernel function is taken to be the 

Epanechnikov kernel K(t) = 0.75(1 ± t 2 ) + . The MISEs of the estimators are reported in 

Table 1. The following phenomena are observed from this table: (i) Generally, the MISE 

improved signicantly with increase of sample size m. (ii) The MISE with n = 50 has a 

signicant improvement over the case with n = 15. However, changes in MISE are very 

small for situations with n larger than 50. Most probably, the reason for this phenomenon 

is that, with a signicantly large number of points, the information between 

adjacent points does not contribute in improving the accuracy of the estimator. 

Differences between linear t and cubic t are very small for large n. (iii) To achieve 

more accurate results, it is more important to have a large m than large n. (iv) The 

differences in MISE between the equally and unequally spaced cases are not substantial, 

especially with moderate sample sizes. Roughly speaking, performance of a cubic t for 

n = 15 and m = 50 is acceptable. 

Table 1. Mean integrated square errors in simulation study 


n 

m q 15 50 100 200 

Equally 50 1 0.316 0.274 0.275 0.275 

spaced 3 0.288 0.279 0.277 0.275 

100 1 0.198 0.137 0.137 0.137 

3 0.145 0.139 0.138 0.138 

200 1 0.130 0.063 0.063 0.062 

3 0.066 0.063 0.063 0.063 

Unequally 50 1 0.344 0.303 0.275 0.266 

spaced 3 0.325 0.305 0.277 0.266 

100 1 0.214 0.160 0.140 0.132 

3 0.178 0.159 0.141 0.132 

200 1 0.139 0.083 0.066 0.061 

3 0.093 0.080 0.066 0.061 

From the 200 replications, we select the estimator with the median performance for 

a further illustration of the performance of the procedure. Since results produced by 

cubic t are similar, only results obtained from the linear t are presented. Estimates of 

eigenvalues with linear t for some choices of n and m are presented in Table 2. As 

expected, under the same m, results for different large n are similar, so are not


presented. It seems that the eigenvalue estimates are reasonably accurate. Figure 1 

depicts the resulting estimated eigenfunctions with n = 200 and m = 200 for equally 

spaced and unequally spaced cases. It can be seen that the true and the estimated 

eigenfunctions are close to each other. To study empirically whether the orthogonality 

among the original components is affected by separate smoothing, we compute 

( ˆf j , ˆf j ¢ ) = Xn 

i = 1 

ˆf j (t i ) ˆf j ¢ (t i ) 

for all pairs of ˆf j and ˆf j ¢ . For the equally spaced case, ( ˆf 1 , ˆf 2 ) for (n, m) = (15, 50), 

(50, 50), (100, 100), and (200, 200) are equal to ± 3.65 ´ 10 ± 4 , ± 1.88 ´ 10 ± 6 , 

1.18 ´ 10 ± 7 and 1.69 ´ 10 ± 8 , respectively. It seems that for moderate sizes n and 

m, orthogonality is not seriously affected by separate smoothing of the different 

components. A similar phenomenon is also observed for the unequally spaced case. 

Table 2. Estimates of eigenvalues in simulation study 

(n, m) 

True 

value (15, 50) (50, 50) (100, 100) (200, 200) 

Equally 2.0 2.17 2.42 1.89 2.20 

spaced 1.0 1.47 0.73 0.98 1.05 

Unequally 2.0 3.05 1.77 2.07 2.12 

spaced 1.0 0.81 0.81 0.89 1.09 

4.2. Real examples 

We rst illustrate our method by an application to the tongue data given in Besse and 

Ramsay (1986, pp. 288–289). The data consist of 42 records of tongue dorsum movements 

collected by Munhall (1984) using an ultrasound sensing technique developed by 

Keller and Ostry (1983). It will be assumed that the interval of the observation has been 

normalized to be [0, p], and the sampled values are observations at 13 equally spaced 

points. See Besse and Ramsay (1986) for more detailed background on this data set. The 

proposed two-step procedure with a cubic t at the second step is used to nd the 

eigenfunctions and their eigenvalues of the covariance functions. 

Figure 2 describes the rst three eigenfunctions of the sample functions. Using 

… 

Var(X(t)) dt 

to measure the variation of the random function X(t), we nd that the rst three 

eigenfunctions account for 89.5%, 8.4%and 1.4%of the overall variation of the sample 

functions, respectively. These three eigenfunctions account for a total of 99.3%of the 

overall variance. It can be seen from Fig. 2 that the rst eigenfunction can be regarded as 

constant (compared with other eigenfunctions), the second is close to sin(t), and the 

third is close to cos(t). Our results are very similar to those obtained by Besse and 

Ramsay (1986), in which the eigenvalues respectively account for 90.0%, 8.3%and 1.5% 

of the variation, and the eigenfunctions are also very similar (see Besse & Ramsay, 1986, 

Fig. 5). At a reviewer’s suggestion, the estimated covariance function is shown in Fig. 3.


Figure 1. Estimated 

p 

eigenfunctions 

p 

for (a) the equally and (b) the unequally spaced cases. The solid 

curves are (i) 2 sin(pt), (ii) ± 2 cos(pt). 

Hence the analysis via the proposed two-step procedure also suggests the following 

model as in Besse and Ramsay (1986): 

X(t) = y 1 + y 2 sin(t) + y 3 cos(t) + e(t), 

and leads to the similar conclusion that the simple harmonic motion describes tongue 

dorsum behaviour adequately in [0, p]. 

As a second illustrative example, we reanalyse the curves of saggital hip angles in 

Rice and Silverman (1991). In this data set, n = 20 and m = 39, and observations are 

taken over a gait cycle consisting of one (double) step taken by each child; see Olshen, 

Biden, Wyatt, and Sutherland (1989) for more details. Using the proposed two-step 

procedure with a cubic t at the second step, we obtained the rst four eigenfunctions. 

The corresponding eigenvalues respectively account for 70.6%, 12.1%, 8.4%and 3.8%(a 

total of 94.9%) of the total variability. These results are very close to those reported in 

Rice and Silverman (1991). The eigenfunctions are depicted in Fig. 4. Comprising this 

gure with Fig. 2 in Rice and Silverman (1991), it is clear that eigenfunctions obtained 

from our procedure and their procedures are also similar. The fourth eigenfunction is 

not very close to the raw eigenvector, which may indicate that the amount of noise of 

the raw eigenvector ltered by the local polynomial smoothing is substantial. In any


Figure 2. The rst three eigenfunctions of the tongue data; the rst, second and third eigenfunctions 

are represented by the solid, dotted and dashed curves, respectively. 

Figure 3. Covariance function of the tongue data. The height of the surface indicates the amount of 

covariance.


Figure 4. First four raw eigenvectors (dots) and smoothed eigenfunction (solid curve) of the hip angle 

data. 

case, since it is the least important eigenfunction, statistical conclusions drawn on the 

basis of the analysis are not affected. For completeness, the estimated covariance 

function is displayed in Fig. 5. Detailed interpretations of the results, which are available 

in Rice and Silverman (1991), are not presented to save space. 

Our nal example is based on a variable selected from a study (Mackinnon et al., 

1991) of school smoking-prevention programmes based on social-psychological principles. 

These social inuence programmes were designed to teach social skills and to 

create a social environment less receptive to drug use. If they work as planned, then 

favourable changes in mediating variables for drug use and behavioural intentions are 

indicators of success. One of the main objectives of the study was to evaluate the impact 

of a social inuence based programme on the mediating variables it was designed to 

change. The data were obtained from public middle schools and junior high schools in 

midwestern states of USA. As an illustration, we considered a variable X(t) measuring 

students’ control of their cigarette consumption (the original question is: ‘If your best 

friend offered you a cigarette, how hard would it be to refuse the offer?’), measured at 

ve time points for students from 50 schools. At t = t 0 , all students were in the seventh 

grade; t 1 was six months later and the other time intervals were one year. Hence, the


Figure 5. Covariance function of the hip data. The height of the surface indicates the amount of 

covariance. 

time points are not equally spaced. Using the proposed procedure with an adjustment 

for unequally spaced time points, we found that the rst two eigenvalues respectively 

accounted for 68.3%and 17.0%of the total variability. The corresponding eigenfunctions 

obtained via a local polynomial smoothing with a linear t are displayed in Fig. 6. 

Roughly, these eigenfunctions can be regarded as quadratic. They reveal the change in 

students’ control of cigarette consumption over time. From the rst eigenfunction, it 

seems that the time that students’ best friend has the strongest inuence is between half 

and one year after seventh grade, after which the best friend’s inuence decreases. 

However, it should be pointed out that this is just an illustrative example with only ve 

time points, and its interpretation should be treated with caution. More data on more 

time points are required for a more substantive conclusion to be drawn. 

Several examples given in Ramsay and Silverman (1997) have been analysed. We 

observe that results obtained via the proposed two-step procedure are similar to those 

reported in Ramsay and Silverman (1997). 

5. Discussion 

Functional data analysis is particularly useful for investigating changes over time for 

characteristics that are measured continuously or repeatedly for each object. Examples 

of this type of research are longitudinal studies, and analyses of growth curves and 

learning curves. It is not straightforward to extend the classical multivariate data analysis 

to functional data analysis. As a complement to the existing work, this paper proposes a 

simple two-step procedure to estimate the covariance function and its eigenvalues and


Figure 6. The rst two eigenfunctions of the data on school smoking-prevention programme; the rst 

and second functions are represented by the solid and dotted curves, respectively. 

eigenfunctions. At the rst step, we compute the eigenvectors and eigenvalues of 

certain sample covariance matrices. This task can be completed easily and efciently 

using standard software. The second step only involves some one-dimensional nonparametric 

smoothing, simpler than the multi-dimensional smoothing required by other 

methods. Our simulation results indicate that the results produced are quite accurate. 

The proposed procedure also gives very similar results to those in Besse and Ramsay 

(1986) and Rice and Silverman (1991) in analysing some real examples. Hence, it is 

simple to understand, easy to implement, and efcient in producing reliable results. 

Owing to the optimal statistical properties associated with the local polynomial 

smoothing, we apply it to obtain the nal eigenfunctions at the second step of the 

proposed procedure. In fact, other methods such as spline smoothing can be used. The 

available code for most of these more traditional methods can be found in existing 

software such as SAS, S-PLUS and Matlab. For practitioners, the smoothing at the second 

step can be completed via the above standard software. Hence, the proposed procedure 

can be used by those with less technical training. Owing to the pioneering work of 

Ramsay and Dalzell (1991), Ramsay and Silverman (1997) and others, we expect that 

functional data analysis will gain in popularity, and that the standard software will 

include an option to perform local polynomial smoothing in the near future. 

Acknowledgements 

The research for this paper was partiallysupported by a grant from the Research Grants Council of 

the Hong Kong Special Administration Region, China (Project No. CVHK 4346/OIH), and a direct


grant from CVHK. The authors are greatly indebted to the Editor and two reviewers for some 

valuable comments for improvement of the paper, and thankful to Drs M. A. Pentz and C. P. Chou 

for providing the data in the last real example. The assistance of Esther L. S. Tam in preparing the 

manuscript is also acknowledged. 

References 

Baker, C. T. H. (1977). The numerical treatment of integral equations. Oxford: Clarendon Press. 

Besse, P., & Ramsay, J. O. (1986). Principal components analysis of sampled functions. 

Psychometrika, 51, 285–311. 

Brumback, B. A., & Rice, J. A. (1998). Smoothing spline models for the analysis of nested and 

crossed samples of curves (with discussion). Journal of the American Statistical Association, 

93, 961–994. 

Castro, P. E., Lawton, W. H., & Sylvestre, E. A. (1986). Principal modes of variation for process with 

continuous sample curves. Technometrics, 28, 329–337. 

Cheng, M. Y., Fan, J., & Marron, J. S. (1997). On automatic boundary corrections. Annals of 

Statistics, 25, 1691–1708. 

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal 

of the American Statistical Association, 74, 829 –836. 

Donoho, D. L. (1995). Nonlinear solution of linear inverse problems by wavelet-vaguelette 

decomposition. Applied and Computational Harmonic Analysis, 2, 101–126. 

Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet 

shrinkage. Journal of the American Statistical Association, 90, 1200–1224. 

Eubank, R. L. (1988). Spline smoothing and nonparametric regression. New York: Marcel Dekker. 

Fan, J., & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial tting: variable 

bandwidth and spatial adaptation. Journal of the Royal Statistical Society, Series B, 57, 

371 –394. 

Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. London: Chapman 

& Hall. 

Fan, J., & Marron, J. S. (1994). Fast implementation of nonparametric curve estimator. Journal of 

Computational and Graphical Statistics, 3, 35–56. 

Fan, J., & Zhang, W. (1999). Statistical estimation in varying coefcient models. Annals of 


Green, P. J., &Silverman, B. W. (1994). Nonparametric regression and generalized linear models: 

A roughness penalty approach. London: Chapman & Hall. 

Hart, J. D., &Wehrly, T. E. (1986). Kernel regression estimation using repeated measurements data. 

Journal of the American Statistical Association, 81, 1080–1088. 

Keller, E., & Ostry, D. J. (1983). Computerized measurement of tongue dorsum movements with 

pulsed-echo ultrasound. Journal of the Acoustical Society of America, 73, 1309–1315. 

Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the 

data are curves. Journal of the Royal Statistical Society, Series B, 55, 725–740. 

Loeve, M. (1963). Probability theory, 3rd ed. Princeton, N.J.: D. Van Nostrand. 

Mackinnon, D. P., Johnson, C. A., Pentz, M. A., Dwyer, J. H., Hansen, W. B., Flay, B. R., & Wang, 

E. Y.-I. (1991). Mediating mechanism in a school-based drug prevention program: First year 

effects of the midwestern prevention project. Health Psychology, 10, 164–172. 

Munhall, K. G. (1984). Temporal adjustment in speech motor control: Evidence from laryngeal 

kinematics. Unpublished doctoral dissertation, McGill University. 

Olshen, R. A., Biden, E. N., Wyatt, M. P., &Sutherland, D. H. (1989). Gait analysis and the bootstrap. 

Annals of Statistics, 17, 1419–1440. 

Ramsay, J. O. (1982). When the data are functions. Psychometrika , 47, 379–396. 

Ramsay, J. O., & Dalzell, C. J. (1991). Some tools for functional data analysis (with discussions). 

Journal of the Royal Statistical Society, Series B, 53, 539–572.

Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. New York: Springer-Verlag. 

Ramsay, J. O., & Li, X. (1998). Curve registration. Journal of the Royal Statistical Society, Series B, 

60, 351–363. 

Rice, J. A., & Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically 

when the data are curves. Journal of the Royal Statistical Society, Series B, 

53, 233–243. 

Ruppert, D., & Wand, M. P. (1994). Multivariate weighted least squares regression. Annals of 


Venables, W. N., & Ripley, B. D. (1999). Modern applied statistics with S-PLUS, 3rd ed. New York: 

Springer-Verlag. 

Wahba, G. (1990). Spline models for observing data. Philadelphia: Society for Industrial and 

Applied Mathematics. 

Zhang, W., &Lee, S.-Y. (2000). Variable bandwidth selection in varying coefcient models. Journal 

of Multivariate Analysis, 74, 116–134. 

Received 22 August 2000; revised version received 14 August 2001 

Estimating the covariance function with functional data 261

Estimating the covariance function with functional data - Statistics ...

Create successful ePaper yourself

Delete template?

Save as template?