25.12.2014 Views

Time Series - STAT - EPFL

Time Series - STAT - EPFL

Time Series - STAT - EPFL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Time</strong> <strong>Series</strong><br />

Anthony Davison<br />

c○2008<br />

http://stat.epfl.ch<br />

Introduction 2<br />

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

CO 2 at Mauna Loa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

Northern hemisphere temperatures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

Arosa ozone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

Eskdalemuir rainfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br />

Japanese earthquakes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

Beaver body temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

Seal position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

Seal position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

Infectious diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

FTSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

European stock markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

<strong>Time</strong>-course microarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

Course details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

Mathematical basics 20<br />

Stochastic process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

Continuous/discrete time models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

Measures of dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

Moving average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

Random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

Periodic series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

1


Introduction slide 2<br />

Motivation<br />

□ In basic data analysis we make often assume that observations are independent or even<br />

independent identically distributed,<br />

X 1 ,...,X n<br />

ind<br />

∼ F 1 ,...,F n ,<br />

X 1 ,... ,X n<br />

iid ∼ N(µ,σ 2 ).<br />

□ <strong>Time</strong> series is the study of observations that arise in some order (almost always time) and which<br />

as a result are dependent.<br />

□ There are many more ways to be dependent than to be independent, and almost all data are<br />

collected in time order, so time series arise in a vast range of disciplines: economics; finance;<br />

marketing; epidemiology; biomedicine; genomics; environmental science; computer science;<br />

electrical engineering; physics; ...<br />

□ Many of these disciplines have developed special techniques for their data types, and we will only<br />

scratch the surface of them in this course, by surveying some main ideas.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 3<br />

CO 2 at Mauna Loa<br />

Monthly levels of carbon dioxide (ppm) at Mauna Loa (Hawaii) from March 1958 to July 2007.<br />

co2<br />

320 340 360 380<br />

1960 1970 1980 1990 2000<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 4<br />

2


Northern hemisphere temperatures<br />

Temperature anomaly ( ◦ C) for 0–1979 relative to 1961–1990 instrumental average.<br />

Temperature anomaly (C)<br />

−1.5 −1.0 −0.5 0.0 0.5<br />

0 500 1000 1500 2000<br />

Year (AD)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 5<br />

Arosa ozone<br />

Annual average and daily total atmospheric ozone measurements (in Dobson units) at Arosa.<br />

Annual average ozone (DU)<br />

300 330 360<br />

Daily ozone (DU)<br />

200 350 500<br />

1940 1950 1960 1970 1980 1990 2000<br />

<strong>Time</strong><br />

1940 1950 1960 1970 1980 1990 2000<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 6<br />

3


Eskdalemuir rainfall<br />

Hourly rainfall totals at Eskdalemuir, in the south of Scotland<br />

Hourly rainfall (0.1 mm)<br />

0 50 100 150<br />

1975 1976 1977 1978 1979 1980<br />

Year<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 7<br />

Japanese earthquakes<br />

<strong>Time</strong>s and magnitudes of earthquakes with epicentre less than 100km in an offshore region west of<br />

the main Japanese island of Honshū and south of the northern island of Hokkaidō. The figure shows<br />

all 483 earthquakes of magnitude 6 or more on the Richter scale in the period 1885–1980, about 5<br />

tremors per year, in one of the most seismically active areas of Japan.<br />

Magnitude (Richter units)<br />

6.0 6.5 7.0 7.5 8.0 8.5<br />

0 5000 10000 15000 20000 25000 30000 35000<br />

Days since 1 January 1885<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 8<br />

4


Beaver body temperature<br />

100 consecutive telemetric measurements on the body temperature of a female Canadian beaver,<br />

Castor canadensis, taken at 10-minute intervals. The animal remained in its lodge for the first 38<br />

recordings and then moved outside, at which point there was a sustained temperature rise.<br />

Body temperature (C)<br />

36.5 37.0 37.5 38.0 38.5<br />

0 20 40 60 80 100<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 9<br />

Seal position<br />

Hawaiian monk seals, Monachus schauinslandi, number around 1300, are endemic to the Hawaiian<br />

Islands and are the most endangered species of marine mammal that lives entirely within the<br />

jurisdiction of the United States. The species has been declining partly owing to poor juvenile survival<br />

which is evidently related to poor foraging success. Data have been collected recently on the foraging<br />

habitats, movements, and behaviors of monk seals throughout the Northwestern and main Hawaiian<br />

Islands.<br />

The central Hawaiian islands<br />

22.0<br />

Kauai<br />

Oahu<br />

21.5<br />

latitude (degrees)<br />

21.0<br />

Molokai<br />

Maui<br />

Pacific Ocean<br />

20.5<br />

−159 −158 −157 −156<br />

longitude (degrees)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 10<br />

5


Seal position<br />

Journey of a juvenile female (4-5 years old) Hawaiian monk seal while she foraged and occasionally<br />

hauled out ashore. She was tagged and released at the southwest corner of Molokai, and tracked from<br />

13 April 2004 through 27 July 2004, using a satellite-linked radio transmitter glued to her dorsal<br />

pelage to document geographic and vertical movements as proxies of foraging behavior.<br />

Seal’s motion between pairs of the well−determined points<br />

100<br />

northing (km)<br />

80<br />

60<br />

40<br />

240 260 280 300 320 340 360<br />

easting (km)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 11<br />

Infectious diseases<br />

Weekly counts of new influenza and meningococcal infections in Germany 2001–2006.<br />

Meningococcus Influenza<br />

0 20 0 1000<br />

2001 2002 2003 2004 2005 2006 2007<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 12<br />

6


FTSE<br />

The Financial <strong>Time</strong>s Stock Exchange Index, 1991–1998.<br />

FTSE Index<br />

3000 4000 5000 6000<br />

1992 1993 1994 1995 1996 1997 1998<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 13<br />

European stock markets<br />

The rise of European stock markets, 1991–1998.<br />

EuStockMarkets<br />

FTSE CAC SMI DAX<br />

3000 6000 1500 3500<br />

2000 6000 2000 5000<br />

1992 1993 1994 1995 1996 1997 1998<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 14<br />

<strong>Time</strong><br />

7


<strong>Time</strong>-course microarray<br />

Expression levels for 2771 genes/sequence tages spotted on acDNA microarray, relating to gene<br />

transcription in the immune response of Anopheline mosquitoes.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 15<br />

<strong>Time</strong>-course microarray<br />

A clustering of the mosquito genes based on the time courses.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 16<br />

8


Comments<br />

□ The measurements can be continuous (temperatures) or discrete (infectious disease counts) or a<br />

mixture (rainfall), scalar or vector (seal position)<br />

□ Mostly they are at regular intervals, but some are intermittent (quakes)<br />

□ Some are instantaneous values, others are integrals (rain)<br />

□ Some series exhibit strong trend and/or seasonality (CO 2 , infectious diseases, stock markets)<br />

□ There can be missing observations and/or possible outliers (ozone, rainfall)<br />

□ <strong>Series</strong> can be long (rainfall, temperatures) or very short (microarray)<br />

□ May be one or a few or many series<br />

□ Focus may be<br />

– a possible change in the underlying series, so the dependence is a nuisance (e.g. ozone,<br />

temperatures, beaver, microarray)<br />

– dependence/interaction within or between series (quakes, diseases, stock markets)<br />

– rare events (stock markets, rainfall, ozone)<br />

– comparison of parallel series (microarray)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 17<br />

Objectives<br />

Typically we have in mind one or more of the following general objectives:<br />

□ Description<br />

– Want a ‘simple’ summary of the series<br />

□ Analysis<br />

– Construct stochastic model(s), and try and answer questions with them<br />

– Model will reflect knowledge about phenomenon under study and complexity of data available<br />

– May just need to accommodate dependence as part of larger analysis<br />

□ Monitoring/Control<br />

– Blood pressure/chemical reactor temperature must be kept between x 0 and x 1<br />

– Aim to detect changes as they occur and to influence process in real time<br />

□ Forecasting<br />

– ‘What will the market do tomorrow’ ‘What will the economy do next year’<br />

– Sometimes the model is not so important (though economic models may be complex)<br />

– Often combine different models to get best forecasts (‘model averaging’)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 18<br />

9


Course details<br />

□ Place: MA30 (= MA A3 30)<br />

□ Lectures 8.15–10.00, Monday 29 September 2008 onwards<br />

□ Exercises 10.15–12.00, Monday 29 September 2008 onwards<br />

□ Main reference: Shumway and Stoffer (2006) <strong>Time</strong> <strong>Series</strong> Analysis and its Applications. Second<br />

edition. Springer-Verlag<br />

□ Form of exam not yet determined (probably written, may be some project component)<br />

□ Notes and exercises can be downloaded from course web page<br />

http://stat.epfl.ch/page32112.html<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 19<br />

Mathematical basics slide 20<br />

Stochastic process<br />

Definition 1 (a) A stochastic process {Y t } t∈T with index set T is a family of random variables<br />

defined on a probability space (Ω, F,P).<br />

(b) A realisation of {Y t } is the outcome {y t } t∈T = {Y t (ω)} t∈T for some ω ∈ Ω.<br />

The index set T :<br />

□ most models have index set T = R, R + , or Z; item owing to digitisation, T cannot in practice<br />

contain a sub-interval of R, but the time step ∆ can be very small in some applications;<br />

□ (almost-)continuous time series can be thinned by subsampling at the points of a grid, or, in some<br />

cases, by integration over intervals of width ∆ (e.g. rainfall data);<br />

□ for general discussion of time series we take T = Z, so that Y t is recorded at times 0, ±1, ±2,...,<br />

and write a realisation of the process observed for a finite period as y 1 ,...,y n .<br />

□ Intermittent time series involve index sets T that are not regular grids.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 21<br />

10


Continuous/discrete time models<br />

Two main situations:<br />

□ available data are part of a random sequence {Y t }, for which time t takes only integer values,<br />

i.e. t ∈ Z. Thus Y t does not exist at (say) t = 0.5;<br />

□ available data are values of a random function {Y (t)} that exists for all t ∈ R (or R + ) but is<br />

only observed at a limited number of times.<br />

In some cases (e.g. rainfall, with the time unit being hours) the observed data are<br />

∫ t<br />

t−1<br />

Y (t)dt.<br />

In this case, and particularly if we will want to use different time scales, there is a good case for<br />

building a continuous-time model but estimating its parameters etc. using the cumulated/discrete<br />

time data. Otherwise conclusions made for different time scales may be incoherent.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 22<br />

Measures of dependence<br />

Definition 2 Let {Y t } t∈T be a stochastic process. Then<br />

(a) if E(|Y t |) < ∞, then we define the mean (or expectation) of the process to be µ t = E(Y t ). If<br />

non-constant µ t is sometimes called the trend;<br />

(b) if var(Y t ) < ∞ for all t ∈ T , then we define the (auto)covariance function of the process to be<br />

γ(s,t) = cov(Y s ,Y t ) = E {(Y s − µ s )(Y t − µ t )} , s,t ∈ T ,<br />

and the (auto)correlation function of the process to be<br />

ρ(s,t) =<br />

γ(s,t)<br />

{γ(s,s)γ(t,t)}<br />

1/2,<br />

s,t ∈ T .<br />

□ Note that var(Y t ) = cov(Y t ,Y t ) = γ(t,t).<br />

□ The Cauchy–Schwarz inequality gives |ρ(s,t)| ≤ 1 for all s,t ∈ T , with ρ(t,t) = 1 for all t.<br />

□ The function γ(s,t) is semi-definite positive: ∑ a i a j γ(t t ,t j ) ≥ 0 for any a 1 ,... ,a k and any<br />

{t 1 ,... ,t k } ⊂ T .<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 23<br />

11


Stationarity<br />

If S is a set, then we use u + S to denote the set {u + s : s ∈ S}, and Y S to denote the set of<br />

random variables {Y s : s ∈ S}.<br />

Definition 3 A stochastic process {Y t } t∈T is said to be<br />

(a) strictly stationary if for any finite subset S ⊂ T and any u such that u + S ⊂ T , the joint<br />

distributions of Y S+u and Y S are the same;<br />

(b) second-order stationary (or weakly stationary) if the mean µ t is constant and the covariance<br />

function γ(s,t) depends only on t − s.<br />

□ When T = Z = {0, ±1, ±2,...} and the process is stationary,<br />

say, where h is called the lag.<br />

γ(t,t + h) = γ(0,h) = γ(0, −h) ≡ γ |h| = γ h , h ∈ Z,<br />

□ Similarly, we can write ρ(t,t + h) ≡ ρ |h| = ρ h , say, for h ∈ Z.<br />

□ Thus in the stationary case the covariance and correlation functions are symmetric around h = 0.<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 24<br />

Stationarity<br />

□ In practice strict stationarity is impossible to verify, and many computations require only<br />

second-order stationarity.<br />

□ Hereafter ‘stationary’ will mean second-order stationary, when used without comment.<br />

□ We can also define third- and higher-order stationarity by extending (b) to higher moments.<br />

□ In practice we often preprocess the data, by removing trend/seasonality, and model the processed<br />

series using a stationary stochastic process.<br />

□ However treating variation as random or as trend depends on the purpose of analysis. Consider<br />

the figure below, or the temperature data ...<br />

Y(t)<br />

0.0 0.5 1.0 1.5 2.0 2.5<br />

Y(t)<br />

0.0 0.5 1.0 1.5 2.0 2.5<br />

0 20 40 60 80 100<br />

t<br />

0 2 4 6 8 10<br />

t<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 25<br />

12


White noise<br />

Definition 4 A stochastic process {Y t } is called white noise if its elements are all uncorrelated, with<br />

mean E(Y t ) = 0 and variance var(Y t ) = σ 2 .<br />

If in addition the Y t are normally (Gaussianly) distributed, then we have Gaussian white noise,<br />

iid<br />

Y t ∼ N(0,σ 2 ).<br />

The term ‘white’ comes from an analogy with white light, and indicates that all frequencies are<br />

equally present ...<br />

−3 −1 1 3<br />

0 100 200 300 400 500<br />

<strong>Time</strong><br />

−3 −1 1 3<br />

0 100 200 300 400 500<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 26<br />

Moving average<br />

□ The panels on the previous page showed Gaussian white noise {ε t } above, and a smoothed version<br />

Y t = 1 3 (ε t + ε t−1 + ε t−2 ).<br />

□ Averaging reduces the variance, and introduces correlation in {Y t }.<br />

Example 5 Compute the autocorrelation function of the above moving average and show that it is<br />

stationary. Discuss the figure below.<br />

Y_{t+1}<br />

−1.5 −0.5 0.5 1.5<br />

Y_{t+2}<br />

−1.5 −0.5 0.5 1.5<br />

Y_{t+3}<br />

−1.5 −0.5 0.5 1.5<br />

−1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5<br />

Y_t<br />

Y_t<br />

Y_t<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 27<br />

13


Random walk<br />

Example 6 Let T = {0,1,2,...}, let {ε t } be white noise, let Y 0 = 0, and define<br />

Y t = Y t−1 + ε t , t = 1,2,... .<br />

Show that this is not a stationary time series.<br />

Y<br />

0 10 20 30 40<br />

0 100 200 300 400 500<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 28<br />

Autoregression<br />

Example 7 Let T = Z, let {ε t } be white noise, and define<br />

Y t = αY t−1 + ε t .<br />

This is an autoregressive process of order 1, AR(1), model. Find a necessary condition for it to be<br />

stationary. The graph below shows examples with α = ±0.9.<br />

Y(t)<br />

−4 −2 0 2 4<br />

Y(t)<br />

−6 −4 −2 0 2 4 6<br />

0 50 100 150 200 0 50 100 150 200<br />

<strong>Time</strong><br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 29<br />

14


Periodic series<br />

Example 8 Let {ε t } be white noise with unit variance, and define<br />

( ) 2πt<br />

Y t = cos + 5ε t , t = 1,... ,500.<br />

50<br />

This is a periodic signal obscured by noise. Compute its mean and autocorrelation function.<br />

−2 0 1 2<br />

0 100 200 300 400 500<br />

<strong>Time</strong><br />

−20 0 10<br />

0 100 200 300 400 500<br />

<strong>Time</strong><br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 30<br />

Summary<br />

Today we<br />

□ saw some examples of time series<br />

□ introduced some basic ideas:<br />

– use of stochastic process as model for time series<br />

– mean, covariance and correlation functions<br />

– stationary and strictly stationary series<br />

– white noise<br />

– simple examples: moving average, random walk, autoregression, periodic series<br />

Next week:<br />

□ simple approaches to removing systematic variation (trend and seasonality)<br />

<strong>Time</strong> <strong>Series</strong> Autumn 2008 – slide 31<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!