11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ANALYSIS USING R 1478.3 Analysis Using RThe R function density can be used <strong>to</strong> calculate kernel density estima<strong>to</strong>rswith a variety of kernels (window argument). We can illustrate the function’suse by applying it <strong>to</strong> the geyser data <strong>to</strong> calculate three density estimates ofthe data and plot each on a his<strong>to</strong>gram of the data, using the code displayedwith Figure 8.4. The hist function places an ordinary his<strong>to</strong>gram of the geyserdata in each of the three plotting regions (lines 4, 10, 17). Then, the densityfunction with three different kernels (lines 8, 14, 21, with a Gaussian kernelbeing the default in line 8) is plotted in addition. The rug statement simplyplaces the observations in vertical bars on<strong>to</strong> the x-axis. All three densityestimates show that the waiting times between eruptions have a distinctlybimodal form, which we will investigate further in Subsection 8.3.1.For the bivariate star data in Table 8.2 we can estimate the bivariate densityusing the bkde2D function from package KernSmooth (Wand and Ripley,2009). The resulting estimate can then be displayed as a con<strong>to</strong>ur plot (usingcon<strong>to</strong>ur) or as a perspective plot (using persp). The resulting con<strong>to</strong>ur plotis shown in Figure 8.5, and the perspective plot in 8.6. Both clearly show thepresence of two separated classes of stars.8.3.1 A Parametric Density Estimate for the Old Faithful DataIn the previous section we considered the non-parametric kernel density estima<strong>to</strong>rsfor the Old Faithful data. The estima<strong>to</strong>rs showed the clear bimodalityof the data and in this section this will be investigated further by fitting aparametric model based on a two-component normal mixture model. Suchmodels are members of the class of finite mixture distributions described ingreat detail in McLachlan and Peel (2000). The two-component normal mixturedistribution was first considered by Karl Pearson over 100 years ago(Pearson, 1894) and is given explicitly byf(x) = pφ(x,µ 1 , σ 2 1) + (1 − p)φ(x,µ 2 , σ 2 2)where φ(x,µ,σ 2 ) denotes a normal density with mean µ and variance σ 2 .This distribution has five parameters <strong>to</strong> estimate, the mixing proportion, p,and the mean and variance of each component normal distribution. Pearsonheroically attempted this by the method of moments, which required solvinga polynomial equation of the 9 th degree. Nowadays the preferred estimationapproach is maximum likelihood. The following R code contains a function <strong>to</strong>calculate the relevant log-likelihood and then uses the optimiser optim <strong>to</strong> findvalues of the five parameters that minimise the negative log-likelihood.R> logL

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!