10.07.2015 Views

To the Graduate Council: I am submitting herewith a thesis written by ...

To the Graduate Council: I am submitting herewith a thesis written by ...

To the Graduate Council: I am submitting herewith a thesis written by ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4: Algorithm Overview 55The problem with using this closed form solution is <strong>the</strong> dependence of <strong>the</strong> optimalbandwidth on <strong>the</strong> second derivative of <strong>the</strong> density function f that we are trying tocompute. By using <strong>the</strong> Gaussian kernel for our implementation we have ensured <strong>the</strong>differentiability of <strong>the</strong> estimated density and also justified <strong>the</strong> reason for not choosing<strong>the</strong> naïve estimator or its rugged counterparts.Two popular but quick and simple bandwidth selectors are based on <strong>the</strong> normal scalerule and maximum smoothing principle. For ex<strong>am</strong>ple, an easy approach would be tomake use of a standard f<strong>am</strong>ily of distributions to assign a value to <strong>the</strong> doublederivative term. In Equation 4.12 we assume normal density and compute <strong>the</strong> secondderivative. This method can lead to gross errors in cases when <strong>the</strong> data is notdistributed <strong>the</strong> way it was assumed.f2−52−5"(x)dx = σ φ"(x)dx ≈ 0.212σ(4.16)The rationale behind <strong>the</strong> principle of cross validation is to use <strong>the</strong> s<strong>am</strong>e dataset toextract data points partially as a construction set and a training set. A model is fitassuming <strong>the</strong> correctness of <strong>the</strong> training dataset and is tested for accuracy with <strong>the</strong>construction dataset. The error in <strong>the</strong> estimate is minimized <strong>by</strong> defining a cost functionof <strong>the</strong> error. Based on <strong>the</strong> construction of <strong>the</strong> cost function, methods are n<strong>am</strong>ed as leastsquares cross validation, biased cross validation and likelihood cross validation. Moreadvanced bandwidth selectors are <strong>the</strong> plug-in and <strong>the</strong> bootstrap methods that “plug-in”estimates of <strong>the</strong> unknown quantities that appear in <strong>the</strong> formulae for asymptoticallyoptimal bandwidth. Bootstrap methods make use of a pilot bandwidth to initialize <strong>the</strong>density estimation process and improve <strong>the</strong> pilot bandwidth based on <strong>the</strong> data. InEquation 4.15, we show <strong>the</strong> plug-in method of bandwidth selection. Plug-in methodsinvolve <strong>the</strong> estimation of <strong>the</strong> integrated squared density derivatives called functionals.15 243R(K ) = =2 =2ĥσ where R( K ) K( t ) dt, µ 2(K )t K( t ) dt235µ2(K ) n[ ](4.17)and σ = med j | X j − medi( X i ) | is <strong>the</strong> absolute deviation. (4.18)We discuss implementation issues in <strong>the</strong> next chapter. Our next building block is <strong>the</strong>information measure on <strong>the</strong> accurate density of curvature estimated using <strong>the</strong>bandwidth optimized kernel density estimators.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!