10.07.2015 Views

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>np</strong> 9A variety of bandwidth methods are implemented including fixed, nearest-neighbor, and adaptivenearest-neighbor.A variety of data-driven methods of bandwidth selection are implemented, while the user can specifytheir own bandwidths should they so choose (either a raw bandwidth or scaling factor).A flexible plotting utility, <strong>np</strong>plot, facilitates graphing of multivariate objects. An example forcreating postscript graphs using the <strong>np</strong>plot utility and pulling this into a LaTeX document isprovided.<strong>The</strong> function <strong>np</strong>ksum allows users to create or implement their own kernel estimators or testsshould they so desire.<strong>The</strong> underlying functions are written in C for computational efficiency. Despite this, due to theirnature, data-driven bandwidth selection methods involving multivariate numerical search can betime-consuming, particularly for large datasets. A version of this package using the Rmpi wrapperis under development that allows one to deploy this software in a clustered computing environmentto facilitate computation involving large datasets.To cite the <strong>np</strong> package, type citation("<strong>np</strong>") from within R for details.Details<strong>The</strong> kernel methods in <strong>np</strong> employ the so-called ‘generalized product kernels’ found in Hall, Racine,and Li (2004), Li and Racine (2003), Li and Racine (2004), Li and Racine (2007), Ouyang, Li, andRacine (2006), and Racine and Li (2004), among others. For details on a particular method, kindlyrefer to the original references listed above.We briefly describe the particulars of various univariate kernels used to generate the generalizedproduct kernels that underlie the kernel estimators implemented in the <strong>np</strong> package. In a nutshell,the generalized kernel functions that underlie the kernel estimators in <strong>np</strong> are formed by taking theproduct of univariate kernels such as those listed below. When you cast your data as a particular type(continuous, factor, or ordered factor) in a data frame or formula, the routines will automaticallyrecognize the type of variable being modelled and use the appropriate kernel type for each variablein the resulting estimator.Second Order Gaussian (x is continuous) k(z) = exp(−z 2 /2)/ √ 2π, where z = (x i − x)/h,and h > 0.Second Order Epanechnikov (x is continuous) k(z) = 3 ( 1 − z 2 /5 ) /(4 √ 5) if z 2 < 5, 0 otherwise,where z = (x i − x)/h, and h > 0.Uniform (x is continuous) k(z) = 1/2 if |z| < 1, 0 otherwise, where z = (x i − x)/h, and h > 0.Aitchison and Aitken (x is a (discrete) factor) l(x i , x, λ) = 1 − λ if x i = x, and λ/(c − 1) ifx i ≠ x, where c is the number of (discrete) outcomes assumed by the factor x. Note that λmust lie between 0 and (c − 1)/c.Wang and van Ryzin (x is a (discrete) ordered factor) l(x i , x, λ) = 1 − λ if |x i − x| = 0, and(1 − λ)λ |xi−x| /2 if |x i − x| ≥ 1. Note that λ must lie between 0 and 1.Li and Racine (x is a (discrete) factor) l(x i , x, λ) = 1 if x i = x, and λ if x i ≠ x. Note that λmust lie between 0 and 1.Li and Racine (x is a (discrete) ordered factor) l(x i , x, λ) = 1 if |x i − x| = 0, and λ |xi−x| if|x i − x| ≥ 1. Note that λ must lie between 0 and 1.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!