10.07.2015 Views

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>np</strong>plregbw 79Details<strong>np</strong>plregbw implements a variety of methods for no<strong>np</strong>arametric regression on multivariate (qvariate)explanatory data defined over a set of possibly continuous and/or discrete (unordered, ordered)data. <strong>The</strong> approach is based on Li and Racine (2003) who employ ‘generalized productkernels’ that admit a mix of continuous and discrete datatypes.Three classes of kernel estimators for the continuous datatypes are available: fixed, adaptive nearestneighbor,and generalized nearest-neighbor. Adaptive nearest-neighbor bandwidths change witheach sample realization in the set, x i , when estimating the density at the point x. Generalizednearest-neighbor bandwidths change with the point at which the density is estimated, x. Fixedbandwidths are constant over the support of x.<strong>np</strong>plregbw may be invoked either with a formula-like symbolic description of variables on whichbandwidth selection is to be performed or through a simpler interface whereby data is passed directlyto the function via the xdat, ydat, and zdat parameters. Use of these two interfaces ismutually exclusive.Data contained in the data frame zdat may be a mix of continuous (default), unordered discrete(to be specified in the data frame zdat using factor), and ordered discrete (to be specified in thedata frame zdat using ordered). Data can be entered in an arbitrary order and data types will bedetected automatically by the routine (see <strong>np</strong> for details).Data for which bandwidths are to be estimated may be specified symbolically. A typical descriptionhas the form dependent data ~ parametric explanatory data | no<strong>np</strong>arametricexplanatory data, where dependent data is a univariate response, and parametricexplanatory data and no<strong>np</strong>arametric explanatory data are both series of variablesspecified by name, separated by the separation character ’+’. For example, y1 ~ x1 + x2| z1 specifies that the bandwidth object for the partially linear model with response y1, linearparametric regressors x1 and x2, and no<strong>np</strong>arametric regressor z1 is to be estimated. See below forfurther examples.A variety of kernels may be specified by the user. Kernels implemented for continuous datatypesinclude the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and theuniform kernel. Unordered discrete datatypes use a variation on Aitchison and Aitken’s (1976)kernel, while ordered datatypes use a variation of the Wang and van Ryzin (1981) kernel.Valueif bwtype is set to fixed, an object containing bandwidths (or scale factors if bwscaling =TRUE) is returned. If it is set to generalized_nn or adaptive_nn, then instead the kthnearest neighbors are returned for the continuous variables while the discrete kernel bandwidths arereturned for the discrete variables. Bandwidths are stored in a list under the component name bw.Each element is an rbandwidth object. <strong>The</strong> first element of the list corresponds to the regressionof Y on Z. Each subsequent element is the bandwidth object corresponding to the regression of theith column of X on Z. See examples for more information.Usage IssuesIf you are using data of mixed types, then it is advisable to use the data.frame function toconstruct your i<strong>np</strong>ut data and not cbind, since cbind will typically not work as intended onmixed data types and will coerce the data to the same type.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!