10.07.2015 Views

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>np</strong>cdensbw 27data. <strong>The</strong> approach is based on Li and Racine (2004) who employ ‘generalized product kernels’that admit a mix of continuous and discrete datatypes.<strong>The</strong> cross-validation methods employ multivariate numerical search algorithms (direction set (Powell’s)methods in multidimensions).Bandwidths can (and will) differ for each variable which is, of course, desirable.Three classes of kernel estimators for the continuous datatypes are available: fixed, adaptive nearestneighbor,and generalized nearest-neighbor. Adaptive nearest-neighbor bandwidths change witheach sample realization in the set, x i , when estimating the density at the point x. Generalizednearest-neighbor bandwidths change with the point at which the density is estimated, x. Fixedbandwidths are constant over the support of x.<strong>np</strong>cdensbw may be invoked either with a formula-like symbolic description of variables on whichbandwidth selection is to be performed or through a simpler interface whereby data is passed directlyto the function via the xdat and ydat parameters. Use of these two interfaces is mutuallyexclusive.Data contained in the data frames xdat and ydat may be a mix of continuous (default), unordereddiscrete (to be specified in the data frames using factor), and ordered discrete (to be specified inthe data frames using ordered). Data can be entered in an arbitrary order and data types will bedetected automatically by the routine (see <strong>np</strong> for details).Data for which bandwidths are to be estimated may be specified symbolically. A typical descriptionhas the form dependent data ~ explanatory data, where dependent dataand explanatory data are both series of variables specified by name, separated by the separationcharacter ’+’. For example, y1 + y2 ~ x1 + x2 specifies that the bandwidths for thejoint distribution of variables y1 and y2 conditioned on x1 and x2 are to be estimated. See belowfor further examples.A variety of kernels may be specified by the user. Kernels implemented for continuous datatypesinclude the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and theuniform kernel. Unordered discrete datatypes use a variation on Aitchison and Aitken’s (1976)kernel, while ordered datatypes use a variation of the Wang and van Ryzin (1981) kernel.Value<strong>np</strong>cdensbw returns a conbandwidth object, with the following components:xbwybwfvalbandwidth(s), scale factor(s) or nearest neighbours for the explanatory data,xdatbandwidth(s), scale factor(s) or nearest neighbours for the dependent data, ydatobjective function value at minimumif bwtype is set to fixed, an object containing bandwidths (or scale factors if bwscaling =TRUE) is returned. If it is set to generalized_nn or adaptive_nn, then instead the kthnearest neighbors are returned for the continuous variables while the discrete kernel bandwidths arereturned for the discrete variables.<strong>The</strong> functions summary and plot support objects of type conbandwidth.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!