10.07.2015 Views

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

44 <strong>np</strong>udensbwreminitmaxftoltolsmalla logical value which when set as TRUE the search routine restarts from locatedminima for a minor gain in accuracy. Defaults to TRUE.integer number of iterations before failure in the numerical optimization routine.Defaults to 10000.tolerance on the value of the cross-validation function evaluated at located minima.Defaults to 1.19e-07 (FLT_EPSILON).tolerance on the position of located minima of the cross-validation function.Defaults to 1.49e-08 (sqrt(DBL_EPSILON)).a small number, at about the precision of the data type used. Defaults to 2.22e-16 (DBL_EPSILON).DetailsValue<strong>np</strong>udensbw implements a variety of methods for choosing bandwidths for multivariate (p-variate)distributions defined over a set of possibly continuous and/or discrete (unordered, ordered) data.<strong>The</strong> approach is based on Li and Racine (2003) who employ ‘generalized product kernels’ thatadmit a mix of continuous and discrete datatypes.<strong>The</strong> cross-validation methods employ multivariate numerical search algorithms (direction set (Powell’s)methods in multidimensions).Bandwidths can (and will) differ for each variable which is, of course, desirable.Three classes of kernel estimators for the continuous datatypes are available: fixed, adaptive nearestneighbor,and generalized nearest-neighbor. Adaptive nearest-neighbor bandwidths change witheach sample realization in the set, x i , when estimating the density at the point x. Generalizednearest-neighbor bandwidths change with the point at which the density is estimated, x. Fixedbandwidths are constant over the support of x.<strong>np</strong>udensbw may be invoked either with a formula-like symbolic description of variables on whichbandwidth selection is to be performed or through a simpler interface whereby data is passed directlyto the function via the dat parameter. Use of these two interfaces is mutually exclusive.Data contained in the data frame dat may be a mix of continuous (default), unordered discrete(to be specified in the data frame dat using factor), and ordered discrete (to be specified in thedata frame dat using ordered). Data can be entered in an arbitrary order and data types will bedetected automatically by the routine (see <strong>np</strong> for details).Data for which bandwidths are to be estimated may be specified symbolically. A typical descriptionhas the form ~ data, where data is a series of variables specified by name, separated by theseparation character ’+’. For example, ~ x + y specifies that the bandwidths for the jointdistribution of variables x and y are to be estimated. See below for further examples.A variety of kernels may be specified by the user. Kernels implemented for continuous datatypesinclude the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and theuniform kernel. Unordered discrete datatypes use a variation on Aitchison and Aitken’s (1976)kernel, while ordered datatypes use a variation of the Wang and van Ryzin (1981) kernel.<strong>np</strong>udensbw returns a bandwidth object, with the following components:bwbandwidth(s), scale factor(s) or nearest neighbours for the data, dat

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!