10.07.2015 Views

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

The np Package - NexTag Supports Open Source Initiatives

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

28 <strong>np</strong>cdensbwUsage IssuesIf you are using data of mixed types, then it is advisable to use the data.frame function toconstruct your i<strong>np</strong>ut data and not cbind, since cbind will typically not work as intended onmixed data types and will coerce the data to the same type.Caution: multivariate data-driven bandwidth selection methods are, by their nature, computationallyintensive. Virtually all methods require dropping the ith observation from the data set, computingan object, repeating this for all observations in the sample, then averaging each of these leave-oneoutestimates for a given value of the bandwidth vector, and only then repeating this a large numberof times in order to conduct multivariate numerical minimization/maximization. Furthermore, dueto the potential for local minima/maxima, restarting this procedure a large number of times mayoften be necessary. This can be frustrating for users possessing large datasets. For exploratorypurposes, you may wish to override the default search tolerances, say, setting ftol=.01 and tol=.01and conduct multistarting (the default is to restart min(5, ncol(xdat,ydat)) times) as is done for anumber of examples. Once the procedure terminates, you can restart search with default tolerancesusing those bandwidths obtained from the less rigorous search (i.e., set bws=bw on subsequent callsto this routine where bw is the initial bandwidth object). A version of this package using the Rmpiwrapper is under development that allows one to deploy this software in a clustered computingenvironment to facilitate computation involving large datasets.Author(s)Tristen Hayfield 〈hayfield@phys.ethz.ch〉, Jeffrey S. Racine 〈racinej@mcmaster.ca〉ReferencesAitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,”Biometrika, 63, 413-420.Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probabilitydensities,” Journal of the American Statistical Association, 99, 1015-1026.Li, Q. and J.S. Racine (2007), No<strong>np</strong>arametric Econometrics: <strong>The</strong>ory and Practice, Princeton UniversityPress.Pagan, A. and A. Ullah (1999), No<strong>np</strong>arametric Econometrics, Cambridge University Press.Scott, D.W. (1992), Multivariate Density Estimation. <strong>The</strong>ory, Practice and Visualization, NewYork: Wiley.Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,”Biometrika, 68, 301-309.See Alsobw.nrd, bw.SJ, hist, <strong>np</strong>udens, <strong>np</strong>udistExamples# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we compute the# likelihood cross-validated bandwidths (default) using a second-order# Gaussian kernel (default). Note - this may take a minute or two

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!