Applied nonparametric density and regression estimation with discrete data: plug-in bandwidth selection and non-geometric kernel functions
Bandwidth selection plays an important role in kernel density estimation. Least-squares cross-validation and plug-in methods are commonly used as bandwidth selectors for the continuous data setting. The former is a data-driven approach and the latter requires a priori assumptions about the unknown distribution of the data. A benefit from the plug-in method is its relatively quick computation and hence it is often used for preliminary analysis. However, we find that much less is known about the plug-in method in the discrete data setting and this motivates us to propose a plug-in bandwidth selector. A related issue is undersmoothing in kernel density estimation. Least-squares cross-validation is a popular bandwidth selector, but in many applied situations, it tends to select a relatively small bandwidth, or undersmooths. The literature suggests several methods to solve this problem, but most of them are the modifications of extant error criterions for continuous variables. Here we discuss this problem in the discrete data setting and propose non-geometric discrete kernel functions as a possible solution. This issue also occurs in kernel regression estimation. Our proposed bandwidth selector and kernel functions perform well in simulated and real data.