Using kernel density estimation to solve regression problem and don't know where to find literature

85 Views Asked by At

I am researching on a regression problem and have an idea like this:

Given training data $T=\{(x_1,y_1),\ldots,(x_N,y_N)\}$ and a validating point $x^*$, using a parameterized kernel function to get the distribution of $y^*$:

$$p(y^*|T,x^*) \propto \sum_{i=1}^N k_\theta(x_i-x^*,y_i-y^*)$$

and then optimize $\theta$ (maybe defined as a neural network) to maximize this probability over a validating dataset.

Are there any papers similar to this idea? I have read some Gaussian Process Regression and Kernel Density Estimation papers but find their methods are somewhat different from my idea. (but I'm not very familiar with GPs so I am not sure...)

Any suggestions? (Maybe some keywords I can GoogleScholar with)

1

There are 1 best solutions below

0
On

I think your idea is similar in spirit to kernel regression (also known as the Nadaraya–Watson model). Bishop offers a probabilistic treatment of the model in his textbook, §6.3.1. More exactly, he shows that you can use a density estimation model for the joint distribution of inputs $\mathbf{x}$ and targets $y$: $$ p(\mathbf{x}, y) = \frac{1}{N} \sum_{i=1}^N f(\mathbf{x} - \mathbf{x}_i, y - y_i). $$ Then the conditional distribution takes the form you indicated (since $p(y|\mathbf{x}) \propto p(\mathbf{x}, y)$) and the mean of the conditional distribution will be $$ \mathsf{E}[y|\mathbf{x}] = \sum_{i=1}^N k(\mathbf{x} - \mathbf{x}_i) y_i $$ with the kernel $k$ being derived from the density function $f$.

A usual choice of kernel is the zero-mean isotropic Gaussian, but, as you suggest, more recent work has explored learning parameterized kernels, a common parameterization being in terms of the Mahalanobis metric [1, 2].

Another related line of work is locally weighed learning; see this survey and maybe §6.3 from Elements of Statistical Learning.