how to show the K-nearest-neighbor density model is an improper distribution: Bishops 2.61

Question

how to show the K-nearest-neighbor density model is an improper distribution: Bishops 2.61

1.2k Views Asked by Bumbble Comm At 27 Mar 2026 - 6:26

In Bishop's pattern recognition and machine learning, KNN is defined from a starting distribution:

$$p(x) = \frac{K}{NV}$$

where K is the number of observed points in a region of measure V, out of a total of N observations. The KNN model is defined by allowing a sphere to expand around a given point x until it contains exactly K other points.

exercise 2.61 in the text is where I'm stuck: show that the K-nearest-neighbor density model defines an improper distribution whose integral over all space is divergent.

I'm not sure how to set up the integral to show that the distribution is improper... I suspect it's a simple solution, but I'm having trouble figuring out how to get traction on it. Any help would be greatly appreciated.

Edit: for K=k you've got the region split into a kth order Voronoi tessellation, which leaves the full normalizing integral as a sum of integrals across each disjoint region. Setting K=1 and N=1 (for the simplest case) you've got $$ \int p(x) = \int dx/V(x) = \int \frac{\Gamma(D/2)D\,dx}{2\pi^{D/2}\parallel x - p_c \parallel^D} = \int_0^{\infty} \frac{D\,dr}{r^D}$$, where $p_c$ is the closest point (our one observed point in my simple case).

I might have made a mistake above, but intuitively, I take this to mean that the distribution p(x) is ill-defined for the points in the observed dataset (since the volume of the sphere centered on $p_c$ with $p_c$ on the boundary is zero, so p(x) explodes to infinity at those points). That gives one possible reason why this is an improper distribution for K=1, but look at K=2 in the case of N=2:

The sphere is defined by the radius needed to enclose the farthest point, meaning there's a boundary surface dividing the feature space in half, where every point on that surface has the same distance to both points. Our integral then for one side of that surface involves integrating with the volume coming from the distance to the far point on the other side of the surface, and then the integral on the other side is identical since the space is symmetrically split. I have no idea how to even approximate that integral, but it at least doesn't have the singularity issue that comes up with K=1. No idea how to proceed from here.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 05 Sep 2023 - 4:10

My intuition is that if $\mathbf{x}$ is far enough from where the data points are located, the volume approximates the volume of the sphere defined by two points, which simplifies the problem. And presumably this integral will diverge.

Let $V(\mathbf{x})$ be the volume of the smallest sphere containing exactly K points. Then the density estimator takes the form

$$ p(\mathbf{x}) = \frac{K}{N V(\mathbf{x})} $$

in which $K$ and $N$ are constant.

Now let $S$ be the smallest sphere centered on the origin which contains all $N$ data points.

For any $\mathbf{x}$ outside $S$, a sphere centered on $\mathbf{x}$ with radius $2 \left\| \mathbf{x} \right\|$ must also contains all $N$ points. If we define $V_0(\mathbf{x})$ as the volume of such sphere, then we have $V_0(\mathbf{x}) \ge V(\mathbf{x})$ and $V_0(\mathbf{x}) \propto \left\| x \right\| ^D$

Now we show that the integral $\displaystyle \int_{\mathbb{R}^D - S} \frac{1}{\left\| x \right\| ^D} \,d \mathbf{x}$ diverges. Because if it does, then the integral $\displaystyle \int_{\mathbb{R}^D - S} \frac{1}{V(\mathbf{x})} \,d \mathbf{x}$ also diverges. Then by the definition of improper integral, we can say that $\displaystyle \int_{\mathbb{R}^D} \frac{1}{V(\mathbf{x})} \,d \mathbf{x}$ diverges.

Under the hyperspherical coordinates, the integral takes the form

$$ \begin{align*} \int_{\mathbb{R}^D - S} \frac{1}{\left\| x \right\| ^D} \,d \mathbf{x} &= \int_{\mathbb{R}^D - S} \frac{1}{r^D} r^{D-1}\sin ^{D-2}(\theta _{1})\sin ^{D-3}(\theta _{2})\cdots \sin(\theta _{D-2})\,dr\,d\theta _{1}\,d\theta _{2}\cdots d\theta _{D-1} \\ &= \int_{0}^{2\pi} \sin ^{D-2}(\theta _{1}) \,d\theta_1 \cdots \int_{0}^{2\pi} \sin(\theta _{D-2}) \,d\theta _{2} \int_{0}^{2\pi} d\theta _{D-1} \int_{R_S}^{\infty} \frac{1}{r} \,dr \end{align*} $$

where $R_S$ is the radius of $S$. We can easily see that this integral diverges since $\displaystyle \int_{R_S}^{\infty} \frac{1}{r} \,dr$ diverges.

**Bumbble Comm** · Accepted Answer

Suppose we have one datapoint at $x=0$; i.e., $N=1$. Then the probability at $x$ equals

$$ p(x) = \frac{K}{NV} = \frac{1}{|x|}, $$ with $K=1$.

It is easy to verify that $$ \int_{-\infty}^\infty p(x) \text{ d}x = \infty. $$

Hence, $p(x)$ is not a true distribution, which answers the question for $K=1$.

For $K>1$, it is not necessary to compute the whole integral (i.e., over the entire domain), because the resulting probability density function is already "too heavy" in its tails. Let $X_1,\ldots,X_N$ be our 1-dimensional datapoints and, without loss of generality, $X_1 \leq X_2 \leq \ldots \leq X_N$. In that case, for $x \leq X_1$, our "probability" equals

$$ p(x) = \frac{K}{N(X_k-x)}, \quad x\leq X_1 $$

Obviously, $p(x)$ is likely to be different for other values of $x$, but it will always be positive. We can compute the integral from $-\infty$ to $X_1$:

$$ \int_{-\infty}^{X_1} \frac{K}{N(X_k-x)} \text{ d}x = \left[ \frac{K}{N} \ln |X_k-x| \right]_{-\infty}^{X_1} = \infty $$

Since $p(x)$ is always positive, the total integral from $-\infty$ to $\infty$ will be infinite too. This is equally true for higher dimensional values (just a bit more complicated to write it down).

how to show the K-nearest-neighbor density model is an improper distribution: Bishops 2.61

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions