In Bishop's pattern recognition and machine learning, KNN is defined from a starting distribution:
$$p(x) = \frac{K}{NV}$$
where K is the number of observed points in a region of measure V, out of a total of N observations. The KNN model is defined by allowing a sphere to expand around a given point x until it contains exactly K other points.
exercise 2.61 in the text is where I'm stuck: show that the K-nearest-neighbor density model defines an improper distribution whose integral over all space is divergent.
I'm not sure how to set up the integral to show that the distribution is improper... I suspect it's a simple solution, but I'm having trouble figuring out how to get traction on it. Any help would be greatly appreciated.
Edit: for K=k you've got the region split into a kth order Voronoi tessellation, which leaves the full normalizing integral as a sum of integrals across each disjoint region. Setting K=1 and N=1 (for the simplest case) you've got $$ \int p(x) = \int dx/V(x) = \int \frac{\Gamma(D/2)D\,dx}{2\pi^{D/2}\parallel x - p_c \parallel^D} = \int_0^{\infty} \frac{D\,dr}{r^D}$$, where $p_c$ is the closest point (our one observed point in my simple case).
I might have made a mistake above, but intuitively, I take this to mean that the distribution p(x) is ill-defined for the points in the observed dataset (since the volume of the sphere centered on $p_c$ with $p_c$ on the boundary is zero, so p(x) explodes to infinity at those points). That gives one possible reason why this is an improper distribution for K=1, but look at K=2 in the case of N=2:
The sphere is defined by the radius needed to enclose the farthest point, meaning there's a boundary surface dividing the feature space in half, where every point on that surface has the same distance to both points. Our integral then for one side of that surface involves integrating with the volume coming from the distance to the far point on the other side of the surface, and then the integral on the other side is identical since the space is symmetrically split. I have no idea how to even approximate that integral, but it at least doesn't have the singularity issue that comes up with K=1. No idea how to proceed from here.
Suppose we have one datapoint at $x=0$; i.e., $N=1$. Then the probability at $x$ equals
$$ p(x) = \frac{K}{NV} = \frac{1}{|x|}, $$ with $K=1$.
It is easy to verify that $$ \int_{-\infty}^\infty p(x) \text{ d}x = \infty. $$
Hence, $p(x)$ is not a true distribution, which answers the question for $K=1$.
For $K>1$, it is not necessary to compute the whole integral (i.e., over the entire domain), because the resulting probability density function is already "too heavy" in its tails. Let $X_1,\ldots,X_N$ be our 1-dimensional datapoints and, without loss of generality, $X_1 \leq X_2 \leq \ldots \leq X_N$. In that case, for $x \leq X_1$, our "probability" equals
$$ p(x) = \frac{K}{N(X_k-x)}, \quad x\leq X_1 $$
Obviously, $p(x)$ is likely to be different for other values of $x$, but it will always be positive. We can compute the integral from $-\infty$ to $X_1$:
$$ \int_{-\infty}^{X_1} \frac{K}{N(X_k-x)} \text{ d}x = \left[ \frac{K}{N} \ln |X_k-x| \right]_{-\infty}^{X_1} = \infty $$
Since $p(x)$ is always positive, the total integral from $-\infty$ to $\infty$ will be infinite too. This is equally true for higher dimensional values (just a bit more complicated to write it down).