derivative of a neighbourhood sum

Question

derivative of a neighbourhood sum

176 Views Asked by Bumbble Comm At 04 Apr 2026 - 3:20

I am going through a book on statistical learning and ran into a problem concerning k nearest neighbor methods. The book says that using least squares to determine optimal k will lead to $k=1$. I tried proving this but am not sure how to differentiate the summation part i.e $\sum_{x_{i} \epsilon N_{k}(x)} Y_{i}$ since the k is involved in the summation.

To help me out could someone show me how to differentiate the kNN formula itself with respect to k please.

Thanks! PS the formula for kNN is:

$\hat{Y}(x)=\frac{1}{k}\sum_{x_{i} \epsilon N_{k}(x)} Y_{i}$

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Here is crude information of what I have in mind. Note that it might be one way of formulating the problem.

Firstly, note that a real machine learning problem is a discrete problem, in the sense that some data points, with the corresponding labels (classes) are given to you and you have to guess the label of a new data point, according to the rule

$$\hat{Y}(X)=\frac{1}{k}\sum_{X_{i} \epsilon N_{k}(X)} Y_{i}$$

So, given is labeled data points $T=\{(X_i,Y_i)|X_i \in \mathbb{R}^n,Y_i \in\{0,1,...,c-1\}\}$, where we have $c$ classes.

At this point, we would like to define a function $f:\mathbb{R}^n \rightarrow\mathbb{R}$, which is continuous, such that the floor of the function ($\lfloor{f(X)}\rfloor$)gives the correct classes, when inputs $X_i$ from the set $T$ is given as input to it.

For any optimisation, you need to define a cost function. One can be defined as below.

$$Cost=\sum_{i}(\hat{Y}(X_i)-\lfloor{f(X_i)}\rfloor)^2=\sum_{i}(\frac{1}{k}\sum_{X_{i} \epsilon N_{k}(x)} Y_{i}-\lfloor{f(X_i)}\rfloor)^2$$

In order to use calculus to analyse the problem, we need to make further modifications.

Firstly, the set $T$, which is finite in practice, should be defined for all the inputs $x_i \in \mathbb{R}^n$.
The second thing is to get rid of the discontinuity that is produced by the function $\lfloor{f(x)}\rfloor$. To that goal, we allow any number in $\mathbb{R}$ to be a possible class. In other words, we have an infinite number of classes now. Therefore, we do not need the floor function anymore.
Last thing is to have a density defined on data space. For simplicity, have the distribution to be uniform. The result would be that the cost function can be written as below.

$$Cost=\int_{X \in \mathbb{R}^n} \bigg(\frac{1}{V(S_r(X))}\int_{T \in S_r(X)}f(T)dT -f(X) \bigg )^2 dX$$

$S_r(X)$ is the open sphere, centred at $X$ with radius $r$. $V(S_r(x))$ is the volume of the open sphere. $V(S_r(x))$ is supposed to be the continuous counterpart of $k$. Also, we have $V(S_r(x))=\alpha r^n$, where $\alpha$ is a constant.

$$V(S_r(x))=\alpha r^n$$

We may rewrite the cost function to get.

$$Cost=\int_{X \in \mathbb{R}^n} \bigg( \frac{1}{\alpha r^n} \int_{T \in S_r(X)}f(T)dT -f(X) \bigg )^2 dX$$

The term $\int_{T \in S_r(X)}f(T)dT$ can be approximated by

$$\int_{T \in \mathbb{R}^n} \delta (X,r)f(T)dT$$

where $\delta(X,r)$ is Gaussian distribution with mean $X$ and variance $r$ and $r=d(T,X)$ is the Euclidean distance between $X$ and $T$. As $r$ goes to $0$, the integral goes to $\alpha r^n f(X)$.

According to the last two equations, you should come to a conclusion that for $r=0$ you get the minimum. Note that $r=0$ is equivalent to $k=1$, which means that you want to consider only the point $X$ itself for evaluation of the label $Y(X)$.

You can also take the derivative of the cost function, with respect to $r$ and put it equal to $0$, which is a tedious process.

derivative of a neighbourhood sum

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in SUMMATION

Related Questions in SELF-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions