I need to show the following inequality concerning the k-NN algorithm: $$ $$ Data $ S = \{X_1, X_2, ..., X_n\} $ is split in the $ \leq k$ parts:
$$ S_j = \{ X_i L i - (j-1) \lfloor \frac{n}{k} \rfloor + 1,\ ...\ , j\lfloor\frac{n}{k} \rfloor\}, 1 \leq j \leq k $$
So for example for $ S = \{X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8, X_9, X_{10}\}$ and $k=2$ we got $S_1 = \{X_1, X_2, X_3, X_4, X_5\}$ and $S_2 = \{X_6, X_7, X_8, X_9, X_{10}\}$
$$ $$ For the $X_{1, j}(X)$ as 1st nearest neighbour of $X$ in $S_j, j \leq k$ I need to show that:
$$ \frac{1}{k}\sum_{j=1}^{k}{||X_{j}(X) - X||} \leq \frac{1}{k}\sum_{j=1}^{k}{||X_{1, j}(X) - X||} $$
$X_{j}(X)$ stands for j-th next neighbour of X $$ $$
A far as I understand, this inequality states that the on average the distance between $X$ and it's j-th neighbour is bounded by the average of distances between $X$ and it's first neighbour in the j-th partition of the data set. Does it make any sense? If not, what is the correct intuition? I'd be extremely thankful for any hints on how to aproach this proof.
Your use of $j$ to index both the partitions and the nearest neighbors within each partition is confusing in terms of notation, but the key issue is that you've misstated the core inequality. It should read:
$\frac{1}{k}\sum_{j=1}^{k}{||X_{j}(X) - X||} \geq \frac{1}{k}\sum_{j=1}^{k}{||X_{1, j}(X) - X||}$
That's because, within any given partition, the distance to $X$'s closest neighbor, ${||X_{1, j}(X) - X||}$, must, by definition, be less than or equal to the distance between $X$ and all of its more distant neighbors (whose distances will be equal only if there are multiple points equidistant from $X$):
${||X_{j}(X) - X||} \geq {||X_{1, j}(X) - X||}$
Just as that definitional inequality establishes a lower bound on how close the points within some particular partition lie to $X$, so too does the average across all those minimal, nearest-neighbor distances establish a more generalized bound. For example, if you take the second closest point from each partition and average across each of those distances to $X$, you will necessarily get an average distance that is equal to or greater than the average across the nearest neighbor distances. (Again, those two averages would be equal only if there was a tie between the first and second nearest point within every individual partition). More generally, there is no way to draw a point from each partition such that the average of the selected points' distances to $X$ is less than the average of the distances between $X$ and its nearest neighbor within each partition. That's the point of the inequality and the sense in which the average of the nearest neighbor distances from each partition establishes a lower bound across all the partitions.