Knn Classifier - Sample size influence

623 Views Asked by Bumbble Comm At 09 Apr 2026 - 12:41

I'm working with a K-nearest neighbours classifier, using cross validation to determine k. What I'm stuck on is this: How does total sample size N influence the optimal value of k?

My thinking was that a higher density of data or sparsity of data might somehow relate to how large or small a useful k may be. I gather that too small a k runs the risk of over fitting, and too large a k might over-generalise the classification decision boundaries?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 26 Feb 2018 - 8:59

I know of the rule of thumb of taking $k = \sqrt{N}$ though I cannot find a good justification for this right now, and I don't think this rule means much in practice, and I don't think that you can really deduce a relationship between $N$ and $K$ except that obviously $k \le N-1$. I don't believe there is any universal relationship between $N$ and $k$ that would help you pick $k$, and any potential rule would need to involve $D$, the dimension of each data point as knn (using euclidean distance) suffers from the curse of dimensionality.

What you mention about what happens as $k$ is small/large holds in general. Your number of data points could be large and you may still have data that is not dense, it depends on your specific case - CV ('k'-fold cross validation [not the same k as in knn!]) is the way to go in my opinion

Knn Classifier - Sample size influence

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in PYTHON

Related Questions in SUFFICIENT-STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions