I'm working with a K-nearest neighbours classifier, using cross validation to determine k. What I'm stuck on is this: How does total sample size N influence the optimal value of k?
My thinking was that a higher density of data or sparsity of data might somehow relate to how large or small a useful k may be. I gather that too small a k runs the risk of over fitting, and too large a k might over-generalise the classification decision boundaries?
I know of the rule of thumb of taking $k = \sqrt{N}$ though I cannot find a good justification for this right now, and I don't think this rule means much in practice, and I don't think that you can really deduce a relationship between $N$ and $K$ except that obviously $k \le N-1$. I don't believe there is any universal relationship between $N$ and $k$ that would help you pick $k$, and any potential rule would need to involve $D$, the dimension of each data point as knn (using euclidean distance) suffers from the curse of dimensionality.
What you mention about what happens as $k$ is small/large holds in general. Your number of data points could be large and you may still have data that is not dense, it depends on your specific case - CV ('k'-fold cross validation [not the same k as in knn!]) is the way to go in my opinion