I'm reading a book named "The Elements of Statistical Learning" in which it states 2 scenarios when we are trying to predict the class label:
Scenario 1: The training data in each class were generated according to two bivariate Gaussian distribution with uncorrelated components and different means.
Scenario 2: The training data in each class came from a mixture of 10 low-variance Gaussian distributions, with individual means themselves distributed as Gaussian.
It's said that the Least Squares methods are more appropriate for Scenario 1, while k-nearest neighbor is more appropriate for Scenario 2. But I don't quite understand why.
Could anybody help explain the difference? Any help is greatly appreciated.