I am trying to understand if classifier X has smaller training error than classifier Y, then classifier X will have smaller generalization (test) error than classifier Y. ( Answer is False)
My understanding is below:
They can be totally different(training error and test error) if train distribution is not equal to test distribution. Even under the same distribution, they can be very different. Because h is picked to minimize training error, not test error. Let me know if my understanding is correct ?
$h$ is picked to minimized a loss function of which part of it usually would have included the training error. Some loss function would have included some regularization to improve the generalization performance.
A small training error doesn't mean good generalization for sure. The classifier could have been overfitted to the training data and have very poor generalization performance. For example, a classifier that would have just memorize the training data with no learning would predict randomly on unseen data. This would be a very bad classifier and we can observe that during test or validation.