I have a question regarding the (k-fold cross validation). I understand the process in general but I am not certain why we test on all data except Sj (i.e. all except one). My understanding is that a single subsample is kept as validation data for testing the model, and the remaining k − 1 subsamples are used as training data.
Why do we need to keep one sample as validation data, maybe the real question is why we cannot use the subsample for training data and still have it as a validation data. Is it not unchanged although we use it for training?
Thanks for all clarification
The main principle is that models ability to generalize is tested on the previously unseen data. It does not make any sense to test the model on the data that has been used during training. Including a test fold into the training set makes a great difference, because the model adjusts its parameters to minimize the difference between its prediction and the ground truth.
In k-fold cross validation the data set is split into k folds and then k experiments are performed. Each fold in turn is used as a test set and other k-1 folds as a training set.