How do we defined our prediction models to be generalized well enough to be applied to unseen dataset?

117 Views Asked by At

How do we define our prediction models to be generalized well enough to be applied to an unseen dataset? And if there is an outlier in the data do we need to keep it or remove it? Have to justify the answer

1

There are 1 best solutions below

0
On

In order to create generalized predictive models, model should fit in between under fitting and over fitting, which is called the best fit. Therefore we need to stop training the model at the appropriate point which is called early stopping.We can use resampling procedure like k-fold cross validation on dataset to determine the skill of machine learning model on unseen data. Most common problem in applied machine learning is overfitting. We can use regularization techniques like $L1 , L2$ regularization to prevent overfitting of the model.