I am trying to train a model with a lot of input variables using linear regression. For technical reasons, my training data is obtained from a simulation that closely but not perfectly mirrors the final application. There is no way that I can make the training data perfectly match the application situation. Most of the time, my regression works perfectly fine, but occasionally there is a situation in the application that doesn't fit with the training data and the model predicts something absurd. My question is - how can I modify my linear regression model so that these rare cases will tend predict a value close to the edge of some reasonable range rather than something out of this world? I know that the output range cannot be larger than a certain span (it's the vibrational frequency of a molecule so it has a natural range).
Thank you! By the way, all the assumptions of linear regression are most likely satisfied by my data.
For your information, the input quantities are electrostatics information (electric fields etc) in the vicinity of a certain chemical group and the output quantity is a vibrational frequency. The electric fields are not really correlated with each other, and together they predict the frequency pretty nicely most of the time. I know from chemical knowledge that the frequency cannot be smaller or larger than certain (approximate) values, so I need a predictor that respects that.
Would a generalized linear model with a logistic link function work here?