I am working with simulation software to generate training data. Specifically, I want to create a regression model predicting the weight of plastic parts from an injection molding process.
Besides an accurate prediction, I am interested in quantifying a prediction interval, which I am computing as:
$$\hat\theta x_{new}\pm t_{\frac{\alpha}{2},n-p}s\sqrt{1+x_{new}(X^\top X)^{-1}x_{new}^\top}$$
Since I am obtaining the training data from simulation, I would like to specify the design matrix $X$ in such a way, that I tend to minimize the average prediction interval (in a relevant interval for future predictions $[L,U]$). In the simple two-dimensional case, I can derive the following statements: Since $$SE(y)=MSE*\left[ 1+\frac{1}{n}+\frac{(x_{new}-\bar X)^2}{\sum_{\forall i} (x_i-\bar X)^2} \right]$$
it makes sense to center the data collection efforts on the interval where future predictions are to be made to minimize $x_{new}-\bar X$, , while the observations $x_i$ should spread as widely from the center as possible, to maximize the term $\sum_{\forall i} (x_i-\bar X)^2$. Intuitively, I understand that a straight line fit is 'fixed' more by distant points than by close points.
I would like to generalize this to the multivariate regression setting, to specify how to simulate data (given the objective of minimizing the uncertainty intervals). From this post:
I understand that it would make sense to generate the data s.t. the columns of $X$ are not linearly dependent. Intuitively, this sounds to me like a similar version to the above statement. I am left with the question: What is a mathematically sound approach of specifying the Design matrix to minimize the average prediction interval in the multivariate regression case?
You always can generate a "degenerate case" where the number of $\beta$s equals the number of observations (without complete colineariaty). In this case you will have a $0$ variance, i.e., $\hat{Y} = Y$.
For a more realistic case, when you have $n>>p$ and you cannot control the noise term $\varepsilon$, then $\mathrm{X}$ with (piarwise) orthogonal columns will give you the most "stable" $\mathrm{(X'X)}^{-1}$ matrix.