I have a model M that takes input X and outputs Y, and I'm trying to predict how the mean of Y will change as parameters of the model P vary, given a distribution for X and some samples of the mean of Y from runs of the model with given parameter values. Let's say P=(A,B). M is a complex model and I can only run it with certain parameter values and see what the result is. Normally I would just regress the sample means of Y onto the values of (A,B). However, the model has a logical parameter L that can be true or false, and if it is true then there is an additional continuous parameter C. C is hence undefined if L=false. In other words:
Y = M(X; A,B) when L=false
Y = M(X; A,B,C) when L=true
I would like to find the dependence of the mean of Y on A, B, C and L, and it is reasonable to assume that the value of L does not change the dependence of Y on A or B. Doing regressions separately for points with L=false and L=true seems sub-optimal, since otherwise the estimated dependence on A and B would generally differ in each subset.
I thought about regressing onto L and L*C (with L=false corresponding to L=0, and defining L*C as 0 in those cases, so that there is correctly no dependence of the mean of Y on C with L=false). However, I don't think this will generally give the correct dependence on C when L=true, since the points with L*C=0 will affect the estimate of this.
Does anyone have a good idea for what to do?
Actually I take back my statement "I don't think [regressing onto L and L*C] will generally give the correct dependence on C when L=true". If the regression model is
$$ Y = \alpha A + \beta B + \gamma L + \delta LC, $$
then $ \gamma = Y(X; A,B,C=0,L=\mathrm{true})$ and the line of $Y$ plotted against $C$ when $L=\mathrm{true}$ will intercept the $Y$ axis at $\gamma$ for given $A$ and $B$. So the estimate for the correct value of $\delta$ from a standard ordinary least squares regression seems like it should be unbiased (provided the other assumptions made when doing a regression are satisfied, of course).
So I think regressing against $L$ and $LC$ is actually the correct way to do this. This seems to work in simple numerical tests I've done. Do please correct me if I'm wrong.