Derivative of optimal linear regression coefficient w.r.t. distribution shift

26 Views Asked by At

Assume I have the following model: $$ x = f(u)\\ y = g(u)\\ u \sim \mathcal{N}(m, 1) $$ I'm trying to derive the derivative of the optimal linear regression coefficient $w*$ for the model $y = wx + \epsilon$, i.e. $$ \frac{\delta w^*}{\delta m} $$ but I'm having some trouble framing the problem in probabilistic notation.

$$ w^* = \arg\max_w \mathbb{E}_{u \sim \mathcal{N}(m, 1)}\left[P(y|x)\right]\\ $$ where if we assume that the error term $\epsilon$ is normally distributed, this is equivalent to
$$ \begin{align} w^* &= \arg\min_w \mathbb{E}_{u \sim \mathcal{N}(m, 1)}\left[(y - wx)^2\right]\\ &= \arg\min_w \mathbb{E}_{u \sim \mathcal{N}(m, 1)}\left[(g(u) - wf(x))^2\right] \end{align} $$

I am stuck here, as I do not know how to proceed to differentiate under an $\arg\min$ operation. I think I could handle the differentiation under $m$ in the expectation with something like the log-derivative trick although I am not sure how to actually do that.

Does this make at least some sense? Feel free to post a complete solution if you have one, otherwise I welcome any suggestion.