Background: Regular gradient descent can be written something like $x_{t + 1} = x_t - \eta g_t$, where $g_t$ is the gradient of the function we're trying to optimize.
Problem: If we have a (symmetric, positive definite) matrix $Q$, then I want to show that the "preconditioned" gradient descent $x_{t + 1} = x_t - \eta Q^{-1} g_t$ can equivalently be written
$$x_{t + 1} = \text{argmin}_x f(x_t) + g_t^T(x - x_t) + \frac{1}{2\eta}\lVert x - x_t \rVert_Q^2.$$
My attempt:
I want to start with the arg min expression for $x_{t + 1}$ and show that this is equal to $x_t - \eta Q^{-1}g_t$. We have
\begin{align*} &\text{argmin}_x f(x_t) + g_t^T(x - x_t) + \frac{1}{2\eta}\Vert x - x_t \Vert_Q^2\\ &= \text{argmin}_x g_t \cdot x + \frac{1}{2\eta}(x - x_t)^T Q (x - x_t) \end{align*}
At this point I want to differentiate and set equal to zero, I know that the derivative of $g_t \cdot x$ should be $g_t$, but how do I differentiate $\frac{1}{2\eta}(x - x_t)^T Q (x - x_t)$?
If its derivative is $\frac{1}{\eta}Q(x - x_t)$, then I can write
\begin{align*} g_t + \frac{1}{\eta}Q(x_{t + 1} - x_t) &= 0\\ \implies Q(x_{t + 1} - x_t) &= -\eta g_t\\ \implies x_{t + 1} &= x - \eta Q^{-1}g_t. \end{align*}
Is this legitimate? I've been reading about matrix calculus and I'm not at all sure the derivative of the squared $Q$ norm is correct.
Thanks a lot.