Derivative of a scalar with respect to a matrix

210 Views Asked by At

I need to solve $\frac{d}{dN_\theta} n_\theta^T K^2 n_\theta$ where $N_\theta$ is a matrix built from the outer product $n_\theta n_\theta^T$. $K^2$ is a PSD matrix in no way related to the variables. Any ideas will be appreciated!

2

There are 2 best solutions below

2
On BEST ANSWER

If your phrase "built from" actually means $\mathbf{N_\theta \equiv n_\theta\,n^T_\theta}$ then things are pretty simple.

$\mathbf{n_\theta^T K^2 n_\theta}$ is equal to the scalar product $\mathbf{K^2:N_\theta}$

The differential of the latter is $\mathbf{K^2:dN_\theta}$

So the derivative wrt $\mathbf{N_\theta}$ is just $\mathbf{K^2}$

0
On

Let me try to give a partial answer, at least, though I'm not sure that what you're asking for is actually well-defined.

OK, let's proceed formally by assuming that $n_\theta$ is a well-defined function of $N_\theta$, i.e., $n_\theta = g(N_\theta)$ for $g$ some vector-valued function of square matrices. On the other hand, let $f(v) = v^T K^2 v$. Hence, as a product of $N_\theta$, $$ n_\theta^T K^2 n_\theta = (f \circ g)(N_\theta), $$ so that by the chain rule, $$ \tfrac{d}{dN_\theta}(n_\theta^T K^2 n_\theta) = D(f \circ g)(N_\theta) = Df(g(N_\theta)) \circ Dg(N_\theta) = Df(n_\theta) \circ Dg(N_\theta). $$

Now, recall that $Df(v)$ is the unique linear transformation such that $$ f(v+h) = f(v) + Df(v)(h) + o(\|h\|). $$ However, $$ f(v+h) = (v+h)^T K^2 (v+h) = v^T K^2 v + h^T K^2 v + v^T K^2 h + h^T K^2 h = f(v) + \left(h^T K^2 v + v^T K^2 h\right) + o(\|h\|), $$ so that $Df(v)(h) = h^TK^2v + v^TK^2h$, and hence $$ \tfrac{d}{dN_\theta}(n_\theta^T K^2 n_\theta)(N_\theta)(H) = \left(Df(n_\theta) \circ Dg(N_\theta)\right)(H) = Dg(N_\theta)(H)^T K^2 n_\theta + n_\theta^T K^2 Dg(N_\theta)(H), $$ which I suppose you could write as $$ \tfrac{d}{dN_\theta}(n_\theta^T K^2 n_\theta) = \left(\tfrac{dn_\theta}{dN_\theta}\right)^TK^2 n_\theta + n_\theta^T K^2 \tfrac{dn_\theta}{dN_\theta}. $$


Now, let me explain why I'm not sure this can work out. You say that $N_\theta$ is constructed from $n_\theta n_\theta^T$, so let's take the simplest possible case, i.e., that $g$ is defined implicitly by the equation $$ N_\theta = n_\theta n_\theta^T = g(N_\theta) g(N_\theta)^T, $$ which is to say $$ F(S,g(S)) = 0, \quad F(S,v) = S - vv^t. $$ Then, by the multivariable chain rule $$ 0 = D\left(F(S,g(S))\right)(S) = D_1 F(S,g(S)) + D_2 F(S,g(S)) \circ Dg(S). $$ Now, on the one hand $$ F(S+h,v) = (S+H) - vv^T = F(S,v) + H, $$ so that $D_1F(S,v)(H) = H$, i.e., $D_1F(S,v) = \operatorname{Id}$, whilst on the other, $$ F(S,v+h) = S - (v+h)(v+h)^T = S - vv^T - hv^T - vh^T -hh^T\\ = F(S,v) + \left(-hv^T - vh^T\right) + o(\|h\|) $$ so that $D_2F(S,v)(h) = -hv^T - vh^T$. Thus, $$ 0 = D\left(F(S,g(S))\right)(S)(H) = D_1 F(S,g(S))(H) + \left(D_2 F(S,g(S)) \circ Dg(S)\right)(H)\\ = H - Dg(S)(H)g(S)^T - g(S)Dg(S)(H)^T, $$ so that by plugging in $S = N_\theta$ and $g(N_\theta) = n_\theta$, $$ Dg(N_\theta)(H) n_\theta^T + n_\theta Dg(N_\theta)(H)^T = H, $$ or equivalently, $$ \tfrac{d n_\theta}{d N_\theta} n_\theta^T + n_\theta \left(\tfrac{d n_\theta}{dN_\theta}\right)^T = \operatorname{Id}. $$ So, when the dust settles, $$ \tfrac{d}{dN_\theta}(n_\theta^T K^2 n_\theta)(N_\theta)(H) = X(H)^T K^2 n_\theta + n_\theta^T K^2 X(H), $$ where $X(H)$ solves the equation $$ X(H)n_\theta^T + n_\theta X(H)^T = H, $$ but I'm not sure that you can uniquely solve that equation.