Given the following matrices and vectors, I am trying to derive the gradient of equation (1).
$t \in R ,\quad S \in R^{N \times N}, \quad y \in R^N, \quad Q = tS $ and $Q$ is invertible
$\frac{\partial y^TQ^{-1}y}{\partial t} \tag{1}$
Equations (2), (3) and (4) come from the matrix cookbook:
https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
$\frac{\partial a^TX^{-1}b}{\partial X} = -X^{-T}ab^TX^{-T} \tag{2}$
$\partial (X^{-1}) = -X^{-1}(\partial X) X^{-1} \tag{3}$
$\frac{\partial a^TXa}{\partial X} = aa^T \tag{4}$
Given all of the above, I tried to apply the chain rule, but I see two different approaches to this. The first one, described in (5), uses identity (2), whereas the second one, described in (6), uses the identities (3) and (4).
$\frac{\partial y^TQ^{-1}y}{\partial t} = \frac{\partial y^TQ^{-1}y}{\partial Q} \frac{\partial Q}{\partial t} = -Q^{-T}yy^TQ^{-T}S \tag{5}$
$\frac{\partial y^TQ^{-1}y}{\partial t} = \frac{\partial y^TQ^{-1}y}{\partial Q^{-1}} \frac{\partial Q^{-1}}{\partial t} = yy^T(-Q^{-1}\frac{\partial Q}{\partial a}Q^{-1}) = -yy^TQ^{-1}SQ^{-1} \tag{6}$
Are these results the same? And if yes, why?
Furthermore, are they correct? Both results are in $R^{N \times N}$, whereas I expected the gradient (1) to be in $R$, since $y^TQ^{-1}y \in R$ and $t \in R$.
Thank you!
Let's write things out explicitly. You have a function $$ f: \Bbb R \to \Bbb R : t \mapsto y^TQ^{-1}y\tag{1} $$ where $y \in \Bbb R^n$ is a fixed vector, $S$ is a fixed invertible $n \times n$ matrix, and $$ Q = tS. $$ Now $$ Q^{-1} = \frac{1}{t} S^{-1} $$ so we can rewrite formula 1 as $$ f: \Bbb R \to \Bbb R : t \mapsto \frac{1}{t^2} y^TS^{-1}y\tag{2} $$ where $y^T S^{-1} y$ is a constant --- a single real number. The derivative of $f$ is thus (without using the chain rule or anything, really!) $$ f'(t) = \frac{-2}{t^3} y^T S^{-1} y. $$
As for why your book of formulas doesn't lead to this result...I'm not willing to help out there, because I don't think that applying formulas without understanding them is within the realm of "mathematics".
But @MichaelHoppe is completely correct in noting that these formulas are for derivatives with respect to a vector argument, which you do not have (you have a function from $\Bbb R$ to $\Bbb R$). Applying them willy-nilly is unlikely, therefore, to lead to a useful result.