How to find $\frac{\partial}{\partial \mathbf{Q}}\left(x_2^\intercal (\mathbf{I}_T\otimes \mathbf{Q})^{-1}x_2\right)$?

97 Views Asked by At

How to find $$\frac{\partial}{\partial \mathbf{Q}}\left(x_2^\intercal (\mathbf{I}_T\otimes \mathbf{Q})^{-1}x_2\right)$$?

Q is symmetric

I'm thinking we could use some sort of chain rule getting $$ x_2 x_2^\intercal \frac{\partial}{\partial \mathbf{Q}}(\mathbf{I}_T\otimes \mathbf{Q}^{-1})$$

However, being the derivative of a scalar function w.r.t a matrix, I would expect it to have the same dimensions as Q, but I'm not getting them...

2

There are 2 best solutions below

3
On BEST ANSWER

We know that $(I_{\textbf{T}} \otimes Q)^{-1} = I_{\textbf{T}} \otimes Q^{-1}$ so we have \begin{equation} x_2^T (I_{\textbf{T}} \otimes Q^{-1}) x_1 \end{equation} The matrix $I_T \otimes Q^{-1}$ looks like this \begin{equation} I_{\textbf{T}} \otimes Q^{-1} = \begin{bmatrix} Q^{-1} & 0 & 0 & \cdots & 0 \\ 0 & Q^{-1} & 0 & \cdots & 0 \\ 0 & 0 & Q^{-1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & Q^{-1} \end{bmatrix} \end{equation} This means that if we partition the vectors $x_1,x_2$ into subvectors of same dimensions as $Q$, i.e. as follows \begin{equation} x_{1,2} = \begin{bmatrix} x_{1,2}^{1}\\ x_{1,2}^{2}\\ \vdots \\ x_{1,2}^{\textbf{T}} \end{bmatrix} \end{equation} where $x_k^{t}$ is a vector of length equal to the number of rows/columns of $Q$. We get that \begin{equation} x_2^T (I_{\textbf{T}} \otimes Q^{-1}) x_1 = (x_2^{1})^T Q^{-1} x_1^1 + (x_2^{2})^T Q^{-1} x_1^2 + \ldots + (x_2^{\textbf{T}})^T Q^{-1} x_1^{\textbf{T}} \end{equation} or simply \begin{equation} x_2^T (I_{\textbf{T}} \otimes Q^{-1}) x_1 = \sum_{t=1}^{\textbf{T}} (x_2^{t})^T Q^{-1} x_1^t \tag{1} \end{equation} We know that in general \begin{equation} \frac{\partial}{\partial X} a^T X^{-1} b = -X^{-T}ab^T X^{-T} \end{equation} When $X$ is symmetric, we get \begin{equation} \frac{\partial}{\partial X} a^T X^{-1} b = -X^{-1}ab^T X^{-1} \tag{2} \end{equation} Applying this to what we have in equation (1) \begin{equation} \frac{\partial}{\partial Q} x_2^T (I_{\textbf{T}} \otimes Q^{-1}) x_1 = \frac{\partial}{\partial Q} \sum_{t=1}^{\textbf{T}} (x_2^{t})^T Q^{-1} x_1^t = \sum_{t=1}^{\textbf{T}} \frac{\partial}{\partial Q} (x_2^{t})^T Q^{-1} x_1^t \end{equation} Using equation (2), we get \begin{equation} \frac{\partial}{\partial Q} x_2^T (I_{\textbf{T}} \otimes Q^{-1}) x_1 = - \sum_{t=1}^{\textbf{T}} Q^{-1}x_2^{t}(x_1^t)^T Q^{-1} \end{equation}

1
On

$\def\v{{\rm vec}}\def\M{{\rm Mat}}\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\L{\left(}\def\R{\right)}\def\p#1#2{\frac{\partial #1}{\partial #2}}$For ease of typing, define the variables $$\eqalign{ I &= I_T \qquad x = x_1 \qquad y = x_2 \\ X &= \M(x) \implies x = \v(X) \\ Y &= \M(y) \implies y = \v(Y) \\ }$$ and use a colon to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}\L A^TB\R$$ Use this to rewrite the objective function $$\eqalign{ \phi &= y^T(I\otimes Q)^{-1}x \\ &= y^T\L I\otimes Q^{-1/2}\R \L I\otimes Q^{-1/2}\R x \\ &= \v\L Q^{-1/2}Y\R^T \v\L Q^{-1/2}X\R \\ &= Q^{-1/2}Y:Q^{-1/2}X \\ &= YX^T:Q^{-1} \\ }$$ Then calculate the differential and gradient $$\eqalign{ d\phi &= -YX^T:Q^{-1}dQ\,Q^{-1} \\ &= -Q^{-1}YX^TQ^{-1}:dQ \\ \p{\phi}{Q} &= -Q^{-1}YX^TQ^{-1} \\ }$$