First derivative of RVM related matrix expression

224 Views Asked by At

Can somebody help me find the first of the following function $\mathcal{L}$ with respect to the elements $\phi_{mn}$ of the matrix $\mathbf{\Phi}$?

\begin{equation} \mathcal{L} = -\frac{1}{2}\big[N\log 2\pi + \log\left|\mathbf{C}\right| + \mathbf{\hat{t}}^{T}\mathbf{C}^{-1}\mathbf{\hat{t}} \big], \end{equation}

where \begin{equation} \mathbf{C} = \mathbf{B} + \mathbf{\Phi} \mathbf{A}^{-1} \mathbf{\Phi}^{T}, \\ \mathbf{\hat{t}} = \mathbf{\Phi}\boldsymbol{w} + \mathbf{B}^{-1}\mathbf{e}. \end{equation}

Note that bold small letters denote $N\times 1$ vectors, bold capital letters $N\times N$ matrices, and $\mathbf{A}$, $\mathbf{B}$ are diagonal.

Background information: $\mathcal{L}$ is a likelihood function and relates to the Relevance Vector Machine (RVM) as described in the following paper.

EDIT:

I came so far on my own for the first derivative of the last term (the first two terms were already answered in my previous question). I first expanded it to get

\begin{align} \mathbf{\hat{t}}^{T}\mathbf{C}^{-1}\mathbf{\hat{t}} ~=~&\mathbf{w}^{T}\mathbf{\Phi}^{T}\mathbf{B}\mathbf{\Phi}\mathbf{w} \\ ~+~&\mathbf{e}^{T}\mathbf{\Phi}\mathbf{w} \\ ~+~&\mathbf{w}^{T}\mathbf{\Phi}^{T}\mathbf{e} \\ ~+~&\mathbf{e}^{T}\mathbf{B}^{-T}\mathbf{e} \\ ~-~&\mathbf{w}^{T}\mathbf{\Phi}\mathbf{B}\mathbf{\Phi}\mathbf{\Sigma}\mathbf{\Phi}^{T}\mathbf{\Phi}\mathbf{w} \\ ~-~&\mathbf{e}^{T}\mathbf{\Phi}\mathbf{\Sigma}\mathbf{\Phi}^{T}\mathbf{B}\mathbf{\Phi}\mathbf{w} \\ ~-~&\mathbf{w}^{T}\mathbf{\Phi}\mathbf{B}\mathbf{\Phi}\mathbf{\Sigma}\mathbf{\Phi}^{T}\mathbf{e} \\ ~-~&\mathbf{e}^{T}\mathbf{\Phi}\mathbf{\Sigma}\mathbf{\Phi}^{T}\mathbf{e}. \end{align}

Then I took the derivatives for terms number 1,2,3, and 8 on the right hand side using formulas from the matrix cookbook

\begin{equation} \frac{\partial \left[ \mathbf{w}^{T}\mathbf{\Phi}^{T}\mathbf{B}\mathbf{\Phi}\mathbf{w} \right]}{\partial \mathbf{\Phi}} = 2 \mathbf{B}\mathbf{w}\mathbf{w}^{T} \\ \frac{\partial \left[ \mathbf{e}^{T}\mathbf{\Phi}\mathbf{w} \right]}{\partial \mathbf{\Phi}} = \mathbf{e}\mathbf{w}^{T} \\ \frac{\partial \left[ \mathbf{w}^{T}\mathbf{\Phi}^{T}\mathbf{e} \right]}{\partial \mathbf{\Phi}} = \mathbf{e}\mathbf{w}^{T} \\ \frac{\partial \left[ \mathbf{e}^{T}\mathbf{\Phi}\mathbf{\Sigma}\mathbf{\Phi}^{T}\mathbf{e} \right]}{\partial \mathbf{\Phi}} = 2 \mathbf{\Sigma}^{T}\mathbf{e}\mathbf{e}^{T} \end{equation}

The 4th term drops, but I don't know how to find the derivatives for the remaining higher order terms 5, 6, and 7. Any suggestions? Is what I did correct so far?

1

There are 1 best solutions below

2
On BEST ANSWER

For ease of typing, I'm going to use $$\eqalign{ v &= \mathbf{\hat{t}}\cr P &= \mathbf{\Phi} \cr }$$ The term you are asking about can be written using the Frobenius product $$\eqalign{ f &= v^TC^{-1}v \cr &= C^{-1}:vv^T \cr }$$ Its differential is $$\eqalign{ df &= C^{-1}:d(vv^T) + dC^{-1}:vv^T \cr &= C^{-1}:2\,{\rm sym}(dv\,v^T) - C^{-1}\,dC\,C^{-1}:vv^T \cr &= 2\,C^{-1}v:dv - C^{-1}vv^TC^{-1}:dC \cr\cr }$$ Now we need expressions for $(dC, dv)$ in terms of $dP$. $$\eqalign{ v &= Pw +B^{-1}e \cr dv &= dP\,w \cr\cr C &= B + PA^{-1}P^T \cr dC &= 2\,{\rm sym}(dP\,A^{-1}P^T) \cr }$$ Substituting $$\eqalign{ df &= 2\,C^{-1}v:dP\,w - C^{-1}vv^TC^{-1}:2\,{\rm sym}(dP\,A^{-1}P^T) \cr &= 2\,C^{-1}vw^T:dP - 2\,C^{-1}vv^TC^{-1}PA^{-1}:dP \cr &= 2\,C^{-1}v(w^T - v^TC^{-1}PA^{-1}):dP \cr\cr \frac{\partial f}{\partial P} &= 2\,C^{-1}v(w^T - v^TC^{-1}PA^{-1}) \cr\cr }$$ In these manipulations, I've made use of the fact that $A$, $C$ (and their inverses) are symmetric.

I've also made extensive use of the algebraic properties of the Frobenius product, such as $$\eqalign{ A:BC &= AC^T:B \cr X:YZ &= Y^TX:Z \cr A:B &= B:A \cr A:B &= A^T:B^T \cr A:{\rm sym}(B) &= {\rm sym}(A):B \cr A:{\rm skew}(B) &= {\rm skew}(A):B \cr {\rm sym}(A):{\rm skew}(B) &= 0 \cr }$$ all of which can easily be verified from the definitions $$\eqalign{ A:B &= {\rm tr}(A^TB) \cr {\rm sym}(A) &= \frac{1}{2}(A+A^T) \cr {\rm skew}(A) &= \frac{1}{2}(A-A^T) \cr }$$