Derivative of intertwined matrix expression

260 Views Asked by At

What are the derivatives of $\mathbf{\hat{y}}(\mathbf{w})$ with respect to the elements $w_{i}$ of the vector $\mathbf{w}$, where

\begin{equation} \mathbf{\hat{y}}(\mathbf{w}) = \mathbf{\Omega}^{T}\mathbf{a} + \mathbf{1}b \end{equation}

and $\mathbf{a}$ and $b$ can be found as follows \begin{equation} \begin{bmatrix} b \\ \mathbf{a} \end{bmatrix} = \begin{bmatrix} 0 & \mathbf{1}^T \\ \mathbf{1} & \mathbf{\Omega}+diag(\frac{1}{\gamma\mathbf{w}}) \end{bmatrix}^{-1} \begin{bmatrix} 0 \\ \mathbf{y} \end{bmatrix} \end{equation}

Bold letters indicate vectors, $\mathbf{1}$ is a vector of ones, $\Omega$ a square matrix and $diag(\cdot)$ is a diagonal matrix with entries $(\gamma w_i)^{-1}$. I am a bit at loss due to the intertwined structure. Is this even possible?

Both equations are related to the LSSVM classifier as described in ftp://ftp.esat.kuleuven.be/sista/ida/reports./98-72.pdf

1

There are 1 best solutions below

0
On BEST ANSWER

For convenience, let's define some matrices $$\eqalign{ G &= \gamma\,\operatorname{Diag}(w) \cr F &= G^{-1} \cr H &= \Omega+F \cr J &= H^{-1} \cr }$$ and their differentials $$\eqalign{ dG &= \gamma\,\operatorname{Diag}(dw) \cr dF &= -F\,dG\,F \cr dH &= dF \cr dJ &= -J\,dH\,J \cr &= JF\,dG\,FJ\cr }$$ And a couple of vectors, which will prove useful $$\eqalign{ s^T &= \frac{(1^TJ1)\,y^T-(1^TJy)\,1^T}{(1^TJ1)^2} \cr\cr g &= \gamma\,\operatorname{diag}(FJ^T1s^TJ^TF) \cr\cr }$$

You can multiply though your second equation, to get rid of the inverse, then evaluate the partitions to obtain two separate equations. Then the three equations that we will work with are $$\hat{y} = \Omega^Ta+1\beta \tag{1}$$ $$1^Ta = 0 \tag{2}$$ $$y = Ha + 1\beta \tag{3}$$ Solve Eqn(3) for $a$, multiply by $1^T$ so that we can use Eqn(2) to solve for $\beta$ and find its differential: $$\eqalign{ a &= J\,(y-1\beta) \cr 1^Ta &= 0 = 1^TJ\,(y-1\beta) \cr 1^TJ1\,\beta &= 1^TJy \cr\cr \beta &= \frac{1^TJy}{1^TJ1} = \frac{J:1y^T}{J:11^T} \cr\cr d\beta &= 1s^T:dJ \cr &= 1s^T:JF\,dG\,FJ \cr &= FJ^T1s^TJ^TF:dG \cr &= FJ^T1s^TJ^TF:\gamma\operatorname{Diag}(dw) \cr &= \gamma \operatorname{diag}(FJ^T1s^TJ^TF)^Tdw \cr &= g^Tdw }$$where colon denotes the Frobenius product, and diag() returns the diagonal of a matrix as a vector.

Now let's multiply Eqn(3) by $J$, solve for $a$ and find its differential $$\eqalign{ a &= J\,(y-1\beta) \cr da &= dJ\,(y-1\beta) - J1\,d\beta \cr &= JF\,dG\,FJ(y-1\beta) - J1\,d\beta \cr &= JF\,dG\,Fa - J1\,d\beta \cr\cr da1^T &= J\,dG\,F^2a1^T - J11^T\,d\beta \cr \operatorname{diag}(da1^T) &= \operatorname{diag}(J\,dG\,F^2a1^T) - \operatorname{diag}(J11^T\,d\beta) \cr\cr da &= (1a^TF^2\odot J)\operatorname{diag}(dG) - J1\,d\beta \cr &= (1a^TF^2\odot J)\gamma\,dw - J1g^Tdw \cr }$$where $\odot$ denotes the Hadamard (element-wise) product, and since the diagonal matrices $(F,dG)$ commute they were re-arranged.

Finally, let's take the differential of Eqn(1) $$\eqalign{ d\hat{y} &= \Omega^Tda+1d\beta \cr &= \gamma\Omega^T(1a^TF^2\odot J)\,dw - \Omega^TJ1g^Tdw + 1g^Tdw \cr\cr \frac{\partial \hat{y}}{\partial w} &= \gamma\Omega^T(1a^TF^2\odot J) - \Omega^TJ1g^T + 1g^T \cr }$$


Update

A nicer way to state the result eliminates the Hadamard product in favor of $A=\operatorname{Diag}(a)$
then $$\eqalign{ JF\,dG\,Fa &= \gamma\,JAF^2dw \cr }$$ and $$\eqalign{ da &= J(\gamma AF^2 - 1g^T)\,dw \cr }$$ and $$\eqalign{ \frac{\partial\hat{y}}{\partial w} &= \gamma\,\Omega^TJAF^2 + (I-\Omega^TJ)\,1g^T \cr }$$