Matrix derivative of expression, mistake in proof

44 Views Asked by At

I cannot find the problem with the following proof can someone help?

Prove that for matries X, W and T: $$\frac\partial{\partial W}\frac12 \text{Tr}\left\{\left(XW-T)^T(XW-T)\right)\right\} = X^TXW-X^TT $$

Proof:

$$\frac\partial{\partial W}\frac12 \text{Tr}\left\{\left(XW-T)^T(XW-T)\right)\right\} =\frac\partial{\partial W}\frac12 \sum_{i}\left(\left(XW-T\right)^T(XW-T)\right)_{i,i} $$ $$= \frac\partial{\partial W}\frac12 \sum_{i,j}\left(\left(XW-T\right)^T_{i,j}(XW-T)_{j,i}\right)$$ $$=\frac\partial{\partial W}\frac12 \sum_{i,j} \left((XW-T)^2_{j,i}\right) $$ $$=\frac\partial{\partial W}\frac12 \sum_{i,j,k} \left((X_{j,k}W_{k,i}-T_{j,i})^2\right) $$

Now if we consider just the m,n entry of this matrix

$$ \left[\frac\partial{\partial W}\frac12 \sum_{i,j,k} \left((X_{j,k}W_{k,i}-T_{j,i})^2\right)\right]_{m,n} = \frac\partial{\partial W_{m,n}}\frac12 \sum_{i,j,k} \left((X_{j,k}W_{k,i}-T_{j,i})^2\right) \quad\quad\quad\quad \text{(3)}$$

Now we can put $k = m$ and $ i = n$ as all other terms will have no dependence on $w_{m,n}$ (*). Therefore,

$$ \left[\frac\partial{\partial W}\frac12 \text{Tr}\left\{\left(XW-T)^T(XW-T)\right)\right\}\right]_{m,n} = \frac\partial{\partial W_{m,n}}\frac12 \sum_{j} \left((X_{j,m}W_{m,n}-T_{j,n})^2\right) $$ $$= \sum_{j}(X_{j,m}W_{m,n}-T_{j,n})X_{j,m} $$ $$= \sum_{j} \left( X_{j,m}W_{m,n}X_{j,m} - T_{j,n}X_{j,m}\right) $$ $$= \sum_{j} \left( X_{j,m}W_{m,n}X_{j,m} \right) - \left(X^TT\right)_{m,n} $$

But the first term on the LHS above does not equal $(X^TXW)_{m,n}$

Edit: the problem is with the application of * everything afer * is incorrect becuase it neglects the cross terms from RHS of (3).

1

There are 1 best solutions below

0
On

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$To reduce the clutter, define the matrix $Y$ with components $$\eqalign{ Y_{ij} &= \sum_k X_{ik}W_{kj}-T_{ij} \\ }$$ Calculate the component-wise derivative of $Y$ wrt $W$ $$\eqalign{ \p{Y_{ij}}{W_{mn}} &= \sum_k X_{ik}\,\delta_{km}\delta_{jn} \;=\; X_{im}\,\delta_{jn} \\ }$$ Use this result to calculate the component-wise derivative of the objective function $$\eqalign{ \phi &= \sum_i\sum_j \tfrac 12Y_{ij}\,Y_{ij} \\ \p{\phi}{W_{mn}} &= \sum_i\sum_j Y_{ij}\left(\p{Y_{ij}}{W_{mn}}\right) \\ &= \sum_i\sum_j Y_{ij}\left(X_{im}\,\delta_{jn}\right) \\ &= \sum_i Y_{in}X_{im} \\ &= \left(X^TY\right)_{mn} \\ }$$ The error that you made was to naively $\color{blue}{\rm replace}$ the indices $(k,m=i,n),\,$ instead you must use Kronecker deltas as shown above.