Minimization of the sum of squares classification error

154 Views Asked by At

I want to minimize $$ E_D( \widetilde W) = \frac {1}{2} Tr\{(\widetilde X \widetilde W − T)^T (\widetilde X \widetilde W − T)\} $$ with respect to W.

Where $ \widetilde W $ is a (D x K) matrix, $ \widetilde X $ is a (N x D) matrix and $ T $ is a (N x K) matrix.

What I have done is $$ E_D( \widetilde W)= \frac{1}{2} Tr \{( \widetilde W^T \widetilde X − T^T) ( \widetilde X \widetilde W− T) \} $$ $$ =\frac{1}{2}Tr \{ (\widetilde W^T \widetilde X^T \widetilde X \widetilde W− \widetilde W^{T} X^{T} T − T^T \widetilde X \widetilde W + T^T T)\} $$

At this step, I would take the partial differentiation with respect to W but I don't know what to do with the trace.

Solving for W I am supposed to get is

$$ W = (\widetilde X^T\widetilde X)^{-1} \widetilde X^T T $$

1

There are 1 best solutions below

1
On

$$\operatorname {tr} (X^{\intercal}Y)=\operatorname {tr} (XY^{\intercal})=\operatorname {tr} (Y^{\intercal}X)=\operatorname {tr} (YX^{\intercal})=\sum _{i,j}X_{ij}Y_{ij}$$

$$\frac{\partial E_D(W)}{\partial W_{ij}}=0,\quad \forall i,j$$

$$\frac{\partial }{\partial W_{ij}}(\frac12 \operatorname {tr} [(XW-T)^{\intercal}(XW-T)])=0,\quad \forall i,j$$

$$A\triangleq XW-T$$

$$X A^{\intercal}=0$$

$$X^{\intercal} (XW-T)=0$$

$$X^{\intercal} X W-X^{\intercal} T=0$$

$$X^{\intercal} X W=X^{\intercal} T$$

$$ W= ( X^{\intercal} X)^{-1} X^{\intercal} T$$