I am reading Bishop's PRML book and got a bit stuck with equation 4.15 in the part that covers application of least squares method in classification:
$E_D(\tilde{\mathbf{W}}) = \frac{1}{2}Tr\{(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})^T(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})\}$
What is the trace operator doing in the matrix/vector sum of least squares? Can someone provide a simple and intuitive example that illustrates the squared error and trace operator connection?
Thank you very much in advance!
Vlad
This is simply a matrix form of the sum-of-squares error function. To see this, note that:
The factor of $\frac 1 2$ is there to make things simpler when the expression is differentiated.