Trace operator in Least Squares Classification

977 Views Asked by At

I am reading Bishop's PRML book and got a bit stuck with equation 4.15 in the part that covers application of least squares method in classification:

$E_D(\tilde{\mathbf{W}}) = \frac{1}{2}Tr\{(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})^T(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})\}$

What is the trace operator doing in the matrix/vector sum of least squares? Can someone provide a simple and intuitive example that illustrates the squared error and trace operator connection?

Thank you very much in advance!

Vlad

2

There are 2 best solutions below

0
On BEST ANSWER

This is simply a matrix form of the sum-of-squares error function. To see this, note that:

  1. The rows of $\tilde{\mathbf{X}} \tilde{\mathbf{W}}$ are your predictions (one row per sample);
  2. Thus, the entries of $\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T}$ are the errors, where the row indicates which sample we're viewing, and the column which component of that sample;
  3. The $i$-th diagonal entry of $\mathbf{A}^T \mathbf{A}$ is the sum of the squared elements of the $i$-th column. So, the $i$-th diagonal entry of $(\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T})^T (\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T})$ is the sum of the squared errors made in the $i$-th component, taken over all samples;
  4. At the end, we apply the trace operator to sum up the diagonal entries (errors).

The factor of $\frac 1 2$ is there to make things simpler when the expression is differentiated.

0
On

The trace here is the Frobenius inner product and the function to be minimized is a squared (Frobenius) norm.