Trace operator in Least Squares Classification

977 Views Asked by Bumbble Comm At 03 Apr 2026 - 1:52

I am reading Bishop's PRML book and got a bit stuck with equation 4.15 in the part that covers application of least squares method in classification:

$E_D(\tilde{\mathbf{W}}) = \frac{1}{2}Tr\{(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})^T(\tilde{\mathbf{X}}\tilde{\mathbf{W}}-\mathbf{T})\}$

What is the trace operator doing in the matrix/vector sum of least squares? Can someone provide a simple and intuitive example that illustrates the squared error and trace operator connection?

Thank you very much in advance!

Vlad

Original Q&A

There are 2 best solutions below

Bumbble Comm On 20 Sep 2015 - 5:31 BEST ANSWER

This is simply a matrix form of the sum-of-squares error function. To see this, note that:

The rows of $\tilde{\mathbf{X}} \tilde{\mathbf{W}}$ are your predictions (one row per sample);
Thus, the entries of $\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T}$ are the errors, where the row indicates which sample we're viewing, and the column which component of that sample;
The $i$-th diagonal entry of $\mathbf{A}^T \mathbf{A}$ is the sum of the squared elements of the $i$-th column. So, the $i$-th diagonal entry of $(\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T})^T (\tilde{\mathbf{X}} \tilde{\mathbf{W}} - \mathbf{T})$ is the sum of the squared errors made in the $i$-th component, taken over all samples;
At the end, we apply the trace operator to sum up the diagonal entries (errors).

The factor of $\frac 1 2$ is there to make things simpler when the expression is differentiated.

Bumbble Comm On 21 Sep 2015 - 9:36

The trace here is the Frobenius inner product and the function to be minimized is a squared (Frobenius) norm.

Trace operator in Least Squares Classification

There are 2 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions