Minimize objective function for least square classification

348 Views Asked by At

Consider a training set $\{(\vec{x^{(n)}}, \vec{t^{(n)}}) \in \rm I\!R^D \times I\!R^K : n = 1, ..., N \}$ where :

  • $\vec{t^{(n)}}$ is an indicator for the class membership of $\vec{x^{(n)}}$, i.e $\vec{t^{(n)}} = (0, 1, ... 0)$ if $\vec{x^{(n)}}$ belongs to $C2$

  • X is a $N \times (D+1)$ matrix whose nth row is $(\vec{x^{(n)}})^T$

  • T is a $N \times K$ matrix whose nth row is $(\vec{t^{(n)}})^T$

The least square problem is to minimise the objective function:

$G($W$)$ = $\frac{1}{2}\sum_{n=1}^{N}\sum_{k=1}^{K}[y_k(\vec{t^{(n)}};\vec{w_k}) - t_k^{(n)}]^2$

where $y_k(\vec{t^{(n)}};\vec{w_k})$ is the linear discriminant function.

I know that W is minimised by W = (X$^T$X)$^{-1}$X$^T$X

However, I don't understand why there is this $\frac{1}{2}$ coefficient in front of the double summation.

1

There are 1 best solutions below

1
On BEST ANSWER

It's a simplification tool. When you take the derivative of the objective function and the exponent $2$ comes out to multiply the double sum it will be cancelled out. Similarly, when you set the objective function's derivative to $0$ you could just as easily divide out the $2$ in the front.