Consider a training set $\{(\vec{x^{(n)}}, \vec{t^{(n)}}) \in \rm I\!R^D \times I\!R^K : n = 1, ..., N \}$ where :
$\vec{t^{(n)}}$ is an indicator for the class membership of $\vec{x^{(n)}}$, i.e $\vec{t^{(n)}} = (0, 1, ... 0)$ if $\vec{x^{(n)}}$ belongs to $C2$
X is a $N \times (D+1)$ matrix whose nth row is $(\vec{x^{(n)}})^T$
- T is a $N \times K$ matrix whose nth row is $(\vec{t^{(n)}})^T$
The least square problem is to minimise the objective function:
$G($W$)$ = $\frac{1}{2}\sum_{n=1}^{N}\sum_{k=1}^{K}[y_k(\vec{t^{(n)}};\vec{w_k}) - t_k^{(n)}]^2$
where $y_k(\vec{t^{(n)}};\vec{w_k})$ is the linear discriminant function.
I know that W is minimised by W = (X$^T$X)$^{-1}$X$^T$X
However, I don't understand why there is this $\frac{1}{2}$ coefficient in front of the double summation.
It's a simplification tool. When you take the derivative of the objective function and the exponent $2$ comes out to multiply the double sum it will be cancelled out. Similarly, when you set the objective function's derivative to $0$ you could just as easily divide out the $2$ in the front.