Sum squared error minimization

342 Views Asked by At

I will define the variables first. Let's say $t = d(x) + \delta$ where x is a N dimensional vector, d(x) is a deterministic function and $\delta$ is noise that has Gaussian probability distribution with $0$ mean and variance $\varsigma^2 $. Now, I have linear function approximation with a parameter vector $w$ : $y = \sum_{n=0}^{N} w_n x_n = w^T \cdot x$. Now, I want to find the optimal $w$ such that the SSE = $\sum_{n=1}^{N}(t^n - y^n)^2$ is minimized. Here $t^n$ is the actual value of in the input. $y^n$ is the output we have predicted. Hence, we are trying to get $y^n$ equal as closely possible to $t^n$. So from an intuitive stand point, I understand that finding an optimal set of weights would cause $y = t$, which would minimize the SSE. However, I am not sure where to start when deriving this mathematically. My initial approach would be to take the derivative of $y$ with respect to $w$ and set it equal to 0. But if I did that, then that would just give me $x = 0$.