How to derive this solution to this minimization problem in vector form?

89 Views Asked by At

We want to minimize the mean squared error $$ \sum_{t=1}^n (y_t - \theta^T x_t - \theta_0)^2. $$ Letting $X = [x_t, 1]$, we can rewrite the above problem in vector form as $$ \sum_{t=1}^n (y_t - [x_t^T, 1][\theta, \theta_0]^T)^2 \\ = \lVert [y_1, \ldots, y_n]^T - X [\theta, \theta_0]^T\rVert ^2 \\ = y^Ty - 2[\theta, \theta_0]X^Ty + [\theta, \theta_0]X^TX[\theta, \theta_0]^T. $$ Given $y$ and $X$ in above expression, the $\theta$'s that minimize the above expression are given as $$ [\theta, \theta_0]^T = (X^TX)^{-1}X^Ty. $$ How is this derived? All I can tell is that the first and last terms in the vector form expression are always positive. I am reading machine learning lecture notes. This is from page 3 (formula 14) from this lecture: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-machine-learning-fall-2006/lecture-notes/lec5.pdf

1

There are 1 best solutions below

0
On BEST ANSWER

Let $\beta = [\theta,\theta_0]$. Then we want to find the value for beta that minimizes $y^Ty - 2\beta X^Ty + \beta X^T X\beta$. Using the standard minimization technique we differentiate this equation with respect to $\beta$ and set it equal to zero, which gives \begin{equation} 0 = \frac{\partial}{\partial\beta}y^Ty - 2\frac{\partial}{\partial\beta}\beta X^Ty + \frac{\partial}{\partial\beta}\beta X^T X\beta = -2X^Ty+2X^TX\beta. \end{equation} The factor two can be removed after which we get that $\beta = (X^TX)^{-1}X^Ty$.