Understanding how linear regression equation is simplified

54 Views Asked by At

I am learning linear regression. I came across following:

We need to find weights $w$ to reduce error function. So, $$w^*=\arg \min_w{E(w,\mathcal{D})}=\arg \min_w\sum_{i=0}^n(y_i-w^Tx_i)^2 $$ whre $(y_i-w^Tx_i)^2$ is squared error function, $\mathcal{D}$ is training data and $n$ is number of samples in training data.
Solve for $w$ by setting $\nabla_wE=\nabla_w\sum_{i=0}^n(y_i-w^Tx_i)^2=0$
$$\nabla_wE=-2\sum_iy_ix_i+2\color{blue}{\sum_i(w^Tx_i)x_i}=0$$
$$\nabla_wE=-2X^Ty+2\color{blue}{X^TXw}=0\Longrightarrow w^*=\color{red}{(X^TX)^{-1}X^Ty}$$

Q1. I am not able to understand how those two blue equations are same? (First one is in summation form, whereas second blue equation considers whole set of $x_i$'s as matrix $X$).

Q2. Also I did not understand how the red colored matrix form is achieved.

1

There are 1 best solutions below

0
On

$X$ is a matrix whose $i$th row is $x_i^\top$.

The sum $\sum_i (w^\top x_i) x_i$ is a linear combination of vectors $x_i$ with coefficients $(w^\top x_i)$; such a linear combination can be written as $X^\top v$ (note that the columns of $X^\top$ are $x_1, \ldots, x_n$) where $v$ is a vector whose $i$th entry is $w^\top x_i$. (If this is not clear to you, think about how in general the matrix multiplication $Av$, where $A$ is a matrix and $v$ is a vector, is a linear combination of the columns of $A$.)

Finally, the vector $v$ can be written as $Xw$; just check that the $i$th entry of $Xw$ is precisely $x_i^\top w$.


The last equation can be rearranged to $X^\top y = X^\top X w$. Multiplying both sides on the left by $(X^\top X)^{-1}$ yields the red equation.