Show that the normal equations are identical to $\frac{\partial}{\partial\theta_j}QS(\theta)=0~\forall~j=1,\ldots,k$

36 Views Asked by At

Let the quadratic sum be given by $QS(\theta)=\sum_{i=1}^{n}(y_i-x^i\theta)^2$, with $y=(y_1,\ldots,y_n)^T, \theta=(\theta_1,\ldots,\theta_k)^T$ and $$ X=\begin{pmatrix}x_{11} & \ldots & x_{1k}\\ \vdots & \ddots & \vdots\\ x_{n1} & \ldots & x_{nk}\end{pmatrix}. $$ Show that the normal equations $X^TX\theta=X^Ty$ are identical with $$ \frac{\partial}{\partial\theta_j} QS(\theta)=0~\forall~j=1,\ldots,k. $$

Hi, good evening, here is my proof.

The normal equation is $\textbf{X}^T\textbf{X}\theta=\textbf{X}^Ty$. It is \begin{equation} \textbf{X}\theta=\begin{pmatrix}x_{11} & \ldots & x_{1k}\\x_{21} & \ldots & x_{2k}\\\vdots & \ddots & \vdots\\x_{n1} & \ldots & x_{nk}\end{pmatrix}\cdot\begin{pmatrix}\theta_1\\ \vdots\\ \theta_k\end{pmatrix}=\begin{pmatrix}\sum_{i=1}^{k}x_{1i}\theta_i\\\sum_{i=1}^{k}x_{2i}\theta_i\\ \vdots\\ \sum_{i=1}^{k}x_{ni}\theta_i\end{pmatrix} \end{equation} and from this for the left hand side (LHS) it follows \begin{equation} LHS=\textbf{X}^T\textbf{X}\theta=\begin{pmatrix}x_{11} & x_{21} & \ldots & x_{n1}\\x_{12} & x_{22} & \ldots & x_{n2}\\\vdots & \vdots & \ddots & \vdots\\ x_{1k} & x_{2k} & \ldots & x_{nk}\end{pmatrix}\cdot \begin{pmatrix}\sum_{i=1}^{k}x_{1i}\theta_i\\\sum_{i=1}^{k}x_{2i}\theta_i\\ \vdots\\ \sum_{i=1}^{k}x_{ni}\theta_i\end{pmatrix}=\begin{pmatrix}\sum_{i=1}^{n}\sum_{j=1}^{k}x_{i1}x{ij}\theta_j\\\sum_{i=1}^{n}\sum_{j=1}^{k}x_{i2} x_{ij}\theta_j\\\vdots\\\sum_{i=1}^{n}\sum_{j=1}^{k}x_{ik}x_{ij}\theta_j\end{pmatrix}. \end{equation} For the right hand side (RHS) it is \begin{equation} RHS=\textbf{X}^Ty=\begin{pmatrix}x_{11} & x_{21} & \ldots & x_{n1}\\x_{12} & x_{22} & \ldots & x_{n2}\\\vdots & \vdots & \ddots & \vdots\\ x_{1k} & x_{2k} & \ldots & x_{nk}\end{pmatrix}\cdot\begin{pmatrix}y_1\\\vdots\\y_n\end{pmatrix}=\begin{pmatrix}\sum_{i=1}^{n}x_{i1}y_i\\\sum_{i=1}^{n}x_{i2}y_i\\\vdots\\\sum_{i=1}^{n}x_{ik}y_i\end{pmatrix}. \end{equation} So the normal equations are given as \begin{equation} \sum_{i=1}^{n}\sum_{j=1}^{k}x_{is}x_{ij}\theta_j=\sum_{i=1}^{n}x_{is}y_i, s=1,\ldots,k.~~(+) \end{equation}

It is \begin{equation} \frac{\partial}{\partial\theta_j}QS(\theta)=\frac{\partial}{\partial\theta_j}\left(\sum_{i=1}^{n}(y_i-x^i\theta)^2\right), x^i\theta=\sum_{s=1}^{k}x_{is}\theta_s. \end{equation} With the chainrule for partial differentiation and binomial rule it is \begin{align} \frac{\partial}{\partial\theta_j}\left(\sum_{i=1}^{n}(y_i-x^i\theta)^2\right)&=\underbrace{\sum_{i=1}^{n}\frac{\partial}{\partial\theta_j}(y_i^2)}_{=0}-2\sum_{i=1}^{n}\frac{\partial}{\partial\theta_j}(y_i x^i\theta)+\sum_{i=1}^{n}\frac{\partial}{\partial\theta_j}((x^i\theta)^2)\\ &=-2\sum_{i=1}^{n}y_ix_{ij}+2\sum_{i=1}^{n}\sum_{s=1}^{k}x_{is}x_{ij}\theta_s \end{align} Now assume that $\frac{\partial}{\partial\theta_j}QS(\theta)=0$ for all $j=1,\ldots,k$ then it is \begin{equation} -2\sum_{i=1}^{n}y_ix_{ij}=-2\sum_{i=1}^{n}\sum_{s=1}^{k}x_{is}x_{ij}\theta_s \end{equation} resp. after division by $-2$ on both sides \begin{equation} \sum_{i=1}^{n}y_ix_{ij}=\sum_{i=1}^{n}\sum_{s=1}^{k}x_{is}x_{ij}\theta_s~\forall 1\leqslant j\leqslant k. \end{equation} So this equations are the same then the equations (+). So it does not matter if we say that the normal equations have to be fullfilled or if we want that the partial derivatives of the square sum all have to be $0$.


Is my proof okay? I know its lot of calculation but maybe somebody is motivated to say me if my proof is correct.

With kind regards

math12

1

There are 1 best solutions below

1
On BEST ANSWER

I think it looks fine. Or you can simply use matrix algebra. Since $$ QS(\theta)=(\mathbf{y}-\mathbf{X}\theta)^\prime (\mathbf{y}-\mathbf{X}\theta) $$ it follows that $$ \frac{\partial}{\partial \theta_j}QS(\theta)=\mathbf{x}_{\cdot j}^\prime(\mathbf{y}-\mathbf{X}\theta)+(\mathbf{y}-\mathbf{X}\theta)'\mathbf{x}_{\cdot j}=0\\ =\mathbf{x}_{\cdot j}'\mathbf{y}+\mathbf{y}'\mathbf{x}_{\cdot j}-\mathbf{x}'_{\cdot j}\mathbf{X}\theta-\theta\mathbf{X}'\mathbf{x}_{\cdot j}=0 $$ Note that all terms are scalars, meaning that the transposed terms are equal to the non-transposed ($\mathbf{x}_{\cdot j}$ is column $j$ of $\mathbf{X}$). $$ \frac{\partial}{\partial \theta_j}QS(\theta) =2\mathbf{x}_{\cdot j}'\mathbf{y}-2\mathbf{x}'_{\cdot j}\mathbf{X}\theta=0\\ \Rightarrow\mathbf{x}_{\cdot j}'\mathbf{y}-\mathbf{x}'_{\cdot j}\mathbf{X}\theta=0 $$ So then by stacking these partial derivatives (skipping the constant $2$): $$ \frac{\partial}{\partial \theta}QS(\theta)=\begin{pmatrix}\frac{\partial}{\partial \theta_1}QS(\theta) \\ \frac{\partial}{\partial \theta_2}QS(\theta) \\ \vdots \\ \frac{\partial}{\partial \theta_K}QS(\theta)\end{pmatrix}\Rightarrow\begin{pmatrix}\mathbf{x}_{\cdot 1}'\mathbf{y}-\mathbf{x}'_{\cdot 1}\mathbf{X}\theta \\ \mathbf{x}_{\cdot 2}'\mathbf{y}-\mathbf{x}'_{\cdot 2}\mathbf{X}\theta \\ \vdots \\ \mathbf{x}_{\cdot K}'\mathbf{y}-\mathbf{x}'_{\cdot K}\mathbf{X}\theta\end{pmatrix}=\boldsymbol{0}. $$ This is equal to $$ \begin{pmatrix}\mathbf{x}_1'\mathbf{y} \\ \mathbf{x}_2'\mathbf{y} \\ \vdots \\ \mathbf{x}_K'\mathbf{y}\end{pmatrix} = \begin{pmatrix} \mathbf{x}'_1\mathbf{X}\theta \\ \mathbf{x}'_2\mathbf{X}\theta \\ \vdots \\ \mathbf{x}'_K\mathbf{X}\theta\end{pmatrix}\\ \begin{pmatrix}\mathbf{x}_1'\\ \mathbf{x}_2'\\ \vdots \\ \mathbf{x}_K'\end{pmatrix}\mathbf{y} = \begin{pmatrix} \mathbf{x}'_1 \\ \mathbf{x}'_2 \\ \vdots \\ \mathbf{x}'_K\end{pmatrix}\mathbf{X}\theta\\ $$ and since $\mathbf{X}=(\mathbf{x}_{\cdot 1}, \mathbf{x}_{\cdot 2}, \dots, \mathbf{x}_{\cdot K})$ it follows that this is $$ \mathbf{X}'\mathbf{y}=\mathbf{X}'\mathbf{X}\theta, $$ i.e. the normal equations.