Taking partial derivative of SSD, wrt the parameters $a, b$?

40 Views Asked by At

In the book titled: Analysis of Straight-line data, by Forman S. Acton; it is given on page #10:


The classical 'least squares' procedure is most commonly derived by forming an expression for the sum of the squared vertical deviations from a general line, then demanding that this expression be minimized wrt the parameters of the line. If our points $(x_i, y_i)$ are computed with a line:
$$Y_i = a + bx_i, (1)$$ then the sum of squared deviations are given by $$SSD = \sum_{i=1}^n (y_i - Y_i)^2 \equiv (y_i - a - bx_i)^2. (2)$$

We are to choose a, and b so that this expression is a minimum, which we accomplish by partially differentiating SSD wrt a, and to b, and equating each derivative separately to zero, thereby obtaining the so-called normal equations of the system:
$$an + b\sum x = \sum y\\ a\sum x + b\sum x^2 = \sum xy. (3)$$

(Summations are all over the data points unless explicitly indicated otherwise; thus $\sum xy $ means $\sum_{i=1}^n xy,$ etc.)

Equations (3) are a pair of simultaneous equations for the two fitted parameters a, b, so it seems as if there might be little to say about the procedure, except to point out that once a, and b have been found out, the minimal value of SSD may be computed most expediently from the equation $$SSD = \sum y^2 -a\sum y - b\sum {xy}, (4)$$ which follows from (2), and (3) using some algebra.


Question: Am unable to understand how the partial derivatives of SSD are taken wrt a, b; i.e.
$\dfrac{\partial SSD (= \sum_{i=1}^n (y_i - Y_i)^2 \equiv (y_i - a - bx_i)^2)}{\partial a}$ (or, the partial derivative ($\dfrac{\partial SSD}{\partial b}$) ) is computed to yield equations in (3)?

==================================================

Edit : As per the answer given by @ultralegend5385, the derivation for the equation #(3) is clear. But, the implied meanings of the two normal equations for SSD, is still unclear.

Also, the derivation of the equation #4, is still unclear. Might be the lack of clarity stems directly from the inability to understand the crux of the two normal equations.


Edit #2: Here is an attempt to explain the normal equations, of the equation (3).
The first equation states the sum of the actual dependent (i.e., y) values of data points, as the sum of two terms.
The first term is the product of the linear regression line's slope, and the number of data points. While the second term is the product of the intercept of the linear regression line, and the sum of independent variable values, for all the data points.

This itself is too confusing, and not clear as to the actual meaning implied by the same.

The next equation is even more complex. But, if can understand the meaning of the first normal equation in (3), then only can proceed.

1

There are 1 best solutions below

2
On BEST ANSWER

I shall assume familiarity with chain rule and power rule of derivatives ($(x^n)^\prime=nx^{n-1}$). Let me know if you need help with these.

We have $$\mathsf{SSD}=\sum_{i=1}^n(y_i-a-bx_i)^2$$ Let us differentiate one of the terms $(y_i-a-bx_i)^2$ w.r.t. $a$ and $b$. Note that for any function $f(x)$, the chain rule gives $$\frac{\mathrm d}{\mathrm dx}(f(x))^2=2f(x)f^\prime(x)$$ So, first taking $y_i-a-bx_i=f(a)$ and noting that $f^\prime(a)=-1$, we get $$\frac{\partial}{\partial a}(y_i-a-bx_i)^2=2(y_i-a-bx_i)(-1)$$ So, the derivative of the sum will just be $$\frac{\partial }{\partial a}\textsf{SSD}=\sum_{i=1}^n-2(y_i-a-bx_i)$$ This is equal to $0$, and we can cancel the $-2$ to get $$\sum_{i=1}^ny_i-\sum_{i=1}^na-\sum_{i=1}^nbx_i=0$$ which is precisely the first equation in $(3)$.

Similarly, if we set $y_i-a-bx_i=f(b)$, and $f^\prime(b)=-x_i$, we will get $$\frac{\partial }{\partial b}(y_i-a-bx_i)^2=2(y_i-a-bx_i)(-x_i)$$ So, we get $$\frac{\partial }{\partial b}\mathsf{SSD}=2(y_i-a-bx_i)(-x_i)$$ Try setting this to zero, and see if it matches the other equation in $(3)$.


EDIT

To derive the expression for $\mathsf{SSD}$, first I will write $(3)$ in a different way. I will use the standard notation $$\mathsf{SS}_{xx}=\sum_{i=1}^n(x_i-\bar{x})^2=\sum x_i^2-n\bar{x}^2$$ $$\mathsf{SS}_{yy}=\sum_{i=1}^n(y_i-\bar{y})^2=\sum y_i^2-n\bar{y}^2$$ $$\mathsf{SS}_{xy}=\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})=\sum x_iy_i-n\bar{x}\bar{y}$$ You can verify these easily. Now, I will write $a=\bar{y}-b\bar{x}$, and then the second equation in $(3)$ as $\newcommand{\SS}{\mathsf{SS}}$ $$an\bar{x}+b(\SS_{xx}+n\bar{x}^2)=\SS_{xy}+\bar{x}\bar{y}$$ Substituting the value for $a$, we get that $$b=\frac{\SS_{xy}}{\SS_{xx}}$$ (again a fairly simple check). Then, with this, I can write $$\mathsf{SSD}=\sum(y_i-a-bx_i)^2=\sum\left((y_i-\bar{y})-b(x_i-\bar{x})\right)^2$$ Expanding and using the $\SS$ notation, the above expression reduces down to $$\mathsf{SSD}=\SS_{yy}+b^2\SS_{xx}-2b\SS_{xy}=\SS_{yy}-b\SS_{xy}$$ since $b^2\SS_{xx}=b\cdot b\SS_{xx}=b\SS_{xy}$. You can verify that this and $(4)$ are the same.

About a ''meaning'' of the equations in $(3)$, I have no clue and I believe you cannot really explain them in a meaningful way. But what you can do is, in general (this is called multiple linear regression), you have a model that looks like $$y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\cdots+\beta_{p-1}x_{p-1,1}+\varepsilon_i$$ which can be written in a matrix form as $Y=X\beta+\varepsilon$ ($Y$ is simply the vector of all $y_i$s, $\beta$ the vector of the $\beta_j$s, $X$ the matrix with the first column all $1$s, and the rest given by $x_{i,j}$s.) Then, the estimate for the $\beta_j$s (that minimises $\mathsf{SSD}$) can be written in one go using the matrix equation $X\hat{\beta}=PY$ where $P$ is the projection matrix onto the column space of $X$. So, it has a nice linear algebraic explanation, but I am not aware of your background so I don't know how much of this you understand. Anyway, hope this helped. :)