Least Squares: Derivation of Normal Equations with Chain Rule

2.5k Views Asked by At

I'm new to Stackexchange so please bear with me. I'm struggling with the least squares formula. Now Wikipedia does show ways to derive the "normal equations".

But I'd like to get the same result using the Chain Rule. Which would be something like this (I don't know if it's possible for matrices, since I haven't seen this approach anywhere yet):

$$(y - Xb)^2$$

and differentiating it with respect to $b$, which would be something like:

$$2(y - Xb) * (-X)$$

It seems though, that a transpose sign is missing somewhere, since the formula should look as follows:

$$ \mathbf{-X}^{\rm T} \mathbf y+ (\mathbf X^{\rm T} \mathbf X ){\boldsymbol{b}}$$

Please help me to correct this or point out if I'm totally wrong.

3

There are 3 best solutions below

2
On

Im not sure, if your calculation is correct. Usually you have to minimze the following term:

$(y-xb)' \times (y-xb)$

$=(y'-b'x') \times (y-xb)$

multipying out.´

$=y'y-y'xb-b'x'y+b'x'xb$

It is $y'xb=b'x'y$

Thus you can write $=y'y-2b'x'y+b'x'xb$

Differentiate and set the derivative equal to 0.

$-2x'y+2x'xb=0$

Now solve this equation for b.

1
On

Instead of $m\!\times\!1$ and $n\!\times\!1$ vectors for $y$ and $b$, it might be easier to consider $m\!\times\!p$ and $n\!\times\!p$ matrices.

Then the quantity to minimize is $f = \|Y-X\!\cdot\!B\|^2_F = \|M\|^2_F$

Rewriting this to use the Frobenius product, $f = M:M$, makes taking the derivative easier. Here it is step by step. $$ \eqalign { df &= 2M:dM \cr &= 2M:d(Y-X\!\cdot\!B) \cr &= 2M:(-X\!\cdot\!dB) \cr &= 2(Y-X\!\cdot\!B):(-X\!\cdot\!dB) \cr &= 2(X\!\cdot\!B-Y):X\!\cdot\!dB \cr &= 2X^T\!\cdot(X\!\cdot\!B-Y):dB \cr }$$ So the derivative with respect to $B$ is $$ \eqalign { \frac {\partial f} {\partial B} &= 2X^T\!\cdot(X\!\cdot\!B-Y) \cr }$$ Equating it to zero yields the normal equations.

The step you are questioning is how $X$ gets transposed. In the last step of the derivation, we used $$ \eqalign { A:X\cdot\!B &= X^T\!\cdot\!A:B \cr }$$ to move $X$ to the other size of the Frobenius product. Since the Frobenius product is really just a trace operation, it's easy to prove that $$ \eqalign { A:X\cdot\!B &= \text{tr}(A\!\cdot(X\cdot\!B)^T) \cr &= \text{tr}(A\!\cdot\!B^T\!\cdot\!X^T) \cr &= \text{tr}(X^T\!\cdot\!A\!\cdot\!B^T) \cr &= X^T\!\cdot\!A:B \cr }$$

0
On

To get the derivative of the formula $(y-{X}b)^T (y-Xb)$, we can use the Chain Rule as follows:

Let $f(b) = y-Xb, g(u) = u^Tu$ where $u = f(b)$. then,

$$ \eqalign{ \frac {\partial [(y-{X}b)^T (y-Xb)]} {\partial b} &= \frac {\partial g(u)} {\partial u} \times \frac {\partial f(b)} {\partial b} }$$

Note that since $g(u)$ is in quadratic form($\partial X^TI_{n}X = 2X$), the derivative of $g(u)$ can be directly derived:

$$ \eqalign { \frac {\partial g(u)} {\partial u} &= \frac {\partial u^Tu} {\partial u} &= 2u^T \cr }$$

$u^T$ is a $1 \times n$ row vector to keep compatible with $\partial f(b)$.

$$ \eqalign { \frac {\partial f(b)} {\partial b} &= \frac {\partial (y-Xb)} {\partial b} \cr &= - \frac {\partial Xb} {\partial b} \cr &= -X ~~~(Jacobian ~matrix) }$$

So,

$$ \eqalign{ \frac {\partial [(y-{X}b)^T (y-Xb)]} {\partial b} &= \frac {\partial g(u)} {\partial u} \times \frac {\partial f(b)} {\partial b} \cr &= 2u^T \times (-X) \cr &= -2(y-Xb)^TX ~~~(1 \times n ~row ~vector) }$$

Finally transpose the result vector and we can get the familiar column vector:

$$ \eqalign{ [-2(y-Xb)^TX]^T &= -2X^T(y-Xb) &= -2X^Ty + 2X^TXb } $$