How is the derivative with respect to vector is taken in linear regression?

Question

How is the derivative with respect to vector is taken in linear regression?

537 Views Asked by Bumbble Comm At 12 Apr 2026 - 5:30

In the book I am studying the author motivates that the sum of the distances of data points to the fitted line can be written in matrix form as $$ (t-X\beta)^T(t-X\beta) $$ where X is a matrix that has an observation on each row, t are the column vector of corresponding target values, and $\beta$ are the column vector of parameters we are estimating.

So far, good. Then, to minimize this sum, we need to take the derivative with respect to $\beta$ and set to zero.

We get $$ X^T(t-X\beta)=0 $$ And that is too much a jump for me. I think I know basic algebra, but not matrix calculus. Can you detail the steps from first equation to the second?

I can go this far: $$ (t-X\beta)^T(t-X\beta) $$ $$ (t^T-\beta^TX^T)(t-X\beta) $$ $$ (t^Tt-t^TX\beta-\beta^TX^Tt+\beta^TX^TX\beta) $$

But I do not know how to take the derivative of the last line w.r.t. $\beta$.

Thanks.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 06 Mar 2014 - 12:37

Let $\phi(\beta) = (t-X\beta)^T(t-X\beta)$. Now consider \begin{eqnarray} \phi(\beta+h)&=& (t-X(\beta+h))^T(t-X(\beta+h)) \\ &=& t^Tt -2 t^TX(\beta+h) +(X(\beta+h))^T X(\beta+h) \\ &=& t^Tt -2 t^TX(\beta+h) + \beta^TX^TX\beta + 2\beta^T X^TXh +h^TX^TX h \\ &=& t^Tt -2 t^TX\beta- 2 t^TXh + \beta^TX^TX\beta + 2\beta^T X^TXh +h^TX^TX h \\ &=& t^Tt -2 t^TX\beta + \beta^TX^TX\beta - 2 t^TXh + 2\beta^T X^TXh +h^TX^TX h \\ &=& \phi(\beta) + 2(X \beta-t)^TXh + h^TX^TX h \\ \end{eqnarray} Note that $|h^TX^TX h| \le \|X\|^2 \|h\|^2$ (in particular, it is $o(\|h\|)$), and $\beta \to 2(X \beta-t)^TXh$ is linear, hence we obtain $D \phi(\beta) h = 2(X \beta-t)^TXh$, or we can write $D \phi(\beta) = 2(X \beta-t)^TX$.

Hence at a minimum we have $D \phi(\hat{\beta}) = 0$, which gives rise to the normal equations $(X \hat{\beta}-t)^TX = $, or equivalently, $X^T(X \hat{\beta} -t) = 0$.

**Bumbble Comm** · Accepted Answer

Here I list a few basic rules for matrix calculus which can be applied to a large amount of vector derivatives and also can be readily proved like coppper.hat's method. Assume $A$ is a constant matrix and $x$ is a vector, then the following is true

$$\begin{align}\nabla_xAx&=A\\\nabla_xx^TA&=A^T\\\nabla_xx^TAx&=x^TA+(Ax)^T=x^T(A+A^T)\qquad\text{(product rule)}\end{align}$$

Then for your question, the deduction is straightforward. $$\begin{align}&\nabla_\beta(t^Tt-t^TX\beta-\beta^TX^Tt+\beta^TX^TX\beta)\\=&-t^TX-(X^Tt)^T+\beta^T(X^TX+(X^TX)^T)\\=&2\beta^TX^TX-2t^TX\\=&2(X\beta-t)^TX\end{align}$$

How is the derivative with respect to vector is taken in linear regression?

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions