How to differentiate with respect a vector in this matrix expression?

Question

How to differentiate with respect a vector in this matrix expression?

539 Views Asked by Bumbble Comm At 18 Apr 2026 - 11:10

I have this function of $\beta$: $$f(\beta)=\left(\textbf{y}-\textbf{X}\beta\right)^T\left(\textbf{y}-\textbf{X}\beta\right)$$

Where:

$\textbf{y}$ is a $N \times 1$ column vector.
$\textbf{X}$ is a $N \times p$ matrix.
Therefore $\beta$ is a $p \times 1$ column vector.

I'm asked to differentiate $f(\beta)$ with respect to $\beta$, but I've never worked with matrices when it comes to differentiating, I find it a bit difficult.

I looked for help on some books I have and on the internet, and found these expressions:

$\left(\partial/\partial_{\textbf{x}}\right)\textbf{x}^T\textbf{y}=\left(\partial/\partial_{\textbf{x}}\right)\textbf{y}^T\textbf{x}=\textbf{y}$
$\left(\partial/\partial_{\textbf{x}}\right)\textbf{x}^TA\textbf{y}=\left(\partial/\partial_{\textbf{x}}\right)\textbf{y}^TA^T\textbf{x}=A\textbf{y}$

But I can't seem to be able to apply them in my case. Any help is more than appreciated, I'm still in my learning stage with mathematics.

EDIT

I've tried to apply the chain rule to no avail, apparently, because it doesn't match with the final solution given by the book I took this problem from: $$\dfrac{\partial f(\beta)}{\partial \beta}=-\textbf{X}^T\left(\textbf{y}-\textbf{X}\beta\right)-\left(\textbf{y}-\textbf{X}\beta\right)^T\textbf{X}$$

Original Q&A

There are 3 best solutions below

Bumbble Comm On 06 May 2017 - 3:37

Expanding the expression gives us $$f(\beta) = (y-X\beta)^T(y-X\beta) = y^Ty - \beta^TX^Ty - y^TX\beta + \beta^T X^T X \beta$$ Therefore, $$\frac{\partial f}{\partial \beta} = 0-X^Ty-y^TX + \frac{\partial }{\partial \beta} \beta^T A \beta$$ where $A = X^TX$. Note that $$\beta^T A\beta = \sum_{k=1}^p \sum_{\ell=1}^p \beta_k\beta_\ell A_{k\ell}$$ and so $$\frac{\partial}{\partial \beta_i} \beta^T A \beta = \sum_{k=1}^p \sum_{\ell=1}^p \frac{\partial}{\partial \beta_i}\beta_k\beta_\ell A_{k\ell} = \sum_{k=1}^p \sum_{\ell=1}^pA_{k\ell}\beta_\ell \frac{\partial \beta_k}{\partial \beta_i}+A_{k\ell}\beta_k\frac{\partial \beta_\ell}{\partial \beta_i}$$ $$= \sum_{k=1}^p \sum_{\ell=1}^pA_{k\ell}\beta_\ell \delta_{ki}+A_{k\ell}\beta_k\delta_{\ell i} = \sum_{k=1}^p A_{ki}\beta_k + \sum_{\ell=1}^p A_{i\ell}\beta_\ell = (\beta^TA+A\beta)_i$$ Thus, $$\frac{\partial f}{\partial \beta} = \beta^T X^TX + X^TX\beta - X^Ty-y^TX$$ which is equivalent to your book's solution.

Also, the formulas you gave are incorrect, since the dimensions don't work out. You should have $$\frac{\partial }{\partial x} y^Tx = y^T$$ $$\frac{\partial}{\partial x} y^TA^Tx = y^TA^T$$

Bumbble Comm On 07 May 2017 - 3:23

Define a new vector
$$\eqalign{ w &= X\beta-y \cr}$$ Then write the function in terms of the inner/Frobenius product (denoted by a colon) and this new variable. In this new form, finding the differential and gradient is straightforward. $$\eqalign{ f &= w:w \cr\cr df &= 2w:dw \cr &= 2w:X\,d\beta \cr &= 2X^Tw:d\beta \cr\cr \frac{\partial f}{\partial\beta} &= 2X^Tw \cr &= 2X^T(X\beta-y) \cr\cr }$$ Don't be put-off by the Frobenius product, it's merely a convenient infix notation for the trace $$\eqalign{A:B={\rm tr}(A^TB)\cr}$$

**Bumbble Comm** · Accepted Answer

Just play a Taylor series trick. Recall, Taylor series tell us that $$ f(x+\partial x) = f(x) + f^\prime(x)\partial x + o(\|\partial x\|) $$ Hence, we can see that \begin{align*} f(\beta+\partial \beta) =&(y-X(\beta+\partial \beta))^T(y-X(\beta+\partial \beta))\\ =&(y-X\beta+X\partial \beta)^T(y-X\beta+X\partial \beta)\\ =&\underbrace{(y-X\beta)^T(y-X\beta)}_{f(\beta)}\underbrace{-(X\partial \beta)^T(y-X\beta) - (y-X\beta)^T(X\partial \beta)}_{f^\prime(\beta)\partial \beta}+\underbrace{(X\partial \beta)^T(X\partial\beta)}_{o(\|\partial \beta\|}\\ \end{align*} Basically, we just expand $f(\beta + \partial \beta)$, regroup terms, and Taylor's theorem tells us which one the derivative is. This gives \begin{align*} f^\prime(\beta)\partial \beta =&-(X\partial \beta)^T(y-X\beta) - (y-X\beta)^T(X\partial \beta)\\ =&-\partial \beta^TX^T(y-X\beta) - (y-X\beta)^TX\partial \beta\\ =&-(X^T(y-X\beta))^T\partial \beta - (y-X\beta)^TX\partial \beta\\ =&-2(X^T(y-X\beta))^T\partial \beta\\ \end{align*}

From the Riesz representation theorem we get that $\langle \nabla f(\beta),\partial \beta\rangle = f^\prime(\beta)\partial \beta$ or that $\nabla f(\beta)^T\partial\beta = f^\prime(\beta)\partial \beta$. Matching terms, we get the result we want which is $$ \nabla f(\beta) = -2X^T(y-X\beta). $$

How to differentiate with respect a vector in this matrix expression?

There are 3 best solutions below

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in MATRIX-EQUATIONS

Trending Questions

Popular # Hahtags

Popular Questions