How do I do the following matrix differentiation?

Question

How do I do the following matrix differentiation?

385 Views Asked by Bumbble Comm At 28 Mar 2026 - 6:59

If $\mathbf{A}$ is symmetric, my function $f:\mathbb{R}^{d}\mapsto \mathbb{R}^{d}$ is defined as $$ f(\mathbf{x}) = \mathbf{Ax}(\mathbf{x}^{T}\mathbf{Ax}). $$ What is the differentiation of $f$ with respect to $\mathbf{x}$, i.e., $\nabla_{\mathbf{x}}f(\mathbf{x})$?

This is not a homework problem but rather something related to my research and I converted the notations so that it is more legible. The function $f$ itself is already a gradient of some vector function that maps to a scalar and I initially assumed the Hessian of the original function would be positive definite but my code keeps firing errors at me. Without proper education in matrix calculus, the answer I came up with is $$ \nabla_{\mathbf{x}}f(\mathbf{x}) = \mathbf{A}(\mathbf{x}^{T}\mathbf{Ax}) + 2\mathbf{Axx}^{T}\mathbf{A} $$ Is this correct?

Original Q&A

There are 4 best solutions below

Bumbble Comm On 03 Sep 2017 - 9:00

You are correct. If you are not sure about your result, here is the general method.

Computing $f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})$ leads indeed to :

$$f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})=\mathbf{x}^TA\mathbf{x}A\mathbf{h}+2A\mathbf{x}\mathbf{x}^TA\mathbf{h}+\underset{=o(|h|)}{\underbrace{A\mathbf{x}\mathbf{h}^TA\mathbf{h}+2A\mathbf{h}\mathbf{x}^TA\mathbf{h}+A\mathbf{h}\mathbf{h}^TA\mathbf{h}}}.$$

I only expanded $f(\mathbf{x}+\mathbf{h})$ using linearity of operations and used the fact that $\mathbf{x}^TA\mathbf{h}=\mathbf{h}^TA\mathbf{x}$ since $A$ is symmetric.

Hence $\nabla_{\mathbf{x}}f(\mathbf{x})=\mathbf{x}^TA\mathbf{x}A+2A\mathbf{x}\mathbf{x}^TA$.

Bumbble Comm On 03 Sep 2017 - 9:30

Another approach is using the chain rule. First, recall that there exists $Q\in M_d(\Bbb R)$ such that $A=Q^TQ$ (for example, Cholesky). Now let $z:=Qx$, then $$f(x)=Q^TQxx^TQ^TQX=Q^Tzz^Tz=:g(z)$$ So $$D_xf(x)=D_zg(z)D_xQx$$ Where $$D_zg(z)=Q^TD_z zz^Tz=Q^T((z^Tz)I+2zz^T)$$ Finally $$D_xf(x)=Q^T((z^Tz)I+2zz^T)Q=(z^Tz)Q^TQ+Q^Tzz^TQ$$ Plugging in $z=Qx,A=Q^TQ$: $$D_xf(x)=(x^TAx)A+2Axx^TA$$

Bumbble Comm On 02 Apr 2022 - 12:49

$ \def\a{\alpha}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $For typing convenience, define the scalar variable $$\eqalign{ \a &= x^TAx \qiq d\a = 2x^TA\;dx \\ }$$ Use this to write the objective function, then calculate its differential and gradient $$\eqalign{ f &= \a Ax \\ df &= Ax\;d\a + \a A\;dx \\ &= Ax\LR{2x^TA\;dx} + \a A\;dx \\ &= \LR{2Axx^TA + \a A} dx \\ \grad{f}{x} &= {2Axx^TA + \a A} \\ }$$ A result which matches your own.

**user228113** · Accepted Answer

In the pen-and-paper sense, your Jacobian matrix is correct. I presume that the code gives you errors due to the fact that it interprets the arrowed $*$ in $$A\stackrel\downarrow*(x^t*A*x)$$ as a product of matrices, rather than as the multiplication matrix-by-scalar. Thus it checks the dimensions, it sees that you are trying to multiply a $(d\times d)$ vector by a $(1\times 1)$ vector, and it concludes that you are making a syntax error. We humans implicitly assume $$A(x^tAx)=A\cdot (x^t*A*x)$$

Where $*$ is the map which makes the product of an $(n\times k)$ and a $(k\times m)$ matrix to obtain a $(n\times m)$ matrix, and $\cdot$ is the map which assigns to a $(n\times m)$ matrix and a scalar number (id est, a $(1\times1)$ matrix) the appropriate thing. However, as justified and useful as it is, it remains an inconsistent use (actually, non-use) of the notation.

The machine may not do such thing.

To see it more clearly, notice what happens with the calculation \begin{align}f(x+h)-f(x)&=A(x+h)((x+h)^tA(x+h))-Ax(x^tAx)=\\&=Ax(h^tAx)+Ax(x^tAh)+Ah(x^tAx)+o(\lvert h\rvert)=\\&= 2Axx^tAh+Ah(x^tAx)+o(\lvert h\rvert)\end{align}

Notice that there is no problem in writing $A*x*\alpha$, because $A:(d\times d)$, $x:(d\times 1)$ and $\alpha:(1\times1)$.

The calculation above shows that the differential $D_xf$ is actually the map $D_xf(h)= 2Axx^tAh+Ah(x^tAx)$. You could write it like that if your purpose is evaluating it.

However, if you want a matrix $\nabla_xf$ such that $\nabla_xfh=D_xf(h)$, you can obtain it with the identity $h*\alpha=(\alpha\cdot I)*h$, where $I$ is the $d\times d$ identity matrix, so that $$\nabla_xf=2*A*x*x^t*A+A*((x^t*A*x)\cdot I)$$ How you produce the appropriate scalar multiple of the identity matrix might depend on the programming language, but there should be several options, once you know that the problem is there.

Another way might be writing $((x^t*A*x)*A)$, because some programming languages have only the scalar-by-matrix product overloaded into the symbol $*$, rather than the matrix-by-scalar product.

How do I do the following matrix differentiation?

There are 4 best solutions below

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in JACOBIAN

Trending Questions

Popular # Hahtags

Popular Questions