Gradient of function involving trace of matrix product

Question

Gradient of function involving trace of matrix product

217 Views Asked by Bumbble Comm At 11 May 2026 - 4:18

Suppose we have a function $f: \mathbb{R}^{d-1} \to \mathbb{R}$ given by

$$ f(x_1, \ldots, x_{d-1}) = \operatorname{trace} \left( A^T \mathbf{1}_{n\times 1} [ x_1, x_2, \ldots, x_{d-1}] B\right)$$

where $A,B$ are $n\times d$ and $(d-1)\times d$ matrices respectively. I'm trying to compute the gradient of this function. Currently I am just trying to expand out the entire expression inside the trace and compute the gradient that way, but it's quite messy and I'm not able to push it through. Can anyone point out a better approach? Thank you.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 30 May 2019 - 6:27

I may have hit submit on this question too early!

We could use the cyclic property of traces to make the function appear in the form $$ \operatorname{trace} \left([x_1, x_2, \ldots, x_{d-1}] C\right) $$

where $C$ is a $(d-1) \times 1$ matrix (i.e. a column vector with $d-1$ entries). In terms of the variables above, $C = BA^T \mathbf{1}_{n\times 1}.$ The product inside the trace is now just a $1 \times 1$ matrix, i.e. we have

$$f(x_1, x_2, \ldots, x_{d-1}) = [x_1,x_2,\ldots, x_{d-1}] C$$

so $\nabla f(x) = C.$

**Bumbble Comm** · Accepted Answer

If you know about the Frechet derivative and its relation to the gradient of a function, this question becomes almost a triviality. For this question, the key result is the following:

Theorem:

If $V$ and $W$ are finite-dimensional real vector spaces, and $f: V \to W$ is a linear transformation, then for every $x \in V$, $f$ is differentiable at $x$ and its derivative at $x$ (which is a linear map from $V$ into $W$) is given by \begin{equation} df_x(\cdot) = f(\cdot) \end{equation}

What this theorem says is that a linear transformation is its own best linear approximation (i.e it is its own derivative). Now, note that the function $f: \mathbb{R}^{d-1} \to \mathbb{R}$ you have defined is linear (because trace is linear, and matrix multiplication is also linear). So, for every $x = (x_1, \dots, x_{d-1}) \in \mathbb{R}^{d-1}$, by the theorem above, we have \begin{equation} df_x(\cdot) = f(\cdot) \end{equation} In general, the gradient of $f$ at $x$ is the matrix of $df_x$ relative to the standard basis. So, \begin{align} \nabla f(x) &= \text{matrix of $df_x$ wrt standard basis} \\ &= \text{matrix of $f$ wrt standard basis} \\ &= \begin{bmatrix} f(e_1), & \dots &, f(e_{d-1}) \end{bmatrix} \end{align}

In other words, $\nabla f(x)$ is the $1 \times (d-1)$ matrix whose $i^{th}$ entry is $f(e_i)$; I'll leave it to you to compute what $f(e_i)$ is.

As you mentioned, expanding everything out in terms of components and taking partial derivatives is a nightmare in this case. This is why, if you're not already familiar with the Frechet derivative, I highly recommend you learn more about it... it simplifies computations like this immensely. As a reference, I would highly recommend Loomis and Sternberg's book Advanced Calculus (section 3.6 in particular) to learn about this.

Gradient of function involving trace of matrix product

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in VECTOR-ANALYSIS

Related Questions in SCALAR-FIELDS

Trending Questions

Popular # Hahtags

Popular Questions