Scalar-by-matrix Derivative of Quadratic Product

Question

Scalar-by-matrix Derivative of Quadratic Product

526 Views Asked by Bumbble Comm At 29 Mar 2026 - 11:59

I'd like to know $\frac{\partial f(\mathbf{U})}{\partial \mathbf{U}}$, i.e., the 'by-matrix derivative' of the following scalar function $f(\mathbf{U})$ w.r.t. $\mathbf{U}$.

$$f(\mathbf{U}) = \vec{x}^T \mathbf{U} \mathbf{D} \mathbf{U}^T \vec{x}\;,$$

where $\vec{x} \in \mathbb{R}^n$ is a column vector, $\mathbf{U} \in \mathbb{R}^{n \times n}$ is a unitary matrix ($\mathbf{U}^T\mathbf{U} = \mathbf{I}_n$), $\mathbf{D} \in \{0,1\}^{n \times n}$ is a diagonal matrix ($\mathbf{D} \neq \mathbf{I}_n$).

I found in The Matrix Cookbook, see eq. (82), the derivative $\frac{\partial g(\mathbf{U})}{\partial \mathbf{U}}$ of

$$g(\mathbf{U}) = \vec{x}^T \mathbf{U}^T \mathbf{D} \mathbf{U} \vec{x}\;.$$

Please note the difference in the transposition of $\mathbf{U}$ for $f(\mathbf{U})$ and $g(\mathbf{U})$.

From the earlier question "Derivative of inverse quadratic function of a matrix" I learned that $\frac{\partial f(\mathbf{U})}{\partial u_{ij}} = \vec{x}^T (\mathbf{U} \mathbf{D} \mathbf{J}^{ij} + \mathbf{J}^{ji} \mathbf{D} \mathbf{U}^T) \vec{x}$. Unfortunately, I can't figure out how to combine it to a 'closed matrix notation'. I end up with $\frac{\partial f(\mathbf{U})}{\partial u_{ij}} = \mathbf{D}\mathbf{U}^T\vec{x}\vec{x}^T \vert_{ij} + \vec{x}\vec{x}^T\mathbf{U}\mathbf{D}\vert_{ji}$.

Any help is appreciated!

Original Q&A

There are 2 best solutions below

Bumbble Comm On 07 Apr 2015 - 5:01

Let $W = x^TUD$ and rewrite the function using the Frobenius product as $$ \eqalign { f &= \|W\|_F^2 \cr &= W:W \cr } $$ For which the differential is $$ \eqalign { df &= 2\,W:dW \cr &= 2\,(x^TUD):d\,(x^TUD) \cr &= 2\,(x^TUD):(x^TdU\,D) \cr &= 2\,(xx^TUDD^T):dU \cr &= 2\,(xx^TUD):dU \cr &= \bigg(\frac {\partial f} {\partial U}\bigg):dU \cr } $$ From the last 2 lines, the derivative must be $$ \eqalign { \frac {\partial f} {\partial U} &= 2\,xx^TUD \cr } $$ NB: In several of the steps above, I made use of the fact that $D^T=D=D^2$

**Bumbble Comm** · Accepted Answer

A straightforward way is to compute $f(U+H) = x^T (U+H) D (U+H)^T x = f(U)+x^T HDU^Tx + x^TUDH^T x + f(H)$, and note that $|f(H)| \le K \|H\|^2$ for some $K$.

It follows that the derivative is given by $Df(U)(H) = x^T HDU^Tx + x^TUDH^T x$. Since $f$ is real valued and $D^T=D$, we can write $Df(U)(H) = 2x^TUDH^T x$.

We have ${ \partial f(U) \over \partial U_{ij} } = Df(U)(E_{ij}) = 2x^TUD E_{ji} x = 2x^TUD e_j e_i^T x$.

Comments:

This is the definition (or one of a few equivalents) of differentiability:

A function $f:V \to W$ where $V,W$ are Banach spaces is said to be (Fréchet) differentiable at $x$ iff there exists a continuous linear operator $A:V \to W$ such that for all $\epsilon>0$, there exists some $\delta >0$ such that if $\|h\| <\delta$, then $\|f(x+h)-f(x) - A(h) \| \le \epsilon \|h\|$. The operator $A$ is called the derivative of $f$ at $x$.

A few points:

(1) In our case, $V=\mathbb{R}^{n \times n}$, $W = \mathbb{R}$.

(2) The derivative operator is often denoted $Df(x)$. Note that $Df(x):V \to W$. So, given $h \in V$, we write $Df(x)(h) \in W$ to denote the operator applied to $h$ (perhaps think of $h$ as a perturbation).

(3) The idea of differentiability is to quantify the difference $f(x+h)-f(x)$ in some way. Some folks write $f(x+h) =f(x)+A(h) + o(h)$.

(4) The linear operator $A$ cannot always be expressed as a matrix multiplication. For example, take the trace $\operatorname{tr}: \mathbb{R}^{n \times n} \to \mathbb{R}$. This is a differentiable function, but you cannot write down a single matrix multiplication that represents the derivative (in fact, we have $D \operatorname{tr}(x)(h) = \operatorname{tr}(h)$). This is a confusing point for many folks as we typically (in the $\mathbb{R}^n \to \mathbb{R}^m$ case) write the derivative as a matrix multiplication. The derivative of the function $f$ above cannot be written as a simple matrix multiplication.

To answer the questions in your comment below:

To compute the derivative of $f$, we compute $f(U+H)-f(U)$ and look for linear and higher order terms. We have $f(U+H)-f(U) = x^T HDU^Tx + x^TUDH^T x + f(H)$, the term $H \mapsto x^T HDU^Tx + x^TUDH^T x$ is linear (and continuous) in $H$, and the term $f(H)$ can be bounded by $K\|H\|^2$. Hence from the definition, we see that $f$ is differentiable at $U$, and the derivative applied to the direction $H$ is given by $Df(U)(H) = x^T HDU^Tx + x^TUDH^T x$. The derivative is a function $Df(U): \mathbb{R}^{n\times n} \to \mathbb{R}$, but cannot be written as a simply matrix multiplication of some matrix and $H$.

The expression $Df(U)(H) = x^T HDU^Tx + x^TUDH^T x$ completely defines the derivative of $f$.

Now for a slight backtrack: While what I wrote above is correct, there is a sense in which you can write down a single object that represents the derivative.

From above, we can write (using properties of the trace operator) $Df(U)(H) = 2x^T HDU^Tx = \operatorname{tr} (2x^T HDU^Tx) = \operatorname{tr}( 2 DU^Txx^T H ) = \operatorname{tr}( (2 xx^T U D )^T H )$.

If one uses the Frobenius norm and the corresponding inner product, we see that we can write $Df(U)(H) = \langle 2 xx^T U D, H \rangle $, so we can write the gradient $\nabla f(U) = 2 xx^T U D$.

However, you must realise that this is not just a simple matrix multiplication, and that the trace is intimately involved.

Scalar-by-matrix Derivative of Quadratic Product

There are 2 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in LINEAR-ALGEBRA

Related Questions in ANALYSIS

Related Questions in MATRICES

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions