An easy matrix calculus problem

234 Views Asked by At

Let $A$ and $B$ be two $n\times n$ matrices, and $x$ be a $n\times 1$ column vector. What is the derivative of $f=(x^TAx)Bx$ with respect to $x$?

I try to calculate it like

$\frac{\partial f}{\partial x}=\frac{\partial (x^TAx)}{\partial x}Bx+(x^TAx)\frac{\partial (Bx)}{\partial x}=[x^T(A+A^T)]Bx+(x^TAx)B$

= a number plus a matrix.

What's wrong with it?

Thanks for your help.

3

There are 3 best solutions below

0
On

Partition $B$ rowwisely: $$B = \begin{bmatrix} b_1^T \\ b_2^T \\ \cdots \\ b_n^T \end{bmatrix} \in \mathbb{R}^{n \times n}. $$ Denote $x^TAx$ by $g(x)$, then $f(x)$ can be written as (notice that $f$ is a vector-valued function, which maps $\mathbb{R}^n$ into $\mathbb{R}^n$, so that the correct size for the derivative of $f$ should be an $n \times n$ matrix.): $$f(x) = g(x)Bx = \begin{bmatrix} g(x) b_1^Tx \\ g(x) b_2^Tx \\ \cdots \\ g(x) b_n^Tx \end{bmatrix} := \begin{bmatrix} f_1(x) \\ f_2(x) \\ \cdots \\ f_n(x) \end{bmatrix}. $$ It then follows that \begin{align*} \frac{\partial f(x)}{\partial x} = \begin{bmatrix} \frac{\partial f_1(x)}{\partial x^T} \\ \frac{\partial f_2(x)}{\partial x^T} \\ \cdots \\ \frac{\partial f_n(x)}{\partial x^T} \end{bmatrix} \in \mathbb{R}^{n \times n}, \tag{1} \end{align*} where for each $i \in \{1, 2, \ldots, n\}$: \begin{align*} & \frac{\partial f_i(x)}{\partial x} \\ = &\frac{\partial [g(x)\cdot b_i^T x]}{\partial x} \\ = & \frac{x^T A x}{\partial x} b_i^Tx + x^T A x \frac{\partial b_i^T x}{\partial x} \\ = & (b_i^T x)(A + A^T) x + (x^T A x)b_i \end{align*} Substitute these expressions back to $(1)$ (after taking transpose), it can be seen that: \begin{align*} \frac{\partial f(x)}{\partial x} = &\begin{bmatrix} x^T(A + A^T)(b_1^Tx) + b_1^T(x^T A x) \\ x^T(A + A^T)(b_2^Tx) + b_2^T(x^T A x) \\ \cdots \cdots \\ x^T(A + A^T)(b_n^Tx) + b_n^T(x^T A x) \end{bmatrix} \\ = & Bxx^T(A + A^T) + Bx^TAx. \end{align*}

0
On

I think direct expansion is the most straightforward here. Look at $f(x+h)-f(x)$ and collect the terms that are linear in $h$.

Since $f(x+h) = (x^T+h^T) A(x+h) B (x+h)$, we get that $Df(x)h = x^T AxBh + x^T A h Bx + h^T A x B x= (x^T AxB+Bx x^T A+Bx x^T A^T) h$.

Aside: One needs to be a little careful with the chain rule. If $f(x) = a(x) \cdot b(x)$, then $Df(x)(h) = Da(x)(h)\cdot b(x) + a(x) \cdot Db(x)(h)$. This is where the confusion in the question arises. Since $a(x),b(x)$ are scalars, we can write $Df(x)(h) = b(x) \cdot Da(x)(h) + a(x) \cdot Db(x)(h)$, and hence $Df(x) = b(x) \cdot Da(x) + a(x) \cdot Db(x)$. In general, however, one must be very careful with the '$h$' bookkeeping.

0
On

$$\eqalign{ S &= \frac{1}{2}(A+A^T) \cr\cr \alpha &= x^TAx \,= x^TSx \cr d\alpha &= 2\,x^TS\,dx \cr\cr f &= \alpha\,Bx \cr df &= \alpha\,B\,dx + Bx\,d\alpha \cr &= \alpha\,B\,dx + Bx\,(2x^TSdx) \cr &= (\alpha\,B + 2\,Bxx^TS)\,dx \cr\cr \frac{\partial f}{\partial x} &= \alpha\,B + 2\,Bxx^TS \cr &= x^TAx\,B + Bxx^T(A+A^T) \cr }$$