Matrix Derivative Product Rule $\frac{d}{dx} \langle x,y\rangle Ax $

91 Views Asked by At

Let $A\in\mathbb{R^{n\times n}}$,$x,y\in\mathbb{R^{n}}$. What is the derivative of the following expression?

$$ \frac{d}{dx} \langle x,y\rangle Ax $$

What I tried:

Applying the product rule doesn't work because the first term (see next equation) is inconsistent dimensionally, or I am applying it in the wrong way. Let $f(x) = \langle x,y \rangle, g(x) = Ax$, then

$$ \frac{d}{dx} f(x)g(x) = \frac{df(x)}{dx} g(x) + f(x)\frac{dg(x)}{dx} = xAx + \langle x,y \rangle A$$.

Here the first term in the last equation doesn't make sense dimensionally, any help or suggestion would be appreciated.

4

There are 4 best solutions below

3
On BEST ANSWER

Avoiding coordinates one may alternatively compute $d_pf(v)$, where $f(x)=\langle x,y\rangle Ax$. For that purpose take a differential path $c$, defined on an interval containing $0$, satisfying $c(0)=p$ and $\dot c(0)=v$, where $\dot c(t)$ is a shorthand for $\frac{d}{dt}c(t)$. Now compute $$\begin{align} d_pf(v)&=\frac{d}{dt}\big|_{t=0}f(c(t))\\ &=\frac{d}{dt}\big|_{t=0}\bigl(\langle c(t),y)\rangle Ac(t)\bigr)\\ &=\bigl(\langle \dot c(t),y\rangle A\cdot c(t)+\langle c(t),y\rangle A\cdot\dot c(t)\bigr)\big|_{t=0}\\ &=\langle v,y\rangle Ap+\langle p,y\rangle Av. \end{align}$$

Edit: If you insist to write the differential without argument (but should one?), proceed as follows: $$\begin{align} d_pf(v)&=\langle v,y\rangle Ap+\langle p,y\rangle Av\\ &=A\bigl(\langle v,y\rangle p+\langle p,y\rangle v \bigr). \end{align}$$ Observe that $$\langle v,y\rangle p= p\langle y,v\rangle=py^Tv, $$ hence $$\begin{align} d_pf(v)&=A\bigl(py^Tv+\langle p,y\rangle v \bigr)\\ &=A\bigl(py^T+\langle p,y\rangle I\bigr)v \end{align} $$ and finally $$d_pf=A\bigl(py^T+\langle p,y\rangle I \bigr)$$

But, to be honest, I like the expression $d_pf(v)$ more for symmetry reasons. And it works for any inner product.

0
On

In Einstein notation and the standard basis for which $\langle x,\,y\rangle=x_ky_k$, you're differentiating a vector of $i$th component $x_ky_kA_{il}x_l$ with respect to a vector of $j$th component $x_j$, giving a matrix of $ij$ entry$$\delta_{jk}y_kA_{il}x_l+x_ky_kA_{il}\delta_{jl}=y_jA_{il}x_l+x_ky_kA_{ij}=(Axy^T+\langle x,\,y\rangle A)_{ij}.$$In other words, the derivative is $A(xy^T+\langle x,\,y\rangle I_n)$, with $I_n$ the $n\times n$ identity matrix.

3
On

Let $$ f(x)=\langle x,y\rangle Ax. $$ Then \begin{eqnarray} \frac{d}{dx}f(x)(u)&=&\lim_{t\to0}\frac{f(x+tu)-f(x)}{t}\\ &=&\lim_{t\to0}\frac{\langle x+tu,y\rangle A(x+tu)-\langle x,y\rangle Ax}{t}\\ &=&\lim_{t\to0}\left(\langle u,y\rangle Ax+\langle x,y\rangle Au+t\langle u,y\rangle Au\right)\\ &=&\langle u,y\rangle Ax+\langle x,y\rangle Au. \end{eqnarray}

0
On

Define the vector-valued function $$\eqalign{ f &= (y^Tx)(Ax) \\ }$$ and calculate its differential and gradient $$\eqalign{ df &= (Ax)(y^Tdx) + (y^Tx)(A\,dx) \\ &= \Big(Axy^T + (y^Tx)A\Big)\,dx \\ &= A\Big(xy^T + (y^Tx)I\Big)\,dx \\ \frac{\partial f}{\partial x} &= A\Big(xy^T + (y^Tx)I\Big) \\ }$$ which is another way to derive JG's result while avoiding index notation.

Your error was to assume that the product rule is valid for the gradient of a vector-valued function like $f$, but it isn't.

The problem is that a gradient operation changes the tensorial character of any term to which it is applied (i.e. it turns vectors into matrices), and matrix multiplication is inherently non-commutative. Index notation was developed to handle precisely such calculations, but with standard matrix notation you must proceed very cautiously.