Calculating $\frac{\mathrm{d}}{\mathrm{d}\boldsymbol{X}} \mathbb{u}^{T}\boldsymbol{X}\mathbb{u}$ with symmetric $\boldsymbol{X}$

83 Views Asked by At

We want to calculate $\frac{\mathrm{d}}{\mathrm{d}\boldsymbol{X}} \mathbb{u}^{T}\boldsymbol{X}\mathbb{u}$ with $\boldsymbol{X}$ being a symmetric matrix.

Let $\circ$ denote the Hadamard product for vectors. \begin{align*} \frac{\mathrm{d}}{\mathrm{d}\boldsymbol{X}} \mathbb{u}^{T}\boldsymbol{X}\mathbb{u} = &\frac{\partial}{\partial X_{i j}} \sum_{p, q} u_{p} X_{p q} u_{q}=\sum_{p, q} u_{p} \frac{\partial X_{p q}}{\partial X_{i j}} u_{q} \end{align*}

Now, let's analyze $\frac{\partial X_{p q}}{\partial X_{i j}}$. This becomes $1$ for both $p=i \land q=j$ and for $q=i \land p=j$ since $\mathbb{\boldsymbol{X}}=\mathbb{\boldsymbol{X}}^{\top} \iff X_{i j} = X_{j i}$, thus we will get 2 different Kronecker deltas for each case $\delta_{p i}\delta_{q j}$ and $\delta_{q i}\delta_{p j}$ respectively.

This will yield us for $i\neq j$: \begin{align*} &\sum_{p, q} u_{p}\frac{\partial X_{p q}}{\partial X_{i j}} u_{q} = \sum_{p, q} u_{p}u_{q} \delta_{p i}\delta_{q j} + u_{p}u_{q} \delta_{q i}\delta_{p j} = 2\mathbb{u}\mathbb{u}^{\top} \end{align*}

For the diagonal terms we will just get $\mathbb{u}$'s elements squared since for $i=j$: \begin{align*} \sum_{p, q} u_{p}\frac{\partial X_{p q}}{\partial X_{i i}} u_{q} = \sum_{p, q} u_{p}u_{q} \delta_{p i}\delta_{q i} = u_i u_i = \mathbb{u} \circ \mathbb{u} \end{align*}

Thus combining this we get the final result by subtracting the squared elements from the diagonal once: \begin{align*} &\frac{\mathrm{d}}{\mathrm{d}\boldsymbol{X}} \mathbb{u}^{T}\boldsymbol{X}\mathbb{u} = 2\mathbb{u}\mathbb{u}^\top - \operatorname{diag}(\mathbb{u}\circ\mathbb{u}) \end{align*}

I know this contradicts the known result $\frac{\partial \boldsymbol{a}^{\top} \boldsymbol{X} \boldsymbol{b}}{\partial \boldsymbol{X}}=\boldsymbol{a b}^{\top}$ so what did I miss?

1

There are 1 best solutions below

3
On

That known result is for non-symmetric matrices. Basically, the way you defined your derivative with the symmetric matrix, to have the derivatives repeated in the lower and upper triangular part of the matrix, the correct answer would be $\frac{\partial \boldsymbol{a}^{\top} \boldsymbol{X} \boldsymbol{b}}{\partial \boldsymbol{X}}=\boldsymbol{a b}^{\top} + \boldsymbol{b a}^{\top} - \operatorname{diag}(\mathbb{a}\circ\mathbb{b})$

By repeated derivatives, I mean that if $i\neq j$ the derivative $\frac{\partial \boldsymbol{a}^{\top} \boldsymbol{X} \boldsymbol{b}}{\partial \boldsymbol{X}_{ij}} = \frac{\partial \boldsymbol{a}^{\top} \boldsymbol{X} \boldsymbol{b}}{\partial \boldsymbol{X}_{ji}}$ appears twice, while if $i=j$ the derivative $\frac{\partial \boldsymbol{a}^{\top} \boldsymbol{X} \boldsymbol{b}}{\partial \boldsymbol{X}_{ii}}$ appears only once.