The matrix identity $\nabla_A \text{tr}AB = B^T$ when A is symmetric

133 Views Asked by At

Suppose $A,B \in \mathbb{R}^{n \times n}$, $f: \mathbb{R}^{n \times n} \to \mathbb{R}$. Define $\nabla_A f(A)) \in \mathbb{R}^{n \times n}$, where $(\nabla_A f(A))_{ij} = \frac{\partial{f(A)}}{\partial{A_{ij}}} $. Consider the equation $\nabla_A \text{tr}AB = B^T$, according to this note. This can be proved as $$ \begin{align} f(A) = \text{tr}AB &=\sum_{i=1}^nA_{1i}B_{i1} +\sum_{i=1}^nA_{2i}B_{i2} + \cdots + \sum_{i=1}^nA_{ni}B_{in} \\ &= \sum_{j=1}^{n}\sum_{i=1}^{n} A_{ji}B_{ij}\\ \end{align} $$

If we assume A is symmetric matrix, I'm confused whether the following is right because it contradicts with $\nabla_A \text{tr}AB = B^T$.

$\nabla_A \text{tr}AB = B + B^T$ because $A_{ij} = A_{ji}$ and $\frac{\partial{f(A)}}{\partial{A_{ij}}} = B_{ij} + B_{ji}$

2

There are 2 best solutions below

0
On BEST ANSWER

The function $$ f : \mathbb{R}^{n\times n} \rightarrow \mathbb{R}, \qquad A \mapsto f(A) , \quad A \in \mathbb{R}^{n \times n} $$ can regarded as multivariable function of $n^2$ coordinates $A_{ij}$. Partial derivative $\frac{\partial f(A)}{\partial A_{ij}}$ defined as $$ \frac{\partial f(A)}{\partial A_{ij}} := \lim_{h \rightarrow 0}\frac{f(A_{11},\cdots A_{ij}+h, \cdots ,A_{nn}) - f(A_{11},\cdots, A_{ij}, \cdots ,A_{nn})}{h}. $$ When you take the derivative of the function $f(A) = \sum A_{ij}B_{ji}$ (regard $A_{ij}$'s as the variables) and evaluate the derivatives for a point $A$ s.t $A_{ij}=A_{ji}$, you'll have (for example $2\times 2$ matrix),

$\frac{\partial f(A)}{\partial A_{12}} = $

\begin{align} &\lim_{h \rightarrow 0}\frac{(A_{11}B_{11}+ (A_{12}+h)B_{21} + A_{21}B_{12}+ A_{22}B_{22})-(A_{11}B_{11}+ A_{12}B_{21} + A_{21}B_{12}+ A_{22}B_{22})}{h} \\ &= \lim_{h \rightarrow 0}\frac{(A_{11}B_{11}+ (A_{12}+h)B_{21} + \color{red}{A_{12}}B_{12}+ A_{22}B_{22})-(A_{11}B_{11}+ A_{12}B_{21} + \color{red}{A_{12}}B_{12}+ A_{22}B_{22})}{h}\\ &=B_{21} \end{align}

whereas $\frac{\partial f(A)}{\partial A_{21}} = B_{12}$. We see that the condition of being symmetric $A_{ij}=A_{ji}$ will not affect the evaluation of the limit. Your confusion before, that $$ \frac{\partial f(A)}{\partial A_{ij}} = \frac{\partial}{\partial A_{ij}} (\sum_{i,j=1}^{n} A_{ij}B_{ji}) = \frac{\partial}{\partial A_{ij}}(\cdots + A_{ij}B_{ji} + \cdots + A_{ji}B_{ji} + \cdots) = \frac{\partial}{\partial A_{ij}} (\cdots + A_{ij}(B_{ij}+B_{ji}) + \cdots) = B_{ij}+B_{ji} $$ is because you treat $A_{ji}$ as a varying variable by the substitution $A_{ji} = A_{ij}$.

0
On

You've subtly confused two different functions. To see what's going on, it becomes convenient to define an "instantiation" map (I will work with $2\times 2$ matrices for simplicity): $$A(a_{11},a_{12},a_{21},a_{22}) = \begin{pmatrix}a_{11} & a_{12} \\ a_{21} & a_{22}\end{pmatrix},$$ which simply takes the numerical entries of the matrix and outputs them as a structured matrix; it instantiates the matrix $A$. With respect to this map, the derivative is unambiguously $$\frac{\partial A}{\partial a_{ij}} = E_{ij},$$ where $E_{ij}$ is the matrix with zeros everywhere except the $ij$th entry, which is $1$. This is really what is meant by the derivative with respect to the $ij$th component. It is always with respect to this instantiation map, and the sole purpose of the derivative is to pick out the $ij$th entry essentially.

Now of course, real matrices can be more complicated. For example, we could have the matrix $$B = \begin{pmatrix}b & c \\ c & b\end{pmatrix}.$$ This matrix $B$ is really just our previous map $A$ evaluated at the values $(b,c,c,b)$.

Notice that the situation here is a bit different though. For example, it makes no sense to differentiate $B$ with respect to its entry $b_{12}$. Why? Because technically, $b_{12}$ is not a free variable. What you have instead is the variable $c$, upon which $b_{12}$ (and $b_{21}$) depends. When you actually differentiate with respect to $c$ however, you need to use the chain rule to get $$\frac{\partial B}{\partial c} = \sum_{i,j}\frac{\partial A}{\partial a_{ij}}\frac{\partial a_{ij}}{\partial c} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.$$ Of course, what people commonly write for the above derivative, for lack of a better general notation, is just $\frac{\partial B}{\partial b_{ij}}$. But note that this is very different from how we used $\frac{\partial A}{\partial a_{ij}}$ previously. The latter treats each entry as a free independent variable, while the former has the hidden meaning of differentiating with respect to an underlying parameter upon which many different entries can depend.

So when you differentiated your symmetric matrix, you technically didn't differentiate with respect to the $ij$th entry, but rather the common variable upon which the $ij$th and $ji$th entry depends. This is different from the original intention of what is meant by $\frac{\partial{A}}{\partial a_{ij}}$, which treats the $ij$th entry as independent from all others.