Proof that there's only one function that satisifes the properties of a trace.

109 Views Asked by At

I've been reading Mathematics for Machine Learning, by Deisenroth, Aldo Faisal and Cheng Soon Ong, and it contains the following lines about the trace of square matrices that I need help with:

The trace satisfies the following properties:
$\cdot$ tr$(A+B)$ = tr$(A)$ + tr$(B)$
$\cdot$ tr$(\alpha A)$ = $\alpha$tr$(A)$
$\cdot$ tr$(I_n)$=$n$ $\cdot$ tr$(AB)$ = tr$(BA)$

It can be shown that only one function satisfies these four properties together - the trace (Gohberg et al., 2012)

Now I can see how the 4 properties hold, based on the trace being the sum of diagonal entries of a square matrix, but the trouble I'm having is in the last line which states that it can be shown that no other function satisfies these properties. I'm unable to find a proof of the same and was hoping someone could help me with it here. I'm also not quite clear on what work the authors are referring to as Gohberg et.al, 2012, so some info or links on that would be really helpful too.

Thank you!

2

There are 2 best solutions below

3
On BEST ANSWER

Let $\Psi\colon\Bbb R^{n\times n}\longrightarrow\Bbb R$ be a map such that those four properties hold. The first two properties tell you that $\Psi$ is a linear map. The fourth property tells then you that $\Psi(M)=0$ when $M$ can be written as a linear combination of matrices of the form $AB-BA$. It is not hard to prove that the vector space $V$ spanned by the matrices of that form is the space of those matrices whose trace is $0$. But $\Bbb R^{n\times n}=V\bigoplus\Bbb R\operatorname{Id}$. In other words, any matrix $M\in\Bbb R^{n\times n}$ can be written in one and only one way as a sum $M_0+\lambda_M\operatorname{Id}$, with $\operatorname{tr}(M_0)=0$ and $\lambda_M\in\Bbb R$. But then$$\Psi(M)=\Psi(M_0)+\lambda_M\Psi(\operatorname{Id})=n\lambda_M=\operatorname{tr}(M).$$

Besides, my guess, is that reference to Gohberg et.al, 2012 means the textbook Traces and Determinants of Linear Operators, by Israel Gohberg, Seymour Goldberg, and Nahum Krupnik, published by Birkhäuser. Anyway, take a look at the bibliography of Mathematics for Machine Learning.

1
On

Let $e_{i,j}$ be the $n$ by $n$ matrix with $1$ at the $(i,j)$ entry and $0$ elsewhere, and similarly let $e_i \in \mathbb{R}^n$ have $1$ in the $i$-th component and $0$ elsewhere. Note that $e_{i,j} = e_i e_j^T,$ and so $\operatorname{tr}(e_{i,j}) = \operatorname{tr}(e_j^T e_i) = \delta_{i,j}$ (that is, $1$ if $i=j$, otherwise $0$). Writing a general $n$ by $n$ matrix as a linear combination of these $e_{i,j},$ we see that any function satisfying the four properties must map a matrix to the sum of its diagonal elements.