I've been reading Mathematics for Machine Learning, by Deisenroth, Aldo Faisal and Cheng Soon Ong, and it contains the following lines about the trace of square matrices that I need help with:
The trace satisfies the following properties:
$\cdot$ tr$(A+B)$ = tr$(A)$ + tr$(B)$
$\cdot$ tr$(\alpha A)$ = $\alpha$tr$(A)$
$\cdot$ tr$(I_n)$=$n$ $\cdot$ tr$(AB)$ = tr$(BA)$
It can be shown that only one function satisfies these four properties together - the trace (Gohberg et al., 2012)
Now I can see how the 4 properties hold, based on the trace being the sum of diagonal entries of a square matrix, but the trouble I'm having is in the last line which states that it can be shown that no other function satisfies these properties. I'm unable to find a proof of the same and was hoping someone could help me with it here. I'm also not quite clear on what work the authors are referring to as Gohberg et.al, 2012, so some info or links on that would be really helpful too.
Thank you!
Let $\Psi\colon\Bbb R^{n\times n}\longrightarrow\Bbb R$ be a map such that those four properties hold. The first two properties tell you that $\Psi$ is a linear map. The fourth property tells then you that $\Psi(M)=0$ when $M$ can be written as a linear combination of matrices of the form $AB-BA$. It is not hard to prove that the vector space $V$ spanned by the matrices of that form is the space of those matrices whose trace is $0$. But $\Bbb R^{n\times n}=V\bigoplus\Bbb R\operatorname{Id}$. In other words, any matrix $M\in\Bbb R^{n\times n}$ can be written in one and only one way as a sum $M_0+\lambda_M\operatorname{Id}$, with $\operatorname{tr}(M_0)=0$ and $\lambda_M\in\Bbb R$. But then$$\Psi(M)=\Psi(M_0)+\lambda_M\Psi(\operatorname{Id})=n\lambda_M=\operatorname{tr}(M).$$
Besides, my guess, is that reference to Gohberg et.al, 2012 means the textbook Traces and Determinants of Linear Operators, by Israel Gohberg, Seymour Goldberg, and Nahum Krupnik, published by Birkhäuser. Anyway, take a look at the bibliography of Mathematics for Machine Learning.