What is the motivation behind the standard inner product defined on matrices?

Question

What is the motivation behind the standard inner product defined on matrices?

320 Views Asked by Bumbble Comm At 28 Mar 2026 - 11:33

We define the standard inner product on matrices in $\mathbb R^{m\times n}$ by

$$ \langle A\mid B \,\rangle = \mbox{tr} \left(A\, B^{\mkern 2mu\mathrm t} \right)$$

What is the motivation behind defining it in such a way? Does it have any similarity with the inner product defined over $\mathbb R^n$?
What is the geometrical meaning of this inner product?

I am new in this topic and I want to develop an intuition that this should fit properly and in correspondence with other standard inner product so that I have less to memorise.

Can someone please help me a bit on this topic?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 19 Jan 2020 - 6:12

Your questions are a bit weird. All inner products have the same geometric meaning, namely, each inner product defines a measure of orthogonality. I don't know what else you are expecting.

In fact, on a finite-dimensional vector space over $\mathbb R$, there is essentially only one inner product, namely, the Euclidean inner product. Let $V$ be an $N$-dimensional inner product space equipped with an inner product $\langle\cdot,\cdot\rangle$. Then there exists an orthonormal basis $\{v_1,v_2,\ldots,v_N\}$ of $V$ with respect to this inner product. Let $\{e_1,e_2,\ldots,e_N\}$ denotes the standard basis of $\mathbb R^N$ and $(\cdot,\cdot)$ denotes the Euclidean inner product $(x,y)=\sum_{i=1}^Nx_iy_i$. Then the linear map $L:V\to\mathbb R^N$ defined by $Lv_i=e_i$ is an isometry, with $\langle u,v\rangle=(Lu,Lv)$. So, the inner product $\langle\cdot,\cdot\rangle$ on $V$ is just the Euclidean inner product in disguise. Different inner products just use different isometries $L$.

E.g. if $V=\mathbb R^n$ and $P$ is a symmetric positive definite matrix, then $\langle x,y\rangle:=x^TPy$ defines an inner product on $V$, but that is actually equal to the Euclidean inner product $(Lx,Ly)$ where $L$ is the linear map $v\mapsto P^{1/2}v$.

In your case, the isometry $L$ is just the vectorization operator, commonly denoted by "$\operatorname{vec}$", such that $L(A)=\operatorname{vec}(A)$ is the $mn\times 1$ vector obtained by stacking the columns of $A$ on top of one another. In other words, if you reshape $A$ and $B$ as two column vectors $\mathbf a,\mathbf b\in\mathbb R^{mn}$, then $\langle A,B\rangle$ is simply the Euclidean inner product $(\mathbf a,\mathbf b)$.

If you want, you may interpret $\langle A,B\rangle$ as some sort of combined measurement of orthogonality, but that is probably only muddying the waters:

Let $\{w_1,\ldots,w_n\}$ be any orthonormal basis of $\mathbb R^n$. Then $$ \langle A,B\rangle=\operatorname{tr}(B^TA)=\operatorname{tr}(W^TB^TAW)=\sum_k w_kB^TAw_k=\sum_k(Aw_k,Bw_k). $$ Therefore $\langle A,B\rangle$ measures the extent of orthogonality between $Aw_k$ and $Bw_k$ for each $k$ and produce a combined measurement.
Alternatively, when $A=ux^T$ and $B=vy^T$ are rank-one matrices, $$ \langle A,B\rangle=(x^Ty)(v^Tu)=(x,y)(u,v). $$ Therefore, $\langle A,B\rangle$ measures not only the extent of orthogonality between the vectors $u$ and $v$ on $\mathbb R^m$, but also the extent of orthogonality between the linear functionals $w\mapsto x^Tw$ and $w\mapsto y^Tw$ on $(\mathbb R^n)^\ast$ and these two measurements are multiplied together. In general, $A$ and $B$ can be written as sums of rank-one matrices, and the measurements between the rank-one components are sumed up to produce a final result.

Bumbble Comm On 19 Jan 2020 - 6:25

Note that $\operatorname{tr}AB^t=\sum_{ij}A_{ij}B_{ij}$ or, if we make the vector space's dimension $mn$ so two indices become one, $A_kB_k$, the usual "dot product" on $\Bbb R^{mn}$, with its usual geometric interpretation. We just have to identify $\Bbb R^{m\times n}$ with $\Bbb R^{mn}$.

**Bumbble Comm** · Accepted Answer

Note that for two diagonal matrices $A=diag(\lambda_1,\ldots,\lambda_n)$ and $B=diag(\mu_1,\ldots,\mu_n)$, you get that $tr A^T B = \sum\lambda_i\mu_i = \left<v,w\right>$ where $v=(v_1,\ldots,v_n)$ and $w=(w_1,\ldots,w_n)$.

Choose a basis $B$ and consider the set $V_B$ of all matrices which are diagonal with respect to this basis. It is easy to see that $V_B$ is a subspace (and even a subalgebra) which is isometric to $\mathbb{R}^n$ by identifying the vector $v$ with the matrix $diag(v)$ which has the elements of $v$ on its diagonal. The inner product you described (which is called the Hilbert-Schmidt inner product, by the way) is then identified with the usual inner product on $\mathbb{R}^n$.

This works for any choice of basis. Recalling that two matrix commute if and only if there is a base in which both are diagonal simultaneously, we can say that this inner product is a generalization to the non-commuting case. Or rather, that the inner product on $\mathbb{R}^n$ is the Hilbert-Schmidt product restricted to the commuting case.

This still leaves open the question, why is one of the matrices transposed? We could just define the norm to be $tr AB$. Off the top of my head I can come up with three justifications for this choice:

The norm should remain the same if we apply the same (orthogonal) basis transformation to both matrices. If $O$ is an orthogonal matrix from the basis $B$ to the basis $B'$, then we already know that in $\mathbb{R}^n$ it holds that $\left<Ov,Ow\right>=\left<v,w\right>$ for any two vectors $v,w$. In the matrix space this manifests in the fact that the map $M\mapsto OMO^t$ is an isometry from $V_B$ to $V_{B'}$. In general, we still want the property $\left<OMO^t,ONO^t\right>=\left<M,N\right>$ for any two matrices $M,N$. The Hilbert-Schmidt norm achieves that.
In the complex case, the Hilbert-Schmidt norm becomes $\left<A,B\right> = TrB^* A$ where $*$ means the conjugate transpose (the reason that I put it on $A$ rather of $B$ is a matter of convention, the real case is also usually defined as $\left<A,B\right> = Tr B^t A$). Note that this induces the usual notion of inner product $\left<u,v\right> = \sum u_i\bar{v_i}$ to diagonal matrices. Still, this doesn't explain a lot because we could have just defined the inner product to be $Tr \bar{A}B$, i.e. to just conjugate without transposing. The reason I went to the complex case is as follows: any inner product on $\mathbb{C^n}$ is of the form $\left<v,w\right> = w^* P v$ for some positive definite matrix (depending on the base). While this is also true over $\mathbb{R}$ (with transposition rather than conjugation), in $\mathbb{C}$ $P$ necessarily has a square root $Q$ which satisfies that $Q^2=P$. This allows us to define the product $\left<A,B\right>_P = \left<Q^*AQ,Q^*BQ\right>$, and it is not hard to prove that this product coincides the product $\left<v,w\right> = w^* P v$ in the commuting case. This implies that $M\mapsto Q^*MQ$ allows us to extend the operation of "twisting" the inner product by $P$ to the space of matrices (that is to say, to nicely define an isometry between the inner product structure induced on the matrix space by $\left<v,w\right>=v^*w$ to that induced by $\left(v,w\right)=v^*Pw$). But this only works if the conjugation is there.
Yet another plausible generalization is to non-square matrices. The space of all $n\times m$ matrices for $n\ne m$ is a real\complex vector space, for which an inner product might also be desirable. In the real case, transposing one of the matrices and multiplying is the natural thing to do. In particular, non square matrices could also be diganolized (where the intention is still that the matrix vanishes outside the main diagonal, only now the diagonal does not go all the way to the opposing corner). The space of all diagonal matrix with respect to a given basis is still a subspace which could be identified with $\mathbb{R}^{\min\{m,n\}}$, and the Hilbert-Schmidt norm still acts like an inner product in the commuting case.

This leaves open the small detail of the differences between defining the product as $tr B^t A$ and as $tr AB^t$, I'll leave it to you to contemplate what difference this makes if any (I suggest you consider the non square case).

What is the motivation behind the standard inner product defined on matrices?

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in LINEAR-TRANSFORMATIONS

Related Questions in INNER-PRODUCTS

Related Questions in TRACE

Trending Questions

Popular # Hahtags

Popular Questions