We define the standard inner product on matrices in $\mathbb R^{m\times n}$ by
$$ \langle A\mid B \,\rangle = \mbox{tr} \left(A\, B^{\mkern 2mu\mathrm t} \right)$$
What is the motivation behind defining it in such a way? Does it have any similarity with the inner product defined over $\mathbb R^n$?
What is the geometrical meaning of this inner product?
I am new in this topic and I want to develop an intuition that this should fit properly and in correspondence with other standard inner product so that I have less to memorise.
Can someone please help me a bit on this topic?
Note that for two diagonal matrices $A=diag(\lambda_1,\ldots,\lambda_n)$ and $B=diag(\mu_1,\ldots,\mu_n)$, you get that $tr A^T B = \sum\lambda_i\mu_i = \left<v,w\right>$ where $v=(v_1,\ldots,v_n)$ and $w=(w_1,\ldots,w_n)$.
Choose a basis $B$ and consider the set $V_B$ of all matrices which are diagonal with respect to this basis. It is easy to see that $V_B$ is a subspace (and even a subalgebra) which is isometric to $\mathbb{R}^n$ by identifying the vector $v$ with the matrix $diag(v)$ which has the elements of $v$ on its diagonal. The inner product you described (which is called the Hilbert-Schmidt inner product, by the way) is then identified with the usual inner product on $\mathbb{R}^n$.
This works for any choice of basis. Recalling that two matrix commute if and only if there is a base in which both are diagonal simultaneously, we can say that this inner product is a generalization to the non-commuting case. Or rather, that the inner product on $\mathbb{R}^n$ is the Hilbert-Schmidt product restricted to the commuting case.
This still leaves open the question, why is one of the matrices transposed? We could just define the norm to be $tr AB$. Off the top of my head I can come up with three justifications for this choice:
The norm should remain the same if we apply the same (orthogonal) basis transformation to both matrices. If $O$ is an orthogonal matrix from the basis $B$ to the basis $B'$, then we already know that in $\mathbb{R}^n$ it holds that $\left<Ov,Ow\right>=\left<v,w\right>$ for any two vectors $v,w$. In the matrix space this manifests in the fact that the map $M\mapsto OMO^t$ is an isometry from $V_B$ to $V_{B'}$. In general, we still want the property $\left<OMO^t,ONO^t\right>=\left<M,N\right>$ for any two matrices $M,N$. The Hilbert-Schmidt norm achieves that.
In the complex case, the Hilbert-Schmidt norm becomes $\left<A,B\right> = TrB^* A$ where $*$ means the conjugate transpose (the reason that I put it on $A$ rather of $B$ is a matter of convention, the real case is also usually defined as $\left<A,B\right> = Tr B^t A$). Note that this induces the usual notion of inner product $\left<u,v\right> = \sum u_i\bar{v_i}$ to diagonal matrices. Still, this doesn't explain a lot because we could have just defined the inner product to be $Tr \bar{A}B$, i.e. to just conjugate without transposing. The reason I went to the complex case is as follows: any inner product on $\mathbb{C^n}$ is of the form $\left<v,w\right> = w^* P v$ for some positive definite matrix (depending on the base). While this is also true over $\mathbb{R}$ (with transposition rather than conjugation), in $\mathbb{C}$ $P$ necessarily has a square root $Q$ which satisfies that $Q^2=P$. This allows us to define the product $\left<A,B\right>_P = \left<Q^*AQ,Q^*BQ\right>$, and it is not hard to prove that this product coincides the product $\left<v,w\right> = w^* P v$ in the commuting case. This implies that $M\mapsto Q^*MQ$ allows us to extend the operation of "twisting" the inner product by $P$ to the space of matrices (that is to say, to nicely define an isometry between the inner product structure induced on the matrix space by $\left<v,w\right>=v^*w$ to that induced by $\left(v,w\right)=v^*Pw$). But this only works if the conjugation is there.
This leaves open the small detail of the differences between defining the product as $tr B^t A$ and as $tr AB^t$, I'll leave it to you to contemplate what difference this makes if any (I suggest you consider the non square case).