Reading through a book, it is mentioned that $||{A}||^2_F=\text{Energy}(A)=\text{tr}(AA^T)=\text{tr}(A^TA)$. I understand that for square matrices the square Frobenius norm would be the the squared sum of all elements within the matrix, but I cannot intuitivley get why for rectangular matrices it would be the trace of the matrix multiplied by its transpose (or the other way around). For instance it would be that $\text{tr}(CD^T) = \text{tr}(DC^T) = \displaystyle\sum_{i=1}^n\sum_{j=1}^dc_{ij}d_{ij}$ for some matrices $D, C$. of size $n \times d$. Maybe some sort of proof would help?
SOURCE: Linear Algebra and Optimization for Machine Learning: A Textbook (page 20)
In frobenius norm of matrix $A_{m\times n}$, we consider $A$ as an $m \times n$ vector that its entries are written one after the other $($ because $ M_{m\times n}(\mathbb{R})\equiv \mathbb{R}^{m\times n})$ and then frobenius norm of $A$ is norm-$2$ of this vector.
For example if $$ A=\begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ \end{bmatrix} and \ v=\begin{bmatrix} x_{11} \\ x_{12} \\ x_{21} \\ x_{22} \\ \end{bmatrix}$$ then $||A||_F=||v||_2.$
Now $$||A||_F=||v||_2=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}|a_{ij}|^2}=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}(a_{ij})(a_{ij})}=\sqrt{tr(AA^T)}$$