Assume that $A$, $B$ $\in R^{p\times d}$ both have orthonormal columns, then the vector of $d$ principal angles between their column spaces is give by $(\cos^{-1}\sigma_1,\cos^{-1}\sigma_2, \dots, \cos^{-1}\sigma_d)^T$, where $\sigma_1 \ge \dots \ge \sigma_d$.
For the definition of principal angles, it's the copy from wiki.
Let $V$ be an inner product space. Given two subspaces $\mathcal{U},\mathcal{W}$ with $\dim(\mathcal{U})=k\leq \dim(\mathcal{W}):=\ell$, there exists then a sequence of $k$ angles $ 0 \le \theta_1 \le \theta_2 \le \cdots \le \theta_k \le \pi/2$ called the principal angles, the first one defined as
$\theta_1:=\min \left\{ \arccos \left( \left. \frac{ |\langle u,w\rangle| }{\|u\| \|w\|}\right) \,\right|\, u\in \mathcal{U}, w\in \mathcal{W}\right\}=\angle(u_1,w_1),$
where $\langle \cdot , \cdot \rangle $ is the inner product and $\|\cdot\|$ the induced norm. The vectors $u_1$ and $w_1$ are the corresponding ''principal vectors.''
The other principal angles and vectors are then defined recursively via
$\theta_i:=\min \left\{ \left. \arccos \left( \frac{ |\langle u,w\rangle| }{\|u\| \|w\|}\right) \,\right|\, u\in \mathcal{U},~w\in \mathcal{W},~u\perp u_j,~w \perp w_j \quad \forall j\in \{1,\ldots,i-1\} \right\}.$
The question is that how can I prove that $\sigma_1 \ge \dots \ge \sigma_d$ are actually the singular values of $B^TA$ ?
I have (I think?) an alternate route to this result that is more intuitive to me.
We have the two subspaces $\mathbb{U}$ and $\mathbb{W}$. The principal angles construct tells us there are two orthonormal bases for these subspaces, $\{u_j\}_{j=1}^{d}$ and $\{w_j\}_{j=1}^d$, with the property that $\langle u_i, w_j \rangle = \delta_{ij} \cos(\theta_i)$. Let's stack these vectors into two matrices, $U\in\mathbb{R}^{p\times d}$ and $W\in\mathbb{R}^{p\times d}$, whose columns are the vectors $\{u_j\}_{j=1}^{d}$ and $\{w_j\}_{j=1}^d$.
The question is then why are the singular values of $B^TA$ equal to $\{\cos(\theta_j)\}_{j=1}^d$?
Since $A$ and $B$ have orthonormal columns they can be expressed as an orthogonal transform of the previously defined $U$ and $W$. Let's call these orthogonal transforms $O_A\in\mathbb{R}^{d\times d}$ and $O_B\in\mathbb{R}^{d\times d}$, so $A = UO_A$ and $B=WO_B$.
Now $B^TA = O_B^TW^TUO_A = O^T_BCO_B$, where, since $\langle u_i, w_j \rangle = \delta_{ij} \cos(\theta_i)$, $C$ is a diagonal matrix with diagonal elements equal to $\{\cos(\theta_j)\}_{j=1}^k$. $O^T_BCO_A$ is an orthogonal matrix, times a diagonal matrix, times an orthogonal matrix, so it is a singular value decomposition of $B^TA$, and its singular values are the cosines of the principal angles, as required.