When we do OLS of $y$ on $X$, with $X$ being a n x p input matrix, the OLS $\beta$ is $(X^TX)^{-1}X^TY$, and the Ridge regression beta is $(X^TX+\lambda I)^{-1}X^TY$. Also, the singular value decomposition of X is $UDV^T$ where $U$ and $V$ are orthogonal matrices and $D$ being a diagonal matrix. In equation 3.47 of Elements of Statistical Learning the author states $$ \begin{aligned} X\beta^{ridge} &= X(X^TX+\lambda I)^{-1}X^Ty\\ &= UD(D^2+\lambda I)^{-1}DU^Ty \end{aligned} $$
which seems to suggest that $X^TX = D^2$. But to arrive at that we first have $$ \begin{aligned} X^TX &=VDU^TUDV^T\\ &= VD^2V^T \end{aligned} $$
Now, I know $VV^T = I$ by property of orthogonal/orthonormal matrix. But there's a $D^2$ between them and I know matrix multiplications are not commutative. So how do we get to $VD^2V^T = D^2$?
The key is that for invertible $n \times n$ matrices $A,B,C$, we can rewrite their product $(ABC)^{-1}=C^{-1} B^{-1} A^{-1}$.
$$\begin{align} X\beta^\text{ridge} &= UDV^T(VDU^T UDV^T + \lambda I)^{-1}VDU^T y \\ &= UDV^T[V(D^2 + \lambda I) V^T]^{-1}VDU^T y \\ &= UDV^T V^{-T} (D^2 + \lambda I)^{-1} V^{-1} V DU^T y \\ &= UD(D^2 + \lambda I)^{-1} DU^T y \end{align}$$