I am currently reading The Element of Statistical Learning. In section 3.5.2 (Partial Least Squares), it describes the algorithm for it:
$1.$ Standardize each $x_j$ to have mean zero and variance one. Set $\hat{y}^{(0)} = \bar{y}\textbf{1}$ and $x_j^{(0)} = x_j$, $j = 1, ... , p$
$2.$ For $m = 1, 2, ... , p$
$(a)$ $z_m = \displaystyle \sum_{j = 1}^p \hat{\phi}_{mj}x_j^{(m-1)}$, where $\hat{\phi}_{mj} = \langle x_j^{(m-1)}, y\rangle$
$(b)$ $\hat{\theta}_m = \dfrac{\langle z_m, y \rangle}{\langle z_m, z_m \rangle}$
$(c)$ $\hat{y}^{(m)} = \hat{y}^{(m-1)} + \hat{\theta_m}z_m$
$(d)$ Orthogonalize each $x_j^{(m-1)}$ with respect to $z_m$ : $x_j^{(m)} = x_j^{(m-1)} - \left[\dfrac{\langle z_m, x_j^{(m-1)} \rangle}{\langle z_m, z_m \rangle}\right]z_m$
$3$. Output the sequence of fitted vectors $\{\hat{y}^{(m)}\}_1^{p}$. Since the $\{z_l\}_1^m$ are linear in the original $x_j$, so is $y^{(m)} = X\hat{\beta}^{\text{pls}}(m)$. These linear coefficients can be recovered from the sequence of PLS transformations.
Now, I am very confused because it does not proceed to explain how $\hat{\beta}^{\text{pls}}(m)$ can be computed "from the sequence of PLS transformations". I read the section for hours and I don't yet to see any hints on how to do it. Am I expected to find the left inverse of $X$ and multiply it by $\hat{y}^{(m)}$? That sounds very inefficient.