While reading the brief discussion on the PCA approach in the book Deep Learning (Ian Goodfellow and Yoshua Bengio and Aaron Courville), I could not understand the passage shown in the figure. Specifically when the authors go to compute the covariance matrix of the encoded input $\vec{z}$. Since $\vec{z}$ has been defined to be $W^T\vec{x}$, I don't understand how $Z^TZ$ can be $W^TX^TXW$. According to my reasoning it should instead be: $X^TWW^TX$. Can anyone tell me where am I wrong?
The discussion about that is given on page 146 (Part I - section 5) of the book (https://www.deeplearningbook.org/contents/ml.html).
