In the derivation of PCA two steps look the following:
$\max E[ || U^T X_i ||_2^2 ] = \max_{||U_1||=1} E[ (U_1^T X_i)^2 ] = \max_{||U_1||=1} \frac{1}{n} \sum_{i=1}^n (U_1^T X_i)(X_i^T U_1)$
Here, $U$ is an orthogonal matrix coming from SVD and $X_i$ is a column of the data matrix $X$.
I really wonder how the argumentation for removing the norm in the first step works, but also I can't wrap my head around the second step (shouldn't it be: $(U_1^T X_i)^2 = (U_1^TX_i)^T(U_1^TX_i)$)
I believe this is not a complicated reasoning but that I am missing out something obvious. So, I would appreciate any help!