I am studying orthogonal columns and matrices right now and I have encountered the following theorem:
Theorem An $m \times n$ matrix $U$ has orthonormal columns if and only if $U^T U = 1$.
Is it even possible to have a matrix $U$ of the size $m\neq n$ here?
I ask because $U^TU=I\iff U^T=U^{-1}$, which means $U$ must be $n\times n$.
Also, I tried using it on some examples.
For instance, let $\displaystyle y=(4,8,1), u_1=\left(\frac{2}{3},\frac{1}{3},\frac{2}{3}\right), u_2=\left(-\frac{2}{3},\frac{2}{3},\frac{1}{3}\right), W=\text{span}(u_1,u_2)$ and $U$ be the matrix formed by $u_1$ and $u_2$ as columns.
So $\{u_1,u_2\}$ is an orthonormal set (I checked it is an orthogonal set and $u_1\cdot u_1=u_2\cdot u_2 = 1$).
But I am not able to use this theorem computing $U^TU$ and $UU^T$ (i.e. $UU^T\neq I$!). I am definitely misunderstanding something here.
If $u_1\cdot u_1=1$, would $u_1$ be an orthonormal column?
Your mistake is that if $U$ is $m\times n$ you are always allowed to take the transpose, but $U$ is not necessarily invertible. $U^T$ in this case is an $n\times m$ matrix, and the product $U^TU$ makes sense because you end up with an $n\times n$ matrix (an $n\times m$ matrix multiplies on the left of an $m\times n$ matrix yields an $n\times n$ matrix). Similarly $UU^T$ also makes sense but in this case you get an $m\times m$ matrix by the same logic.
Above you wrote $U^TU = I \Leftrightarrow U^T = U^{-1}$, but this is not correct. Just because $U^TU = I$ doesn't mean $U$ is invertible. For invertibility you would require that $m = n$. If the dimensions are different you are automatically guaranteed that $U$ is not invertible because it doesn't achieve full rank: at least one of its rows or columns is linearly dependent on the other rows/columns, respectively. Even if a matrix is square ($m=n$) you only get invertibility when all of the columns (interchangeably, all of the rows) are linearly independent.
To illustrate why $U^TU = I$ implies the columns of $U$ are orthogonal, suppose that $U = \left [ \begin{array}{ccc} u_1 & \ldots & u_n \\ \end{array} \right ]$ where $u_i$ is the $i$th column of $U$. Then the multiplication $U^TU = I$ is equivalent to
$$ \left [ \begin{array}{c} u_1^T \\ \vdots \\ u_n^T \\ \end{array} \right ] \left [ \begin{array}{ccc} u_1 & \ldots & u_n \\ \end{array} \right ] \;\; =\;\; \left [ \begin{array}{cccc} \langle u_1, u_1\rangle & \langle u_1, u_2 \rangle & \ldots & \langle u_1, u_n\rangle \\ \langle u_2, u_1 \rangle & \langle u_2, u_2\rangle & \ldots & \langle u_2, u_n \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle u_n, u_1\rangle & \langle u_n, u_2\rangle & \ldots & \langle u_n, u_n\rangle\\ \end{array} \right ]. $$
This above matrix being equal to the identity, is equivalent to saying that $\langle u_i, u_j\rangle = \delta_{ij}$, hence the columns are orthonormal.