I'm learning about the hypothesis function used in linear regression.
$$h(\theta) = \theta_0X_0 + \theta_1X_1$$
Where $\theta$ is a $1\times 2$ matrix and $X$ is a $n\times 2$ matrix (with the first column of $X$ all 1's so it fits $\theta$).
I found an example online that instead of doing $\theta^TX$ does $X\theta^T$. Not only does that work but $\theta^T X$ can't be multiplied anyhow.
But in the definition that is exactly what they do. But matrix multiplication isn't transitive. So how is it that $X\theta^T$ is the correct answer when $\theta^T X$ is the definition?