Given a $p$-dimensional zero mean random (column) vector $X \in \mathbb{R}^{p}$, the auto-covariance is given by $$\Sigma = Cov(X) = \mathbb{E}[XX^T].$$ Now given $n$ observations of this column vector, it feels natural to stack each $p$-dimensional observation as new column vectors, i.e. create the data matrix $M \in \mathbb{R}^{p\times n}$, such that the unbiased sample covariance is $\hat{\Sigma} = \tfrac{1}{n-1}MM^T$.
However, in the literature, it is common to arrange variables as columns and observations as rows, i.e. $M \in \mathbb{R}^{n\times p}$, such that $\hat{\Sigma} = \tfrac{1}{n-1}M^TM$. The benefit of this is that it conforms well with the design matrix used in regression analysis. However, it seems slightly inconsistent with the random vector notation.
While I do understand it is a choice of application, what would you say is the ideal way of arranging the matrix $M$, observations as columns or as rows in order to keep things consistent?