The covariance matrix can be interpreted as a summarization of a whole dataset into a single matrix representing a quadratic form that computes the variance of that dataset in a certain direction. I realize one might answer my question by: "you can compute std. dev. as $\sqrt{x^T\Sigma x}$", but I am not looking for a way to compute the standard deviation per se, I am looking for a way how to derive a summarization of the dataset (similar to what the covariance matrix is for computing variance) that will allow me to do this.
Let's say I don't know that covariance matrices exist and let's say I want to find a way how to compute the average value of standard deviation of a dataset $X = \left\{x_1, x_2, ..., x_N | x_i \in \mathbb{R}^N\right\}$ in the direction given by a unit vector $d$. Assume that the dataset $X$ has been de-meaned. This is how I would go about computing it:
$$\text{std}(d) = \frac{1}{N}\sum^N_i = |x_i \cdot d|$$
Writing a similar function for the variance and simplifying the expression yields the covariance matrix, which is a compressed representation of the whole dataset for the purposes of computing variance in arbitrary direction (and I'm comfortable with this derivation). How can I derive a similar compressed representation (not necessarily a matrix) of the whole dataset, but for computation of the standard deviation? Does such compressed representation even exist?
Or does such compressed representation exist only when we square the dot product (= the covariance matrix in this case)? Why is squaring so special? Why cannot we use any other symmetric functions like $x^4$ or $exp(|x|)$ to arrive at a summarization of a whole dataset for to purposes of computing a measure of spread (not necessarily variance)?
You might be looking for a matrix $A$ which satisfies $A^TA = \Sigma$ I think. Notice how $$ \|Ax\|^2 = x^T A^TA x = x^T \Sigma x $$ i.e. $\|Ax\| = \sqrt{x^T\Sigma x}$
Covariance Reconstruction
Let $Y=(Y_1,\dots, Y_n)$ be a random vector of centered ($\mathbb{E}[Y_i]=0$) random variables with covariance zero and variance $1$, i.e. $$ \mathbb{E}[Y_i Y_j] = \delta_{ij} = \begin{cases} 0 & i\neq j\\ 1 & i=j \end{cases} $$ Let us now consider $X=A^T Y$. $X$ is still centered, so $$ \text{Cov}(X_i, X_j) = \mathbb{E}[(A^T Y)_i (A^TY)_j] = \mathbb{E}[(A^TYY^T A)_{ij}] = (A^T \underbrace{\mathbb{E}[YY^T]}_{\mathbb{I}}A)_{ij} = (A^T A)_{ij} = \Sigma_{ij} $$ So $X$ has now covariance $\Sigma$. At the same time $(A^T)^{-1} X = Y$ is uncorrelated and standardized. This is the "whitening" that was mentioned in the comments
Normal distribution
If you have a standard normal distributed random variable $Y\sim \mathcal{N}(0,1)$, then $X:=\sigma Y\sim\mathcal{N}(0,\sigma^2)$. Similarly for an iid standard normal vector $Y\sim \mathcal{N}(0,\mathbb{I})$ we have that $X= A^T Y \sim\mathcal{N}(0, \Sigma)$.
A is not unique
there are different $A$ which satisfy the equation $A^T A = \Sigma$.