If we have data set $x$ as $m$ of $n \times 1$ vectors, and we know the average over index $m$ of $xx^t$ is $<xx^t> = C$, where $C$ is $ n \times n$ matrix. What is the average of scalar $ x^tx$? I came up with two ways and not sure which one is more suitable or which one is wrong:
(1). Using $tr(AB) =tr(BA)$, $<x^tx> = <tr(x^tx) >=tr(<xx^t>) = tr(C)$
(2) $<x^tx> \simeq \sqrt{<x^txx^tx>} = \sqrt{<x^tCx>} = \sqrt{<tr(x^tCx)>} = \sqrt{tr(<Cxx^t>)} = \sqrt{tr(C^2)}$
EDIT (1) is correct after modification from joriki's answer below. (2) is wrong for two reasons: a. the first $\simeq$ is not a good approximation since variance of $x^tx$ may be large' b. the first equal sign to replace $xx^t$ by $C$ is wrong since there terms of $x^t$, $xx^t$, $x$ are perfectly correlated through $x$, therefore $<x^txx^tx> \neq <x^t<xx^t>x>$. This step underestimates the result.
related question Expected value of $x^t\Sigma x$ for multivariate normal distribution $N(0,\Sigma)$
Both lines are wrong but the first one happens to end up with the right result nevertheless.
The first one is wrong because you drop the average in the first step, whereas you can only drop it in the last step; in the last step you replace $xx^\top$ by $C$ even though $C$ is defined as the average of that quantity. If you take the average of all intermediate results, you get a correct derivation of the result.
The second one is wrong because the expectation of $x^\top x$ isn't the square root of the expectation of its square. (In fact the difference between the expectation of the square and the square of the expectation is the variance.) Also again you can't replace $xx^\top$ by $C$ because $C$ is the average of that quantity; and you can't just drop the average in the third step.
Overall I get the impression that you seem to be confusing taking the average with taking the trace; these are related in other contexts, but not here, where the average is being taken over the data points and not over the vector components.