To simplify my question, assume that I independently sampled $M$ data points $\{x\}_{i=1}^M$ from Gaussian distribution $N(\mu,\sigma^2)$. I can use $\{x\}_{i=1}^M$ to calculate its mean $\hat{\mu}$ and the $\hat{\sigma}$. My question is that what is the relation between the $\sigma$, $\hat{\sigma}$ and $M$? If $M$ increases, $\hat{\sigma}$ increases or decrease to $\sigma$? Can the result be extended to more general case like other distributions?
I ask this question, because in machine-learning/data-mining we may have many different algorithms/methods (saying 3 algorithms) to finish one task. In experiments, we can choose one algorithm and run one time and has the precision 90%. We can run it another time and the precision may be 92%. Each algorithm can be independently run $\beta$ times with $\beta$ precision results averaged, thus the variance of $\beta$ results can also be calculated. Then, we compare the average precision and variance for each algorithm, and the algorithm with higher average precision and smaller variance is the best. So how to choose $\beta$? If $\beta=5$ is very small, the variance may be not well calculated. If $\beta=5000$ is very big, then running $1000$ times is computationally expensive.
When you say, you compute $\hat{\mu}$ and $\hat{\sigma}$ I am assuming you are using the standard formulas for samples. These formulas give you an estimate of the true mean and standard deviation and have the property that they are consistent, this is, one can show that they converge to the true mean and standard deviation when $M\rightarrow\infty$
As you pointed out in the second paragraph, having a small $\beta$ is like having a small $M$, so we know very little about how good of an estimate will $\hat{\mu}$ and $\hat{\sigma}$ be. The larger the $\beta$, the better, of course there is a tradeoff, as in any empirical work, someitmes you just have to deal with the fact that you don't have a big enough sample. Sometimes, as in your case, it means you will need more computing power.