If I want to apply the LLN for an estimator that uses another estimator, can I apply the LLN inside the summation and after it simplify the outer summation by using the expected value of the inner one? If yes, why? How can I describe these operations in a more rigorous way?
This comes particularly handy when showing the convergence of a variance estimator
$$ \frac 1 n \sum (x_i - \bar{x}_n)^2 $$ $$ \bar{x}_n = \frac 1 n \sum x_i $$
I suppose it depends on the situation; are you looking at something like this; probabilities of probabilities.
Where for example, there a coin that is tossed, and there is a 50 percent chance, that it will have a 40 percent chance of coming up heads. And in addition, there is also a 50 percent chance, that it will have a 30 percent chance, of coming up heads.
And if you are using the strong law of large numbers etc. There are generalizations for this law, such as kolmogorov's Generalized strong law of large numbers which generalizes to, independent but not identically distributed random variables, given that certain variances and conditions are boundedd. . See http://mathworld.wolfram.com/StrongLawofLargeNumbers.html
And perhaps you want to know how this differs, with regard to variance and rates of convergence from a standard biased coin with a fixed chance (of coming up heads being) 0.35 (ie binomial distribution with p=0.35
Moreover, whether one can take the limit of the expected value in the former case (0.35) to make claims like 'almost surely, or almost all the measure (with Pr~=1) is concentrated on sequences whose limiting relative frequency of heads is 0.35 (expected, sample average generating function)sample probably. I have some results on this, using generating functions.
Ie Formally, it appears that in the traditional Kolmogorov (measure theoretical interpretation), whilst the situations are (whilst formally distinguished), the empirical consequence which are entailed by said formalism, appear to be identical and indistinguishable. At least that is what I and one of my co-supervisors have concluded.
This being at least in relation to limiting relative frequencies (within standard probability theory, as I said I am not sure what occurs when one uses a banach spaces real valued random variable approach, or a non-standard hyper-finite formalism).
I suppose it would be interesting to consider the variances and rates of convergence, using a non-standard approach.
Perhaps, with each progressive iteration of the LLN (ie, ' with probability 1,the relative frequency of trials with chance of chance of chance, with probability1, with probability of trials with chance of chance etc,,,, with probability1, with probability1,with probability 1, with probability the limiting rel freq(heads)=0.35) the rate of frequency convergence to the expected value, slows, down or changes,
But presumably this would require an uncountable iteration, of said almost surely's (PR(1)), if the appropriate variance conditions and mean conditions are met each time, although I am not sure. I (can only) presume that in some sense the measure of convergent sequences, is un-countably greater, but this may not be quite correct.
Perhaps, something might change, if there is an infinite (presumably it would have to be uncountable or maybe not)-meta-distribution of probabilities
.Due to the limitations of the sequences themselves being countably infinite, there may not be another elements in the sequence, to have one, or what is more important, an infinite of said elements in each with each appropriate probability value.