For the last while I've been attempting to truly understand what is being said by the Central Limit Theorem. I get the general idea, but there are still one or two details that are troubling. To make sure my thinking is correct, I'm going to explain my interpretation of the CLT. At the end I'll ask my questions.
The version of the CLT I'm working with comes from Mathematical Statistics and Data Analysis 3rd ed by Rice. Before stating the theorem the following definition is given
Let $X_{1}, X_{2}, \dots$ be a sequence of random variables with cumulative distribution functions $F_{1}, F_{2}, \dots$ and let $X$ be a random variable with distribution function $F$. We say that $X_{n}$ converges in distribution to $X$ if $$\lim_{n \to \infty} F_{n}(x) = F(x)$$ at every point at which $F$ is continuous.
With that here is the form of the theorem I'm using:
Central Limit Thm:
Let $X_{1}, X_{2}, \dots$ be a sequence of independent random variables having mean $\mu$ and variance $\sigma^{2}$ and the common distribution function $F$ and moment generating function $M$ defined in a neighbourhood of zero. Let $$S_{n} = \frac{1}{n}\sum_{i = 1}^{n}X_{i}$$ Then $$\lim_{n \to \infty} P \bigg(\frac{S_{n} - \mu}{\sigma/ \sqrt{n}} \leq x \bigg) = \Phi(x)\ -\infty < x < \infty$$ where $\Phi(x)$ is the standard normal distribution.
For my purposes the existence of the moment generation function is not important because I'm just attempting to understand the theorem and not prove things here. I should also mention I'm finishing up Spivak's Calculus so some of the ideas in my explanation come from that.
My Thought Process of What's Happening:
We are given a sequence of random variables $X_{1}, X_{2}, \dots$, we are not sure if this sequence of random variables converges or not, one thing we are interested in is if their partial sums eventually converge to a term. The partial sums were defined by:
$$S_{n} = \frac{1}{n}\sum_{i = 1}^{n}X_{i}$$
These partial sums, $S_{n}$ can be seen as a sequence in themselves and they are also random variables as they are a function of the $X_{i}$. The Law of Large Numbers shows that this sequence converges to $\mu$. That is:
$$\lim_{n \to \infty}S_{n} = \lim_{n \to \infty}\bigg(\sum_{i = 1}^{n}X_{i}\bigg) = \mu$$
From this we ask how do the values of our sequence of random variables, $S_{n}$ (question 1) fluctuate around the value $\mu$?
To determine this we standardize our random variables, $S_{n}$:
$$Z_{n} = \frac{S_{n} - \mu}{\sigma/\sqrt{n}}$$
This again is a sequence of random variables, which are a function of the random variables, $S_{n}$. We know that $\lim_{n \to \infty}S_{n} \to \mu$. By calculus since $Z_{n}$ is continuous this means
$$ Z_{1}(S_{1}), Z_{2}(S_{2}),Z_{3}(S_{3}), \dots \to Z(\mu)\ \textbf{question 2}$$
The CLT is discussing probability, in other words it expresses the probability of a realization of random variable occurring. This is where the cumulative distribution functions come into play. So given a sequence of distribution functions $F_{1}, F_{2}, \dots$ and a distribution function $F$ what I'm envisioning occurring based on the convergence of the $Z_{n}$ is:
$$\begin{array} F_{1}(Z_{1}) & F_{1}(Z_{2}) & \dots & \to & F_{1}(Z) \\ F_{2}(Z_{1}) & F_{2}(Z_{2}) & \dots & \to & F_{2}(Z) \\ \vdots & \vdots & \dots & \to & \vdots \\ F(Z_{1}) & F(Z_{2}) & \dots & \to & F(Z) \end{array}$$
The reason for this is because the $F_{i}$ are continuous functions and by calculus properties, if $Z_{n} \to Z$ then $F(Z_{n}) \to F(Z)$. Now the value that $F(Z)$ at the end of the convergence will take on is $\Phi(x)$. Thus:
$$\lim_{n \to \infty} P \bigg(\frac{S_{n} - \mu}{\sigma/ \sqrt{n}} \leq x \bigg) = \lim_{n \to \infty} F_{n}(Z) = \Phi(x)$$
My questions
(Marked as question 1 in the write up): I wrote $S_{n}$, but are we looking at the behaviour of $X_{i}$ or the behaviour of $S_{n}$ around $\mu$?
This has to do with $\lim_{n \to \infty}Z_{n}$. Since we know by the Law of Large Numbers that $\lim_{n \to \infty}S_{n} = \mu$, wouldn't that mean that $\lim_{n \to \infty}Z_{n} = 0$ all the time?
The CLT revolves around the behaviour of the random variables around $\mu$. To this end I can "see" the idea of since my $X_{i}$ are random variables, in turn the $S_{n}$ should also be random values, but again since we are taking things to the limit we end back up at the constant value of $\mu$. What am I missing in my understanding of this process?
Did I explain the ideas relatively correct?
EDIT: Based off of the discussion I've been having with @apprentice, the following is another way to frame my issues:
Based off a simple example of what the CLT is doing. Say we have a population, we take a sample from the population, take its average and plot it on a graph. We repeat this process over many times and by the end of it, based on the CLT the distribution of those mean values will be normally distributed. So now I'm trying to take that idea and translate it to the technical jargon used to define the CLT rigourously.
Regarding the first question, we are indeed looking at the behavior of the partial sum $S_n$, but this of course depends on the behavior our random sequence $\{X_i\}_{i \geq 1}$ in hand.
For the second question, again you are correct. Strictly speaking, under the conditions that ensure the CLT holds (in fact under even weaker conditions), we have $Z_n \to 0$ almost surely (and thus in probability).
On the third question,yes the partial process $S_n$ is still random, but we're just saying that this converges to a fixed number as $n \to \infty$ almost surely. I don't think you're missing anything in particular - it's just that the "averaging out" absorbs all the noisiness as $n \to \infty$. However, if we multiply $S_n$ by $\sqrt{n}$, which is "just the right amount", we get a nice normal random variable as a limit.
Regarding the last question, I think you're on a good path of understanding the concepts! However, going through a nice measure theory based probability/stat book should help solidify your understanding.
[EDIT] I misread the question; I thought $Z_i$ is the standardized version of $X_i$, not the partial sum. Under bounded second moments of the original sequence (which we are already implicitly assuming by dividing by $\sigma$), we have $Z_n \to N(0,1)$ in distribution. Thus, $Z_n$ does not converge to any fixed number.