Background
In the Equation (2.45) Introduction to Econometrics, GLobal Edition by Stock and Watson, it says
$E(\overline{Y}) = \frac{1}{n}\sum_{i=1}^{n}E(Y_i) = \mu_Y$
where observations $Y_1, \cdots, Y_n$ are i.i.d., and $\mu_Y$ and $\sigma^2_Y$ denote the mean and variance of $Y_i$.
What I don't understand
If $E(\overline{Y})$ in the equation is supposed to be the "mean of means", why must the number of samples be the same as the size of each sample? This is refering to the fraction $\frac{1}{n}$ in front of the summation $\sum_{i=1}^{n}E(Y_i)$.
Isn't the "argument" (in programming terms) for $E$ supposed to be multiset or bag which represent a series of observation, and not individual observation? Inside the summation of the above equation, why is $E(Y_i)$, representing the "mean" of individual observation permissible? It makes no sense to me. In programming terms, it wouldn't even compile.
The issue is not so much about the use of the term "observations" versus "samples" or "random variables" or "realizations," but rather, a misconception about the meaning of the expectation operator $\operatorname{E}$.
Suppose I have a single random variable, say $X$, that follows some probability distribution. To make things very simple, let's suppose this distribution is $$\Pr[X = 2] = 0.3, \quad \Pr[X = 5] = 0.7.$$ If I ask you, "what is $\operatorname{E}[X]$," what would you do? Would you insist that you cannot calculate it because you have no sample?
Of course, $$\operatorname{E}[X] = 2 \Pr[X = 2] + 5 \Pr[X = 5] = (2)(0.3) + (5)(0.7) = 4.1.$$ This is not an empirical result; it does not depend on any observation of $X$ whatsoever.
Now suppose I have two of these random variables, say $X_1$ and $X_2$, independent and identically distributed as $X$. What would you say about $\operatorname{E}[X_1 + X_2]$? We could either explicitly compute the probability distribution of the sum $X_1 + X_2$; i.e., $$\begin{align} \Pr[X_1 + X_2 = 4] &= \Pr[X_1 = 2]\Pr[X_2 = 2] = 0.09 \\ \Pr[X_1 + X_2 = 7] &= \Pr[X_1 = 2]\Pr[X_2 = 5] + \Pr[X_1 = 5]\Pr[X_2 = 2] = 0.42 \\ \Pr[X_1 + X_2 = 10] &= \Pr[X_1 = 5]\Pr[X_2 = 5] = 0.49 \\ \end{align}$$ hence $$\operatorname{E}[X_1 + X_2] = (4)(0.09) + (7)(0.42) + (10)(0.49) = 8.2,$$ or we can use linearity of expectation: $$\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2] = 4.1 + 4.1 = 8.2.$$ Thus the expectation of a sample mean of size $n = 2$ is $$\operatorname{E}\left[\frac{X_1 + X_2}{2}\right] = \frac{1}{2}\operatorname{E}[X_1 + X_2] = 4.1.$$
And in the general case of an arbitrary sample size, the same idea holds. The expectation operator is not calculating a sample mean. It's calculating a distributional property, a measure of central tendency of the underlying distribution. If I write $$\operatorname{E}[\bar X] = \operatorname{E}\left[\frac{X_1 + \cdots + X_n}{n}\right],$$ this represents a distributional property of the sampling distribution of the sample mean. That is to say, just like how $X_1 + X_2$ has its own probability distribution (that we can derive from $X$), the sample mean $\bar X$ has a probability distribution. $\operatorname{E}[\bar X]$ is the expected value of that distribution.
An "observation" for which we have no numerical result remains a random variable for the purposes of calculating distributional properties, except where noted otherwise. It doesn't matter if I call $X_1$ an "observation" or a random variable if I want to compute $\operatorname{E}[X_1 + X_2 + \cdots + X_n]$. The context is clear: I want an expectation with respect to all of the variables $X_1, \ldots, X_n$. If I want to treat $X_1$ as a fixed but unknown value, I would write $$\operatorname{E}[X_1 + \cdots + X_n \mid X_1],$$ and this would be a function of $X_1$. But in no case should you be thinking about $\operatorname{E}$ as being some empirical mean of some hypothetical sample drawn from a probability distribution. $\operatorname{E}[X]$ is not a statistic and not a random variable.
It is strange, because this is the second recent instance I have seen where someone is confused between estimators and distributional properties (i.e., parameters). This should not be the case for students of probability and statistics; it is critically important to disambiguate these before proceeding any further.