I'd like to know the best examples that are simple and easy to understand, but which also capture the essence and the spirit of statistical modeling. What are some simple but also fundamental and illuminating examples of statistical modeling?
Edit: Here is another way to phrase the question: If a student asked you what statistical modeling is, what examples would you tell them? You would want the examples to somehow capture the essence of the subject without being too complicated.
Edit 2: I'll attempt to provide an example myself. What fraction of the population is planning to vote for candidate A? We introduce a random variable $X$ that is the result of selecting a person at random from the population and checking whether or not the person is planning to vote for A. If the person is planning to vote for A then $X = 1$, otherwise $X = 0$. We make a modeling assumption that $X$ has a Bernoulli distribution with parameter $p$. This is a simple but concrete and fundamental example of a statistical model.
Suppose that we select $n$ people at random from the population (with replacement) and the random variable $X_i$ is $1$ if the $i$th person is planning to vote for A, and zero otherwise. Then $$ \hat p = \frac{X_1 + \cdots + X_n}{n} $$ estimates the value of $p$. When we estimate the parameter $p$ in this way we have performed statistical inference.
I think this little example contains the key ideas of statistical modeling and statistical inference. A student can think of this example and say, "Ah, now I know what statistical modeling and statistical inference are."
But please correct me if you have any disagreements, or if I've used any terms incorrectly, as I'm a bit of an outsider to the field of statistics.
I'd be interested in hearing other examples like this that are basic but fundamental and illuminating.
As comments indicate the question is vague. Maybe the following example is in the general direction you have in mind. If not, and with your comments, the discussion may encourage other answers.
Suppose a communications satellite has three CPUs connected (in parallel) so that the satellite 'dies' only when all three CPUs are disabled. Also suppose that a reasonable model for the length of life of an individual CPU is $\mathsf{Exp}(rate = 1/3)$ so that its average time to failure (possibly a hit by a cosmic ray particle) is 3 years.
If $X_i,\; i = 1,2,3$ are independent lifetimes of each CPU, then the lifetime of the satellite is $W = \max(X_i),$ where $$F_W(w) = P(X_1 \le w, X_2 \le w, X_3 \le w) = [P(X_i \le w]^3 = (1 - e^{-(1/3)w})^3.$$
[Of course, this only accounts for death of CPUs from cosmic rays. A full reliability model would be more complicated taking into account solar storms, direct hits by space junk, and so on. But with suitable data those risks can be modeled similarly and combined.]
Thus the probability the satellite avoids death from cosmic rays for more than 5 years is $$1 - F_W(5) = 1 - (1 - e^{-5/3})^3 \approx 0.4663.$$
The wait for the first CPU to be destroyed by a cosmic ray is $V = \min(X_i),$ which can also be found by a CDF argument: $P(V \ge v) = 1-P(V > v) = = 1 - (e^{-(1/3)v})^3,$ so that $V \sim \mathsf{Exp}(rate = 3(1/3) = 1).$ Thus $E(V) = 1.$
The average lifetime of the satellite (based on cosmic ray hits) can be shown by moment generating functions to be $\bar X \sim \mathsf{Gamma}(shape=3, rate=1).$ So that $E(\bar X) = 3.$
One can find $E(W)$ by getting the PDF of $W$ from its CDF and using the definition of the expected value. Alternatively, if the three CPUs have the same failure rate, then one can use the no-memory property of exponential distributions to argue that $E(W) = 1 + 3/2 + 3,$ where the average wait for the first failure is $1$, then the average additional wait for the second is $3/2,$ and the average additional wait for the third is $3;$ for a total of $E(W) = 5.5.$
Pedagogically, I think this example has three advantages: (a) It is sufficiently accurate to have been used in practice. (b) It illustrates methods of finding distributions of averages, maximums, and minimums, and the no-memory property of the exponential distribution. (c) In cases where slightly messier lifetime distributions are required, one can use simulation to get reliable approximate answers.
Below is a simple simulation in R of a million 3-CPU satellites with exponential components. (A million iterations is enough to give about two or three place accuracy.)
Histograms show approximate distributions of the max, min, and mean; the density functions of the minimum and the mean are shown in red.
Acknowledgment: Similar to examples and problems in Seuss (2010).