Let's say we have just one coin. Before observing any tosses, we start with a flat prior for the probability of heads, $p$. In other words uniform or Beta with parameters $1$ and $1$. If we manage to see a million coin tosses in advance with a roughly equal number of heads and tails, the prior becomes a Beta that highly peaked around $0.5$. Now if someone says they'll offer a chance to see $10$ more tosses, we can safely pass since our prior distribution has already given us an almost point estimate for $p$.
Now, let's say there is a machine that outputs only two kinds of coins. It'll either return a coin with $p=0.1$ or a coin with $p=0.9$ with equal probabilities. We don't know any of this, and are given the chance to do what we want with the machine for $24$ hours. So we make it generate a thousand coins and toss each of them a thousand times (hence generating again a million tosses). And across these million tosses, the number of heads and tails is roughly the same. The next day, the machine generates a coin and were invited to play a game that involves tossing this coin. Someone offers to toss this coin $10$ times before the game. Now, these $10$ tosses will help us a lot since we'll get an idea of what kind of coin the machine generated.
It's clear that in the second scenario, the prior distribution we have isn't very helpful while in the first, it's incredibly helpful. So what is the prior distribution in the second case and how does it manage to remain unhelpful despite a million tosses?
I think we need to clearly spell out the entity to which the prior/posterior distributions apply. First you have a machine that produces coins with a distribution $D_m$, and next you sample a coin $C$ from it with a distribution $D_C$=Bernoulli($p_C$) where $p_C$=prob of heads for coin $C$ (small $m$ here indicates the one and only machine we have, while $C=c_1,c_2,\ldots$ takes on multiple values, one for each coin sampled from it). Note that both $D_m$ and $D_C$ are possibly unknown, and we may want to impose priors on them that are updated as coins are tossed.
Extreme 1: Let's consider your first scenario, where a million coins are drawn from the machine and each is tossed once. For a moment assume we have an oracle which, given any coin $C$, returns the $p_C$ (probability of heads) for it.
Extreme 2: In the other extreme, say a single coin $C=c$ is tossed a million times. Then we know everything about $c$, including $p_c$, but nothing about $D_m$ as we sample from it only once.
Middle ground: In between these extreme lies the second scenario, where a thousand coins are each tossed a thousand times; this is closer to the optimal for learning $D_m$. You have drawn 1,000 samples (coins) from $D_m$ and for each you have a very good estimate of $p_C$.
Back to your question:
This is correct, since we're able to learn $D_m$ well
This is correct, since we can't learn $D_m$, the prior's info is all we have
Side notes: