imagine a betting game where we observe $N$ independent coin flips $x_1,...,x_n$ (where each $x_i \in {H,T}$) from the same coin, whose true weight is $\theta$. the task is to predict how many Heads we will get in the next $M$ flips of the same coin. the closer you are to the true number of heads, the higher your payoff will be. as example, the reward may be $r/(abs(Guess_H - True_H) + 1)$ where $Guess_H$ is your guess for number of Heads on next $M$ flips, $True_H$ is actual number of Heads in the next $M$ flips, and $r$ is some value $r > 1$.
what is the optimal strategy? how do you show formally which of the two following strategies is better?
strategy 1: estimate coin weight $\theta$ from $N$ first flips. if $\theta > 0.5$, predict all Heads for next $M$ flips ($M$ Heads), if $\theta < 0.5$, predict all Tails for next $M$ flips (0 Heads).
strategy 2: estimate coin weight $\theta$ from $N$ first flips. predict $\theta*M$ many heads for next $M$ flips.
which strategy is better? can this be shown formally? attempt:
expected reward for strategy 1:
assume $\theta = 0.75, M = 10$
$E(reward|strategy 1) = Binomial(10; 10, 0.75)*r$
expected reward for strategy 2:
$E(reward|strategy 2) = Binomial(0.75 * 10; 10, 0.75)*r$
this shows that for this case, strategy 2 is better, since:
$Binomial(0.75 * 10; 10, 0.75) > Binomial(10; 10, 0.75)$
how can this be shown analytically and in general, not assuming particular values for $M$ and the true $\theta$?
If $N$ and $M$ are large, you expect the first part to be a good measure of $\theta$, then you expect a fraction $\theta$ of the tosses to be heads, so your prediction should be $\theta M$, strategy 2. If they are large enough that the normal approximation is good, you can use that. All heads will then be several standard deviations away. If $N,M$ are smaller, you have a lot of algebra ahead of you.