One generalization of Bernoulli trials hierarchy in Example 4.4.6 is to allow the success probability to vary from trial to trial, keeping the trials independent. A standard model for the situation is
$X_{i}|P_{i}\sim Bernoulli(P_{i}) ,\quad i=1,2,3,...n$
$P_{i}\sim beta(\alpha ,\beta )$
A random variable of interest is $Y=\sum_{i=1}^{n}X_{i}$, the total number of successes
We are asked to find the expectation and variance of Y
For expectation, we have $\mathbb{E}(Y)=\mathbb{E}\sum_{i=1}^{n}X_{i}=\sum_{i=1}^{n}\mathbb{E} X_{i}$
Because each $X_{i}|P_{i}$ is Bernoulli($P_{i}$), we have $\mathbb{E}(X_{i})=\mathbb{E}(\mathbb{E}(X_{i}|P_{i}))$ where $\mathbb{E}(X_{i}|P_{i})=P_{i}$
Also from the properties of beta distribution, we have, $\mathbb{E}(P_{i}^{n})=\frac{\Gamma (\alpha +n)\Gamma (\alpha +\beta)}{\Gamma(\alpha+\beta+n)\Gamma(\alpha)}\quad $ yielding $\quad \mathbb{E}(P_{i}^{2})=\frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)}$ and $\mathbb{E}(P_{i})=\frac{\alpha}{(\alpha+\beta)}$
Thus, $\mathbb{E}(Y)=n\mathbb{E}(P_{i})=\frac{n\alpha}{\alpha+\beta}$, which completes the first part of the problem.
However to find the variance of Y, I get two different answers from two different approaches and I can't figure out why is this the case,
Approach 1
since each $X_{i}$ is independent and identically distributed, we have
$\mathrm{Var}(Y)=\mathrm{Var}(\sum_{i=1}^{n}X_{i})=\sum_{i=1}^{n}\mathrm{Var}(X_{i})=n\mathrm{Var}(X_{i})\quad$ where $\quad \mathrm{Var}(X_{i})=\mathbb{E}(\mathrm{Var}(X_{i}|P_{i}))+\mathrm{Var}(\mathbb{E}(X_{i}|P_{i}))$
We have, $\mathrm{Var}(X_{i}|P_{i})=P_{i}(1-P_{i})=P_{i}-P_{i}^{2}\quad$ and $\quad\mathbb{E}(X_{i}|P_{i})=P_{i}$
Therefore, $\mathrm{Var}(X_{i})=\mathbb{E}(P_{i}-P_{i}^{2})+\mathrm{Var}(P_{i})=\mathbb{E}(P_{i}-P_{i}^{2})+\mathbb{E}(P_{i}^{2})-(\mathbb{E}(P_{i}))^{2}=\mathbb{E}(P_{i})(1-\mathbb{E}(P_{i}))=\frac{\alpha\beta}{(\alpha+\beta)^2}$
Hence, $\quad\mathrm{Var}(Y)=\frac{n\alpha\beta}{(\alpha+\beta)^2}$ which is the answer.
However, using a slightly different (somewhat long) route, I arrive at a different result.
Approach 2
$\mathrm{Var}(Y)=\mathbb{E}(Y^{2})-(\mathbb{E}(Y))^{2}\quad $ where $\mathbb{E}(Y)=n\mathbb{E}(P_{i})=\frac{n\alpha}{\alpha+\beta}$
To find the $\mathbb{E}(Y^{2})\quad $, we notice that
$Y^{2}=(X_{1}+X_{2}+X_{3}+....X_{n})^{2}=\sum_{i=1}^{n}X_{i}^{2}+2\sum_{j>i}\sum_{i=1}^{n-1}X_{i}X_{j}\quad$
The second term on the RHS has exactly $\frac{n(n-1)}{2}$ terms corresponding to unordered selections of two numbers from 1 to n, and because each $X_{i}$ is independent and identically distributed, the expectation of $Y^{2}$ reduces to (with i$\neq$ j)
$\mathbb{E}(Y^{2}|P_{i})=\mathbb{E}(\sum_{i=1}^{n}X_{i}^{2}|P_{i})+n(n-1)\mathbb{E}(X_{i}X_{j}|P_{i})=\sum_{i=1}^{n}\mathbb{E}(X_{i}^{2}|P_{i})+n(n-1)\mathbb{E}(X_{i}|P_{i})\mathbb{E}(X_{j}|P_{i})= nP_{i}+n(n-1)P_{i}^{2}$
Therefore, $\mathbb{E}(Y^{2})=\mathbb{E}(\mathbb{E}(Y^{2}|P_{i})=n\mathbb{E}(P_{i})+n(n-1)\mathbb{E}(P_{i}^{2})=n\mathbb{E}(P_{i})+n^{2}\mathbb{E}(P_{i}^{2})-n\mathbb{E}(P_{i}^{2})$
Hence, $\mathrm{Var}(Y)=\mathbb{E}(Y^{2})-(\mathbb{E}(Y))^{2}=n\mathbb{E}(P_{i})+n^{2}\mathbb{E}(P_{i}^{2})-n\mathbb{E}(P_{i}^{2})-(n\mathbb{E}(P_{i}))^{2}\quad$ which does not simplify to the expression given in the answer of $\frac{n\alpha\beta}{(\alpha+\beta)^2}$
It would be extremely helpful if someone can help with what went wrong in approach 2 since this is what i attempted in my first try.
Thanks in advance
After some thinking into why the two approaches give different answers, I arrived at a flaw in the logic for approach 2, which I have explained below.
Hope this answer would be helpful for anyone who stumbles across a similar situation while trying to solve this exercise problem of Casella and Berger.
The experiment shown detailed in the question above can be thought of as an Empirical Bayes model wherein the success probability for each trial varies from trial to trial following a Beta distribution.
The primary flaw in reasoning behind approach 2 is the fact that for $i\neq j$, while computing $\mathbb{E}(X_{i}X_{j})$ using conditioning, the conditioning needs to be done on two different probabilities $p_{i}, p_{j}$ for the two independent trials $X_{i},X_{j}$ unlike just $p_{i}$ which was done in Approach 2.
Therefore, $\mathbb{E}(X_{i}X_{j})=\mathbb{E}(\mathbb{E}(X_{i}X_{j}|p_{i}, p_{j}))$ where the inner expectation is w.r.t the conditional distribution of $(X_{i},X_{j})|(p_{i}, p_{j})$, notice that the above joint distribution can be written as
$f((X_{i},X_{j})|(p_{i}, p_{j}))=\frac{f((X_{i},X_{j},P_{i}, P_{j})}{f(P_{i}, P_{j})}=\frac{f(X_{i},P_{i})f(X_{j},P_{j})}{f(P_{i}, P_{j})}=f(X_{i}|P_{i})f(X_{j}|P_{j})\quad$ where the second last equality follows from the independence of the trial outcome and success probability for the first and second trials.
Now using the above expression to evauluate the inner expectation, we have $\mathbb{E}(X_{i}X_{j}|P_{i}, P_{j})=\sum_{X_{j}}^{}\sum_{X_{i}}^{}X_{i}X_{j}f((X_{i},X_{j})|(P_{i}, P_{j}))=\sum_{X_{j}}^{}\sum_{X_{i}}^{}X_{i}X_{j}f(X_{i}|P_{i})f(X_{j}|P_{j})=\left ( \sum_{X_{j}}^{}X_{j}f(X_{j}|P_{j}) \right ) \left ( \sum_{X_{i}}^{}X_{i}f(X_{i}|P_{i}) \right )=P_{i}P_{j}$
Therefore, $\mathbb{E}(\mathbb{E}(X_{i}X_{j}|p_{i}, p_{j}))=\mathbb{E}(p_{i}p_{j})=\mathbb{E}(p_{i})\mathbb{E}(p_{j})=(\mathbb{E}(p_{i}))^{2}$ since $p_{i},p_{j}$ is independent and identically distributed on $Beta(\alpha,\beta)$, using this result, approaches 1, 2 arrive at the same answer.
Aside1:
The entire story could have been avoided by using $\mathbb{E}(X_{i}X_{j})=\mathbb{E}(X_{i})\mathbb{E}(X_{j})=\mathbb{E}(\mathbb{E}(X_{i}|P_{i}))\mathbb{E}(\mathbb{E}(X_{j}|P_{j}))=\mathbb{E}(p_{i})^{2}$ where $p_{i} \sim Beta(\alpha,\beta)$ from independence.
But since any correct approach should give the same answer,finding what went wrong with approach 2 above was necessary for my own peace of mind.
Aside2:
The logic employed in approach 2 would work perfectly fine for a beta binomial model wherein the same success probability, sampled from a beta distribution, is used for each of the $n$ trials, in other words, $p_{i}=p_{j}$ by design.