I was reading this article on SPRT (Sequential Probability Ratio Test). I was interested in a simple proof of the optimality result of SPRT. However the proof on page 3 regd. optimality is incorrect. This is because if $K^*$ in the proof is not the stopping time of SPRT, then $P(\Lambda_{K^*}\geq \gamma_1) \neq P_D$ as they have used. If it is SPRT then they have not proved that SPRT is better (in terms of expected number of samples till stopping) than any arbitrary test that has $P_D^{'} \geq P_D $ and $P_{FA}^{'} \leq P_{FA}$, where $P_D$ and $P_{FA}$ are the probability of correct detection and probability of false alarm respectively fixed for SPRT.
But I liked the idea. In comparison to the proof given in Dudley's notes, this proof, if it can be cleaned up, would be a lot more approachable. I wish to correct the proof by strengthening the arguments. Readers who wish to help may read the article (albeit carefully as there are typos) to get an understanding
The statement of optimality is as follows:
Let $P_D$ and $P_{FA}$ be given. Let $N$ be the number of samples till the SPRT reaches a verdict ($H_0$, the null hypothesis, is distribution $P_0$ and $H_1$ is $P_1$). Consider ANY sequential test that has $P_{FA}' \leq P_{FA}$ and $P_{D}' \geq P_{D}$. Suppose this requires $N'$ steps to reach a verdict. Then for $i=0,1$ $$\mathbb{E}_i[N] \leq \mathbb{E}_i[N']$$ where $\mathbb{E}_i[N]$ is the expected time taken for a verdict when samples are drawn with distribution $P_i$.
My attempt at the proof: Let $X_j$ be distributed iid according to $P_1$. The $P_0$ case can be handled similarly.
We have $\log \Lambda_k = \sum_{j=1}^k\log \frac{P_1(X_j)}{P_0(X_j)}$. As $N,N'$ are stopping times, by Wald's lemma, $$\mathbb{E}_1[\log \Lambda_{N'}] = \mathbb{E}_1[N']\mathbb{E}_1\left[\log \frac{P_1(X)}{P_0(X)}\right] = \mathbb{E}_1[N']D(P_1\|P_0)$$ where $D(P\|Q)$ is the K-L divergence between $P$ and $Q$.
Now consider $\mathbb{E}_1[\log \Lambda_{N'}] $. The idea in the article was to condition on the mutually exclusive events $\{\Lambda_N \geq \gamma_1\}$ and $\{\Lambda_N \leq \gamma_0\}$ where $\gamma_1 = \frac{P_D}{P_{FA}}$ and $\gamma_0 = \frac{1-P_D}{1-P_{FA}}$ are the thresholds for SPRT. The error they made was that they apparently didn't differentiate between $N$ and $N'$ while proving the above result. So we have
\begin{equation} \mathbb{E}_1[\log (\Lambda_{N'})1_{\{\Lambda_N \geq \gamma_1 \}}] =\mathbb{E}_1[\log (\Lambda_{N'})|\Lambda_N \geq \gamma_1 ]P_1(\Lambda_N \geq \gamma_1)=\mathbb{E}_1[\log (\Lambda_{N'})|\Lambda_N \geq \gamma_1 ]P_D \end{equation}
\begin{eqnarray} \mathbb{E}_1[\log (\Lambda_{N'})|\Lambda_N \geq \gamma_1 ]&\geq& -\log\left[\mathbb{E}_1\left[\frac{1}{\Lambda_{N'}}1_{\{\Lambda_N \geq \gamma_1 \}}\right]\right]+\log P_D \end{eqnarray} where the inequality is by Jensen's Inequality. Now in the article, they claim $\mathbb{E}_1\left[\frac{1}{\Lambda_{N'}}1_{\{\Lambda_N \geq \gamma_1 \}}\right]=\mathbb{E}_0\left[\Lambda_N \geq \gamma_1 \right] = P_{FA}$. I don't think this is true unless $N'=N$. In any case I got stuck here. I wish to prove somehow that $$\mathbb{E}_1\left[\frac{1}{\Lambda_{N'}}1_{\{\Lambda_N \geq \gamma_1 \}}\right] \leq P_{FA}$$
I'd appreciate any ideas in this endeavor.
Edit: A verdict means choosing either hypothesis 0 or 1 and stopping. No additional samples are taken once a verdict has been reached.