in a multiple choice exam you have an unlimited supply of Questions,in Which a correct answer fetches 4 points a wrong answer a penalty of 1 mark.if you randomly select one of the four options available for each question, what is the number of questions you should try to make sure you are 99 percent sure to get marks above 20.
I tried to do this using the expected gain of each question(0.25) and got 80 which is wrong but I am unable to understand how to have a confidence interval of 99 percent on the expectation.Also what is the distribution which the expectation follows(is it normal by central limit theorem).
Let $X$ be the number of points you gain per answer (a random variable). Its distribution is
$$X \sim \begin{cases}4 &\text{ with probability } \frac{1}{4} \\ -1 &\text{ with probability } \frac{3}{4}\end{cases}$$
Let $X_1,X_2,\dots$ be iid variables from this distribution (which are the points gained for each question).
We are interested in calculating $N \in \mathbb{N}$, such that
$$\mathbb{P}\left(\sum_{i=1}^N X_i \ge 20\right) \ge 0.99 \qquad (1)$$
Its useful to transform $X_i$'s into Bernoulli distributed random variables. Note that
$$\frac{X+1}{5} \sim \begin{cases}1 &\text{ with probability } \frac{1}{4} \\ 0 &\text{ with probability } \frac{3}{4}\end{cases} = \text{Ber}\left(\frac{1}{4}\right)$$
So we can transform $(1)$ into
\begin{align}\mathbb{P}\left(\sum_{i=1}^N (X_i+1) - N \ge 20\right) &\ge 0.99 \\ \mathbb{P}\left(\sum_{i=1}^N \frac{X_i+1}{5} - \frac{N}{5} \ge 4\right) &\ge 0.99 \\ \mathbb{P}\left(\sum_{i=1}^N \text{Ber}\left(\frac{1}{4}\right) - \frac{N}{5} \ge 4\right) &\ge 0.99 \\ \mathbb{P}\left(\sum_{i=1}^N \text{Ber}\left(\frac{1}{4}\right) \ge 4+\frac{N}{5}\right) &\ge 0.99 \\ \end{align}
The sum of iid Bernoulli random variables is a Binomial random variable:
\begin{align} \mathbb{P}\left(\text{Binom}\left(N,\frac{1}{4}\right) \ge 4+\frac{N}{5}\right) &\ge 0.99 \\ \end{align}
Now we can use the Central Limit Theorem to get a good approximation for $\text{Binom}\left(N,\frac{1}{4}\right)$, using Normal distribution. For this, we need to subtract the mean $\left(\frac{N}{5}\right)$ and divide by the variance $\left(\frac{1}{4}\cdot\frac{3}{4}N=\frac{3}{16}N\right)$, then multiply by $\sqrt{N}$:
\begin{align} \mathbb{P}\left(\text{Binom}\left(N,\frac{1}{4}\right)-\frac{N}{4} \ge 4+\frac{N}{5}-\frac{N}{4}\right) &\ge 0.99 \\ \mathbb{P}\left(\frac{\text{Binom}\left(N,\frac{1}{4}\right)-\frac{N}{4}}{\frac{3}{16}N} \ge \frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) &\ge 0.99 \\ \mathbb{P}\left(\sqrt{N}\frac{\text{Binom}\left(N,\frac{1}{4}\right)-\frac{N}{4}}{\frac{3}{16}N} \ge \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) &\ge 0.99 \\ \end{align}
Now, due to the Central Limit Theorem, if $N$ is large, the left side is very closely approximated by a standard normal $\mathcal{N}(0,1)$ variable. Since the expectation for $N$ you've calculated was $80$, this $N$ will likely be large enough for the approximation to be very close.
\begin{align} \mathbb{P}\left(\mathcal{N}(0,1) \ge \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) &\ge 0.99 \\ \end{align}
The probability for a random variable to be $\ge$ to a value, is $1$ minus the probability of it being $<$:
\begin{align} 1-\mathbb{P}\left(\mathcal{N}(0,1) < \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) &\ge 0.99 \\ \end{align}
Now the $\mathbb{P}(...)$ section is just the $\mathcal{N}(0,1)$ distribution's CDF, $\phi$, evaluated at $\frac{4-\frac{N}{20}}{\frac{3}{16}N}$:
\begin{align} 1-\phi\left(\sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) = 0.99 \\ 0.01 = \phi\left(\sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}\right) \\ \phi^{-1}(0.01) = \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N} \\ \end{align}
Now we need to find the value of $\phi^{-1}(0.01)$, namely the $x$ where $\phi(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x e^{-\frac{t^2}{2}} dt \approx 0.01$. I did this numerically, and it turns out $\phi(-3.723) \approx 0.01$, so $\phi^{-1}(0.01) \approx -3.723$.
\begin{align} -3.723 &\approx \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N} \\ -3.723 \cdot \frac{3}{16}N &\approx \sqrt{N}\left(4-\frac{N}{20}\right) \\ -0.6980625N &\approx \sqrt{N}\left(\frac{80-N}{20}\right) \\ -13.9165N &\approx \sqrt{N}(80-N) \\ 194.9165N^2 &\approx N(80-N)^2 \\ 194.9165N &\approx (80-N)^2 \\ 194.9165N &\approx 6400-160N+N^2 \\ 0 &\approx 6400-354.9165N+N^2 \\ N_1 &\approx 19.0555;\quad N_2 \approx 354.915 \end{align}
When substituting back $N_1$ and $N_2$ into $-3.723 \approx \sqrt{N}\frac{4-\frac{N}{20}}{\frac{3}{16}N}$, we find that $N_1 = 19.0555$ does not satisfy the equation (we get $+3.723$ on the left side), but $N_2 = 354.915$ does.
So for the initial probability $(1)$ to be $\ge 0.99$, we need to answer at least $\boxed{N \ge 355}$ questions.