Let's say that, in a video-game, every time you go to the blacksmith and there is a random (yet announced) chance he succeeds fixing your weapon.
In this example it goes like this:
70% chance succeeding but ended up failing the fix;
60% chance succeeding but ended up failing the fix;
50% chance succeeding and ended up succeeding the fix;
70% chance succeeding and ended up succeeding the fix;
I now want to figure out if the announced chance of succeeding is accurate by using a confidence interval.
With the sd=8.29 and the mean= 62.5:
- 62.5 ± 3.291*(8.29/sqrt(4)) = [48.9 to 76.1]
Being 3.291 the Z value to 99.9%, does it make sense to say that I'm 99.9% confident in a 50% actual success mean, and therefore, the announced chance values are legit?
No.
Your confidence interval is computed assuming that the "chances" are independent samples from a normal distribution, which they obviously aren't since they are bounded to $[0,1]$.
If you ignore that, and choose to assume normality, then you should use the Student's t-distribution, since your variance is unknown. Then, the confidence interval you compute will be a confidence interval for the true population mean.
However, if $50\%$ was in the 99.9% confidence interval for the true population mean $\mu$, that would NOT mean that you are $99.9\%$ confident that $\mu=50\%$. It would mean that you are $99.9\%$ 'confident' that the true population mean was somewhere in that interval. Here, 'confident' just means that, if you took many independent samples, and computed the $99.9\%$ confidence intervals for each, then in the long run (the limit) $99.9\%$ of the confidence intervals would cover $\mu$.
Also, your confidence interval has no connection to the actual successes and failures, which are a SECOND source of randomness. So even if the observed proportion of successes was not in the $99.9\%$ confidence interval, that does not mean that you can reject the hypothesis that the announced chance values are legit.
I'm not sure how you would test that hypothesis. I've never done a hypothesis test with two levels of randomness, such as your independent Bernoulli trials which each have a random probability of success.
If you only want to make sure that the blacksmith was not announcing overly confident chances, then you could do a one sided test to see if you can reject that the chances were $70\%$ (constant, the maximum announced) in all trials. If you cannot reject that hypothesis, then you wouldn't be able to reject that the blacksmith was overly confident with the random chances. If the example you gave is the actual example you care about, that would probably be sufficient to see that you cannot reject the null hypothesis that the announced chances are legit. However, for more complicated examples, rejecting the null of the constant chance test would not mean that you can reject the null for the actual random chances. As I said, that's a more complicated test that I'm not sure how to do.
EDIT
Upon further reflection, I realized how to test the hypothesis that the announced chances are legit:
Assuming you have a fixed, finite number of trials, $N$, your sample space is finite, consisting of $2^N$ possible outcomes: every possible combination of successes and failures in $N$ trials. If you let $p_i$ represent the announced chance of success for trial $i$, and we treat those as fixed but unknown, then for an outcome $x = (x_1, x_2, \ldots, x_N)$, we have $$P(X=x) = \prod_{i=1}^N p_i^{x_i} (1-p_i)^{1-x_i}$$ Here, $x_i \in \{0, 1\}$, where $0$ represents failure and $1$ represents success.
Then, the number of successes will be a random variable, $S$, which will have a PMF with support on $\{0, 1, \ldots, N\}$, that can be computed from the distribution for $X$.
You can let $S$ be your test statistic, and the distribution computed using the announced chances be the distribution of your test statistic under the null hypothesis, which is that the actual chances of success were the announced chances of success. That distribution will allow you to compute the $99.9\%$ confidence interval for $S$ (exact, no normal approximation), and you can reject the null hypothesis (with $\alpha=0.001$) if the observed value of $S$ is outside that confidence interval.