Question. How does one mathematically analyze situations that involve chance and skill?
Let's take the coin flip as a simple example. Assume that it possible to skillfully flip a coin to get the landing you want. Also assume zero cheating.
FIRST SCENARIO
The world's 5 most talented coin flippers gather to compete. The results:
Person 1
Coin Flips: 100
Success Rate: 100%
Person 2
Coin Flips: 10,000
Success Rate: 90%
Person 3
Coin Flips: 1,000,000
Success Rate: 80%
Person 4
Coin Flips: 100,000,000
Success Rate: 70%
Person 5
Coin Flips: 10,000,000,000
Success Rate: 60%
Each person claims he is the best coin flipper. How would you analyze the results?
SECOND SCENARIO
A man claims he is so skilled at coin flipping, he can always land heads.
He flips one coin. Sure enough, heads.
He flips again. Heads again.
He flips 100 times. All heads.
1000 times. Still all heads.
After 100,000,000,000,000 flips, every single one heads, he stops and says "I told you so."
When do we go from thinking "He's lucky!" to "He's good!"?

Suppose $p\in[0,1]$ is the probability that the coin lands heads up. Let $q\in[0,1]$ be the probability that a candidate successfully lands heads. Then, the skill ratio is given by $\chi_q:=\frac{q-p}{1-p}$. The candidate throws the coin $N$ times and successfully lands heads $M$ times out of $N$. We use the following estimator $\hat{q}:=\frac{M}{N}$ of $q$, assuming that $q$ is uniformly distributed on $[0,1]$. (The last assumption is the most dubious one. It may be better to assume that $q$ is uniformly distributed on $[p,1]$. However, I do not want to deal with difficult calculations.)
Assume that the measurement yields $\hat{q}=\hat{r}$ for some $\hat{r}\in[0,1]$. Therefore, we need to calculate $$\text{E}\Big(\chi_q\,\Big|\,\hat{q}=\hat{r}\Big)=\int_{0}^1\,\left(\frac{r-p}{1-p}\right)\,f(r,\hat{r})\,\text{d}r\,,$$ where $$f\left(r,\hat{r}\right):=\binom{N}{\hat{r}N}r^{\hat{r}N}(1-r)^{\left(1-\hat{r}\right)N}$$ for all $r\in[0,1]$ and $\hat{r}\in\left\{0,\frac{1}{N},\frac{2}{N},\ldots,\frac{N-1}{N},1\right\}$. This means $$\text{E}\Big(\chi_q\,\Big|\,\hat{q}=\hat{r}\Big)=\frac{1}{1-p}\,\binom{N}{\hat{r}N}\,\left(\frac{\big(\hat{r}N+1\big)!\,\big((1-\hat{r})N\big)!}{(N+2)!}-p\,\left(\frac{\left(\hat{r}N\right)!\,\big((1-\hat{r})N\big)!}{(N+1)!}\right)\right)\,.$$ That is, $$\mu:=\text{E}\Big(\chi_q\,\Big|\,\hat{q}=\hat{r}\Big)=\frac{\left(\hat{r}-p\right)N+1-2p}{(N+1)(N+2)(1-p)}\,.$$ For $p=\frac{1}{2}$, we get $$\mu=\frac{\left(2\hat{r}-1\right)N}{(N+1)(N+2)}\,.$$
Now, $$\text{E}\Big(\chi_q^2\,\big|\,\hat{q}=\hat{r}\Big)=\,\int_0^1\,\left(\frac{r-p}{1-p}\right)^2\,f\left(r,\hat{r}\right)\,\text{d}r\,,$$ or $$\text{E}\Big(\chi_q^2\,\big|\,\hat{q}=\hat{r}\Big)=\frac{\left(\hat{r}-p\right)^2N^2+\left(3\hat{r}(1-2p)-2p+5p^2\right)+\left(2-6p+6p^2\right)}{(N+1)(N+2)(N+3)(1-p)^2}\,.$$ Ergo, $$ \begin{align}\sigma:=\sqrt{\text{Var}\Big(\chi_q^2\,\big|\,\hat{q} =\hat{r}\Big)}=\frac{\sqrt{\tau}}{(N+1)(N+2)\sqrt{N+3}(1-p)}\,,\end{align}$$ where $$ \begin{align}\tau&:=(\hat{r}-p)^2N^4+\left(2\hat{r}^2+(3-10)p\hat{r}-2p+7p^2\right)N^3 \\&\phantom{abcdef}-\left(\hat{r}^2+(7-12p)\hat{r}+2-10p+16p^2\right)N^2+\left(5-12p+12p^2\right)N+1\,. \end{align}$$ If $p=\frac{1}{2}$, we have $$\sigma=\frac{\sqrt{(2\hat{r}-1)^2N^4+\left(8\hat{r}^2-8\hat{r}+3\right)N^3-\left(\hat{r}^2-\hat{r}-1\right)N^2+8N+4}}{(N+1)(N+2)\sqrt{N+3}}\,.$$
If the estimated skill ratio is $\hat{\chi}=\frac{\hat{r}-p}{1-p}$, then the skillfulness can be defined by $$\hat{S}:=\frac{\hat{\chi}-\mu}{\sigma}\,.$$ Fix $p=\frac{1}{2}$.
(1) When $N=10^2$ and $\hat{r}=1$, then $\hat{S}\approx 10.2$.
(2) When $N=10^4$ and $\hat{r}=\frac{9}{10}$, then $\hat{S}\approx 100.02$.
(3) When $N=10^6$ and $\hat{r}=\frac{8}{10}$, then $\hat{S}\approx 1000$.
(4) When $N=10^9$ and $\hat{r}=\frac{7}{10}$, then $\hat{S}\approx 31623$.
(5) When $N=10^{10}$ and $\hat{r}=\frac{6}{10}$, then $\hat{S}\approx 100000$.
It seems like $\hat{S}$ goes to $\sqrt{N}$ very quickly when $\hat{r}$ starts to exceed $p$. See the plot of $\hat{S}$ versus $\hat{r}$ for $p=\frac{1}{2}$ and $N=100$ below.
As mentioned in the first paragraph, it should be better if $q$ is assumed to be uniformly distributed on $[p,1]$. Here are some calculations with the modified distribution of $q$ under $p=\frac{1}{2}$:
(1) $N=10^2$ and $\hat{r}=1$ yield $\hat{S}\approx 7.178$;
(2) $N=10^4$ and $\hat{r}=\frac{9}{10}$ yield $\hat{S}\approx 70.79$;
(3) $N=10^6$ and $\hat{r}=\frac{8}{10}$ yield $\hat{S}\approx 707.1$;
(4) $N=10^{9}$ and $\hat{r}=\frac{7}{10}$ yield $\hat{S}\approx 22361$;
(5) $N=10^{10}$ and $\hat{r}=\frac{6}{10}$ yield $\hat{S}\approx 70711$.
Note that $\hat{S}$ goes to $\sqrt{\frac{N}{2}}$ very quickly. See the plots of $\hat{S}$ against $\hat{r}$ for $p=\frac{1}{2}$ and $N=100$ (on top) as well as $N=1000$ (at the bottom) below.