In probability, often we start with some random variables whose attributes are known. For example, we let $X_i$ to be i.i.d. with $$P(X_i=1)=60\%,$$ $$P(X_i=0)=40\%.$$
We then investigate into the distributions of mixture of them, e.g. $X=\sum_{i=1}^{100} \frac{X_i}{100}$.
In real life examples, I often find that what I need is the converse: We made one, and only one, observation, and hope to guess the attribute of $X_i$, provided that we assume $X_i$ are i.i.d.s. How do people manage this?
An attempt of mine is to let $$P(X_i=1)=x,$$ $$P(X_i=0)=1-x,$$ and to each $x$ assign the "likelihood" for it to match our observed data. For example, if out observed that $X = 0.6$, a reasonable theory should assign the largest likelihood to $x=60\%$. However, that doesn't mean $x$ couldn't be $59\%$. In fact, the only two cases that we can safely discard is when $x=0$ and when $x=1$. How do probabilists and statisticians deal with this? What's the reasonable theory behind?
As you said, $X=0.60$ is a point estimation that could be obtained via Maximum Likelihood, Method of Moments or any other criteria. It could be the end of the story if you wish. But you can construct Confidence Intervals (CI), i.e., a random interval $$ I(X) = [L(X),U(X)], $$ which satisfies that $$ P(L(X)\leq x\leq U(X))=1-\alpha, $$ where $\alpha\in(0,1)$, e.g., $\alpha=0.1$. One standard method to obtain a CI is by the asymptotic method of Wald (see https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval). The point here is that you don't have just a point estimation, but a range of values of $x$ that are covered with high probability by your random interval $I(X)$. With that information, you can decide which is the value of $x$ with high confidence. To clarify, once you get your sample, you construct the interval, so you are looking at one realization of $I(X)$.
The wall interval is based in the next fact. Under some regularity conditions (see "Approximation theorems of mathematical statistics" of Serfling), $$ \sqrt{I(\hat{\theta})}(\hat{\theta}-\theta)\rightarrow_d N(0,1), $$ where $\theta$ is the true parameter value, $\hat{\theta}$ is the maximum likelihood estimator and $I(\theta)$ is the observed Fisher information, i.e., $$ I(\theta) = -\dfrac{d^2}{d\theta^2}\log L(\theta;(x_1,...,x_n)), $$ and $L(\theta;(x_1,...,x_n))$ is the likelihood function. Let $z$ be such that $P(N(0,1)\geq z)=\alpha/2$, then $$ P\left( \hat{\theta}-\dfrac{z}{\sqrt{I(\hat{\theta})}}\leq \theta\leq \hat{\theta}+\dfrac{z}{\sqrt{I(\hat{\theta})}} \right)=P\left(|\sqrt{I(\hat{\theta})}(\hat{\theta}-\theta)|\leq z\right)\approx 1-\alpha. $$ You just have to adapt it to Bernoulli distribution. You will get the random interval (if my math is not mistaken) $$ P\left( \hat{p}-z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\leq p\leq \hat{p}+z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\right)\approx 1-\alpha. $$