Suppose I observe $\bar{x}$ occurrences in a relatively large time interval $t,$ and I know that the probability distribution is a Poisson distribution: $P(x) = \frac{e^{-\mu}\mu^{x}}{x!},$ where $\mu$ is the true mean in a time interval $t$. (Also assume $\mu$ is reasonably large enough that my observation $\bar{x}$ is decent.)
What would be the uncertainty in my estimate for the true mean $\mu$? I have seen sources before take the uncertainty to be $\sqrt{\bar{x}}$, because this corresponds to the theoretical standard deviation, so my estimate for the mean would be $\mu = \bar{x} \pm \sqrt{\bar{x}}$.
But generally, isn't the uncertainty normally taken to be the variance? In that case, my guess for the mean would be $\mu =\bar{x} \pm \bar{x},$ but this seems incorrect.
To take the question one step further, suppose I have $N$ sample observances of $\bar{x}$ each for time intervals $t: \bar{x}_1, \bar{x}_2, ... , \bar{x}_3.$
To estimate $\mu$ I could take the weighted mean by assigning an uncertainty to each: $$\sum_{i=1}^{N}\frac{x_i w_i}{w_i}.$$ Taking $w_i = 1/\sigma_i^2,$ is not uncommon, but would lead to the clearly incorrect result: $$\sum_{i=1}^{N}\frac{x_i/x_i}{1/x_i} = \sum_{i=1}^{N}x_i.$$ So taking $w_i = 1/\sigma_i$ seems reasonable here. But if this is the case, why is it more common to see $w_i = 1/\sigma_i^2?$
Some basics: If $N$ has a Poisson distribution with rate $\lambda$, then the probability mass function is $$P(N=n) = \frac{\lambda^n}{n!} e^{-\lambda}$$ If $N_t$ is a Poisson process then for each fixed $t>0$, $N_t$ has the PMF of a Poisson distribution with rate $\lambda t$, i.e. $$P(N_t=n) = \frac{(\lambda t)^n}{n!} e^{-\lambda t}.$$
In any case if the mean rate is large, say $>1000$, then the scaled RV converges in distribution to a standard normal RV, i.e. $$Z_\lambda = \frac{N-\lambda}{\sqrt{\lambda}} \stackrel{d}{\to} Z \sim \mathcal{N}(0,1),$$ as $\lambda \to \infty$. What do we mean by this?
The expression $$ Z \sim \mathcal{N}(0,1)$$ means $$P(Z\leq z) = \Phi(z)$$ where $\Phi$ is the cumulative distribution function (CDF) of a standard normal RV, i.e. $$\Phi(z) = \int_{-\infty}^z \phi(z)dz,$$ where the probability density function (PDF) $\phi$ is $$\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}.$$
The expression $$Z_\lambda \stackrel{d}{\to} Z$$ means that CDF of $Z_\lambda$ converges, pointwise, at the points $z$ of continuity, to the CDF of $Z$, i.e. $$P(Z_\lambda \leq z) \to P(Z\leq z) = \Phi(z).$$ This means that approximately, for large $\lambda$, $$P(N\leq n) \approx \Phi\left(\frac{n-\lambda}{\sqrt{\lambda}}\right)$$
Now, suppose we look at the process $N_t$ over a large time interval $[0,t]$, with $\lambda >0$ fixed. If $t$ is large and $\lambda$ is not ridiculously small, then $\lambda t$ will be large, and we have, approximately, $$P(N_t \leq n) \approx \Phi\left( \frac{n-\lambda t}{\sqrt{\lambda t}}\right).$$
Some statistics: How is this useful for finding confidence intervals? Let's consider the process case, and we have observations $N_1(t),\dotsc, N_n(t)$ for some large time interval $[0,t]$. Take the sample mean $$\hat{N}_t = \frac{1}{n} (N_1(t)+\dotsc +N_n(t)).$$ This has mean $=\lambda t$, and variance $$\operatorname{Var}(\hat{N}_t) = \frac{1}{n^2} n \operatorname{Var}(N_1(t))$$ $$ = \frac{1}{n} \lambda t $$ so the standard deviation is $\sqrt{\lambda t/n}$. It follows that $$\sqrt{n}\frac{(\hat{N}_t-\lambda t)}{\sqrt{\lambda t}} \to Z \sim \mathcal{N}(0,1).$$ This means for large $t$ and large $n$, we have, approximately $$P(\hat{N}_t \leq x ) = \Phi \left(\frac{x-\lambda t}{\sqrt{\lambda t/n}}\right).$$
So what? Well, if we remember that $|x|\leq n$ if and only if $-n \leq x \leq n$ and that distribution functions satisfy $P(a<X<b)=F(b)-F(a)$, where $F(x)=P(X\leq x)$, then we want the difference between the true mean $\lambda t$ and our estimate $\hat{N}_t$ to be small, less than $\epsilon >0$, i.e. \begin{align*} P(|\hat{N}_t -\lambda t| \leq \epsilon) &= P(\hat{N}_t-\lambda t \leq \epsilon)-P(\hat{N}_t-\lambda t \leq -\epsilon)\\ & \approx \Phi \left(\frac{\epsilon}{\sqrt{\lambda t/n}}\right)-\Phi \left(\frac{-\epsilon}{\sqrt{\lambda t/n}}\right)\\ & = \Phi \left(\frac{\epsilon}{\sqrt{\lambda t/n}}\right)-\left(1-\Phi \left(\frac{\epsilon}{\sqrt{\lambda t/n}}\right)\right)\\ & = 2\Phi \left(\frac{\epsilon}{\sqrt{\lambda t/n}}\right)-1. \end{align*}
Pick your favorite number $\alpha \in (0,1)$ and now say, we want the deviation from the mean to be equal to $1-\alpha$, i.e. $$P(|\hat{N}_t -\lambda t| \leq \epsilon) = 1-\alpha.$$ Use the approximate normal expression we just found above and solve for $\epsilon$ or $n$, depending on your viewpoint: $$1-\alpha = 2\Phi \left(\frac{\epsilon}{\sqrt{\lambda t/n}}\right)-1$$ Add one, divide by $2$ and take the inverse CDF, i.e. the quantile function obtaining $$\frac{\epsilon}{\sqrt{\lambda t/n}} = \Phi^{-1}(1-\alpha/2)=:z_{\alpha/2},$$ where the RHS is just convenient notation. Thus, if we want an error of $\epsilon$, we can find out how many samples $n$ we need, by $$\sqrt{n} \geq \frac{\sqrt{\lambda t}}{\epsilon} z_{\alpha/2}$$ and if instead, we are stuck with merely $n$ samples, the best we can do error wise is $$\epsilon = \sqrt{\frac{\lambda t}{n}} z_{\alpha/2}.$$ Notice for $\lambda t$ sufficiently large but fixed, the error goes to zero as $n\to \infty$.
The $(1-\alpha)100\%$ confidence interval for $\lambda t$ is now $$\left[\hat{N}_t - \sqrt{\frac{\lambda t}{n}} z_{\alpha/2}, \hat{N}_t + \sqrt{\frac{\lambda t}{n}} z_{\alpha/2}\right].$$
Caveat: Of course, since this requires the standard deviation and in the Poisson case, this is equal to the square root of the mean, this whole estimate is a little bit silly: we have an error estimate of the mean, that depends on the mean itself! Certainly this is no good. We can rectify this by merely replacing the standard deviation with the sample standard deviation and replacing the normal distribution with Student's $t$-distribution with $n-1$ degrees of freedom.
Hope this helps and is not too overwhelming. Please comment for corrections, clarifications, or suggestions.