I have a group of $n$ events. The successes don't all come in at once, and and I want to try to predict the actual success rate $s$. The number of successes showing in the system at any given time can be $0$ or greater than $0$.
I have the prior probability of success ($p$) based on historical data and the current success rate $x$ (where $x$ is smaller than $s$)
Is there a way that Bayes's theorem can be used to give the probability of success, given that we know $p$ as well as $s$? Can I use Laplacian smoothing to predict this probability if $x = 0$? Am I incorrect in assuming that Bayes's theorem can solve this? Is there another way to do so?
In this situation it is best to use a member of the beta distribution as the prior for the binomial success probability $\pi.$ First, because a beta distribution has support $(0,1).$ Second the beta prior is 'conjugate' to the binomial likelihood, making it easier to find the posterior distribution.
I choose $Beta(\alpha = 38, \beta = 1862)$ as the prior because that distribution has mean $\alpha/(\alpha + \beta) = 0.02,$ median $0.0198,$ mode $0.0195,$ standard deviation approximately $0.019,$ and $$P(0.015 < \pi < 0.025) \approx 0.95.$$ These properties seem a reasonable match to the prior information you provided. The following program in R searches for the parameters $\alpha$ and $\beta$ starting with the relationship $\alpha/(\alpha + \beta) = 0.02$ or $\beta = 49\alpha$ to put enough probability in $(0.015, 0.025)$ to match your historical experience.
Below is a histogram of many simulated values of $\pi \sim Beta(38, 1862)$ with the prior density curve (blue) and the best-fitting normal density superimposed.
If you observe your process through 100 trials and see $x$ successes, then the posterior distribution is obtained according to Bayes' Theorem by the relationship $$ \text{POSTERIOR} \propto \text{PRIOR} \times \text{LIKELIHOOD},$$ which gives $$ p(\pi|x) \propto p(\pi) \times p(x|\pi) \propto \pi^{38-1} (1 - \pi)^{1862-1} \times \pi^x (1 - \pi)^{100 - x} \propto \pi^{38+x-1} (1 - \pi)^{1862 + 100 - x - 1}.$$ Here we use proportionality symbols $\propto$ to indicate that the constants of integration can be ignored -- so that we show only the 'kernels' of the prior, likelihood, and posterior distributions. We recognize the result as the kernel of $Beta(38+x, 1862 + 100 - x),$ which has mean $\frac{38+x}{38 + 1862 + 100}.$
Thus, if there were $x = 3$ successes in 100 trials, we could say that the posterior mean $E(\pi|x) = 0.0205$ and that a 95% posterior probability interval (R code below) for $\pi$ is $(0.0148,\, 0.0271).$ This information about $\pi,$ based on prior information and relatively little data, could be used to predict the success rate during additional trials of the process as you suggest.
Notes: (1) On rates: You refer to 'rates' $x$ and $s,$ but it seems to me you are thinking of counts of successes. (2) On selecting a prior distribution: With a prior that is 'conjugate' to (mathematically compatible with) the likelihood, it is easy to identify the kernel of the posterior distribution without tedious computation. If we used the (green) normal distribution as a prior, the problem would become messy. Using a prior such as $Unif(0.015, 0.025)$ would constrain the posterior to the same support. (3) On the effect of the prior on the result: Very roughly speaking, your prior information contains about as much information as getting 38 successes in 2000 trials. Thus the prior distribution has much more to say about the posterior than my hypothetical additional 100 trials. Continual updating of the posterior is indicated as additional data become available. The posterior for one iteration becomes the prior for the next. (4) If the beta family of distributions is unfamiliar to you, please look at the Wikipedia article on 'beta distribution'.