Not sure how to solve Bayesian parameter learning problem

Question

Not sure how to solve Bayesian parameter learning problem

246 Views Asked by Bumbble Comm At 01 Apr 2026 - 5:02

I could use some help solving a problem about a Dirichlet prior.

We have a multinomial distribution over an alphabet of 27 symbols parameterized by $\mathbf{\theta}=(\theta_1, ..., \theta_{27})$. We have collected a dataset $D = \{x[1],\ldots,x[2000]\}$ consisting of $2000$ symbols, among which "e" appears $260$ times. We use a Dirichlet prior over $\theta$ with parameters $\mathbf{\alpha}= (\alpha_1, ..., \alpha_{27})$, where each $\alpha_i = 10$. What is the predictive probability that letter "e" occurs with this prior? (i.e., what is $P(X[2001] = $ "$e$"$ \mid D)$?)

So my understanding is that we are to find $P(X_{2001} | X_1, X_2, ... X_{2000})$. By the given information, and since this is a Bayesian probability question, it appears that we must apply the chain rule of conditional probability here. If we also observe the conditional independence of $X_1, X_2, ... X_{2001}$ given $\theta$, we have

$$P(X_{2001} | X_1, ... X_{2000}, \theta)P(\theta|X_1, X_{2000})= P(X_{2001} |\theta)P(\theta|X_1, X_{2000})$$

and from law of total probability, $$ =\int_{\theta} P(X_{2001} |\theta)P(\theta|X_1, ... X_{2000}) \mathrm d\theta $$

Now to solve the integral, I am left with some questions. What is $P(X_{2001}|\theta)$? Is this simply $\theta$? Can this probability be derived from the multinomial of $X$ given $\alpha$?

How do we find the posterior $P(\theta|X_1, ... X_{2000})$? I was thinking that since the Dirichlet (Beta) is a conjugate prior to the multinomial (binomial), that this posterior might be

$$P(\theta|X_1, ... X_{2000}) = \frac{\Gamma(\alpha_5+ 260 + \sum_{i=1}^{26} \alpha_i + \frac{2000-260}{26})}{\Gamma(\alpha_5 + 260)\Pi_{i=1}^{26}\Gamma(\alpha_i + \frac{2000-260}{26})} \theta_{5}^{10+260-1}\Pi_{i=1}^{26}\theta_{i}^{1000/13 - 1}$$

Where the subscript $5$ denotes the index for "e" in the parameter vectors. $\alpha_i$ is the value of the $i$th value in the vector $\alpha$ after the $2000$ observations (inclusive of the starting prior parameter values of $\alpha_i=10$). Next, substituting the given information that "e" was observed $260$ of the $2000$ times gives me

$$P(\theta|X_1, ... X_{2000}) = \frac{\Gamma(260)\Gamma(1740)}{\Gamma(260)\Pi_{i=1}^{26}\Gamma(\alpha_i))} \theta_{5}^{260}\Pi_{i=1}^{1740}\theta_{i}^{\alpha_i}$$

I am very unsure of my work so far, and I am not sure how to finish solving this. The last expression is in terms of the vector $\theta$, and I would not know how to integrate such an expression.

I was wondering if it is possible to simplify this problem by modeling the distribution over $\mathbf{\theta}$ as a binomial distribution instead of a Dirichlet. It appears that is really only a binary event here: either "e" is observed or some other symbol was observed.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

What happens is that you start from the Dirichlet distribution as a prior, i.e.

$$p(\theta)=\frac{\Gamma(\sum_{i=1}^{27} \alpha_i)}{\Pi_{i=1}^{27}\Gamma(\alpha_i)} \Pi_{i=1}^{27}\theta_i^{\alpha_i}$$

Then, you obtain some data which have a certain probability under the hypothesis that the $\alpha_i$ have certain values.

$$P(X_1,\ldots,X_{2000}|\theta)={2000 \choose {N_1,\ldots,N_{27}}} \Pi_{i=1}^{27}{\theta_i}^{N_i}$$

In which an $N_i=\sum_k I(X_k=i)$ or in words $N_i$ counts how many times the $i$th symbol appears. From this you now compute the posterior as

$$P(\theta|X_1,\ldots,X_{2000})=\frac{P(X_1,\ldots,X_{2000}|\theta)p(\theta)}{\int P(X_1,\ldots,X_{2000}|\theta)p(\theta) d\theta}$$

However, you only have partial information about the letter "e". Let's say that letter "e" is the 5th symbol, then the probability of it appearing 260 times under your hypothesis is

$$P(N_5=260|\theta)={2000 \choose 260} {\theta_5}^{260}(1-\theta_5)^{2000-260}$$

and applying the Bayesian updating rule we should write

$$P(\theta|N_5=260)=\frac{P(N_5=260|\theta)p(\theta)}{\int P(N_5=260|\theta)p(\theta) d\theta}$$

Working out the integral

$$\int P(N_5=260|\theta)p(\theta) d\theta = {2000 \choose 260} \frac{\Gamma(\sum_{i=1}^{27} \alpha_i)}{\Pi_{i=1}^{27}\Gamma(\alpha_i)} \int {\theta_5}^{260}(1-\theta_5)^{2000-260} \prod_{i=1}^{27}\theta_i^{\alpha_i} \Pi_{i=1}^{27}d\theta_i$$

The factor in front is irrelevant since it will cancel with the one from the numerator in our end formula, let's concentrate on

$$\int {\theta_5}^{260}(1-\theta_5)^{2000-260} \prod_{i=1}^{27}\theta_i^{\alpha_i} \Pi_{i=1}^{27}d\theta_i = \int {\theta_5}^{\alpha_5+260}(\sum_{i\neq 5}\theta_i)^{2000-260} \prod_{i\neq 5}\theta_i^{\alpha_i} \Pi_{i=1}^{27}d\theta_i$$

We work out the $(\sum_{i\neq 5}\theta_i)^{2000-260}$ with the multinomial theorem

$$(\sum_{i\neq 5}\theta_i)^{2000-260}=\sum_{k_i}{{2000-260} \choose {k_1,\ldots,\hat{k}_5\ldots,k_{27}}}\prod_{i\neq 5}\theta_i^{k_i}$$

which gives us for the integral

$$\int {\theta_5}^{\alpha_5+260}(\sum_{i\neq 5}\theta_i)^{2000-260} \prod_{i\neq 5}\theta_i^{\alpha_i} \Pi_{i=1}^{27}d\theta_i = \sum_{k_i}{{2000-260} \choose {k_1,\ldots,\hat{k}_5\ldots,k_{27}}}\frac{\Gamma(\alpha_5+260)\prod_{i\neq 5}\Gamma(\alpha_i+k_i)}{\Gamma(\sum_i \alpha_i + 2000)}$$

Not sure how to solve Bayesian parameter learning problem

There are 1 best solutions below

Related Questions in CONDITIONAL-PROBABILITY

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions