Bayesian Learning Example with a Uniform Prior

49 Views Asked by At

I'm trying to understand an example of online Bayesian learning from Duda and Hart's textbook Pattern Classification, section 3.5, example 1.

Let $D$ be a set of data points, in this case $\{4, 7, 2, 8\}$ and consider a conditional probability $p(x|\theta) = U(0, \theta)$. Denote $D^i$ as the set of $i$ points from $D$. A uniform distribution is also assumed for the prior $p(\theta|D^0) = p(\theta)=U(0, 10)$. Then the update rule for online bayesian learning is given by equation 54: $$ p(\theta|D^n) = \frac{p(x_n|\theta)(p(\theta|D^{n-1})}{\int p(x_n|\theta)p(\theta|D^{n-1})d\theta}$$

Next the authors state that $$p(\theta|D^{1}) \propto p(x|\theta)p(\theta|D^0)$$

So far so good. However they state this probability as $\frac{1}{\theta}$ for $4 \leq \theta \leq 10$ and $0$ otherwise and I'm not sure how this was computed.

From what I understand initially $p(\theta) = U(0, 10)$, so for $\theta$ in the range $[0, 10]$ the probability is $1/10$. Then since $x=4$ is in the range $[0, 10]$, the probability $p(x|\theta)$, for $\theta \in [0, 10]$ is $\frac{1}{\theta}$, so then why isn't the probability $p(\theta|D^{1})$ given by $\frac{1}{\theta^2}$ for $\theta \in [0, 10]$, and $0$ otherwise?

1

There are 1 best solutions below

1
On BEST ANSWER

I do not understand where your square would come from.

You have $p(\theta|D^{1}) \propto p(x_1|\theta)p(\theta|D^0)$.

On the one hand, $p(\theta|D^0)=\frac{1}{10}1_{\{0<\theta<10\}}$.

On the other hand, given $\theta\in(0,10)$, $x_1$ is uniformly distributed on $(0,\theta)$. Therefore $p(x_1\vert\theta)=\frac{1}{\theta}1_{\{0< x_1<\theta\}}$.

So $p(\theta|D^{1}) \propto \frac{1}{\theta}1_{\{0<x_1<\theta\}}1_{\{0<\theta<10\}}=\frac1\theta1_{\{0<x_1<\theta<10\}}$. In particular you have $p(\theta|D^{1}=\{4\}) \propto \frac{1}{\theta}1_{\{4<\theta<10\}}$.