I have been looking through the tutorials on the Bayesian Methods in Tensorflow Probability using the tutorial section here.
I was looking through Chapter 3 that is written in Python using Tensorflow Probability. In the following chapter the author defines the bayesian mixture model of two Gaussians as the joint log probability. The joint log probability is the sum of the log probabilities from all of the prior and conditional distributions. (We take the sum of log probabilities instead of multiplying the probabilities directly for reasons of numerical stability: floating point numbers in computers cannot represent the very small values necessary to calculate the joint log probability unless they are in log space.) The sum of probabilities is actually an unnormalized density; although the total sum of probabilities over all possible inputs might not sum to one, the sum of probabilities is proportional to the true probability density.
However, it seems like the log probability is defined incorrectly.
From the code here :
def joint_log_prob(data_, sample_prob_1, sample_centers, sample_sds):
"""
Joint log probability optimization function.
Args:
data: tensor array representation of original data
sample_prob_1: Scalar representing probability (out of 1.0) of assignment
being 0
sample_sds: 2d vector containing standard deviations for both normal dists
in model
sample_centers: 2d vector containing centers for both normal dists in model
Returns:
Joint log probability optimization function.
"""
### Create a mixture of two scalar Gaussians:
rv_prob = tfd.Uniform(name='rv_prob', low=0., high=1.)
sample_prob_2 = 1. - sample_prob_1
rv_assignments = tfd.Categorical(probs=tf.stack([sample_prob_1, sample_prob_2]))
rv_sds = tfd.Uniform(name="rv_sds", low=[0., 0.], high=[100., 100.])
rv_centers = tfd.Normal(name="rv_centers", loc=[120., 190.], scale=[10., 10.])
rv_observations = tfd.MixtureSameFamily(
mixture_distribution=rv_assignments,
components_distribution=tfd.Normal(
loc=sample_centers, # One for each component.
scale=sample_sds)) # And same here.
return (
rv_prob.log_prob(sample_prob_1)
+ rv_prob.log_prob(sample_prob_2) #Why can we just sum them up
+ tf.reduce_sum(rv_observations.log_prob(data_)) # Sum over samples.
+ tf.reduce_sum(rv_centers.log_prob(sample_centers)) # Sum over components.
+ tf.reduce_sum(rv_sds.log_prob(sample_sds)) # Sum over components.
)
However, mathematically it is supposed to be the log of the following expression like it's defined here $$\pi(\boldsymbol{\alpha},\boldsymbol{\theta}| y) \propto \pi(\boldsymbol{\alpha}) \pi(\boldsymbol{\theta}) \sum_{k=1}^{K}\theta_{k}\pi_{k}(y|\alpha_{k})$$
We all know that the log of sum is not equal to the sum of logs, hence I cannot see how this is implemented as code in the last line (return statement).
Can you please explain me why it is correct or not?
The term
rv_observations.log_prob(...)is the log of the mixture's density, which is the sum term you've written in mathematical notation. That part is ok (look at the log_prob definition for MixtureSameFamily to find the sum, implemented as a logsumexp over the (mixture-weighted) component distribution log_probs).However, I think there is a bug in the code, which is that it's including prior density on both of the categorical mixture parameters (
sample_prob_1andsample_prob_2) but these are not independent; one is fixed by normalization. The expression should readalthough i'd personally reorder this to put the observations last...