What's the problem with log of sum, as opposed to sum of logs in probability theory?

1.3k Views Asked by At

I'm reading about variational inference and expectation maximization, I often see this assertion without proof or explanation:

We have no problem with maximizing sum of logs as in

$\max_\theta \sum{\log{p(x_i, z_i)}}$

But we do have trouble with maximizing log of sum, as when $p(x_i,z_i)$ is marginalized to $p(x_i)$ using law of total probability, and as such you get a sum inside the log. $\max_\theta \sum{\log{\sum_z{p(x_i, z)}}}$

Why is that?

2

There are 2 best solutions below

3
On

A sum of logs is easily simplified as the log of the product, i.e.

$$\sum_{i=1}^n \log a_n = \log\left(\prod_{i=1}^n a_n\right).$$

There is no such conversion for the log of a sum.

1
On

Logarithms are strictly increasing functions and assuming that the logarithm is a real number, i.e. the sum is positive (or at least non-negative, as it should be if this is a sum of probabilities or densities or likelihoods or similar),

then the maximum of the logarithm of a sum is equal to the logarithm of the maximum of the sum and occurs for the same value of the variables

So $$\max_\theta {\log{\sum_z{p(x_i, z)}}} = \log \left(\max_\theta {{\sum_z{p(x_i, z)}}}\right)$$ is as easy to find as finding the maximum of $\sum\limits_z{p(x_i, z)}$