Evidence Approximation -- Circular Logic?

44 Views Asked by Bumbble Comm At 29 Mar 2026 - 5:43

Background:

Suppose that I observe some data $\mathbf{y} = [y_{1}, \ldots, y_{N}]^{T}$ at specific time points $\mathbf{t} = [ t_{1}, \ldots, t_{N}]^{T}$. I am assuming that my data can be modeled as:

$$ y_{n} = \sum_{m=1}^{M} w_{m} \phi_{m}\left( t_{n} \right) + \epsilon_{n} \; \forall n$$

where $ \epsilon_{n} \sim \mathcal{N}\left(0, \beta^{-1} \right) $ are i.i.d.

If we assume that $ \mathbf{w} \sim \mathcal{N}(0, \alpha^{-1} \mathbf{I}) $, where $\mathbf{w} = [w_{1}, \ldots, w_{M}]^{T}$, then we can find $\mathbf{w}$ by maximizing the posterior distribution:

$$ p(\mathbf{w} | \mathbf{y}, \mathbf{t}, \alpha, \beta) \propto p(\mathbf{y} | \mathbf{t}, \mathbf{w}, \beta) p(\mathbf{w} | \alpha) $$

Doing the math, this gives us:

$$ p(\mathbf{w} | \mathbf{y}, \mathbf{t}, \alpha, \beta) = \mathcal{N}\left( \beta ( \alpha \mathbf{I} + \beta \Phi^{T} \Phi) \Phi^{T} \mathbf{y}, (\alpha \mathbf{I} + \beta \Phi^{T} \Phi)^{-1} \right)$$ where $\Phi_{m,n} = \phi_{m}(t_{n})$.

The Problem

In normal Bayesian inference, we would just pick an $\alpha$ and assume that we know $\beta$. However, using the 'evidence approximation', it is claimed that you can find both $\beta$ and $\alpha$ by maximizing:

$$ p(\mathbf{y} | \alpha, \beta) = \int p(\mathbf{y} | \mathbf{t}, \mathbf{w}, \beta) p(\mathbf{w} | \alpha) d\mathbf{w} $$

But my question is -- isn't that circular logic? Aren't we trying to find what we are assuming in the first place?

Assuming that we find these $\alpha$ and $\beta$ in this way, what will it do for us? Give us a better estimate of $\mathbf{w}$ via the posterior? But, how can we know that is actually a better estimate if we don't know what $\mathbf{w}$ is? Will choosing this $\mathbf{w}$ found using these optimal values minimize the mean square error more than if we used arbitrary values?

Also, suppose we find these parameter values -- what do they then represent? The "actual" prior and noise precision? Does it even make sense to say there is an "actual" prior?

In other words, what is the point of evidence approximation? Is it not circular logic? Where, when and why is it used?

Original Q&A

Evidence Approximation -- Circular Logic?

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions