Comparing uniform priors

221 Views Asked by At

The background of the problem is this:

Assume that we have a parameter vector $\Theta$ which satisfies $\Theta^\prime\Theta=1$. If we let this vector have the uniform prior, the density of the prior is $$ \pi(\Theta)=\frac{\Gamma(q/2)}{(2\pi)^{q/2}} $$ where $q$ is the dimension of $\Theta$. Now, we have a set of possible models in the set $\mathcal{M}=\{2, \dots, k\}$, where each (integer) $i\in\mathcal{M}$ corresponds to the dimension of $\Theta_i$.

What I am supposed to answer is how this prior behaves when comparing dimensions $q+1$ to $q$. A hint I got was to look at what happens to the ratio of a priori probabilities when $q$ increases from for instance 2 to 10 (actually, he said when $k$ increases, but I assume he means $q$). Doing this yields: $$ \frac{\pi(\Theta_{10})}{\pi(\Theta_{2})}=\frac{\frac{\Gamma(5)}{(2\pi)^5}}{\frac{1}{2\pi}}=\frac{\Gamma(5)}{(2\pi)^4} $$

but what on earth does this tell me?

Also, I think I found something interesting here on page 3. More specifically, this:

This becomes more problematic in higher dimensions: the uniform prior in large dimension does not integrate anymore. [...] it tells that most of the probability mass lies at $+\infty$.

This sounds very interesting, but is not expanded on as far as I can tell. How do I show this? I'm guessing that is what I need to show (even though my vector of uniform distributions is restricted to length 1). Furthermore, the next sentence is very interesting too as I in another exercise am to do the same thing but with standard normal priors;

If instead one considers a high-dimensional Gaussian distribution $X\sim N (0,1)$, most of the mass is concentrated in a (high dimensional) unit sphere centered at the origin.

So, can anyone shed some light on this issue? I'm a bit lost, and would appreciate help!

2

There are 2 best solutions below

1
On BEST ANSWER

I've had trouble working out what the question is, but I think it might be getting a the great circle problem, or, as it is more generally known the Borel-Kolmogorov paradox.

Basically, if you naively "cut through" a uniform density on an $n$-sphere to get an $n-1$ sphere and renormalise, you end up getting different results depending on your parameters. The problem stems from the $n$-sphere being a measure zero subset of the $n+1$-sphere. The solution, as presented by Jaynes, is making sure that you take a limit, and are clear about how to do it.

Note:

Something related to the measurability: If you gave units to your densities, would comparing them make so much sense?

Answer 2

I was going to suggest this before, but thought the other more likely. It's a classic problem for physicists (I don't know why) - I know it as the "infinite orange", but other names are the "infinite onion" and hyper-dimensional Christmas present.

The point it makes is the infinite orange is made only of skin, somewhat unintuitive (only=almost everywhere). In this case, it's saying that as you increase the number of dimensions the chance of drawing a point that is close (by some measure) to the surface increases. It's easy to show a that if you have an orange of radius $r$, and skin of thickness $\epsilon$, that the eddible inside is:

$$V_\text{inside} = \frac{\pi^{n/2}}{\Gamma(1+n/2)}(r-\epsilon)^n$$

and the volume which is skin is

$$V_\text{skin} = \frac{\pi^{n/2}}{\Gamma(1+n/2)}[r^n - (r-\epsilon)^n]$$

so the ratio of delicious, juicy, inside to yucky skin is:

$$\frac{V_\text{skin}}{V_\text{inside}} = \frac{r^n - (r-\epsilon)^n}{(r-\epsilon)^n} = \left(\frac{r}{r-\epsilon}\right)^n - 1 = \frac{1}{f}$$

(with $f$ from the book). For any $\epsilon$ in $(0,1)$ the limit as $n \rightarrow \infty$ is $\infty$, i.e. it is all skin.

The way of thinking about this probabilistically is simpler if we consider a $n$-cube (for which the limit above also holds). For uniform distributions, a point sampled from the unit $n$-cube can be thought of as $n$ samples from the unit interval. i.e.

$$[0,1]^n = [0,1]\times[0,1]\times ... [0,1] \;\;\;\text{$n$ times}$$

Then, lying within $\epsilon$ from the edge of the cube means that at least one of the samples from the $n$ samples from unit interval is not in $[\epsilon, 1-\epsilon]$. Clearly, as $n$ goes to infinity, at least one sample will be outside of $[\epsilon, 1-\epsilon]$, almost surely.

More probabilistically, the chance of being away from the edge is, $p_\epsilon$, is $1-2\epsilon$, so the probability of at least one lying on the edge after $n$ samples is:

$$Pr(\text{on edge}) = 1-p_\epsilon^n$$

which is $1$ in the limit of $n \rightarrow \infty$.

5
On

The solution apparently goes something like this.

If you divide two of these uniform priors, you get a ratio that depends on $q$:

$$ \frac{\pi(\Theta_{q+1})}{\pi(\Theta_q)}=\frac{\frac{\Gamma((q+1)/2)}{(2\pi)^{\frac{q+1}{2}}}}{\frac{\Gamma(q/2)}{(2\pi)^{q/2}}}=\frac{\Gamma((q+1)/2)}{\sqrt{2\pi}\Gamma(q/2)} $$

This is, of course, fairly reasonable since we have the restriction $\Theta^\prime\Theta=1$; thus, the individual elements in the vector are not independent. This is basically what we're supposed to arrive at -- the comparison does not only depend on the difference of the dimensions in the comparison, but also on the levels of the dimensions.

This can then be compared to the case where we do not have any restrictions on $\Theta^\prime\Theta$ and instead let each element in the vector be iid $N(0,1)$. In this case, we get

$$ \frac{\pi(\Theta_{q+1})}{\pi(\Theta_q)}=\frac{\prod_{i=1}^{q+1}p(\theta_i)}{\prod_{i=1}^{q}p(\theta_i)}=p(\theta_{q+1})=(2\pi)^{-1/2}e^{-\theta_i^2/2} $$ which clearly does not depend on the level of the dimensions, but simply the difference.