I understand the computation the Jeffreys prior, and also its historical motivation. I (somewhat) understand the theoretical desirability of a "prior-construction principle/method" that is invariant under monotone parameter transformation, and non-informative.
However, I am having difficulties translating this to how we can use it in real applications. For example, in an unfair coin with unknown heads-probability $\theta$, why should I use Jeffreys prior, which is a $Beta(\frac{1}{2}, \frac{1}{2})$ distribution favouring the extremes of either heads or tails?
It seems to me that, in order to satisfy this theoretical need for parameter-invariance, we say that the "best" prior assumption is that the coin is either entirely heads-biased or tails-biased? Why is this the "best"?
My guess at an intuition: Perhaps the answer lies in the Fisher information and the concept of "non-informativeness"? In the sense that if our prior for $\theta$ is high at the extremes, then any incoming data will give us more information about $\theta$.
For instance, if given some prior where $p(\theta)$ is high around $0.1$, then a Heads will drastically change our posterior. But if another prior has $p(\theta)$ high around 0.5, then a Heads won't change the posterior as much. As the latter posterior offers less "information" per data point, we would prefer a model where $p(0.1)>p(0.5)$ (which is something the Jeffreys prior satisfies) in order to maximise the information we bring in from the data.
Is the above a reasonable intuition? Does anyone have a better explanation?