Injectivity condition of parametrised statistical manifold

60 Views Asked by At

I am currently learning information geometry by following this note. It starts by defining the set of probability distribution functions $$ S=\{p_\xi=p(x,\xi)\mid \xi=(\xi_1,\ldots, \xi_n)\in O\subset \mathbb{R}^n\}, $$ where $p(x,\xi)$ is a probability distribution function on space $\Omega$. It calls $O$ the parameter space, and requires $\xi\mapsto p_\xi$ to be injective, and $O$ is open.

However, I find the injectivity condition is very strong for many nonlinear models. For example, let us consider the following softmax models: $$ S=\left\{p_\xi(i)=\frac{e^{\xi_i}}{\sum^n_j e^{\xi_j}}, i=1,\ldots, n\mid \xi=(\xi_1,\ldots, \xi_n)\in \mathbb{R}^n\right\}. $$ Even though $p_\xi(i)=\exp\left(\xi_i-\log(\sum^n_j e^{\xi_j})\right)$ is of the exponential form, the map $\xi\mapsto p_\xi$ is not injective. Indeed, $p_{\xi}=p_{\tilde{\xi}}$ if $\xi=\tilde{\xi}+C\boldsymbol{1}$ for any $C\in \mathbb{R}$, where $\boldsymbol{1}$ denotes the vector with all entries being $1$. Similarly, for many neural networks, the parameterisation is noninjective. However the optimisation problem $\inf_{p\in S}J(p)$ can still be well-defined, in the sense that there exists a unique minimiser $p^\star\in S$ (associated with many parameterisations).

Question. May I know whether this injectivity condition is essential to apply the geometric techniques to study the set $S$? More precisely, can we still equip $S$ with the well-known Fisher-information metric, and make $S$ a Riemannian manifold?

If the injectivity is crucial, does it mean the information geometry is not suitable to study optimisation over softmax models, or more general neural network models? I am not sure whether there is a well-known technique to address this difficulty.