Basis vectors in information geometry

265 Views Asked by At

A basis vector in the tangent space at a point of a smooth manifold is given by a differential operator such as $\partial_{i}=\cfrac{\partial }{\partial \xi^{i}}$. On the other hand, in a statistical manifold consisting of probability distributions $p(x; \pmb{\xi})$ the basis vectors can be considered to be $\partial_{i} \log{p}$ which are not differential operators.

Vector fields are defined to be linear maps that satisfy the product rule. How come $X^{i}\partial_{i}\log{p}$ is considered to be a vector field on a statistical manifold? How do we reconcile the defining properties of a vector field with the choice of $\partial_{i} \log{p}$ as basis vectors?

1

There are 1 best solutions below

5
On BEST ANSWER

I'm not at all an expert, but here's what I think I've understood from a little (not-so) light reading---correct me if I've gotten anything wrong.

Let $(X,\mu)$ be a $\sigma$-finite measure space and (by Jensen's inequality) let $$ \log\operatorname{PDF}(X,\mu) = \left\{c \in L^1(X,\mu) \mid \text{$c(x) \in \mathbb{R}$ for $\mu$-a.e. $x \in X$},\; \int_X \exp c \, d\mu = 1 \right\}, $$ which is bijective in the obvious way with the set of all $\mu$-a.e. positive PDFs on $(X,\mu)$. Then an $n$-dimensional statistical manifold is a subset $S$ of $\log\operatorname{PDF}(X,\mu)$ with the structure of an $n$-dimensional smooth manifold defined by a $C^\infty$ atlas $\{(\Xi_k,\ell_k)\}$, such that for every local chart $\ell_k : \Xi_k \to S$ (which you should think of as the pointwise logarithm $\ell_k = \log p_k$ of a map $p = \exp \ell$ from $\Xi_k \subset \mathbb{R}^n$ to the set of all $\mu$-a.e. positive PDFs on $(X,\mu)$),

  1. the map $\ell_k$ is $C^\infty$ as a map $\Xi_k \to L^1(X,\mu)$, and
  2. if $f : \Xi_k \to L^1(X,\mu)$ is a (pointwise) polynomial in $\ell_k$, $\exp \ell_k$, and any finite number of (higher) partial derivatives of $\ell_k$, then for every $\{i_1,\dotsc,i_n\} \subset \mathbb{N}\}$, $\tfrac{\partial^n}{\partial \xi^{i_1} \cdots \partial \xi^{i_n}}\int_X f(x,\cdot)\,d\mu(x) = \int_X \tfrac{\partial^n}{\partial \xi^{i_1} \cdots \xi^{i_n}}f(x,\cdot)\,d\mu(x)$.

However, what seems to get assumed but swept under the rug is that one usually wants a strengthened version of condition 1., namely, that each $\ell_k$ actually defines a smooth embedding of the open subset $\Xi_k$ of $\mathbb{R}^n$ into the Banach space $L^1(X,\mu)$, viewed trivially as a Banach manifold modelled on $L^1(X,\mu)$. This, then, implies that the inclusion of $S$ into $L^1(X,\mu)$ is itself a smooth embedding, i.e., that $S$ really is an $n$-dimensional submanifold of $L^1(X,\mu)$. Thus, as far as I can tell, you should be able to rewrite this slightly strengthened definition of $n$-dimensional statistical manifold in the following way:

An $n$-dimensional statistical manifold is a smooth $n$-manifold $S$ together with a smooth embedding $\ell : S \to L^1(X,\mu)$ for some $\sigma$-finite measure space $(X,\mu)$, such that:

  1. for all $s \in S$, the measurable function $\exp \ell(s)$ defines a $\mu$-almost everywhere positive PDF on $(X,\mu)$, so that $\exp \ell(s)(x) > 0$ for $\mu$-almost every $x \in X$ and $\int_X \exp\ell(s) \,d\mu = 1$;
  2. if $f : S \to L^1(X,\mu)$ is a (pointwise) polynomial in $\ell$, $\exp\ell$, and some finite number of (iterated) directional derivatives of $\ell$, then for every $n \in \mathbb{N}$ and every set of $n$ vector fields $a_1,\dotsc,a_n$ on $S$, we have $\int_X a_1 \cdots a_n(f)(x,\cdot)\,d\mu(x) = a_1 \cdots a_n\left(\int_X f(x,\cdot)\,d\mu(x)\right)$.

Now, given the trivial Banach manifold structure on $L^1(X,\mu)$, one can canonically identify $T_\phi L^1(X,\mu)$ with $L^1(X,\mu)$ for any $\phi \in L^1(X,\mu)$ (just as one identifies $T_v \mathbb{R}^n$ with $\mathbb{R}^n$ for all $v \in \mathbb{R}^n$), so that the derivative $d\ell : TS \to TL^1(X,\mu)$ can be identified with a $L^1(X,\mu)$-valued $1$-form on $S$, allowing one, for instance, to define the Fisher metric as $$ \forall a, b \in \Gamma(TS), \; \forall s \in S \quad g(a,b)_s := \int_X d\ell(a)(s) d\ell(b)(s) \exp\ell(s) \, d\mu $$ the moment that $\ell$ is sufficiently well-behaved for $g(a,b)_s$ to always converge. In terms of a local chart $\ell_k : \Xi_k \to S \subset L^1(X,\mu)$ as in the first main paragraph, for each $\xi \in \Xi$, this simply boils down to $$ d(\ell_k)_\xi : \mathbb{R}^n \cong T_\xi \Xi \to T_{\ell_k(\xi)} S \subset T_{\ell_k(\xi)} L^1(X,\mu) \cong L^1(X,\mu), \quad v \mapsto v^i \partial_i \ell_k(\cdot,\xi); $$ in particular, we see that the partial derivatives $\partial_i \ell_k(\cdot,\xi)$ do live in $T_{\ell_k(\xi)}S$ when viewed as a real linear subspace of $L^1(X,\mu)$.