Question about $\alpha-$affine manifold of a statistical model

41 Views Asked by At

I am reading "Methods of information geometry by shun-ichi-Amari" and I am finding it difficult to understand the following ,first let me define it,

Let $S=\{p_{\xi}/\xi\in E\}$ where $E=[\xi^1,..,\xi^n]\subseteq \mathbb{R^n}$ be a subset of the following set
$$P(X)=\{p:X\to\mathbb{R}/p(x)>0,\int p(x)dx<\infty\}$$ where $X$ is a sample space. Let for each $\alpha\in \mathbb{R}$ define the following: \begin{equation} L^{(\alpha)}(u) = \begin{cases} \frac{2}{1-\alpha}u^{\frac{1-\alpha}{2}} & :\alpha\neq1\\ log(u) & :\alpha=1\\ \end{cases} \end{equation} $$l^{(\alpha)}(x;\xi)=L^{(\alpha)}(p(x;\xi))$$

if for a fixed $\alpha$ and for some coordinate system $[\theta^i]$ $$\partial_{i}\partial_{j}l^{(\alpha)}(x;\theta)=0$$ where $\partial_{i}=\frac{\partial}{\partial\theta^i}$.Then we say that $S$ is $\alpha-$flat and we call such $S$ an $\alpha-$affine manifold.
This is equivalent to the existence of the functions $\{C,F_1,..,F_n\}$ on $X$ such that $$l^{(\alpha)}(x;\theta)=C(x)+\sum_{i=1}^{n}\theta^{i}F_{i}(x).$$

Example: Consider $X$ is a finite set $\{x_1,..,x_n\}$.Let $F_{i}:X\to\mathbb{R}$ be the function defined by $F_{i}(x_j)=\delta_{ij}$ for $i,j=1,2,...,n.$ Then for each $p\in P(X)$ using the independent parameters $\theta^1,...,\theta^n$,we have $L^{(\alpha)}(p(x))=\theta^{i}F_{i}(x)$ here $\theta^{i}=L^{(\alpha)}(p(x_i))$. Therefoe $P(X)$ is an $\alpha-$affine manifold for every $\alpha\in \mathbb{R}$.

Can someone explain the above example,how it is defined,I am confused.All I know it that if $X$ is finite and $E=\{[\xi^1,..,\xi^n],\xi^i>0,\sum_{i=1}^{n}\xi^1<1\}$

\begin{equation} p(x_{i},\xi) = \begin{cases} \xi^i & :1\leq i\leq n\\ 1-\sum_{i=1}^{n}\xi^i & :i=0\\ \end{cases} \end{equation}

1

There are 1 best solutions below

14
On BEST ANSWER

In the example, By definition $\theta^i = L^{(\alpha)}(p(x_i))$. Now we have $L^{(\alpha)}(p(x_i)) = \theta^i \rightarrow L^{(\alpha)}(\zeta^i) = \theta^i$. If $\alpha = 1$ then $\log(\zeta^i) = \theta^i$ is the change is co-ordinate system from $\zeta$ to $\theta$. Note that the definition says the equations hold after possibly a change in co-ordinate system from $\zeta$ to $\theta$. So transformations (even non-linear) are perfectly fine as long as its a $1-1$ map between $\zeta$ and $\theta$.

The authors are trying to convey that after transforming $\{\zeta^i: i \in [n]\}$ to $\{\theta^i: i \in [n]\}$ by some 1-1 functions you should be able to write $L^{\alpha}(p(x,\zeta))$ as linear function in $\theta$. An example $f(\zeta) =e^\zeta$ becomes linear after the transformation $e^{\zeta} = \theta$. Now $f(\log(\theta)) = \theta$ (linear).Now your new function is $f(\log(.))$ and its linear.