First of all, sorry if the question is trivial or not clear. I tried to be as clear as possible, but I'm a beginner in information geometry, with a background in physics, self-studying information geometry (mainly through Amari's new book "Information Geometry and it's applications").
Let $P = (p_1, ..., p_n), Q = (q_1, ..., q_n) \in \mathbb{R}^n$, with $\sum_i^n p_i = \sum_i^n q_i = 1$, be points in the (n-1)-Simplex $\mathcal{N}$. Given a divergence $D(P,Q)$ which equipes $\mathcal{N}$ with a Riemannian metric, Information monotonicity can be defined as
$$ \tag{1} D({\bar P}, {\bar Q}) \leq D(P,Q), $$
where ${\bar P} = ({\bar p}_1, ..., {\bar p}_m), {\bar Q} = ({\bar q}_1, ..., {\bar q}_m) \in \mathbb{R}^m, m < n,$ are points in the (m-1)-Simplex $\mathcal{M}$. The mapping ${\bar p}_A = \sum_{p_i \in \Gamma_A} p_i, A = \{1,...,m\}, \Gamma_A = \{p_j\}$ (a subset of $P$), $\Gamma_A \cap \Gamma_B = \emptyset$ for $A \neq B$, defines ${\bar P}$ and ${\bar Q}$. This coarse-graining mapping, together with the reverse mapping $p_i = \sum_A^m r_{iA}(P) {\bar p}_A$, with $\sum_i^n r_{iA}(P) = 1$, defines a Markov embedding. When $r_{iA}(P) = r_{iA}$ is independent of $P$, equality holds in $(1)$. If the embedding is done trough the tensor product with another probability distribution $R$ (a special case of Markov embedding), we can write
$$ D({\bar P} \otimes R, {\bar Q} \otimes R) = D({\bar P}, {\bar Q}) + D(R, R). $$
However, one of the main properties of information measures, such as the Kullback-Leibler divergence (KLD)
$$ D(P,Q) = \sum_i p_i log \frac{p_i}{q_i}, $$
is the additivity property
$$ \tag{2} D({\bar P} \otimes R, {\bar Q} \otimes W) = D({\bar P}, {\bar Q}) + D(R, W). $$
For the KLD, this property holds for any independent $R$ and $W$ (i.e. $r_{iA}(P) = r_{iA}$). This is not true for other divergences which obey information monotonicity $(1)$, such as $\alpha$-divergences
$$ D_\alpha(P,Q) = \frac{4}{1 -\alpha^2} \left( 1 - \sum_i p_i^{\frac{1-\alpha}{2}} q_i^{\frac{1+\alpha}{2}} \right), \alpha \neq \pm 1. $$
An important question related to the title would be: Does the classical additivity property $(2)$ imply information monotonicity?
One possibility to understand the geometry of this classical additivity property is to use Generalized Pythagorean Theorem (GPT):
$$ D(P, Q) = D(P, R) + D(R, Q). $$
First, the GPT defines an additivity property between the geodesics $D(P,R)$ and $D(R,Q)$ when they are orthogonal at $R$ and when the manifold is dually flat (both geodesics have constant curvature). Second, the KLD has been shown to be the only (decomposable) divergence which obeys information monotonicity and at the same time is dually flat.
Even thought it is a similar property, it is different in the sense that $P$, $Q$ and $R$ are in the same manifold, while in the classical additivity property, $P = {\bar P} \otimes R$ is in the manifold $\mathcal{N}$ and ${\bar P}$ is in the submanifold $\mathcal{M}$.
I tried to use the Jacobian that maps the unit vectors from $\mathcal{N}$ to $\mathcal{M}$, the GPT and the Projection theorem to show that the additivity property $(2)$ will hold only for dually flat manifolds, but it did not work.
Is this the right way to proceed? Does the classical additivity property of information measures require the manifold to be dually flat? For instance, other non-decomposable divergences such as the Rényi divergence have the additivity property. Does that imply that they define a dually flat manifold? If not, how can I understand the classical additivity property using Information Geometry?