Derivative of general composed function with intermediate manifold
Let's assume we have a composite function $h(x) = f(g(x)),$ $$ h: X \rightarrow Z,\\ g: X \rightarrow Y,\\ f: Y \rightarrow Z, \\X \subset \mathbb{R}^n, Y \subset \mathbb{R}^m, Z \subset \mathbb{R}^k $$ Moreover, we know that $h$ is differentiable on $X$, $g$ is differentiable on $X$, but $f$ is only defined on a manifold $Y \subset \mathbb{R}^m$, such that for each point $y_0 \in Y: U(y_0) \not\subset Y$, where $U(y_0)$ is any open-ball in $\mathbb{R}^m$ containing $y_0$. For example, $Y=SO(3) \subset \mathbb{R}^{3 \times 3}$ - manifold of orthogonal matrices with determinant equal to 1.
The first question is: how do we need to extend the function $f$ to some smallest open superset $U(Y) := \overline{Y} \supset Y$ containing $Y$, i.e. $\overline{f}: \overline{Y} \rightarrow Z, \overline{f}|_{Y} = f$, such that: $\overline{f}(y)$ is differentiable on $Y$ and the chain rule holds? $$J_h(x_0) = J_{f \circ g}(x_0) = J_{\overline{f}}(g(x_0)) J_{g}(x_0)$$ (still, $\forall x_0 \in X, g(x_0) \in Y$, that is we need the open extension $\overline{Y}$ only for correctly defining differentiability of $\overline{f}$).
The second question: does the result of the chain rule depend on the extension (i.e. any differentiable extension works)? It was observed that the Jacobian $J_{\overline{f}}$ depends on extension, but it seems that the Jacobian of composition $J_h$ does not.
SO3 logarithmic map
The concrete example of functions is: $$h(x) = \log(R \cdot \exp(x)),\\ f(y) = \log(y),\\ g(x) = R \cdot \exp(x),$$ where $\exp$ is exponential map from Lie algebra (axis-angle representation of rotation) to Lie group (symmetric orthogonal matrices), i.e. $\exp: \mathbb{R}^3 \supset so(3) \rightarrow SO(3) \subset \mathbb{R}^{3 \times 3} \approx \mathbb{R}^9$,
$\log$ is logarithmic map (inverse of $\exp$), i.e. $\log: SO(3) \rightarrow so(3)$ (the hat and vee operators are implicitly in the $\exp$ and $\log$). So the problematic space $Y$ is the manifold $SO3$ in $\mathbb{R}^{3 \times 3}$, and we can extend log map there in different ways.
Experimentally $ \frac{\partial log}{\partial R}(R_0)$ depends on the implementation of log, giving different results for different implementations. At the same time we know the analytical expression of inverse right Jacobain: $$ \frac{\partial log(R_0 \boxplus \mathbf{x})}{\partial \mathbf{x}}(\mathbf{0}) = \frac{\partial log(R_0 \exp(\mathbf{x}))}{\partial \mathbf{x}}(\mathbf{0}) = J^{-1}_{right}(R_0) $$ On the other hand, we can compute this Jacobian via chain rule: $$ \frac{\partial log}{\partial R}(R_0) \cdot \frac{\partial (R_0 exp(\mathbf{x}))}{\partial \mathbf{x}}(\mathbf{0})$$ And it turns out that the product gives the same result, even though $ \frac{\partial log}{\partial R}(R_0) $ are different. We want to know if it is a known theoretical result?
Note: We know that the differentiability on the manifold is defined with respect to local charts ( as the linear map between tangent spaces, i.e. $J_h(0)$), but here we want to extend the definition of the derivative of the logarithmic map $f$ by extending function slightly over the manifold and apply the derivative in a numerical sense, like for multivariate functions defined on $\mathbb{R}^m \rightarrow \mathbb{R}^k$?
Simpler example would be: $$ g: (-\pi, \pi] \to \mathbb{S}^1, \\ f: \mathbb{S}^1 \to (-\pi, \pi] $$ where angle $\alpha \rightarrow (\cos(\alpha), \sin(\alpha))$, and the point on unit sphere is mapped back to angle, for example, by function $atan2(y, x) \rightarrow \alpha$. Here, the derivative of composite is 1, as it is just identity (or addition with some constant). And if we take the derivative of $atan2(y, x)$ and apply chain rule, we get the same result = 1.
Edit. The draft of the proof
Proposition: Given the functions $$ h: X \rightarrow Z, g: X \rightarrow Y, f: Y \rightarrow Z, \\ X \subset \mathbb{R}^n, Y \subset \mathbb{R}^m, Z \subset \mathbb{R}^k \\ h(x) = f(g(x)), \quad \forall x \in X \subset \mathbb{R}^n $$ Assume that functions $g$ and $h$ are differentiable on $X$, i.e.: $$ \exists J_g(a) \in \mathbb{R}^{m \times n}: g(a+\epsilon) - g(a) = J_g(a) \epsilon + \eta(\epsilon) \cdot \epsilon, \quad \lim\limits_{||\epsilon|| \to 0}||\eta(\epsilon)|| = 0, \epsilon \in \mathbb{R}^n, \eta(\epsilon) \in \mathbb{R}^{m \times n} \\ \exists J_h(a) \in \mathbb{R}^{k \times n}: h(a+\epsilon) - h(a) = J_h(a) \epsilon + \mu(\epsilon) \cdot \epsilon, \quad \lim\limits_{||\epsilon|| \to 0}||\mu(\epsilon)|| = 0, \epsilon \in \mathbb{R}^n, \mu(\epsilon) \in \mathbb{R}^{k \times n} \\ \text{for all } a \in X \subset \mathbb{R}^n, \text{ and } \epsilon \text{ close to the origin in } \mathbb{R}^n$$
The second function of composition $f: Y \rightarrow Z$ is defined on $Y$ - manifold (curve or sphere). We can extend $f$ onto open superset of $Y$, i.e. $\overline{Y} \supset Y, \overline{Y} \subset \mathbb{R}^m$, such that extension $\overline{f}$ is differentiable at every point of $Y$. Let's assume we have 2 differentiable extentions with different Jacobians: $$ \exists f_1: \overline{Y} \to Z, f_1 |_{Y} = f, \\ f_1(g(a) + \delta) - f(g(a)) = J_{f_1}(g(a)) \delta + \nu_1(\delta) \delta, \quad \lim\limits_{||\delta|| \to 0}||\nu_1(\delta)|| = 0 \\ \exists f_2: \overline{Y} \to Z, f_2 |_{Y} = f, \\ f_2(g(a) + \delta) - f(g(a)) = J_{f_2}(g(a)) \delta + \nu_2(\delta) \delta, \quad \lim\limits_{||\delta|| \to 0}||\nu_2(\delta)|| = 0 \\ J_{f_1}(g(a)) \neq J_{f_2}(g(a)) $$ We want to prove that: $$ J_{f_1}(g(a)) J_g(a) = J_{f_2}(g(a)) J_g(a) = J_h(a) $$ i.e. the result of the chain rule does not depend on the extension as long as the extension is differentiable.
Proof: $$ h(a + \epsilon) - h(a) \stackrel{\text{diff. of } h}{=} J_h(a) \epsilon + \mu(\epsilon) \epsilon = \\ \stackrel{\text{def composite}}{=} f(g(a + \epsilon)) - f(g(a)) = \\ \stackrel{\text{diff. of }g}{=} f(g(a) + J_g(a) \epsilon + \eta(\epsilon) \epsilon) - f(g(a)) = (*) \\ (*) \stackrel{\text{first extension}}{=} \overline{f}_1 (g(a) + \delta_{\epsilon}) - f(g(a)) = \\ = J_{f_1} (g(a)) \delta_{\epsilon} + \nu_1(\delta_{\epsilon}) \delta_{\epsilon} = \\ \stackrel{\delta_{\epsilon} = J_g(a) \epsilon + \eta(\epsilon) \epsilon}{=} J_{f_1} (g(a)) J_g(a) \epsilon + [J_{f_1}(g(a)) \eta(\epsilon) + \nu_1(\delta_{\epsilon}) J_g(a) + \nu_1(\delta_{\epsilon}) \eta(\epsilon)] \epsilon = \\ \stackrel{[\ldots] = o_1(\epsilon) \to 0}{=} J_{f_1} (g(a)) J_g(a) \epsilon + o_1(\epsilon) \epsilon, \\ (*) \stackrel{\text{second extension}}{=} \overline{f}_2 (g(a) + \delta_{\epsilon}) - f(g(a)) = \\ = J_{f_2} (g(a)) \delta_{\epsilon} + \nu_2(\delta_{\epsilon}) \delta_{\epsilon} = \\ \stackrel{\delta_{\epsilon} = J_g(a) \epsilon + \eta(\epsilon) \epsilon}{=} J_{f_2} (g(a)) J_g(a) \epsilon + [J_{f_2}(g(a)) \eta(\epsilon) + \nu_2(\delta_{\epsilon}) J_g(a) + \nu_2(\delta_{\epsilon}) \eta(\epsilon)] \epsilon = \\ \stackrel{[\ldots] = o_2(\epsilon) \to 0}{=} J_{f_2} (g(a)) J_g(a) \epsilon + o_2(\epsilon) \epsilon. $$
So at the end we have: $$ J_h(a) \epsilon + \mu(\epsilon) \epsilon = J_{f_1} (g(a)) J_g(a) \epsilon + o_1(\epsilon) \epsilon = J_{f_2} (g(a)) J_g(a) \epsilon + o_2(\epsilon) \epsilon, \\\text{ where } \lim\limits_{||\epsilon|| \to 0}||o_1(\epsilon)|| = 0, \lim\limits_{||\epsilon|| \to 0}||o_2(\epsilon)|| = 0 $$ Since the equality holds for arbitrary $\epsilon$ around origin in $\mathbb{R}^n$, then the linear coefficients are equal, i.e. $$ J_{f_1} (g(a)) J_g(a) = J_{f_2} (g(a)) J_g(a) = J_h(a) $$ It ends the proof.