Distribution distance ; Riemann Geometry; Information Geometry

95 Views Asked by Bumbble Comm At 09 Apr 2026 - 7:20

enter image description here

As the upper image shows, $M_1$ is a distribution whose parameter is θ, while $M_2$ corresponds to $θ'=θ-δg$. $δ$ means learning rate and $g$ means the gradient. $M_2$ is optimised from $M_1$ by gradient descent. $M^*$ is a given truely model and $M_1$ $M_2$ are used to fit the $M^*$. All of the distribution distance between these models ($a b c$) are constructed by KL divergence. What I want to know is whether exists $\hat{M}$ with parameter $\hat{θ}=θ-αδg$ ($α $ is the variable) could form an orthogonal relationship about $M^*$ to get a less distribution distance than $KL(M_2 || M^*)$ and at the same time onto the line between $M_1$ $M_2$.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 11 Mar 2023 - 4:06

Let $\,x\,$ be the distance from m2 to m^.

\begin{align*} c^2-(a-x)^2&=b^2-x^2\\ c^2&=(a^2-2ax-x^2)+b^2-x^2\\ 2ax&=a^2+b^2-c^2\\ x&=\dfrac{a^2+b^2-c^2}{2a} \end{align*}

Let d be the distance from m* to m^ $$d=\sqrt{b^2-x^2}=\sqrt{b^2-\bigg(\dfrac{a^2+b^2-c^2}{2a}\bigg)^2}$$

Distribution distance ; Riemann Geometry; Information Geometry

There are 1 best solutions below

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in RIEMANNIAN-GEOMETRY

Related Questions in PYTHAGOREAN-TRIPLES

Related Questions in INFORMATION-GEOMETRY

Trending Questions

Popular # Hahtags

Popular Questions