Given two fixed distribution as $p_\theta,p_{\theta^*}$. In the neighborhood of $\theta$, there exists a small region $\epsilon$, which forms the distribution $p_{\theta+\epsilon}$.
These three distributions meets the following conditions:
$D_{TV}(p_\theta||p_{\theta+\epsilon}) + D_{TV}(p_{\theta+\epsilon}|| p_{\theta^*}) \ge D_{TV}(p_\theta || p_{\theta^*})$
I would like to solve the following optimization problem :
$\arg \max_\epsilon D_{TV}(p_\theta||p_{\theta+\epsilon}) + D_{TV}(p_{\theta+\epsilon}|| p_{\theta^*}) - D_{TV}(p_\theta || p_{\theta^*}) $
How could I do ? Any information would be appreciate for me !!
The total variation distance is not generally easy to differentiate with respect to distribution parameters, and the expressions for $p_θ$, $p_{θ+ε}$, and $p_{θ^*}$ are not given in your question. Additionally, the total variation distance has a triangle inequality property, which makes it challenging to solve this optimization problem analytically. You can approach this problem using numerical methods or optimization techniques, such as gradient-based optimization or stochastic optimization algorithms (e.g., stochastic gradient descent or a genetic algorithm), depending on the specific functional forms of the distributions and their parameters. I need more information about the specific distributions you are working with and any constraints on $ε$ to help you further.
Updated
In the case of Gaussian distributions, the total variation distance have no closed-form expression. Let's assume $p_θ$, $p_{θ^*}$, and $p_{θ+ε}$ are all Gaussian distributions with means $μ_θ$, $μ_{θ^*}$, and $μ_{θ+ε}$ and variances $σ^2_θ$, $σ^2_{θ^*}$, and $σ^2_{θ+ε}$, respectively.
The total variation distance between two Gaussian distributions p and q with means $μ_p$, $μ_q$ and variances $σ^2_p$, $σ^2_q$ is given by:
$ D_{TV}(p || q) = 1/2 ∫ |p(x) - q(x)| dx$
The derivative of the total variation distance between Gaussian distributions with respect to their means and variances does not have a simple closed-form expression.
In your case, since $ε$ is a small perturbation and the update rule is based on gradient descent, it may be more practical to perform numerical optimization or use a gradient-based approach for estimating the optimal $ε$. You can estimate the gradients using finite differences or automatic differentiation libraries, and then update the parameters iteratively.
Keep in mind that the optimization problem might not have a unique solution or might be sensitive to the choice of the initial $ε$ value. However, by carefully selecting the step size and the optimization algorithm, you can still find a good approximation for the optimal $ε$ value.