Hessian of a Probability Function with Measure Theory

56 Views Asked by At

I'm working on an optimization of a probability, and I'd like to find the extrema and Hessian as SPD. Further, I'm interested in doing this a probability measure. The difficulty for me comes with the fact that the domain is dynamic with respect to the gradient.

Let $\epsilon > 0$ and $(\mathbb{R}_+, \mathcal{F}, \mu)$ be a probability space with a finite mean. Let $\Theta = \{\theta_m\}^M$ be a set of angles where in $\theta_m \in (0,\pi)$, $\theta_1 \geq \ldots \geq \theta_M$, and $\sum_{m}{\theta_m}$. The set $\Theta^* = \{\theta^*_m = \frac{\pi}{M}\}^M$ minimizes \begin{equation} P(A(\Theta^*)) = \frac{1}{\pi}\sum_{m}{\int^{\infty}_{a_m}{\left(\theta_m - 2\sin^{-1}\left(\frac{\epsilon}{\rho}\right)\right)d\mu(\rho)}}, \end{equation} where $a_m = \epsilon\csc\left(\frac{\theta_m}{2}\right)$ and $a_* = \epsilon\csc\left(\frac{\pi}{M}\right).$

The probability is really just a symmetric probability measure over a sum of cones.

Let $f_m(\rho) = \theta_m - 2\sin^{-1}\left(\frac{\epsilon}{\rho}\right).$ Because of the bounds on $\theta_m$, $f$ is integrable. I understand Liebniz Integral Theorem, but I'm now trying to understand the Lebesgue Differentiation Theorem. Because of the constraint $\pi = \sum_{m}{\theta_m},$ we can treat $\theta_M = \pi - \sum^{M-1}_{m}{\theta_m}$ as a function of the other angles and calculate the gradient over $m = 1,\ldots,M-1$. The gradient is given by \begin{align} \frac{\partial}{\partial \theta_n}P(A) &= \frac{1}{\pi}\sum_{m}{-f_m(a_m)g(a_m)\left(\frac{\partial a_m}{\partial \theta_n}\right) + \int^{\infty}_{a_m}{\frac{\partial}{\partial \theta_n}f_m(\rho)d\mu(\rho)}}\\ &= \frac{1}{\pi}\left(\int^{\infty}_{a_n}{d\mu} - \int^{\infty}_{a_M}{d\mu}\right)\\ &= \frac{1}{\pi}\int^{a_M}_{a_n}{d\mu}, \end{align} where $g$ is the density function of $\mu.$ The first line comes from LDT of $-\int{\chi_{\rho > a_m}(\rho)f_m(\rho)d\mu(\rho)}$, where $\chi$ is the characteristic function. The first simplification comes from $f_m(a_m) = 0$, i.e. the probability at the vertex of the cone is 0. If $\theta_m = \frac{\pi}{M}$ for all $m$, then $\nabla P = 0$. A sufficient but not necessary condition. Thus, $\Theta^*$ is an extrema. The gradient makes intuitive sense, the gradient changes increases and decreases by the difference in angles and the minimum angle.

To prove it's a minimum, I want to show $H_{A}$ is SPD at $\Phi^*$. So I calculate the Hessian as \begin{align} \pi\frac{\partial^2}{\partial\theta_m\partial\theta_n}P(A) &= g(a_M)\left(\frac{\partial a_M}{\partial a_m}\right) - g(a_n)\left(\frac{\partial a_n}{\partial a_m}\right)\\ &= g(a_M)a_M\cot\left(\frac{\theta_M}{2}\right) + \delta_{mn}g(a_m)a_m\cot\left(\frac{\theta_m}{2}\right)\\ &= \gamma_Mg(a_M) + \delta_{mn}\gamma_mg(a_m) \end{align} where $\gamma_m = a_m\cot\left(\frac{\theta_m}{2}\right)$. Thus the Hessian is composed of a constant matrix plus a diagonal matrix. Since $\theta_m\in (0,\pi)$, then $\gamma_m > 0$. Since $g \geq 0$, then $\gamma_m g(a_m) \geq 0$ for all $m$. Thus, the Hessian is the summation of two SPD's and is itself SPD.

Here's where I'm confused and I feel like I missed something. The second order derivatives should be continuous almost everywhere. But how do I deal with the case where $g$ has compact support up to $\mathcal{B}_{a_*}$? Don't I need to be looking this from the perspective of open sets around $a_*$ to see if the local neighborhood would result in increases?