Find the copula to minimize KL-divergence

54 Views Asked by At

Consider the set of all N-dimensional copulas denoted by $\mathcal{C}^N$. Notice that each copula $C$ is a joint distribution on the state space $[0,1]^N$ with uniform marginal distributions. Now consider an arbitrary joint distribution P on $[0,1]^N$, whose marginals are not necessarily uniform.

Question: What is the solution to $$\min_{C\in \mathcal{C}^N}D_{KL}(C ~||~P),$$ where $D_{KL}(C ~||~P)$ is the KL-divergence/relative entropy from $P$ to $C$? See wiki page here for the definition.

My intuition/conjecture is that the minimum is achieved when $C$ is exactly the copula of $P$, but I cannot prove it or disprove it.

Many thanks to any answer or suggestion!

1

There are 1 best solutions below

4
On BEST ANSWER

Long comment. Assuming $P$ has positive density $p$ which is $N$-times continuously differentiable, a heuristic computation shows that the minimizer $C_*$ has the density $c_*$ satisfying

$$ \frac{\partial^N}{\partial x_1\cdots\partial x_N}\log\frac{c_*(x)}{p(x)}=0. \tag{*} $$

Solving this, we get the density of the form

$$ c_*(x_1,\ldots,x_N)=p(x_1,\ldots,x_N)\prod_{i=1}^{N}g_i(x_1,\ldots,\hat{x}_i,\ldots,x_N), $$

where each $$g_i(x_1,\ldots,\hat{x}_i,\ldots,x_N)$$ is a positive function of $(N-1)$ variables $x_1, \ldots,x_{i-1},x_{i+1},\ldots,x_N$. (Here, $\hat{x}_i$ stands for omission of $x_i$ from the list.)

Example. When $N=2$ and $p(x,y)=x+y$, the above equation has the solution of the form

$$c_*(x,y)=\frac{2\alpha(x+y)}{(1+\alpha x)(1+\alpha y)}, $$

where $\alpha\approx2.51286$ solves $\alpha=2\log(1+\alpha)$. On the other hand, the copula of $p$ is given by the density

$$c_p(x,y)=\frac{2}{\sqrt{8x+1}}+\frac{2}{\sqrt{8y+1}}-\frac{4}{\sqrt{(8x+1)(8y+1)}}.$$

So, it is possible that $c_* \neq c_p$. In this case, numerical calculation gives

$$ D_{\text{KL}}(c_p\|p)\approx 0.102766 \qquad\text{and}\qquad D_{\text{KL}}(c_*\|p)\approx 0.101707. $$


Addendum - Heuristic derivation of $\text{(*)}$. Assuming $c_*$ with strictly positive density exists, variational calculation tells that

$$ 0 = \frac{\mathrm{d}}{\mathrm{d}\epsilon}\biggr|_{t=0} D_{\text{KL}}(c_* + t \eta \| p) = \int_{[0,1]^N} \eta(x) \left(1 + \log\frac{c_*(x)}{p(x)} \right) \, \mathrm{d}x. $$

Here, $\eta$ is any appropriate perturbation term that renders $c_* + t \eta$ a valid copula for all sufficiently small $t$. One such choice can be constructed as follows:

Let $a \in (0, 1)^N$ and $h_1, \ldots, h_N$ be sufficiently small. Also, let $\hat{e}_i$ denote the $i$th standard basis vector in $\mathbb{R}^N$. Then let

$$\eta(x) = \sum_{\epsilon \in \{0,1\}^N} (-1)^{\epsilon_1 + \cdots + \epsilon_N} \tilde{\delta}_a \left( x - \sum_{i=1}^{N} \epsilon_i h_i \hat{e}_i \right), $$

where $\tilde{\delta}_a$ is an approximate Dirac delta about $a$ which is supported on a sufficiently small region around $a$, so that $\eta$ is supported on $[0, 1]^N$. Note that the "marginals" of $\eta$ are identically zero,

$$\int_{[0,1]^{N-1}} \eta(x) \, \prod_{j : j \neq i} \mathrm{d}x_j = 0, \qquad \forall i, $$

ensuring that $c_* + t\eta$ is indeed a valid copula when $t$ is small.

Example. When $N = 2$,

$$\eta(x) = \tilde{\delta}(x) - \tilde{\delta}(x - h_1\hat{e}_1) - \tilde{\delta}(x - h_2\hat{e}_2) + \tilde{\delta}(x - h_1\hat{e}_1 - h_2\hat{e}_2). $$

Now letting $\tilde{\delta}_a \to \delta_a$ and subsequently letting $h_i \to 0$, we obtain the equation $\text{(*)}$.