Does it make sense to use the KL-divergence between joint distributions of synthetic and real data, as a evaluation metric?

118 Views Asked by At

The KL-divergence is defined as: $D_{KL}(p(x_1)∥q(x_1))=\sum p(x_1)\, \log \Big( \dfrac{p(x_1)}{q(x_1)} \Big)$

I consider the Kullback-Leibler (KL) divergence as a performance metric for data synthesis.

Several studies used the KL divergence as a performance metric by computing over a pair of the real and synthetic marginal probability distribution for a given variable.

However, the joint distributions of variables in the synthetic and real data are also important for data synthesis.

If the real and synthetic data have variables of $x_1$ and $x_2$, does it makes sense to measure the KL-divergence between $P_{synthetic}(x_1,x_2)$ and $P_{real}(x_1,x_2)$ to evalaute the similarity between two datasets?

Thank you very much for your help!