Statistical loss function for categorical distributions

56 Views Asked by At

For training an autoencoder model whose outputs (and inputs) are parameters from a categorical distribution $[q_1, q_2, \ldots, q_n]$, I have to define a proper loss function measuring the distance between the ground truth $P$ and prediction $Q$.

Since the outputs are probabilistic, I want to use a loss function that amongst others satisfies that the absolute difference between a ground truth $P$ and prediction $Q$ is penalized more as $P$ increases. As an example, $[p,q] = [0.9, 0.8]$ should receive a higher penalty than $[p,q] = [0.8, 0.7]$.

My initial thoughts were to use the Kullback-Leibler divergence, which for $n$-dimensional categorical distributions $P$ and $Q$ is as follows: $$KL(P||Q) = \sum_{i=1}^{n}p_i\log\big(\frac{p_i}{q_i}\big)$$ A problem that arises when using this divergence is that it is only defined when $q_i=0$ implies $p_i=0$. This problem can be solved by using a 'fuzz factor' $\epsilon>0$: $$KL_{\epsilon}(P||Q) = \sum_{i=1}^{n}(p_i+\epsilon)\log\big(\frac{p_i+\epsilon}{q_i+\epsilon}\big) $$

I wonder if there are other good metrics that can be used as loss function for this case where the output is a set of parameters from a categorical distribution.