I try to understand the relation of likelihood to cross-entropy by reading cross-entropy.
The problem is I cannot understand the formula for the likelihood in the article. The likelihood is defined as follows
$$\prod_{i}^{}q_i^{Np_i}$$
where
$q_i$ is the estimated probability of outcome $i$, $p_i$ is the empirical probability of outcome $i$ and $N$ is the size of the training set.
I haven't seen the formulation of the likelihood like that before that combines estimated and empirical probabilities. Why $p_i$ takes place in the formula? What's the motivation behind this formulation?
This is because in a block of length $N$, outcome $i$ appears about $Np_i$ times and the probability of each of them is estimated as $q_i$. This is why for each outcome it is equal to $q_i^{Np_i}$.