A model $\mathcal{M}_1$ allows to estimate a probability mass function given by $\mathbb{P}_{1n}$, where $n$ denotes dependence on the sample size. This is a categorical distribution over $K$ possible outcomes, with probabilities denoted $p_k$.
Then a new sample of size $n$ becomes available, which under $\mathcal{M}_1$ yields a new probability function, $\mathbb{P}_{2n}$. As a specification test, I want to test
$$H_0: \mathbb{P}_{1n} = \mathbb{P}_{2n}$$
Intuitively, some measure of the discrepancies between the two distributions would be appropriate, e.g. the Kullback–Leibler divergence. Apparently this is referred to in the literature as a goodness-of-fit problem. I have checked Lehmann & Romano TSH (Ch. 14), but most tests seem to be discussed for continuous distributions and further my understanding is that it does not really cover my problem (although there is a reference to 'two-sample problems', which I believe is what I am discussing here). I have also seen the problem referred to as 'homogeneity' of distributions.
I am looking for some discussion on this problem, information on which tests would be appropriate, and/or pointers to relevant literature, books, etc that can help me better understand the problem.
Note: Related questions (e.g. this one on Cross Validated) did not help much either. Ideally I would prefer a more formal discussion and references to the relevant literature.