Let $P,Q$ be two probability distributions. Then, one has the following dual characterization of their Kullback-Leibler divergence (relative entropy):
$$ D(P \,\|\,Q) = \sup_f ( \mathbb{E}_P[f(X)] - \log \mathbb{E}_Q[e^{f(X)}] ) \tag{1} $$ This characterization is sometimes referred to as Gibbs variational principle, or Donsker-Varadhan formula; however, I couldn't track where it was first proven (there is often a reference to a paper of and Donsker Varadhan from 1983, but I couldn't find, in there, where (1) is actually established).
Where was (1) established first, specifically? What to cite?