Let $\mathbb{E}[X]$ be the expected value of a continuous random variable $X$ and $p_\theta$, $q_\phi$ be two density functions parameterized by $\theta$ and $\phi$, respectively. Note that $\theta$ is not necessary equal to $\phi$.
KL-divergence is a non-symmetric non-negative measure which determines how much $p_\theta$ and $q_\phi$ differ from each other. This quantity is particularly useful when used as an addend of the Evidence Lower Bound (ELBO) formulation in auto encoders' loss function, in the context of machine learning (deep learning).
In this video the professor shows that:
$$ \text{KL}(p_\theta\ ||\ q_\phi) = \mathbb{E}_p \bigg[log \frac{p_\theta(x)}{q_\phi(x)} \bigg] := \int_\mathbb{R} log \frac{p_\theta(x)}{q_\phi(x)}\ p_\theta(x)\ dx $$
is hard to compute because "integral goes over $-\infty$ to $+\infty$". But I haven't fully understood why this is an issue.
He will eventually approximate that "hard" integral using a numerical method from to the law of large numbers:
$$ \int_\mathbb{R} log \frac{p_\theta(x)}{q_\phi(x)}\ p_\theta(x)\ dx \approx \frac{1}{N} \sum_{i=1}^N\ log \frac{p_\theta(x_i)}{q_\phi(x_i)} $$
when $N$, the total number of trials, is sufficiently large.
Is the problem related to the fact that it is not possibile to get a closed-form of that integral due to the presence of "infinite sums" which may be an issue for a computer?