Can I compute the KL-divergence between, e.g. the Uniform distribution on $\{(x,y) \in [0,1]^2 | y=0 \}$ (q) and the Uniform distribution on $[0,1]^2$ (p). It seems to me like I should be able to, since I only need to integrate over the area where q is defined. Is there any justification for this (maybe with some generalized notion of the KL)?
Here is my attempt to work out the math in this case:
$q = U(\{(x,y) \in [0,1]^2 | x = y \}) \implies q(x,y) = \frac{1}{\sqrt{2}} \delta(x-y)$
$p = U([0,1]^2) \implies p(x,y) = 1$
$KL(q || p) = \int_0^1 \int_0^1 q(x,y) \log\frac{q(x,y)}{p(x,y)} dx dy$
At this point, we get a delta function inside the log, which according to this question, is undefined.
FYI, The example I'm actually interested in is: $$KL(q(\mathbf{z}) || p(\mathbf{z}))$$ where $p(\mathbf{z}) = \mathcal{N}(0, I)$ and $\mathbf{z} = g\mathbf{v}$ where $\mathbf{v}$ is a constant and $g$ is drawn from some known distribution ($g \sim q_g(g)$).
Usually, the KL divergence for probability measures like this (which are mutually singular) is defined to be infinite. The KL divergence between two measures $\mu$ and $\nu$ is typically defined as
$$ D_{KL}(\mu\|\nu) = \left\{\begin{array}{ll} \Bbb{E}_\mu\left[\ln\frac{d\mu}{d\nu}\right] & \mu\ll \nu \\ \infty & \text{else} \end{array}\right. $$ This makes sense in your case - if you think about the "information gain" when going from a random quantity that can take any position in $[0,1]^2$ to a random quantity that can only live on the x-axis, you've gained "infinite" information because you've eliminated all uncertainty in the $y$ variable.
If you don't want to think about measures, you can think about approximating your $q$ with a narrow box i.e. let $q(x,y)$ be supported on $[0,1]\times [0,\epsilon]$. Then compute $D_{KL}$ and let $\epsilon\rightarrow 0$.