A question about a Gaussian-perturbed probability distribution

193 Views Asked by At

In a paper I've been reading recently, at some point the authors define a new probability distribution over original data $\mathcal{D} = (\{\textbf{x}_1, \dots, \textbf{x}_N \})$ (where each $\textbf{x}_i \in \mathbb{R}^D$ for some $D$) by introducing a gaussian perturbation.

In practice, given that we have some (unknown) probability $p_{data}(\textbf{x})$ and some scalar $\sigma >1$, the new (perturbed) probability is defined as

$$q_{\sigma}(\textbf{x}) = \int p_{data}(\textbf{t}) \mathcal{N}(\textbf{x} | \textbf{t} \, ,\sigma^2I) \, d\textbf{t} .$$

Based on this notation, what I understand is that, if my initial data point is distributed according to the original probability $p_{data}$, what we do now is considering $\tilde{\textbf{x}} = \textbf{x} \odot \varepsilon$ (element-wise multiplication) where $\epsilon \sim \mathcal{N}(\textbf{t}, \sigma^2I)$ and therefore the new density function takes into account the introduced noise. Nevertheless, it's not clear what the role of $\textbf{t}$ is in this context and why we integrate over it. Additionally, I'm not sure how can we show that the new defined distribution $q_\sigma(\textbf{x})$ integrates to $1$. Is that simply achieved by writing:

\begin{align} \int_{\textbf{x}} q_\sigma(\textbf{x}) \, d\textbf{x} &= (\int_{\textbf{x}}\mathcal{N}(\textbf{x}|\textbf{t} \, , \,\sigma^2 I) \, d\textbf{x}) \cdot (\int_{\textbf{t}}p_{data}(\textbf{t}) \, d\textbf{t}) \\ &= 1 \, \, ..? \end{align}