Compute the probability of multivariate Gaussian distribution

44 Views Asked by At

I am reading a proof from a paper:

Zhu, Sicheng, Xiao Zhang, and David Evans. "Learning adversarially robust representations via worst-case mutual information maximization." International Conference on Machine Learning. PMLR, 2020.

There is one step in there proof which I cannot understand. Let me restate the problem here. Let $\mu_{XY}$ denote the joint probability distribution over the space $X \subseteq R^d$ and $Y= \{0,1\}$. Here I am slightly abusing notations $X/Y$ to denote both the space and random variable. Samples $(x,y) \sim \mu_{XY}$ are generated according to $$ y \sim Uniform\{-1, +1\}, x \sim N(y\cdot \theta, \Sigma) $$ where $\theta \in R^d$ and $\Sigma \in R^{d\times d}$ are given parameters. The step that I don't understand is as follows, given parameters $w \in R^d, \epsilon \in R_{+}$, they have $$ \Pr_{x \sim \mu_X} \Big(w^Tx - \epsilon \cdot \|w\|\ge0 \Big) = \frac{1}{2} \Pr_{x\sim N(\theta, \Sigma)} \Big(w^Tx \ge \epsilon \cdot \|w\|\Big) + \frac{1}{2} \Pr_{x\sim N(-\theta, \Sigma)} \Big(w^Tx \ge \epsilon \cdot \|w\|\Big) \\ = \frac{1}{2} - \frac{1}{2} \Pr_{Z \sim N(0,1)} \Big(\frac{-\epsilon \cdot \|w\|+w^T\theta}{\sqrt{w^T\Sigma w}} \le Z \le \frac{\epsilon \cdot \|w\|+w^T\theta}{\sqrt{w^T\Sigma w}}\Big) $$ I understand the first equality since $x \sim N(y\cdot \theta, \Sigma)$ and $y \sim Uniform\{-1, +1\}$. Could someone kindly explain the process the authors used to arrive at the second step? While I have a fundamental understanding of probability theory, I would greatly appreciate it if you could highlight any additional probability concepts I might need to review. Thank you in advance!

1

There are 1 best solutions below

0
On BEST ANSWER

Consider the distribution of $w^Tx$, which is normal since $x \sim N(y\theta, \Sigma)$ is multivariate normal. In particular, $w^Tx \sim N(w^T y\theta, w^T\Sigma w)$ - if you are unsure here, just do the calculation explicitly setting $x = (x_1, \dots, x_d), w = (w_1, \dots, w_d)$ and writing down carefully what $w^Tx = \sum_{i=1}^d w_ix_i$ is.

Then it becomes clear that $\frac{w^Tx - w^Ty\theta}{\sqrt{w^T \Sigma w}} \sim Z = N(0, 1)$. In particular, for $y = -1$, we have that $$P(w^Tx \geq \epsilon ||w||) = P\left(\frac{w^Tx + w^T\theta}{\sqrt{w^T \Sigma w}} \geq \frac{\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right) = P\left(Z \geq \frac{\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right).$$ Similarly, for $y=1$, we get that $$P(w^Tx \geq \epsilon ||w||) = P\left(\frac{w^Tx - w^T\theta}{\sqrt{w^T \Sigma w}} \geq \frac{\epsilon||w|| - w^T\theta}{\sqrt{w^T \Sigma w}}\right) = P\left(Z \geq \frac{\epsilon||w|| - w^T\theta}{\sqrt{w^T \Sigma w}}\right) \\ = P\left(Z \leq \frac{-\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right)$$ where we use that $Z$ is symmetric in the last equality. So overall we see that the expression in your question reduces to $$ \frac{1}{2} \left(P\left(Z \leq \frac{-\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right) + P\left(Z \geq \frac{\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right)\right) \\ = \frac{1}{2}P\left(Z \leq \frac{-\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}} \text{ or } Z \geq \frac{\epsilon||w|| + w^T\theta}{\sqrt{w^T \Sigma w}}\right). $$ Take the complement of this probability and you are done.