Minimizing the expectation over a set, wrt to the Gaussian measure

411 Views Asked by At

I have recently read a proof [1] where, at the last step, the authors use an inequality which basically amounts to a lower bound on $\int_\mathbb{R} \mathbf{1}_A(x)|x| \phi(x)dx$, where $\phi$ is the Gaussian pdf ($\phi(x)=\frac{e^{-x^2/2}}{\sqrt{2\pi}}$) and $A$ is a Borel set subject to $\int_\mathbb{R} \mathbf{1}_A(x)\phi(x)dx = p$. The bound they use is $p^2/2$; I was wondering if this was tight (up to a constant factor), or if one could improve this to get better than a square dependence on $p$: $$ \min_{A\in\mathcal{B}_\mathbb{R}} \int_\mathbb{R} \mathbf{1}_A(x)|x| \phi(x)dx\qquad\\ \text{s.t.} \int_\mathbb{R} \mathbf{1}_A(x)\phi(x)dx = p $$

Does anyone know if a result of this type is known? Any help or suggestion is welcome.

Clement.

[1] Theorem 4 of Testing halfspaces In In Proc. 20th Annual Symposium on Discrete Algorithms (SODA (2009) by Kevin Matulef, Rocco A. Servedio, Ronitt Rubinfeld

2

There are 2 best solutions below

3
On BEST ANSWER

One looks for the minimal $E[|X|;X\in A]$ with $P[X\in A]=p$, where $X$ is standard normal. The optimal set $A$ is some $A=(-a,a)$ (see Edit below), then $$ E[|X|;X\in A]=\frac2{\sqrt{2\pi}}(1-\mathrm e^{-a^2/2}),\qquad P[X\in A]=\frac2{\sqrt{2\pi}}\int_0^a\mathrm e^{-x^2/2}\mathrm dx. $$ When $a\to0$, $E[|X|;X\in A]\sim a^2/\sqrt{2\pi}$ and $P[X\in A]\sim2a/\sqrt{2\pi}$, thus, no better lower bound than $$ E[|X|;X\in A]\geqslant cP[X\in A]^2,\qquad c=\sqrt{2\pi}/4, $$ can hold for every $a$. Let us prove the bound above is indeed universal.

Getting rid of irrelevant multiplicative constants, one sees that the claim holds if and every if $u(a)\geqslant0$ for every $a\geqslant0$, where $$ u(a)=1-\mathrm e^{-a^2/2}-\frac12\left(\int_0^a\mathrm e^{-x^2/2}\mathrm dx\right)^2. $$ Note that $u(0)=0$ and that, for every $a\geqslant0$, $$ u'(a)=\mathrm e^{-a^2/2}\left(a-\int_0^a\mathrm e^{-x^2/2}\mathrm dx\right)\geqslant0, $$ thus the claim holds. Finally, the optimal universal lower bound is $$ \int_\mathbb{R} \mathbf{1}_A(x)\phi(x)dx = p\implies\int_\mathbb{R} \mathbf{1}_A(x)|x| \phi(x)dx\geqslant\frac{\sqrt{2\pi}}4\,p^2. $$ Edit: Assume that $P[X\in A_+]=P[0\leqslant X\leqslant a_+]$ where $A_+=A\cap(0,+\infty)$ and define $A_0=A_+\cap(0,a_+)$, $A_1=(0,a_+)\setminus A_+$ and $A_2=A_+\setminus(0,a_+)$, thus $(0,a_+)=A_0\cup A_1$, $A=A_0\cup A_2$ and $P[X\in A_1]=P[X\in A_2]$ with $A_1\subseteq(0,a_+)$ and $A_2\subseteq(a_+,+\infty)$.

Thus, $E[|X|;X\in A_1]\leqslant a_+P[X\in A_1]$ and $E[|X|;X\in A_2]\geqslant a_+P[X\in A_2]$, that is, $E[|X|;X\in A_1]\leqslant E[|X|;X\in A_2]$. Adding $E[|X|;X\in A_0]$ to both sides, one gets $E[|X|;0\leqslant X\leqslant a_+]\leqslant E[|X|;X\in A_+]$.

Likewise, replacing $A_-=A\cap(-\infty,0)$ by $(-a_-,0)$ where $P[X\in A_-]=P[-a_-\leqslant X\leqslant 0]$ yields $E[|X|;X\in A_-]\geqslant E[|X|;-a_-\leqslant X\leqslant 0]$ hence $E[|X|;X\in A]\geqslant E[|X|;-a_-\leqslant X\leqslant a_+]$ where $P[X\in A]=P[-a_-\leqslant X\leqslant a_+]$.

It remains to show that $E[|X|;-a_-\leqslant X\leqslant a_+]\geqslant E[|X|;-a\leqslant X\leqslant a]$ where $a$ is uniquely defined by the condition $P[-a_-\leqslant X\leqslant a_+]=P[-a\leqslant X\leqslant a]$. This is done as above, comparing $|X|$ on the two parts of $(-a_-,a_+)\triangle(-a,a)$.

Finally, $E[|X|;X\in A]\geqslant E[|X|;|X|\leqslant a]$, where $P[X\in A]=P[|X|\leqslant a]$.

0
On

This is not a real proof. However it gives some ideas. First of all $x$ is linearly increasing and the density is exponentially decreasing. Due to the absolute value $|x|$ lets assume $\mathbb{R}^+$ for the moment. First we need to show for an interval $[a_1, a_2]$ if $a_1=0$ is a minimizer or not. Since $x$ is increasing and the density is decreasing at a higher rate, if there exists another interval with $a_1\neq 0$ it should either be at higher extremes or lower extremes, namely when $a_1<\epsilon$ where $\epsilon$ is a small number or when $a_1=k$ such that $[k, \infty]$ satisfies the area condition. If in both conditions $a_1=0$ is the minimizer than $a_1=0$ should be at least a part of the solution, namely when finitely or infinitely many number of intervals is assumed. According to some simulations that I did, when a "unique interval" is assumed $a_1=0$ is the minimizer. It is not even difficult to show it algebraically, i think.

Below is the comparision of the lower bound $p^2/2$ (magenda) with the bound obtained by $[-a_1, a_1]$ for $a_1\in\mathbb{R}^+$ (blue).

enter image description here

If the single interval assumption is general enough then, the blue curve is the minimizer function and the lower bound by $p^2/2$ is not tight.