Why has the Stein operator for normal approximations the form $(\mathcal Af)(x)=f^\prime(x)-xf(x)$?

568 Views Asked by At

My Question: Why has the Stein operator $\mathcal A$ for normal approximations the form $(\mathcal Af)(x)=f^\prime(x)-xf(x)$? How can one deduce this form of the operator?

Reason for my question: I try to understand Stein's method. So far I understand, that one can use this method to find estimates for distances between random variables $W$ and $N$ of the form

$$\sup_{h\in\mathcal H} |E[h(W)]-E[h(N)]|$$

where $N$ shall be an approximation of $W$ (I am interested in the case, where $N$ has the standard normal distribution). First one sets $g(x)=h(x)-E[h(N)]$ such that

$$|E[h(W)]-E[h(N)]| = |E[g(W)]|$$

Instead of estimating $|E[h(W)]-E[h(N)]|$ one can also estimate $|E[g(W)]|$ which doesn't include $N$ (this step is convincing for me).

To find estimates easily one sets $(\mathcal A f)(x)=g(x)$ with a certain operator $\mathcal A$ (the Stein operator). For approximations against the normal distribution $(\mathcal Af)(x)=f^\prime(x)-xf(x)$ is used. Thus

$$|E[h(W)]-E[h(N)]| = |E[f^\prime(W)-Wf(W)]|$$

I saw in the proof of the Berry-Esseen theorem , that $|E[f^\prime(W)-Wf(W)]|$ can be more easily estimated than $|E[h(W)]-E[h(N)]|$.

What I do not understand, is why $(\mathcal Af)(x)=f^\prime(x)-xf(x)$ was chosen in the first place. Is it just a lucky guess?! Which chain of thoughts lead me to the choice $(\mathcal Af)(x)=f^\prime(x)-xf(x)$ for the normal approximation?

1

There are 1 best solutions below

0
On BEST ANSWER

It's not a lucky guess. It's at the heart of Stein's method: you need a characterizing equation for your distribution. There isn't a unique equation, but in fact many and depending on the situation, one might use other characterizing equations.

In the case of the normal distribution, this is actually simple integration by parts. Below, all expectations are with respect to the normal distribution $c e^{-x^2/2}$. Let $f(x)$ be nice enough for the following to work:

$$E[f'(x)]=\int_{-\infty}^\infty f'(x)ce^{-x^2/2}dx=\left.f(x)ce^{-x^2/2}\right|_{-\infty}^\infty+\int_{-\infty}^\infty f(x)xce^{-x^2/2}dx=E[xf(x)].$$

Now define $(Af)(x)=f'(x)-xf(x)$ and notice that $E[(Af)(x)]=0$ when using the normal distribution.

Now forget about the normal distribution. For an arbitrary measure $\mu$, suppose that $E_\mu[(Af)(x)]:=\int (Af)(x)d\mu=0$ for a large class of functions, say $f\in C^1$ and $f(x),f'(x)$ compactly supported, then $\mu$ must be the normal distribution. In other words if $\mu$ is not the normal distribution, then there's at least on function $f(x)$ that breaks Stein's equation. This takes a bit of proof but isn't too hard and can be found in any introduction to Stein's method.

Notice that instead of starting with $E[f'(x)]$ we could have started with $E[f''(x)]$, or even $E[f'(x)x^2]$, etc. Then by integrating by parts we could come up with an infinite family of operators $\mathcal{A}$ that characterize the normal distribution.