Finding the distribution function of a random variable using CLT

296 Views Asked by At

Let $f_0$ and $f_1$ be two continuous probability density functions with means $\mu_0,\mu_1$ and variances $\sigma_0^2,\sigma_1^2$ on $\mathbb{R}$. Furthermore, let $l(y)=f_1(y)/f_0(y)$ be the likelihood ratio and $0<c_l<1$ and $c_u>1$ be two positive real numbers.

Each time, we draw $n$ i.i.d. samples $(y_1,y_2...,y_n)$ from the distribution $f_0$ and calculate the vector $v=[\ln l(y_1),\ln l(y_2),...,\ln l(y_n)]$. If any element of $v$ is larger than $\ln c_u$ then they are replaced by $\ln c_u$ and similarly if any element of $v$ is smaller than $\ln c_l$ those are changed by $\ln c_l$, giving another vector $v^*$.

Example: For $v=[-3.1, 0.6, -7.5, 0 ,4.8, 2.4, .1]$, $\ln c_l=-2$ and $\ln c_u=3$ we get $v^*=[-2, 0.6, -2, 0 ,3, 2.4, 0.1]$. The values $−3.1$ and $−7.5$ are clipped to $-2$ and $4.8$ is clipped to $3$.

Consider the sum: $$s_n=\sum_{i=1}^n v^*(i)$$ To which distribution does the empirical distribution of $s_n$ converge as $n\rightarrow\infty$? (probably in terms of $c_l,c_u$ and the means and variances of $f_0$ and $f_1$)

Added:(14.04.2014)

To get the density of $s_n$ we need to first know the density of each $v^*_i$. Since the there is clipping the density of $v^*_i$ will be defined on the interval $[\ln c_l, \ln c_u]$. Out of this interval the density of $v^*_i$ will be zero. at exactly $\ln c_l$, the density of $v^*_i$ will have a point mass which is $\int_{\{y:\ln l<\ln l_l\}}f_0(y)\mbox{d}y$ and similarly at $\ln c_u$, there is a point mass of $\int_{\{y:\ln l>\ln l_u\}}f_0(y)\mbox{d}y$. Between $\ln c_l$ and $\ln c_u$, the likelihood doesn't change and I think the density of the likelihood will stay the same at that range (should be checked).

Does anyone have any idea?

Thank you very much!!!

1

There are 1 best solutions below

1
On

(Not an answer, too long for comment. ) I doubt that you can express that in terms of the moments (mean, variance) of $f_0$ and $f_1$.

If we forget for one moment about the "clipping", we have the random variable

$$z=\log \frac{f_1(y)}{f_0(y)}$$ where $y$ follows the density $f_0(y)$. Then, we know its mean: it's the Kullback Leibler distance (or divergence, or relative entropy).

$$E(z)= - \int f_0(y) \log \frac{f_0(y)}{f_1(y)} dy=-D( f_0 || f_1)$$

Then we can expect that $s_n/n \to -D( f_0 || f_1)$ (but we need to check the convergence conditions, and which convergence). Anyway, this is not expressable in terms of means and variances of $f_0$ and $f_1$.

If we can bound the second moment of $z$ (but againg I doubt that it have simpler expression that the mere integration), we can apply then the CLT to $s_n/\sqrt{n}$. Our problem is complicated by the clipping, of course.