Repeated convolution of a pdf of the product of two (independent?) random variables

220 Views Asked by At

Let $p\in \mathbb R$ and $k,n\in \mathbb N$ be fixed constants with $ k << n$.

Let $\bf A$ be a random vector of positive real numbers summing to $p$, generated as follows: choose $n$ real numbers independently and uniformly between $0$ and $1$ to be the coordinates of $\bf A$, then normalize so their sum is $p$.

Let $\bf x$ be a random "mask" vector of length $n$, generated as follows. We choose a random subset of $k$ entries of $\bf x$ to be nonzero. Each of the nonzero entries is chosen uniformly in the range $(1,100)$, independently of each other (and of $\bf A$).

Question: How can we determine the cumulative distribution function of $\sum_{i=1}^{n} {\bf A}_i{\bf x}_i$?

Edit: I did a bit of research and found out how to do steps 2 and 3 in the roadmap below ( I plan to do step 3 using Chernoff Bounds to approximate ). Thus, all that is left to do is find the PDF of $A_1$, but this is really difficult, because I can’t find the point of intersection in order to integrate. Also, even I found the limits of integration, I’m not quite sure how I would take the derivative of such a complex function. I am currently trying to find a solution, but I welcome any input!

1

There are 1 best solutions below

10
On

This is not a complete answer yet, but I want to get some thoughts out there.

  • You can just solve the problem for $p=1$. Since $p$ is just a scale factor, it is simple to determine the distribution for general $p$ from the $p=1$ case.

  • We can simplify things by not choosing the $k$ nonzero entries of $\bf x$ randomly, but instead always having them be the first $k$ coordinates (because the $A_i$ are identically distributed, so it does not matter which $k$ are used). Therefore, instead of an $n$-fold sum, we just need to work with the $k$-fold sum $$ \sum_{i=1}^k A_i U_i,\qquad U_i\sim \text{Unif}(1,100) $$ Above, the $x_i$ have been replaced with the simpler $U_i$ that are always nonzero.

  • Your idea of assuming that the collection $\{A_1,A_2,\dots,A_k\}$ is independent for calculation purposes is probably a good approximation, as long as $k\ll n$. Therefore, you just need to do three things:

    1. Determine the pdf of a single coordinate, $A_1$.

    2. Determine the pdf of $A_1U_1$, by convolving the previous pdf with the uniform pdf. Call this pdf $f_{A_1U_1}(x)$.

    3. Compute the $k$-fold convolution of $f_{A_1U_1}(x)$ with itself.

However, even with all of these simplifications, step 1 of determining the pdf of $A_1$ is still quite hard. In what follows, I will let $V_1,\dots,V_n$ be the unnormalized values of $A_i$, so that each $V_i\sim \text{Unif}(0,1)$, independently of each other.

  • Let $W=V_2+V_3+\dots+V_k$. First, compute the pdf of $W$. There are two ways to do this:

    • Compute the pdf exactly, using the the Irwin-Hall distribution. The resulting pdf is a piecewise function which is a certain polynomial on each piece.

    • If $k$ is large enough, you can just use the central limit theorem and approximate $W$ by a normal normal random variable whose mean is $(k-1)/2$ and whose variance is $(k-1)/12$.

  • Since $V_1$ and $W$ are independent, and we have the pdf of each, we easily get the joint pdf $f(v,w)$ of $(V_1,W)$. Since$$A_1=\frac{V_1}{V_1+W}, $$ you can first find the $P(A_1\le a)$ of $A_1$ by integrating $f(v,w)$ over the region defined by $$ 0\le v\le 1,\;\; 0\le w\le (k-1),\;\; \frac{v}{v+w}\le a $$ This function of $a$ must then be differentiated to get the pdf of $A$. This is the part that I am too scared to attempt, but perhaps with this road map you or someone else can figure it out.