What is $E(\min(X_0+\cdots+X_n,m))$ for large $n$ and $m$s?

60 Views Asked by At

I have multiple independent stochastic variables $X_0,X_1,X_2,\ldots,X_n$. Theese variables are either $0$ or $1$ with probabilites $p_0,\ldots,p_n$ for being $1.$ The probabilities are known. Let $X=X_0+\cdots+X_n$.

I know that the expected value $E(X)=E(X_0)+\cdots+E(X_n)$, but what is $E(\min(X,m))$ where $m$ is some integer?

My first approach is $$E(\min(X,m))=\sum_{S_i\in R} \left(\left(\prod_{j=0}^np (X_j)^{s_{ij}} \cdot (1-p(X_j))^{1-s_{ij}}\right)\cdot \min\left(\sum S_i,m\right)\right)$$

where $R$ is a set of all possible binary sets S_i of length $n$.

For example, if $n=3$, $p_0=0.1$, $p_1=0.6$, $p_2=0.8$, and $m=2$, then $R$ would be $$R=((0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1))$$

$$ E(\min(X,m))=\begin{matrix} 0.1 &\cdot& 0.6 &\cdot& 0.8 &\cdot&2&+\\ 0.1 &\cdot& 0.6 &\cdot&(1-0.8)&\cdot&2&+\\ 0.1 &\cdot&(1-0.6)&\cdot& 0.8 &\cdot&2&+\\ 0.1 &\cdot&(1-0.6)&\cdot&(1-0.8)&\cdot&1&+\\ (1-0.1)&\cdot& 0.6 &\cdot& 0.8 &\cdot&2&+\\ (1-0.1)&\cdot& 0.6 &\cdot&(1-0.8)&\cdot&1&+\\ (1-0.1)&\cdot&(1-0.6)&\cdot& 0.8 &\cdot&1&+\\ (1-0.1)&\cdot&(1-0.6)&\cdot&(1-0.8)&\cdot&0&\\ \end{matrix}=1.452 $$ The problem with this approach is that as $n$ and $m$ increases, the time to calculate this increases by $O(n\cdot2^n)$, which probably could be optimized to be $O(n\cdot \frac{n!}{(n-m)!m!})$ by aggregating the calculations for $X>m$. Either way, a computer will get in trouble pretty quickly at this point.

Another approach is to stochastically calculate this probability. See javascript code example below.

var ps=[0.1,0.6,0.8];
var m=2;

var numIterations=1e6; // Some large number. Larger => more precision and more compute time.
var cumSum=0;
for(var i=0;i<numIterations;i++){
    var sum = 0;
    for(var j=0; j<ps.length;j++){
        sum += Math.random()<ps[j];
    }
    if(sum>m){
        sum=2;
    }
    cumSum += sum;
}
console.log(cumSum/numIterations);

Example output: 1.451833

The problem with this approach is I'm using this result as part of the output for a loss function in a machine learning algorithm. In order for the loss function to be differentiateable with respect to all the weights correctly, it must also be differentiateable with respect to all $p_i$s. Stochastically calculating this probability might miss out on sampling $p_i$s with small probabilities.

Is there any neat mathematical tricks I could use that can simplify the calculations? Approximations are also welcome as long as $$\operatorname{AngleBetween}(\nabla E(\min(X,m)), \nabla \operatorname{SomeApproximatedFunction}(X,m))=\operatorname{ASomewhatSmallAngle}$$ and $\operatorname{SomeApproximatedFunction}(X,m)$ is differentiateable with respect to all $p_i$s.