Percentile of the sum of a multinomial distribution

75 Views Asked by At

A previous question on "Distribution of the sum of a multinomial distribution" wasn't quite what I need as they focus on finding specific solutions for small N, or the limiting case "N is large" (>30 ish). In my case, N can take between (say) 5 and 100, so I need some solution that handles (approximately) most cases as a variable N.

Essentially I need an efficient algorithm to estimate the 10th-percentile cost of walking across N tiles, each of which can take K random costs. The idea is that I have a path-finding algorithm that samples reasonable guesses for paths somehow, and continues until it finds a path with a total cost that is in the cheapest 10% of random paths.

So, in short, does anyone know a generic approximation formula to estimate Q1 in the following python (Python 3) code? Or just any tips on how to optimise the code?

from random import random 

# Question params: N, {(C_1, PDF_1), (C_2, PDF_2), ..., (C_K, PDF_K)}


# Suppose we run N iid random trials which can only return
# values in the set {C_1, C_2, ..., C_K}.

# Each C_i is a real number such as 1 or 46.285 or -3.92
# K is a fixed positive integer (e.g. K=6).
# We define the random variable X ("total cost") as   
#    X := Trial_1() + Trial_2() + ... + Trial_N()

# Clearly we have that N*Min(C_i) <= X <= N*Max(C_i)

# Suppose the probabilities for each cost C_i is PDF_i 
# Note: 0 < PDF_i < 1, and  SUM_i(PDF_i) == 1

# Clearly, the mean of total cost is <X> = N * SUM_i(PDF_i * C_i)

# The question is: **find the lower 10th percentile**
# In other finds, solve Q1 such that Prob(X <= Q1) == 0.1


# Parameters
N = 10  # number of independent trials. e.g. number of dice to throw
COSTS = [1,     12.7,   -5,       6,        15,    60] 
PDF   = [1,      1,      1,       1,        1,      1]  
# parameters in the PDF list must all be positive numbers


K = len(COSTS)  # number of categories
# e.g. K=6 if we're thinking of sides of a dice


# Sanity checks
assert K == len(PDF), "pdf must give exactly one weight per cost"
for p in PDF:
    assert p > 0, "probability-weights must be positive"



NumExperiments = 654321  # a large number


CDF = []  # cumulative probability distributions
for x in range(1, 1+K):
    s = sum(PDF[:x])
    CDF.append(s)


# This code just normalises the PDF and CDF
_weight = sum(PDF)
for x in range(K):
    CDF[x] /= _weight
    PDF[x] /= _weight


def Trial():
    r = random()
    for x in range(K):
        if (r <= CDF[x]):
            return COSTS[x]

def Exp():
    X = 0
    for _ in range(N):
        X += Trial()
    return X


def Estimate():
    experiments = []
    for _ in range(NumExperiments):
        experiments.append(Exp())
    experiments.sort()  # arrange into ascending order
    Q1 = experiments[int(0.1*NumExperiments)]  #lower 10th percentile
    print(Q1)

Estimate()