A previous question on "Distribution of the sum of a multinomial distribution" wasn't quite what I need as they focus on finding specific solutions for small N, or the limiting case "N is large" (>30 ish). In my case, N can take between (say) 5 and 100, so I need some solution that handles (approximately) most cases as a variable N.
Essentially I need an efficient algorithm to estimate the 10th-percentile cost of walking across N tiles, each of which can take K random costs. The idea is that I have a path-finding algorithm that samples reasonable guesses for paths somehow, and continues until it finds a path with a total cost that is in the cheapest 10% of random paths.
So, in short, does anyone know a generic approximation formula to estimate Q1 in the following python (Python 3) code? Or just any tips on how to optimise the code?
from random import random
# Question params: N, {(C_1, PDF_1), (C_2, PDF_2), ..., (C_K, PDF_K)}
# Suppose we run N iid random trials which can only return
# values in the set {C_1, C_2, ..., C_K}.
# Each C_i is a real number such as 1 or 46.285 or -3.92
# K is a fixed positive integer (e.g. K=6).
# We define the random variable X ("total cost") as
# X := Trial_1() + Trial_2() + ... + Trial_N()
# Clearly we have that N*Min(C_i) <= X <= N*Max(C_i)
# Suppose the probabilities for each cost C_i is PDF_i
# Note: 0 < PDF_i < 1, and SUM_i(PDF_i) == 1
# Clearly, the mean of total cost is <X> = N * SUM_i(PDF_i * C_i)
# The question is: **find the lower 10th percentile**
# In other finds, solve Q1 such that Prob(X <= Q1) == 0.1
# Parameters
N = 10 # number of independent trials. e.g. number of dice to throw
COSTS = [1, 12.7, -5, 6, 15, 60]
PDF = [1, 1, 1, 1, 1, 1]
# parameters in the PDF list must all be positive numbers
K = len(COSTS) # number of categories
# e.g. K=6 if we're thinking of sides of a dice
# Sanity checks
assert K == len(PDF), "pdf must give exactly one weight per cost"
for p in PDF:
assert p > 0, "probability-weights must be positive"
NumExperiments = 654321 # a large number
CDF = [] # cumulative probability distributions
for x in range(1, 1+K):
s = sum(PDF[:x])
CDF.append(s)
# This code just normalises the PDF and CDF
_weight = sum(PDF)
for x in range(K):
CDF[x] /= _weight
PDF[x] /= _weight
def Trial():
r = random()
for x in range(K):
if (r <= CDF[x]):
return COSTS[x]
def Exp():
X = 0
for _ in range(N):
X += Trial()
return X
def Estimate():
experiments = []
for _ in range(NumExperiments):
experiments.append(Exp())
experiments.sort() # arrange into ascending order
Q1 = experiments[int(0.1*NumExperiments)] #lower 10th percentile
print(Q1)
Estimate()