I am trying to find an upper bound for the following experiment:
- For each participant, we perform 100 trials.
- Before Each trial, I get to see the history of the current participant's choices and the rewards I gave for each trial.
- There exist two different alternatives the participant must choose from.
- If he chooses the alternative where I put the reward, he gets it. Otherwise, he doesn't see where I put the reward (If at all), meaning he only sees what he chose.
- This goes on for all 100 Trials.
I have constraints upon me, and I must give a total of 25 rewards for each alternative ( So a total of 50 rewards over 100 trials). I'm trying to bound the number of different "reward policies" that exist, now we must consider that each policy depends on the participant's previous choices. I wrote a python program that will probably finish running at the heat death of the universe, and I was wondering if there might be a smarter combinatoric way to solve this issue probably making use of some recurrence relation.
@lru_cache(maxsize=None, typed=False)
def num_policies(t, r1, r2, history):
# Base case: no more trials
if t == 0:
return 1 if r1 == r2 == 0 else 0
# Recursive case: calculate the number of policies depending on the participant's choice
count = 0
# If the participant chooses alternative 1
if r1 > 0:
count += num_policies(t - 1, r1 - 1, r2, history * 10 + 1) # if you give a reward
count += num_policies(t - 1, r1, r2, history * 10 + 1) # if you don't give a reward
# If the participant chooses alternative 2
if r2 > 0:
count += num_policies(t - 1, r1, r2 - 1, history * 10) # if you give a reward
count += num_policies(t - 1, r1, r2, history * 10) # if you don't give a reward
return count
print(timeit.timeit(lambda: num_policies(100, 25, 25, 0), number=1))
I would really appreciate ideas I can try out, or if I missed something I would be glad to know.
P.S I know this is code in MathExchange, this is not a mistake :)
Solution
As pointed below the correct answer is: 14359430228704479165844944263707579471282278514667959615488
But looking this number of on OEIS gave me some interesting insight, It appears in https://oeis.org/A051288, for n=100, and k=25. Described by the formula : $$T(n, k) = binomial(n, 2\cdot k)\cdot2^{n-2\cdot k}\cdot binomial(2\cdot k, k)$$
If anyone has some intuition for why this is the case I'd love to hear.
Here is the pseudocode of the function you defined, optimised using dynamic programming:
We use a 3D array
dp[][][]to store the results of each function call to avoid repeating computations. It is worth mentioning that the parameter $h$ is not used at all in the function definition and so has no impact on the results computed. Using this fact we have that the results depend only on $t,r_1,r_2$, which is why we use a 3D array for memoization.With the function above we compute the value
countto be $14359430228704479165844944263707579471282278514667959615488$ with input $t = 100$, $r_1 = 25$, $r_2 = 25$, $h = 0$.