Finding the Recurrence Relation from a python script

153 Views Asked by At

I am trying to find an upper bound for the following experiment:

  1. For each participant, we perform 100 trials.
  2. Before Each trial, I get to see the history of the current participant's choices and the rewards I gave for each trial.
  3. There exist two different alternatives the participant must choose from.
  4. If he chooses the alternative where I put the reward, he gets it. Otherwise, he doesn't see where I put the reward (If at all), meaning he only sees what he chose.
  5. This goes on for all 100 Trials.

I have constraints upon me, and I must give a total of 25 rewards for each alternative ( So a total of 50 rewards over 100 trials). I'm trying to bound the number of different "reward policies" that exist, now we must consider that each policy depends on the participant's previous choices. I wrote a python program that will probably finish running at the heat death of the universe, and I was wondering if there might be a smarter combinatoric way to solve this issue probably making use of some recurrence relation.

@lru_cache(maxsize=None, typed=False)
def num_policies(t, r1, r2, history):
    # Base case: no more trials
    if t == 0:
        return 1 if r1 == r2 == 0 else 0

    # Recursive case: calculate the number of policies depending on the participant's choice
    count = 0
    # If the participant chooses alternative 1
    if r1 > 0:
        count += num_policies(t - 1, r1 - 1, r2, history * 10 + 1)  # if you give a reward
    count += num_policies(t - 1, r1, r2, history * 10 + 1)  # if you don't give a reward

    # If the participant chooses alternative 2
    if r2 > 0:
        count += num_policies(t - 1, r1, r2 - 1, history * 10)  # if you give a reward
    count += num_policies(t - 1, r1, r2, history * 10)  # if you don't give a reward
    return count


print(timeit.timeit(lambda: num_policies(100, 25, 25, 0), number=1))

I would really appreciate ideas I can try out, or if I missed something I would be glad to know.

P.S I know this is code in MathExchange, this is not a mistake :)

Solution

As pointed below the correct answer is: 14359430228704479165844944263707579471282278514667959615488

But looking this number of on OEIS gave me some interesting insight, It appears in https://oeis.org/A051288, for n=100, and k=25. Described by the formula : $$T(n, k) = binomial(n, 2\cdot k)\cdot2^{n-2\cdot k}\cdot binomial(2\cdot k, k)$$

If anyone has some intuition for why this is the case I'd love to hear.

1

There are 1 best solutions below

0
On BEST ANSWER

Here is the pseudocode of the function you defined, optimised using dynamic programming:


def num_policies(t, r1, r2, h, dp[][][]):
    
    if t == 0:
        return 1 if r1 == r2 == 0 else 0
    
    if dp[t][r1][r2] != -1:
        return dp[t][r1][r2];
        
    count = 0;
    
    if r1 > 0:
        count += num_policies(t - 1, r1 - 1, r2, h * 10 + 1, dp);
    
    count += num_policies(t - 1, r1, r2, h * 10 + 1, dp);
    
    if r2 > 0:
        count += num_policies(t - 1, r1, r2 - 1, h * 10, dp);
    
    count += num_policies(t - 1, r1, r2, h * 10, dp);
    
    dp[t][r1][r2] = count;
    
    return count;

We use a 3D array dp[][][] to store the results of each function call to avoid repeating computations. It is worth mentioning that the parameter $h$ is not used at all in the function definition and so has no impact on the results computed. Using this fact we have that the results depend only on $t,r_1,r_2$, which is why we use a 3D array for memoization.

With the function above we compute the value count to be $14359430228704479165844944263707579471282278514667959615488$ with input $t = 100$, $r_1 = 25$, $r_2 = 25$, $h = 0$.