Bayes' theorem and card colors

271 Views Asked by At

This is an expansion/generalization of a previous question I've asked here. Some of the simplifications I made in the original question turned out to be too simplifying, so I'm trying again. The most general case involves millions of cards and hundreds of front/back colors, if that's important for any reason.

I have 10 cards. The color of the front of the card is either pink, yellow, or green. The color of the back of the card is orange, blue, or red. Here’s what I know:

  • 5 of the cards have pink fronts, 3 have yellow fronts, and 2 have green fronts.
  • 5 of the cards have orange backs, 1 has a blue back, and 4 have red backs
  • Cards with an orange back can have either a yellow or pink front, but not green
  • Cards with a red back can have either a green or pink front, but not yellow
  • Cards with a blue back can have any color front

If you're handed a new card with a pink/yellow/green front, what are the odds that the back is orange/blue/red given that the new card follows the same probability distribution as the 10 old cards?

I've simulated the probabilities, and I'm reasonably confident the simulation is accurate, but I don't know how to get the closed-form solutions. The general idea is to create all possible decks that satisfy the constraints, and then count the number of occurrences of each card type in all decks.

Simulation code (python):

from itertools import permutations

fronts = ["pink"] * 5 + ["yellow"] * 3 + ["green"] * 2
backs = ["orange"] * 5 + ["blue"] * 1 + ["red"] * 4

decks = [list(zip(x, backs)) for x in permutations(fronts, len(backs))]
permitted_decks = [d for d in decks if ("green", "orange") not in d and ("yellow", "red") not in d]

unique_decks = []
for deck in permitted_decks:
    sorted_deck = sorted(deck)
    if sorted_deck not in unique_decks:
        unique_decks.append(sorted_deck)
print(f"{len(unique_decks)} unique decks")

counts = {f: {} for f in set(fronts)}

for deck in unique_decks:
    for front, back in deck:
        counts[front][back] = counts[front].get(back, 0) + 1

for front, back_counts in counts.items():
    back_total = sum(back_counts.values())
    for back_color, back_count in back_counts.items():
        for quantity in [front, back_color, back_count, back_total, round(back_count/back_total, 10)]:
            print(quantity, end="\t")
        print("")

>>> 3 unique decks
>>> yellow  blue    1   9   0.1111111111    
>>> yellow  orange  8   9   0.8888888889    
>>> pink    orange  7   15  0.4666666667    
>>> pink    red     7   15  0.4666666667    
>>> pink    blue    1   15  0.0666666667    
>>> green   red     5   6   0.8333333333    
>>> green   blue    1   6   0.1666666667    

Any help would be appreciated. Thanks!

EDITED: If you run the simulation and do the counts for permitted_decks instead of unique_decks, by replacing for deck in unique_decks with for deck in permitted_decks, you get different numbers:

yellow  orange  604800  691200  0.875   (7/8)
yellow  blue    86400   691200  0.125   (1/8)
pink    orange  547200  1152000 0.475   (19/40)
pink    red     518400  1152000 0.45    (9/20)
pink    blue    86400   1152000 0.075   (3/40)
green   red     403200  460800  0.875   (7/8)
green   blue    57600   460800  0.125   (1/8)

So can I infer that even though there are three unique decks, the number of ways I can arrange the sides to create each unique deck is not the same for all decks, so the deck orientations are not equally likely?

Here are the counts:

86400 [('green', 'red'), ('green', 'red'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'blue'), ('yellow', 'orange'), ('yellow', 'orange')]
********************
86400 [('green', 'red'), ('green', 'red'), ('pink', 'blue'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'orange'), ('yellow', 'orange'), ('yellow', 'orange')]
********************
57600 [('green', 'blue'), ('green', 'red'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'orange'), ('yellow', 'orange'), ('yellow', 'orange')]
1

There are 1 best solutions below

0
On

I don't have the final answer, just estimations.

You have 10 000 000 cards. I will work with proportions, Total is 100%. You know proportion of colors for Front face and Back Face. Combinations of colors are PO, PB, PR, YO, YB, YR, GO, GB, GR (=Pink-Orange, Pink-Blue ...

PO+PB+PR=50% ... : we have 6 equations

At this step, we can invoke Bayes (and only at this step) to get the best estimations for all values :

PO = 50% * 50% = 25% ; PB = 50% * 10% = 5% ; PR = 20%

YO = 15% ; YB = 3% ; YR = 12%

GO = 10% ; GB = 2% ; GR = 8%

Problem, we know that YR=0 and GO=0 ; so we have to re-evaluate the 7 others estimations.

We are now in a problem of 'Linear Programming'; We have some constraints (example PB+YB+GB=10%, YO+YB=30%), so we have 6 equations, and we have 7 unknowns. More unknowns than equations, good news, we have an infinite numbers of solutions.

We have to choose the solution that minimize some indicator. YOU have to build this indicator. We are not in statistics / Probability / Bayes ; we are in linear-Programming.

I don't have correct tools to do this, but my estimation gives :

PO = 23.9% ; PB = 3.5% ; PR = 22.6%

YO = 26.2% ; YB = 3.8% ; YR = 0%

GO = 0% ; GB = 2.7% ; GR = 17.3%