This is an expansion/generalization of a previous question I've asked here. Some of the simplifications I made in the original question turned out to be too simplifying, so I'm trying again. The most general case involves millions of cards and hundreds of front/back colors, if that's important for any reason.
I have 10 cards. The color of the front of the card is either pink, yellow, or green. The color of the back of the card is orange, blue, or red. Here’s what I know:
- 5 of the cards have pink fronts, 3 have yellow fronts, and 2 have green fronts.
- 5 of the cards have orange backs, 1 has a blue back, and 4 have red backs
- Cards with an orange back can have either a yellow or pink front, but not green
- Cards with a red back can have either a green or pink front, but not yellow
- Cards with a blue back can have any color front
If you're handed a new card with a pink/yellow/green front, what are the odds that the back is orange/blue/red given that the new card follows the same probability distribution as the 10 old cards?
I've simulated the probabilities, and I'm reasonably confident the simulation is accurate, but I don't know how to get the closed-form solutions. The general idea is to create all possible decks that satisfy the constraints, and then count the number of occurrences of each card type in all decks.
Simulation code (python):
from itertools import permutations
fronts = ["pink"] * 5 + ["yellow"] * 3 + ["green"] * 2
backs = ["orange"] * 5 + ["blue"] * 1 + ["red"] * 4
decks = [list(zip(x, backs)) for x in permutations(fronts, len(backs))]
permitted_decks = [d for d in decks if ("green", "orange") not in d and ("yellow", "red") not in d]
unique_decks = []
for deck in permitted_decks:
sorted_deck = sorted(deck)
if sorted_deck not in unique_decks:
unique_decks.append(sorted_deck)
print(f"{len(unique_decks)} unique decks")
counts = {f: {} for f in set(fronts)}
for deck in unique_decks:
for front, back in deck:
counts[front][back] = counts[front].get(back, 0) + 1
for front, back_counts in counts.items():
back_total = sum(back_counts.values())
for back_color, back_count in back_counts.items():
for quantity in [front, back_color, back_count, back_total, round(back_count/back_total, 10)]:
print(quantity, end="\t")
print("")
>>> 3 unique decks
>>> yellow blue 1 9 0.1111111111
>>> yellow orange 8 9 0.8888888889
>>> pink orange 7 15 0.4666666667
>>> pink red 7 15 0.4666666667
>>> pink blue 1 15 0.0666666667
>>> green red 5 6 0.8333333333
>>> green blue 1 6 0.1666666667
Any help would be appreciated. Thanks!
EDITED:
If you run the simulation and do the counts for permitted_decks instead of unique_decks, by replacing for deck in unique_decks with for deck in permitted_decks, you get different numbers:
yellow orange 604800 691200 0.875 (7/8)
yellow blue 86400 691200 0.125 (1/8)
pink orange 547200 1152000 0.475 (19/40)
pink red 518400 1152000 0.45 (9/20)
pink blue 86400 1152000 0.075 (3/40)
green red 403200 460800 0.875 (7/8)
green blue 57600 460800 0.125 (1/8)
So can I infer that even though there are three unique decks, the number of ways I can arrange the sides to create each unique deck is not the same for all decks, so the deck orientations are not equally likely?
Here are the counts:
86400 [('green', 'red'), ('green', 'red'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'blue'), ('yellow', 'orange'), ('yellow', 'orange')]
********************
86400 [('green', 'red'), ('green', 'red'), ('pink', 'blue'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'orange'), ('yellow', 'orange'), ('yellow', 'orange')]
********************
57600 [('green', 'blue'), ('green', 'red'), ('pink', 'orange'), ('pink', 'orange'), ('pink', 'red'), ('pink', 'red'), ('pink', 'red'), ('yellow', 'orange'), ('yellow', 'orange'), ('yellow', 'orange')]
I don't have the final answer, just estimations.
You have 10 000 000 cards. I will work with proportions, Total is 100%. You know proportion of colors for Front face and Back Face. Combinations of colors are PO, PB, PR, YO, YB, YR, GO, GB, GR (=Pink-Orange, Pink-Blue ...
PO+PB+PR=50% ... : we have 6 equations
At this step, we can invoke Bayes (and only at this step) to get the best estimations for all values :
PO = 50% * 50% = 25% ; PB = 50% * 10% = 5% ; PR = 20%
YO = 15% ; YB = 3% ; YR = 12%
GO = 10% ; GB = 2% ; GR = 8%
Problem, we know that YR=0 and GO=0 ; so we have to re-evaluate the 7 others estimations.
We are now in a problem of 'Linear Programming'; We have some constraints (example PB+YB+GB=10%, YO+YB=30%), so we have 6 equations, and we have 7 unknowns. More unknowns than equations, good news, we have an infinite numbers of solutions.
We have to choose the solution that minimize some indicator. YOU have to build this indicator. We are not in statistics / Probability / Bayes ; we are in linear-Programming.
I don't have correct tools to do this, but my estimation gives :
PO = 23.9% ; PB = 3.5% ; PR = 22.6%
YO = 26.2% ; YB = 3.8% ; YR = 0%
GO = 0% ; GB = 2.7% ; GR = 17.3%