Probability Algebra With a Wordle Example

92 Views Asked by At

Define $X$ as the proposition "you will win today's Wordle," $A$ as expressing the information returned by the coloring of your first guess (in a vacuum), $B$ as expressing the information returned by the coloring of your second guess (in a vacuum), etc. Thus before the game begins, your probability of victory is $\Pr(X)$, after your first guess it is $\Pr(X\ |\ A)$, after your second guess it is $\Pr(X\ |\ (A \cap B))$, after your third guess it is $\Pr(X\ |\ (A \cap B \cap C))$, etc.

I do not claim that $\Pr(X\ |\ (A \cap B))$ can be numerically pinned down by knowing $\Pr(X)$, $\Pr(X\ |\ A)$, and $\Pr(X\ |\ B)$ alone. In addition to how much of a clue each coloring would provide in a vacuum (i.e., treating each guess as if it had been the first guess), there ought to be some additional degree(s) of freedom representing how the information returned by each clue interacts with the information returned by each other clue.

However, despite applying Bayes' Theorem and other basic probability manipulations to $\Pr(X\ |\ (A \cap B))$, and even trying this, I cannot get $\Pr(X\ |\ A)$ or $\Pr(X\ |\ B)$ to appear in the expression at all, which seems to suggest that $\Pr(X\ |\ (A \cap B))$ is not dependent on $\Pr(X\ |\ A)$ or $\Pr(X\ |\ B)$ in the ordinary algebraic sense. This strikes me as absurd. It seems obvious that a player should generally be excited to see an informative clue (in a vacuum, i.e., a clue with many colored squares), even if it is possible that in some unfortunate cases, a less informative clue would have been more broadly informative when considered in conjunction with all previous clues.

Is this intuition correct? If so, how can I manipulate $\Pr(X\ |\ (A \cap B))$ in such a way that its relation to $\Pr(X\ |\ A)$ and $\Pr(X\ |\ B)$ is made clear? I assume that information theory is probably a better framework for modeling this, but I don't know it yet, and so I am trying to stick to the tools of basic probability theory. However, a pointer to which topics in information theory are relevant would also be a useful aside, as I would like to learn more about this type of problem for modeling more complex contexts, such as the information returned by multiple indicators in technical analysis.

1

There are 1 best solutions below

4
On BEST ANSWER

I don't know what wordle is. But there is the identity $P(X \mid A \cap B) = P_{A}(X \mid B)$, where $P_{A}$ is my made up notation for the probability measure defined by $P_{A}(S) = P(S \mid A) = P(S \cap A)/P(A)$. The identity can be proved using the definition of conditional probability. The identity says that conditioning on events one by one is the same as conditioning on all the events at once. This kind of "updating" your distribution by conditioning falls under Bayesian inference.