Why is this simpler analysis for the check associativity problem incorrect?

30 Views Asked by At

The check associativity problem is:

Input. A binary operator $*$ on a set $X$ of size $n$.

Output. "Yes" if $*$ is associative (i.e., $\forall{i, j, k \in X}, (i*j)*k = i*(j*k)$); "No" otherwise.

The randomized algorithm given below is due to [RS96].

  1. Uniformly at random, pick 3 sets $R, S, T$ from the powerset $2^X$.
  2. if $(R * S) * T \neq R * (S * T)$: output "No"
  3. else output "Yes" (with one-sided error).

If the algorithm outputs "No", it is always correct. We want to bound the probability that the algorithm makes a mistake if $X$ is non-associative.

If $X$ is non-associative then there must exist at least one triple $i', j', k'$ such that $(i'*j')*k' \neq i'*(j'*k')$.

We want to bound the probability of the event that $X$ is non-associative but we output "Yes". Formally, we want to prove the following claim:

Claim. If $*$ is not associative, then at least 1/8 of the possible triples $R$, $S$, $T$ are witnesses, i.e., $$\Pr[(R * S) * T = R * (S * T)] \leq \frac{7}{8}.$$

I am writing out each claim of the analysis so it is easier to identify my mistake:

  1. We make a mistake if we happen to pick triples $R$, $S$, and $T$ such that $i' \notin R$, $j' \notin S$, OR $k' \notin T$.

  2. If we pick a set u.a.r. from $2^X$ then the probability that any fixed element is in that set is 1/2. (Why? This is because each element is in exactly half the sets in $2^X$.)

  3. $\Pr[(R * S) * T = R * (S * T)] \leq \Pr[i' \notin R \; \lor \; j' \notin S \; \lor \; k' \notin T]$. (Inequality is because there could be other non-associative triples.)

  4. Then: $$ \begin{align} \Pr[i' \notin R \; \lor \; j' \notin S \; \lor \; k' \notin T] &= 1 - \Pr[i' \in R, j' \in S, k' \in T] \\ &= 1 - \Pr[i' \in R] \cdot \Pr[j' \in S] \cdot \Pr[k' \in T] \\ &= 1 - \frac{1}{2} \cdot \frac{1}{2} \cdot \frac{1}{2} \\ &= \frac{7}{8} \end{align} $$ where the second step is because $R$, $S$, and $T$ are selected independently.

The more standard analysis is given here. Can anyone please help me understand where I went wrong?

[RS96] S. Rajagopalan and L. Schulman, "Verifying identities." Proceedings of 37th IEEE Symposium on Foundations of Computer Science (FOCS), 1996, pp. 612–616.