Finding counterfeit coins

191 Views Asked by At

Suppose I have $N$ rare coins, of which $M \le N$ are counterfeits. I am blind. I ask an oracle who charges me a penny to tell me in yes/no answers whether there is a counterfeit in any group I show her. Is there a strategy to identify all $M$ coins with minimal cost, preferably something better than $O(N)$?

This sounds like a variant of the counterfeit coin problem but I cannot find a good solution.

EDIT: In the case of $M=1$, one obvious solution is on order of $\log_2(N)$ where you number each coin in order, in base $2$. Then test each group by digits wherever the coin has a 1 in that digit. Then the counterfeit should be identified by the oracle readout per digit, in base $2$.

2

There are 2 best solutions below

9
On

For $M=2$ you can do it in $2 \log_2 N$ by the same binary search as for $M=1$. Split the coins into two halves and ask about one half. If you get yes, ask about the other. If you get yes again, you have two binary searches for one coin each. If you get no, you are looking for two coins in $\frac N2$. You save one question if you get a no the first time while you have not separated the pair because you don't have to ask about the second half.

The information theoretic approach would be to try to keep the probability of each answer close to $\frac 12$. If you ask about a group of $k$ coins, the chance of a no answer is $\left( 1-\frac MN\right)^k$. $$\left( 1-\frac MN\right)^k=\frac 12\\ k \log \left( 1-\frac MN\right)=-\log 2\\ k=-\frac {\log 2}{\log \left(1-\frac MN\right)}$$ This does not account for the effects of the coins being discrete, but if you have $10\%$ counterfeits you should ask about groups of $6$ or $7$

My proposed strategy for large $N$ and at least moderate $M$, which I have not proven optimal, would be to compute $k$ from the above and ask about a group of $k$ coins. If you get no, discard the coins, update $N$ and continue. If you get yes, solve the $k$ coins and update both $M,N$ for the next group. Since the group will be small and you are not certain how many counterfeits are in it, it is probably optimal to just ask coin by coin. There is an opportunity to optimize the endgame where $N$ is not large.

0
On

Let $f(n, m)$ be the optimal worst-case cost, $0 \le m \le n$. You already got that $f(n, 1) \sim \log_2(n)$. Specifically, $f(n, 1) = \lceil \log_2(n) \rceil$. Also $f(n, 0) = f(n, n) = 0$ since no guesses will be needed to determine the fakes.

$f(n, n-1)$: Since there is only one real coin, the only way to determine the real one is to go through all coins (except possibly the last one) - no point in splitting piles up. Therefore $f(n, n-1) = n-1$.

$f(4, 2)$: You can either ask about a pile of size $1$ or $2$ (no point in asking about a pile of size $3$ since it is guaranteed to have a counterfeit). If you ask about a pile of $1$, then that reduces to $1+\max(f(3, 1), f(3, 2)) = 1+\max(2, 2) = 3$. If you ask about a group of size $2$ and you get a "no" answer, you are done. If you get a "yes" answer, then you will have to ask $2$ more questions, so $f(4, 2) = 3$.

$f(n, 2)$: Let's ask about $k$ coins, $1 \le k < n$. If "no" is the response, then it becomes $1+f(n-k, 2)$. For the case where "yes" is the response, I will define a helper function, $g(a, b)$, which represents the optimal worst-case cost for where there are $2$ counterfeits total, $a$ is the size of the group of coins that is guaranteed to have at least $1$ counterfeit, $b$ is the size of the group with an unknown number of counterfeits (either $1$ or $0$). Then $f(n, 2) = \min(\max(1+f(n-k, 2), 1+g(k, n-k)) \forall k, 1 \le k < n)$.

$g(a, b)$: $g(a, 0) = f(a, 2)$, $g(1, b) = f(b, 1)$. We have two options - either ask about $k_1 (1 \le k_1 < a)$ from group $a$ or $k_2 (1 \le k_2 \le b)$ from group $b$. If we ask about $k_1$ and get a "yes" response, that becomes $1+g(k_1, a+b-k_1)$. If we get a "no" response, that becomes $1+g(a-k_1, b)$. If we ask $k_2$ and get "yes", that becomes $1+f(a, 1)+f(k_2, 1)$. If we get "no", that becomes $1+g(a,b-k_2)$. Therefore $g(a, b) = \min(\max(1+g(k_1, a+b-k_1), 1+g(a-k_1, b)) \forall k_1, 1 \le k_1 < a, \max(1+f(a, 1)+f(k_2, 1), 1+g(a,b-k_2)) \forall k_2, 1 \le k_2 \le b)$

Using this, I managed to find A200311 in the OEIS. Note that the OEIS entry is shifted by two (i.e. A200311(n+2) = f(n, 2)) and the first entry is different. From that, I found that $$f(n, 2) = \left\lfloor 2\log_2(n-1) + 2\log_2 \left(\frac{7}{6} \right) \right\rfloor +_? 1$$

(I couldn't find the notation for "$x$ or $x+1$", so just made something up - hope it is clear)

What I noticed was that for the first group, you could take anything from $\approx n/4$ to $ \approx n/3$. For example, for $n = 400$, anything from $96 \approx \frac{400}{4}$ to $128 \approx \frac{400}{3}$ was okay for the size of the first guess. Anything else was suboptimal. In this range, the worst-case cost was the same whether the oracle answered "yes" or "no" to that query.

Edit for f(n, m):

I will define a helper function, $g(p_1;p_2;...;p_k, b, c)$, for the optimal worst-case cost where $p_i$ represents the number of piles of size $i$ guaranteed to have at least one coin, $b$ coins not examined, and $c$ total counterfeits in the configuration.

The first query is equivalent to $g$, $$f(n, m) = g(\lbrack \rbrack, n, m)$$

Let's take a look at $g$ now. The base cases are $g(p_1;p_2;...;p_k, b, 1\cdot p_1 + 2\cdot p_2 + ... + k\cdot p_k + b) = 0$ and $g(p_1;p_2;...;p_k, b, p_1+p_2+...+p_k) = \sum_{i=1}^{k} p_i \cdot f(i, 1)$.

For any other time, we have a number of options: from a pile of size $t$ with a known counterfeit, we can ask any query up to size $m_1, 1 \le m_1 < t$. Let's say we pick $m_1$ from $t$. Then with a "yes" response, that makes $p_t$ go down $1$, $p_{m_1}$ go up $1$, and $b$ go up $t-m_1$. With a "no" response, that makes $p_t$ go down $1$ and $p_{t-m_1}$ go up $1$.

From the "$b$" pile, we can ask a query up to size $m_2, 1 \le m_2 \le \min(b, b-c+\sum_{i=1}^{k} k\cdot p_k)$. With a "yes" response, $p_{m_2}$ goes up $1$ and $b$ goes down $m_2$. With a "no" response, $b$ goes down $m_2$.

Using a Python program, I got the correct optimal worst-case cost (code at bottom). What I noticed was that $$f(n, n-k) = n-1$$ for a fixed $k$ after $n$ is sufficiently high. For example for $k = 7$, $f(n, n-7) \not = n-1$ only for $ n \le 10$. More generally, numerical results strongly suggest that $f(n, n-k) = n-1$ if $$n \ge \left\lfloor \frac{3k+1}{2} \right\rfloor$$

Equivalently, $f(n, m) = n-1$ for $$m \ge \left\lceil \frac{n}{3} \right\rceil$$

The Python code is rather slow - it bruteforces every possible branch. I didn't implement alpha-beta pruning as I could have.

def f(n, m, dic1 = {}, dic2 = {}):
    T = (n, m)
    if T in dic1:
        return dic1[T]
    if m == 0:
        dic1[T] = 0
        return 0
    if m == 1:
        dic1[T] = math.ceil(math.log2(n))
        return dic1[T]
    if n == m:
        dic1[T] = 0
        return 0
    elif n == m+1:
        dic1[T] = m
        return m

    dic1[T] = g((), n, m, dic1, dic2)

    return dic1[T]

def g(P, b, c, dic1, dic2):
    #precondition: P[-1] != 0
    #would also speed up computation since more memoization possibility
    T = (P, b, c)
    #T = (tuple, int, int)

    numPiles = sum(P)
    totalCoins = sum((i+1)*P[i] for i in range(len(P)))+b

    if T in dic2:
        return dic2[T]
    if c == numPiles:
        #one coin in each pile
        dic2[T] = sum(P[i]*f(i+1, 1, dic1, dic2) for i in range(len(P)))
        return dic2[T]

    if c == totalCoins:
        #all the remaining coins are counterfeits
        dic2[T] = 0
        return 0

    worstCase = math.inf
    for index in range(len(P)):
        if P[index] != 0:
            #can only ask if there is a pile with that many coins
            t = index+1 #to adjust for 0-based indexing
            for m1 in range(1, min(t-1, totalCoins-c)+1):
                tmpP = P[:t-1]+(P[t-1]-1,)+P[t:]

                firstNewP = tmpP[:m1-1]+(tmpP[m1-1]+1,)+tmpP[m1:]
                secondNewP = tmpP[:t-m1-1]+(tmpP[t-m1-1]+1,)+tmpP[t-m1:]

                while len(firstNewP) > 0 and firstNewP[-1] == 0:
                    #to make sure that the last element is not zero
                    firstNewP = firstNewP[:-1]

                while len(secondNewP) > 0 and secondNewP[-1] == 0:
                    #to make sure that the last element is not zero
                    secondNewP = secondNewP[:-1]

                comp = 1+max(g(firstNewP, b+t-m1, c, dic1, dic2), g(secondNewP, b, c, dic1, dic2))
                if comp < worstCase:
                    worstCase = comp

    for m2 in range(1, min(b, totalCoins-c)+1):
        if len(P) < m2:
            firstNewP = P+(0,)*(m2-len(P))
        else:
            firstNewP = P
        firstNewP = firstNewP[:m2-1]+(firstNewP[m2-1]+1,)+firstNewP[m2:]
        comp = 1+max(g(firstNewP, b-m2, c, dic1, dic2), g(P, b-m2, c, dic1, dic2))
        if comp < worstCase:
            worstCase = comp

    dic2[T] = worstCase
    return worstCase