I was thinking of the following coin flipping scenario:
You have 2 coins, each with their own probability of being heads vs. tails (neither coin is necessarily fair, and the probabilities for each coin can be different, but both probabilities remain constant in time).
You do N independent trials of coin flips and record the results. However, you later discover that partway through (on the Kth trial, where K is the unknown to be solved for), the coins were switched. It is certain that after all N trials the coins have been switched exactly 1 time (coin 1 -> coin 2 occurs exactly once, and no other switches occur). Also, let's say that the probability of the switch occurring on a given trial is uniformly distributed, so the probability of the switch occurring on trial j is the same as on trial m.
The goal is to estimate on which trial the coins were switched.
This was my thinking:
p1 = probability coin 1 is heads, q1 = probability coin 1 is Tails (q1 = 1 - p1), p2 = prob. coin 2 is heads, q2 is prob. coin 2 is tails (q2 = 1 - p2),
Therefore, you have 2 PDFs. They have slightly different parameters but are both binomial distributions parameterized by p and m. The PDF for coin 1 is: P(x1=k1) = (n1 choose k1)(p1^k1)(q1^(n1-k1)), where n1 is the number of trials using coin 1, p1 is defined above, and k1 is the number of heads using coin 1. The PDF for coin 2 is analogous. However, we also know that n1 = K-1, since the number of trials using coin 1 is just the number of trials before the coins are switched.
So the problem is simply to find the index of the optimal splitting point where it is most likely the coins were switched. So a potential approach might be to brute force iterate through each index up to the last trial N, each time splitting the experiment into 2 groups before and after the trial index under consideration. Then you could estimate p1 and p2 by: p1 = Nheads1/n1, and p2 = Nheads2/n2, where Nheads1 is the number of heads in the 1st group of trials and n1 is the total number of flips using coin 1, and similarly for coin 2. You might want to choose the division that creates the biggest difference between the 2 distributions.
But another consideration might be that as time goes on, since the switch probability is uniformly distributed, the CDF increases linearly in steps of 1/N. In other words, after trial M (M<=N), the probability the switch has already happened is M/N. So towards the end of the whole experiment, the switch has probably happened already, whereas at the beginning, you are still probably using coin 1.
I would appreciate a solution or any help on this. in particular, what metric to use to determine the difference between the 2 distributions, and also how to incorporate the switching probability's dependence on time (trial index). Also, please be as explicit as possible about how you use the uniform distribution for switching enters your derivation, because ultimately I would like to generalize this to other switching distributions (e.g. Gaussian). Thanks!
You could use Maximum likelihood estimation (MLE). Your observations can be encoded by an non-decreasing sequence $S_n$ of the number of successes after $n$ trials. For ease of notation, I'll assume that the switch happens after K trials. Then your likelihood is:
$$L(S|p_1,p_2,K)=p_1^{S_K}(1-p_1)^{k-S_K}p_2^{S_N-S_k}(1-p_2)^{N-K-S_N+S_K}$$ You are looking for parameters $\hat{p_1},\hat{p_2},\hat{K}$ that maximize the above $L$. You correctly noted that $\hat{p_1}=\frac {S_K} K$ and $\hat{p_2}=\frac {S_N-S_K}{N-K}$, so you are left with maximizing $L(S|K)$ which can be done in a single run through the sequence.
If you have a non-uniform prior on $K$, you need to multiply the above $L$ by $P(K)$ and maximize that instead. It's often convenient to work with $\log L$ instead of $L$.