So, I got a repeated experiment with two outcomes, i.e. a coin toss, but the probabilities might change every toss and are independent. Typically, they might come in sequences of the same probabilities and then change at some point.
So, what I want to do now is to apply a hypothesis test to find out if the results of, say, 1000 tosses align with the expected outcome. I.e. the null hypothesis would be that the coin has the probabilities $p_{1} = 0.5, p_2 = 0.6, p_3 = 0.4, ... , p_{1000} = 0.1$ for heads, with a significance level of 1%. And the alternative hypothesis that it is different.
This is trivial of course if the probability always stays the same but I really need some help here.
Here is an approach that I think will work, although you may need to adjust it depending on how your p's are distributed.
Divide the data into (say) 10 bins, based on the range of $p_i$: [0, .1], (.1, .2], (.2, .3], ... , (.9, 10].
For each bin, sum the $p_i$s for the points in that bin. That value is the expected number of heads for the bin.
For each bin, sum the number of heads for the points in the bin. That value is the observed number of heads.
Perform a chi-square goodness of fit test, comparing the observed with the expected values.
Obviously you can tweak this procedure, varying the number of bins or the sizes of the bins.