Knowing sample's expected value what can we say about the distribution probabilities?

67 Views Asked by At

Let's suppose we have a fair coin toss game with two outcomes, each with 50% probability. If I end up with outcome A, I earn 1, for the opposite, I lose 1. The expected value is then 0.

Now suppose I play some coin toss game (not necesarily that one). I get the same payoffs (+1 or -1) but I don't know what are the probabilities of A and B (I do know they sum up to 1).

After playing the game N times I end up with some result R. I calculate the expected value of my sample E = R/N.

Now, knowing the N and E, can I answer the following question: How likely is it that this game has expected value of 0 and I just got lucky?

I tried to solve it using simulations. For given sample size N I run 100k simulations of N coin tosses and calculate the "expected value" as result / N. Now I calculate how often these simulation result in expected value greater than I observed and use that as an answer.

Here's the code:

#include <random>
#include <iostream>
#include <vector>

int main()
{
    constexpr int NSTARS = 100;
    constexpr int NSIMS = 100000;
    constexpr int LOSING_INT = 0;
    constexpr int SAMPLE_SIZE = 12;
    constexpr double SAMPLE_EXP = 0.5;

    std::default_random_engine generator;
    std::uniform_int_distribution<int> distribution(LOSING_INT, LOSING_INT + 1);

    int hit_count = 0;

    double p[20] = {};
    std::vector<double> buckets = {-1.1, -.9, -.8, -.7, -.6, -.5, -.4, -.3, -.2, -.1, 0.0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1.1};

    for (int k = 0; k < NSIMS; k++)
    {

        double sum = 0;

        for (int i = 0; i < SAMPLE_SIZE; ++i)
        {

            int number = distribution(generator);
            double add = number == LOSING_INT ? -1.0 : 1.0;

            sum += add;
        }

        double exp = sum / (double)SAMPLE_SIZE;

        if (exp >= SAMPLE_EXP)
            hit_count++;

        for (int i = 0; i < (int)buckets.size() - 1; i++)
        {
            if (exp >= buckets[i] && exp < buckets[i + 1])
            {
                ++p[i];
            }
        }
    }

    for (int i = 0; i < (int)buckets.size() - 1; i++)
    {
        p[i] /= NSIMS;
    }

    for (int i = 0; i < (int)buckets.size() - 1; ++i)
    {
        std::cout << "[" << buckets[i] << ", " << buckets[i + 1] << "): " << p[i] << " \t";
        std::cout << std::string(p[i] * NSTARS, '*') << std::endl;
    }

    std::cout << "For E >= " << SAMPLE_EXP << " p = " << (double)hit_count / (double)NSIMS << std::endl;

    return 0;
}

My questions:

  1. Does this question even make sense? If I have a sample of 3 loses and 9 wins that adds up to 12 trials with result = 9 - 3 = 6. Average value is 0.5 and the simulation results suggests there's only 7% chance that the coin is fair. Can I really say, just after 12 trials, with 93% confidence that this particular game is not a fair coin toss? Or am I missing something there?
  2. If it's correct then how to derive this result without using simulations? I learned calculus, algebra, differential equations etc while at university some 20 years ago, but somehow statistics and probability theory eluded me. What sources (books, online courses) would you recommend to learn this topic so that I can solve these kind of problems using analytical tools?