What is a specific observation of a probability parameter estimate? Programmer trying to understand statistics

180 Views Asked by At

I am trying to understand the following statement about a collection of independent and identically distributed Bernoulli random variables.

We have $\theta$ as the probability of success of the Bernoulli random variable. Generally we can think of a specific observation of the probability parameter estimate as a random variable, $\hat{\theta}$. that can be estimated as $\hat{\theta} =\bar{Y} = > \frac{1}{n}\sum_{i=1}^n Y_i$ where $Y_i$ are Bernoulli random variables. We can see that our parameter estimate(random variable) is equal to the sample mean of the $Y_i$ follows a normal distribution $\mathcal{N}(\theta,\theta(1-\theta)/n)$ and so we can use the z-statistic for the confidence intervals for the Bernoulli parameter $\theta$.

I am from a programming background and new to statistics. I think I understand that in statistics, parameters refer to populations, unlike in programming where parameters are passed as inputs to a function.

I am having trouble processing several things about the above statement.

  1. What is meant by a specific observation of the probability parameter estimate. How does one observe an estimate? Would "observing" mean "calculating" here?

  2. What name do I use to describe the inputs to this normal distribution $\mathcal{N}(\theta,\theta(1-\theta)/n)$ ?

As a programmer I would call them parameters. But I am understanding that is wrong in statistics.

  1. I am understanding that the variance of a binomial distribution is given by $np(1-p)$ and I know that with CLT we divide the variance by n. However shouldn't that mean that the input to the normal distribution would be just $\theta(1-\theta)$ since the $n$ would cancel out?

  2. Is the $p$ in my understanding of a binomial probability the same as $\theta$ here?

I am understanding that

  • $\hat{\theta}$ means "a point estimate of $\theta$".
  • $\bar{Y}$ means "The mean of Y"

[Update]

From River's answer I see that in $\frac{1}{n}\sum_{i=1}^n Y_i$ the fraction is causing a scaling, where as to me it looks like it is causing an ordinary maths multiplication. Is there any special notation to differentiate scaling and/or translations from ordinary multiplication and addition operations?

2

There are 2 best solutions below

5
On BEST ANSWER
  1. What is meant by a specific observation of the probability parameter estimate. How does one observe an estimate? Would "observing" mean "calculating" here?

Let's take a specific example. Suppose a pollster wants to figure out the true percentage $\theta$ or $p$ of voters in their country who approve of Polly Politician. They pick out a series of random people, Voter 1, Voter 2, Voter 3, ..., Voter $n$ and ask them "Do you approve of Polly? (Y/N)". The responses they get are the random variables $Y_1, Y_2, Y_3, ..., Y_n$, where the value of $Y_i = 0$ if the voter does not approve, and $Y_i = 1$ if the voter does approve. If the sample is a) small enough compared to the entire population, and b) randomly enough selected from the population, then we may assume that $Y_i$ are approximately i.i.d. Bernoulli with $P(Y_i = 1) = \theta$.

A "specific observation" would be the actual observed sequence of responses (or corresponding $0$'s and $1$'s) gotten from Voters 1 through $n$ here when the pollster asked their opinion. We can loosely speak of any statistic based on/calculated from these observations, such as the sample mean $\hat{\theta}$ which we are using to estimate the true population parameter $\theta$, as also having been "observed" when the pollster performed their experiment.

  1. What name do I use to describe the inputs to this normal distribution $\mathcal{N}(\theta,\theta(1-\theta)/n)$?

You would still call $\theta$ and $\theta(1 - \theta)/n$ "parameters" of the distribution, statistically.

  1. I am understanding that the variance of a binomial distribution is given by $np(1−p)$ and I know that with CLT we divide the variance by $n$. However shouldn't that mean that the input to the normal distribution would be just $θ(1−θ)$ since the $n$ would cancel out?

$n\hat{\theta} = Y_1 + ... + Y_n$ is binomial with mean $n\theta$ and variance $n\theta(1-\theta)$, which means it's approximated by a normal $X_n$ with mean $n \theta$ and variance $n \theta(1-\theta)$:

$$n\hat{\theta} \approx X_n, \text{ where } X_n \sim \mathcal{N} (n\theta, n\theta(1-\theta)).$$

Dividing by $n$ to get the sample mean $\hat{\theta}$ gives us the approximation

$$\hat{\theta} \approx \frac{1}{n} X_n,$$

and $\frac{1}{n} X_n$ will still be normal, and its mean has been divided by $n$, but its variance has been divided by $n^2$ (reason: variance is average squared distance from the mean; if we divide all the distances by $n$, we divide the squared distances by $n^2$). So our normal approximation $\frac{1}{n} X_n$ to $\hat{\theta}$ has distribution $\frac{1}{n} X_n \sim \mathcal{N}(\theta,\theta(1-\theta)/n)$.

Is the $p$ in my understanding of a binomial probability the same as $θ$ here?

Yep, different notation for the same thing.

0
On
  1. I think that "specific observation" isn't a very good term here. Usually observation (of random variable) means a concrete value of this variable we got from experiment.

  2. They are named parameters, too (more specifically - mean and variance). Generally, in mathematics (including statistics), parameter is something we use to define a concrete object from some set. For example, linear function can be defined by two parameters: $f(x) = ax + b$. When you choose specific values, say $a = 2$ and $b = 3$, you get concrete linear function $f(x) = 2x + 3$. Sometimes in such cases we say "function $f$ is parameterized by $a$ and $b$".

  3. When we divide variable by $n$, it's variance is divided by $n^2$. CLT says that if we divide our binomial distribution by $\sqrt{n}$, we get approximately normal distribution $\mathcal{N}(\sqrt{n}\theta, \theta (1 - \theta))$. But we divide it by $n$, not by $\sqrt{n}$ - so we divide our approximately normal distribution by $\sqrt{n}$ once more, and get approximately $\mathcal{N}\left(\theta, \frac{\theta (1 - \theta)}{n}\right)$.

  4. Yes, it is.