My book is introducing the need for Chebyshev's Inequality in the following way, and I'm confused at some of the constructs used here.
Suppose we have a biased coin, but we don’t know what the bias is. To estimate the bias, we toss the coin $n$ times and count how many Heads we observe. Then our estimate of the bias is given by $\hat{p} = \frac{1}{n}S_n$ , where $S_n$ is the number of Heads in the n tosses. Is this a good estimate? Let $p$ denote the true bias of the coin, which is unknown to us. Since $E(S_n) = np$, we see that the estimator $\hat{p}$ has the correct expected value: $E(\hat{p})=\frac{1}{n}E(S_n) = p$. This means when $n$ is sufficiently large, we can expect $\hat{p}$ to be very close to $p$; this is a manifestation of the Law of Large Numbers.
Starting from the beginning, this is my understanding:
$\hat{p} = \frac{1}{n}S_n$ comes from the fact that $n\hat{p} = S_n$
$E(S_n) = np$ because...
Let $X$ be the number of heads in $n$ coin tosses. Let $X_i$ be $1$ if the toss is heads and $0$ otherwise. Then, the $E[X_i] = Pr[X_i = 1] = p$. By linearity of expectation we can say $E(X) = np$
Th apart I'm confused about is the following line
...we see that the estimator $\hat{p}$ has the correct expected value: $E(\hat{p})=\frac{1}{n}E(S_n) = p$.
So they are definitely treating $\hat{p}$ as a random variable, and it apparently has a distribution. This sort of makes sense to me since $\hat{p}$ is an estimate so it can take on values $\hat{p} \in [0, 1]$. Apart from that I'm very confused about the right hand side of that statement. Any clarification would be much appreciated
EDIT: After reading my post, I realized the same logic I applied to the initial definition of $\hat{p}$ could be of use. Does this make sense?
$nE(\hat{p}) = E(S_n)$ ? If so, where does $p$ come from?
Your basic concepts about Statistics are not clear. Let me try to help you.
$p$ is a parameter, which is unknown to you. You want to estimate it, i.e. have some idea about $p$. You thus do the following experiment: you take a coin whose probability of landing Head is $p$ and you toss the coin $n$ times. Let $X_i=1$ if the $i$-th toss is Head, and $X_i=0$ if $i$-th toss is Tail.
You correctly notes that $E(X_i)=P(X_i=1)=p$.
Now you believe that average of the number of heads would give you a good estimate of $p$, intuitively. So you take the estimator $\hat{p}=\dfrac{S_n}{n}$ where $S_n=X_1+X_2+...+X_n$.
Since $X_i$ are random, your $S_n$ is random, and thus your $\hat{p}$ is also random. You now want to calculate its expected value.
$E(\hat{p})=E(\dfrac{S_n}{n})=\dfrac{1}{n}E(S_n)=\dfrac{1}{n}E(\sum_{i=1}^n X_i)=\dfrac{1}{n} \sum{i=1}^n E(X_i)=\dfrac{1}{n}np=p$
So $\hat{p}$ is an unbiased estimator of $p$. Your $p$ is non-random but your $\hat{p}$ is. You observe $\hat{p}$ from data, not $p$.