I know that $F$ distribution is used to test for $H_0:\sigma _1^2 = \sigma _2^2$.
However,later on in the book, I am also told that $F$ distribution is used to test for $H_0: \mu _1=\mu_2=\mu _3 \cdots =\mu_n$ for linear regression.
How does that work? How can we have two $H_0$ testing for different values under the same distribution? Just very confused about this.
First of all, in linear regression, the t-test is used to figure whether our $\hat{\beta_i}$ is statically significant or not. Yet, t-test can not tell us whether a set of independent variables (as a whole) has partial effect on a dependent variable or not. That is where the F-test comes in.
In linear regression, usually the Null Hypothesis for F-test is ${H_0:\beta_1=\beta_2=\beta_3....=\beta_i=0}$. It can be interpreted as ${X_1, X_2,...X_i}$ as a whole data set have no effect on Y (the dependent variable). The AlternativeHypothesis then is simply $H_1: H_0$ is not true. The Alternative hypothesis holds at least one of the $\beta_i$ is not zero. Therefore, it means our dataset as a whole is at least explaining something that is happening for Y.
There are various formulas to calculate F-statistics, but to keep it short we only go through one
$F= \frac{R_{ur}^2-R_{r}^2/q}{1-R_{ur}/{df}}$ ur(unrestricted) r(restricted) The simplest way to interpret F-test is that whether the decrease in R squared from unrestricted model to restricted model is big enough to reject Null hypothesis F-test is often used for dealing with multicollinearity issue. A lot of the time, we would have independents variables that are correlated with each other, since the assumption of non-multicollinearity doesn't satisfy here; therefore, it is difficult for us to conclude whether the group variables have partial effect on Y or not. Using F-test to test the group of variables(are highly correlated with each other) are statistically significant.