Theorem: Let $Y=X\beta+\varepsilon$ where $$Y\in\mathcal M_{n\times 1}(\mathbb R),$$ $$X\in \mathcal M_{n\times p}(\mathbb R),$$ $$\beta\in\mathcal M_{n\times 1}(\mathbb R ),$$ and $$\varepsilon\in\mathcal M_{n\times 1}(\mathbb R ).$$
We suppose that $X$ has full rank $p$ and that $$\mathbb E[\varepsilon]=0\quad\text{and}\quad \text{Var}(\varepsilon)=\sigma ^2I.$$ Then, the least square estimator (i.e. $\hat\beta=(X^TX)^{-1}X^Ty$) is the best unbiased estimator of $\beta$, that is for any linear unbiased estimator $\tilde\beta$ of $\beta$, it hold that $$\text{Var}(\tilde\beta)-\text{Var}(\hat\beta)\geq 0.$$
Proof
Let $\tilde\beta$ a linear unbiased estimator, i.e. $$\tilde\beta=AY\ \ \text{for some }A_{n\times p}\quad\text{and}\quad\mathbb E[\tilde\beta]=\beta\text{ for all }\beta\in\mathbb R ^p.$$
Questions :
1) Why $\mathbb E[\tilde\beta]=\beta$ for all $\beta$, I don't really understand this point. To me $\beta$ is fixed, so $\mathbb E[\tilde\beta]=\beta$ for all $\beta$ doesn't have really sense.
2) Actually, what is the difference between the least square estimator and the maximum likelihood estimator. They both are $\hat\beta=(X^TX)^{-1}X^Ty$, so I don't really see (if they are the same), why we give two different name.
1) The condition $\mathbb{E}[\tilde{\beta}]=\beta$ is just the condition "the estimator is unbiased" in mathematical form. Let's say you are considering the least squares estimator, then $$ \begin{align} \mathbb{E}[\hat{\beta}] &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}Y]\\ &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}X\beta+\epsilon]\\ &= \beta, \end{align} $$ and thus the least squares estimator is unbiased. You do have to assume that the noise is zero mean by the way. So not every estimator of the form $\tilde{\beta}=AY+D$ is unbiased.
2) Maximum likelihood and least squares are equivalent under certain conditions, that is if you assume the noise $\epsilon$ is Gaussian. Change that and they won't be the same.