I've started studying maximum likelihood estimation for regression. What I am trying to do is to understand the process step by step. That is how I've structured the material I've read:
The population regression model is $Y=f(x)+ε$, where $f(x)$ depends on parameters $θ_1,θ_2,…,θ_k$;
Let’s assume that the error term ε is a vector of i.i.d. RVs with the pdf $g(ε)$, $0$ mean $(E(ε)=E(Y-f(x))=E(y│x)-E(y│x)=0)$ and unknown variance $σ^2$ [for instance $ε$~$N(0,σ^2)$];
The pdf of $Y$ ($Y=y|x$) is $g(Y)$, which is the same as $g(ε)$ because $Y$ is linearly dependent on $ε$. The only difference is the mean: $0$ for $g(ε)$ and $f(x)$ for $g(Y)$;
From the general population a sample consisting of $n$ pairs of $(y_i;x_i)$ is collected;
Using MLE, we want to find estimates of $θ_1,θ_2,…,θ_k$ i.e. $\hatθ_1,\hatθ_2,…,\hatθ_k$ and the estimate of $σ^2$ i.e. $\hatσ^2$.
$$L(y_i;θ,σ^2 )=\prod\limits_{i=1}^{n} g(y_i) \tag{1}$$
Now, let`s assume that $ε$~$N(0,σ^2)$, which means that $Y$~$N(f(x),σ^2)$. Thus, the pdf of $y_i$ should be:
$$g(y_i)=\frac{1}{σ\sqrt{2π}}e\left(\frac{-(y_i-f(x_i))^2}{2σ^2}\right)\tag{2}$$
Following that, we can rewrite the MLE function $(1)$:
$$L(y_i;θ,σ^2) = \prod\limits_{i=1}^{n} \frac{1}{σ\sqrt{2π}}e\left(\frac{-(y_i-f(x_i))^2}{2σ^2}\right) =$$ $$ = \left(\frac{1}{σ\sqrt{2π}}\right)^n e\left(-\frac{1}{2σ^2} \sum\limits_{i=1}^{n} (y_i-f(x_i))^2\right) \tag{3}$$
$$ln L(y_i;θ,σ^2) = -\frac{n}{2}lnσ^2 -\frac{n}{2}ln\sqrt{2π} - \frac{1}{2σ^2} \sum\limits_{i=1}^{n} (y_i-f(x_i))^2 \tag{4}$$
If the $σ^2$ is fixed, $ln L(y_i;θ,σ^2)$ can be maximised by minimising $\sum\limits_{i=1}^{n} (y_i-f(x_i))^2$. After finding $\hatθ_1,\hatθ_2,…,\hatθ_k$, we can get the estimate of $f(x)$, namely $f\hat(x_i)$, and find the estimate of $ε$: $\hatε_i=y_i-f\hat(x_i)$.
I would appreciate if you could check whether I made any mistakes. For some reasons, I am particularly worried about using population parameters in the bullets $1-3$ and then switching to sample parameters in the bullets $4-5$.