I have a standard NLS problem as follows:
$\ y=f(x)+v, v\sim \mathcal N (0,\sigma^2\Sigma)$
Here, $\Sigma$ is SPD, and $\sigma$ is unknown. If we knew the variance, I think the solution is pretty straight-forward (I may be wrong about this too). Namely, we just compute the log likelihood of our normally distributed function v, and since $\ v=y-f(x)$, we can plug it in to get the objective is to minimize $\ (y-f(x))^T \Sigma^{-1} (y-f(x))$
but I get confused when the $\sigma$ comes into play. I know that this is a maximum likelihood estimation problem, but I am not sure how to take what derivative to get a MLE estimate for $\sigma$ in this case.
Any pointers would be very helpful. Thanks!
According to the comments - i.e., the fact that $\Sigma$ is diagonal such that $\Sigma = diag(w_1,...,w_n)$, denote $\sigma^2 = \theta$,
$$ \mathcal{L}(\theta ; y_1,...,y_n) = (2 \pi)^{-k/2}|\Sigma|^{-1/2}\exp\{ - \frac{1}{2}(y-\mu)'\Sigma^{-1} ( y- \mu) \}\, $$ or $$ \mathcal{L}(\theta ; y_1,...,y_n) = (2 \pi)^{-n/2}(\theta^n\prod w_i)^{-1/2}\exp\{ - \frac{1}{2\theta}\sum_{i=1}^n\frac{1}{w_i}(y_i -\mu)^2 \}. $$ Hence, $$ \mathcal{l}(\theta ; y_1,...,y_n) \propto - \frac{n}{2}\ln\theta- \frac{1}{2\theta}\sum_{i=1}^n \frac{1}{w_i} (y_i - \mu)^2 , $$ by differentiating it w.r.t $\theta$ and reaarenging the equations you get $$ \hat{\sigma}^2=\hat{\theta} = \frac{1}{n}\sum_{i=1}^n\frac{1}{w_i} (y_i - \mu)^2\,. $$ You can easily verify that it indeed the Maximum LE of $\sigma^2$, and if you don't know $f(x)$, then just plug in $\hat{f}(x)$ instead of $\mu$. In addition, note that this is exactly a WLS problem.