Loss Function Clarification

17 Views Asked by At

I'm learning about Machine Learning by writing proofs for derivatives of machine learning operations (like matrix multiplication, softmax, etc). I've started working on Loss Functions, but I've found the information online to be contradictory.

I've written the following:

The MSE of an estimator is defined as below. $$mse(\hat{\theta}) = E_\theta [(\hat{\theta} - \theta)^2]$$ $$\frac{\delta}{\delta{\hat{\theta}}}((\hat{\theta} - \theta)^2) = 2(\hat{\theta} - \theta)$$ $$\frac{\delta}{\delta{\hat{p_i}}}((\hat{p_i} - t_i)^2) = 2(\hat{p_i} - t_i)$$

I want to calculate the gradient of the loss function w.r.t. to p_i. The sources I've find describing this process for backpropogation are vague and sometimes seem to contradict. What exactly are we calculating for the backward pass of the mean-squared-error?

How do we come to the conclusion of the MSE of an estimator (as defined by wikipedia)?