I am getting confused reading online about Gradient Descent, Convex and Non Convex Loss functions.
Multiple resources I referred to mention that MSE is great because its convex. But I don't get how, especially in the context of Neural Networks.
Let's say we have the following:
$X$: Training Dataset
$Y$: Targets
- $\theta$: Set of Parameters of the Model (NN Model with non-linearities)
Then:
$$ \text{MSE}(\theta) = (\text{Feedforward}_{\theta}(X) - Y)^2 $$
Now I dont seem to agree that this MSE Loss function is always convex, it depends strongly on $\text{Feedforward}_{\theta}$, right?
The mean square error is convex in that the MSE is convex on its input and parameters by itself. Applied to the neural network case (e.g. with the model including parameters from the neural network), MSE is certainly not convex unless the network is trivial.