Convexity of MSE in Neural Networks?

4k Views Asked by Bumbble Comm At 27 Mar 2026 - 3:07

I am getting confused reading online about Gradient Descent, Convex and Non Convex Loss functions.

Multiple resources I referred to mention that MSE is great because its convex. But I don't get how, especially in the context of Neural Networks.

Let's say we have the following:

$X$: Training Dataset
$Y$: Targets
$\theta$: Set of Parameters of the Model (NN Model with non-linearities)

Then:

$$ \text{MSE}(\theta) = (\text{Feedforward}_{\theta}(X) - Y)^2 $$

Now I dont seem to agree that this MSE Loss function is always convex, it depends strongly on $\text{Feedforward}_{\theta}$, right?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 22 Aug 2017 - 3:16 BEST ANSWER

The mean square error is convex in that the MSE is convex on its input and parameters by itself. Applied to the neural network case (e.g. with the model including parameters from the neural network), MSE is certainly not convex unless the network is trivial.

Bumbble Comm On 29 Sep 2021 - 12:50

Understanding the convexity of MSE in NN with an example.

Let's consider a very simple case of $1$ input neuron, $1$ hidden neuron and $1$ output neuron. Let $w_1$ be the weight between input and hidden neuron, with sigmoid activation in the hidden unit and weight $w_2$ between hidden unit and the output neuron and no activation in the final unit/neuron. Let's ignore the bias terms for now.

The forward pass would be as follows: \begin{align} \text{Input: } & x\\ \text{Computation in the hidden unit: } & z = \sigma(w_1x) \\ \text{Output: } & \hat y = w_2 z \\ \text{Ground truth: } & y \\ \text{Loss: } & MSE = \frac{1}{2} (y-\hat y)^2 \end{align}

Now you can prove that this loss is convex with respect to $w_2$ but not with respect to $w_1$. You can take second order derivative of loss with respect to these weights to show this. So if you plot this loss on say Z-dimension with $w_1$ and $w_2$ on X & Y dimension, the loss would be non-convex in one dimension and convex in another dimension respectively, making it non-convex overall.

Hope this helps.

Convexity of MSE in Neural Networks?

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NEURAL-NETWORKS

Related Questions in NON-CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions