It has been some time since I have thought about proofs. I was hoping someone could help me check my logic. Here is a question I am trying to solve:
Consider a simple two-layer, single-hidden-node network with identify activation, $f(x) = w_2 w_1 x$. For a training set $\{(x_i, y_i)\}_{i=1, \ldots, N}$. Let's take the mean squared loss and setup
$$L(w_1, w_2) = \dfrac{1}{N}\sum_{i=1}^N (y_i - f(x_i))^2$$
Prove $L$ is non-convex through contradiction. I am given for any $(w_1, w_2)$, $L(w_1, w_2) = L(-w_1, -w_2)$. Also, if a particular pair $(w_1^*, w_2^*)$ is a global minimizer of $L$, what can we say about $(0, 0)$.
Attempt @ Solution
Suppose $L$ is a convex function and let $\textbf{w}^* = (w_1^*, w_2^*)$ be a global minimizer of the function. Then by definition for all $0 \leq t \leq 1$ and for all $w_1, w_2$ we have
$$L(tw_1 + (1-t)w_2) < tL(w_1) + (1 - t)L(w_2)$$
This condition requires that the straight line between any pair of points on the curve of $L$ be above or just meets the graph at the global minimum. Also, $L(\textbf{w}^*) < L(\textbf{w})$ for all $\textbf{w}$ because it is a global minimizer.
$$ L(0, 0) = L(w_1^*, w_2^*) - L(-w_1^*, -w_2^*), $$ which implies that the point $(0, 0)$ lies on the segment connecting these two points which is a contradiction to $L$ being convex. Also, $L$ being convex would mean $(w_1^*, w_2^*) = (0, 0)$ which would be a zero weight neural network and would never learn anything.
Thank you in advance for any insights!
$\def\T{^\mathrm{T}}\def\paren#1{\left(#1\right)}$The major issue in your attempt is conflating the two parts of the question, i.e. disproving convexity and studying $L(0, 0)$. Apart from this, there are a few minor issues:
Part 1
The deduction below omits the factor $\dfrac{1}{N}$ in front of $L$. Suppose $L$ is convex, then for any $w_1$ and $w_2$,$$ L(w_1, w_2) = \frac{1}{2} (L(w_1, w_2) + L(-w_1, -w_2)) \geqslant L(0, 0), $$ i.e.$$ \sum_{k = 1}^n (y_k - w_1 w_2 x_k)^2 \geqslant \sum_{k = 1}^n y_k^2. $$
Case 1: $\sum\limits_{k = 1}^n x_k y_k \neq 0$. Choose $w_1 = \dfrac{\sum\limits_{k = 1}^n x_k y_k}{2\sum\limits_{k = 1}^n x_k^2}$ and $w_2 = 1$, then$$ w_1^2 w_2^2 \sum_{k = 1}^n x_k^2 < 2w_1 w_2 \sum_{k = 1}^n x_k y_k \implies \sum_{k = 1}^n (y_k - w_1 w_2 x_k)^2 < \sum_{k = 1}^n y_k^2, $$ a contradiction.
Case 2: $\sum\limits_{k = 1}^n x_k y_k = 0$. In this case, $L(w_1, w_2) = w_1^2 w_2^2 \sum\limits_{k = 1}^n x_k^2 + \sum\limits_{k = 1}^n y_k^2$. For any $w_1, w_2 \neq 0$,$$ L(w_1, w_2) + L(2w_1, 0) \geqslant 2L\left( \frac{3}{2} w_1, \frac{1}{2} w_2 \right), $$ i.e.$$ w_1^2 w_2^2 \sum_{k = 1}^n x_k^2 + 2\sum_{k = 1}^n y_k^2 \geqslant \frac{9}{8} w_1^2 w_2^2 \sum_{k = 1}^n x_k^2 + 2\sum_{k = 1}^n y_k^2, $$ which implies $\sum\limits_{k = 1}^n x_k^2 = 0$, a contradiction.
Part 2
In fact, since$$ L(w_1, w_2) = (w_1 w_2)^2 \sum_{k = 1}^n x_k^2 - 2w_1 w_2 \sum_{k = 1}^n x_k y_k + \sum_{k = 1}^n y_k^2, $$ if $\sum\limits_{k = 1}^n x_k y_k \neq 0$, then the global minimizers of $L$ lie on the hyperbola $w_1 w_2 = \dfrac{\sum\limits_{k = 1}^n x_k y_k}{\sum\limits_{k = 1}^n x_k^2}$, otherwise they lie on the $w_1$- and $w_2$-axes. Therefore, if $w^*_1 = 0$ or $w^*_2 = 0$, then $(0, 0)$ is also a global minimizer, otherwise it is not one.