How to prove a MSE loss function is convex for linear regression

3.4k Views Asked by At

$$MSE(w)=1/N∥y−Xw∥^2.$$ from what I have understood, If I'm able to prove $MSE(w) +\nabla MSE(w) * (z-w) \leq MSE(z)$. I found the partial derivative of $MSE(w)$ but I'm stuck there. Can someone help me with this proof?

1

There are 1 best solutions below

0
On BEST ANSWER

This is a consequence of two facts: the norm squared $x\to\|x\|^2$ is convex (this follows from the triangle inequality and the homogeneity of the norm) and your error function is a composition of the norm and an affine function. Such compositions are always convex, as you can check by definition.

1) Convexity of $x\to\|x\|^2$:

I am going to choose the Euclidean norm. In this case $$ \|x\|^2=\sum_{k=1}^dx_i^2 $$ and then the second partial derivatives $\partial^2 f/\partial x_i\partial x_j$ are zero if $I\neq j$ and equal $2$ if $I=j$. In other words, the Hessian is diagonal with $2$ on the diagonal, so it is positive definite.

2) If $g:D\subset \mathbb{R}^d$ is a convex set and $g(x)=\langle w,x\rangle +b$ is any affine function and $f:\mathbb{R}\to\mathbb{R}$ is a convex real function, then $f\circ g$ is convex in $D$. Proof:

Let $x_1,x_2\in D$ and $\alpha\in[0,1]$. We have $$ f\circ g(\alpha x_1 +(1−\alpha)x_2)=f(⟨\alpha x_1 +(1−\alpha)x_2,w⟩+b) $$ $$ = f(\alpha⟨x_1, w⟩ + (1 − \alpha)⟨x_2, w⟩ + b) $$ $$ = f(\alpha(⟨x_1, w⟩ + b) + (1 − \alpha)(⟨x_2, w⟩ + b)) $$ $$ \le \alpha f(⟨x_1, w⟩ + b) + (1 − \alpha)f(⟨x_2, w⟩ + b) $$ $$ =f\circ g(x_1)+f\circ g(x_2). $$ by the convexity of $f$.