ERM for quadratic loss with respect to activation layer

32 Views Asked by At

So I have to find the hypothesis the ERM algorithm will return to the following quadratic loss: $l=(\frac{1}{1+e^{-wx}}-y)^2=(\sigma(wx)-y)^2$

I have separable set $\{(x_i,y_i)\}$ where $i=\{1,...,n\}$. My assumption is that the ERM will return the following hypothesis: $$ h(x_i)= \begin{cases} 1 & wx_i\geq 0\\ 0 & wx_i < 0\\ \end{cases} $$ My reasoning is that the minimal value of a quadratic function is $0$, and in order to achieve $0$ the sigmoid function $\sigma(wx)$ has to return a similar value to $y_i$.

$\sigma(wx)$ is relatively close to 1 when $wx \gg 0$ and $0$ when $wx \ll 0$ and sort of odd around $0$ (with a bias of 0.5). So that's why I gave that hypothesis.

Is my reasoning fine?