Suppose if I am building a Linear Regression model with one fully connected layer and a sigmoid with minimizing mean squared error as objective. I understand that this network has a convex error surface since the functions involved, affine transformation, sigmoid, and objective are convex.
$$y' = sigmoid(W.X + b)$$ Minimize $$ (y - y')^2 $$ where $X$ is input vector, $y$ is the actual output, $y'$ is the predicted output and, $W$ is the parameter matrix.
Since it is a convex surface, we will be able to find a global optimum. However, if the data is not linearly separable, is the global optimum we found is still the best solution? Is it possible to have a better solution with the addition of extra layers and convex or non-convex non-linearity?