Perhaps the mathematically easiest case of fitting is linear least squares $L_2$ ("sum of squares"). This is so easy we often learn it already in high school or maybe in our first linear algebra class at college.
The last 10-15 years have seen lots of research in fitting with respect to other norms. $L_1$ ("sum of absolute values") being maybe the most famous example.
How could we mathematically with the algorithms available for these two norms be able to achieve completely arbitrary minimization?
One example would be the heavily discontinous "non-negative" error.
$$C(x) = \infty \cdot H(-x) + C_p(x)\cdot H(x)$$
For $H$ being the Heaviside step function. Infinite (or numerically suitably "veery large") values whenever $x$ is negative but $C(0) = 0$ some reasonably smooth increasing function $C_p(x)$ for positive $x$.
Such extremely non-linear behaviour must surely be difficult to be able to achieve using only well behaved methods, or might I be mistaken? I am interested in both research references and counter-examples as answers.