Im readign a book about Deep learning and in this book the author states something like the following when explaining why we need an error function.
A neural net can bee seen as a system of equations. We have a bunch of unknowns (the weights) and a set of functions (one for each traning example). In our first examples we have been using a linear activation function so we could solve the system of equations. However when using a non liner activation function such as the sigmod we can no longer do this. Therfor we have to resort to some optimization method.
Now my question is if systems of equations are only possible to construct from linear functions or what is the author refering to in this statement?