Why linear activation function fails the universal approximation theorem for neural network

169 Views Asked by At

I understand what UAT is and how it holds true for sigmoid and RELU activation functions. I've seen enough articles and explanation which visually explains how Sigmoid/Relu activation units are used to construct the tower functions/ step functions which can approximate any curve.

enter image description here

Question : I still can't get my head around why linear activation functions cannot approximate any other function. I understand in theory that not every function is linear, hence we need non-linearity in our neural network to model any given function. This definitely makes sense to me but I'm finding it difficult to understand this graphically.

More specifically, Suppose I have a function f(x), Can't I can use series of linear activation functions ( outputs straight lines) in such a way that these straight lines create tower function in graph ( by tweaking parameters) just like how we do it in the case of Sigmoid and Relu function?