Step Activation function...why does it work?!

90 Views Asked by At

I enjoy watching a channel called "Coding Train" because he does his work very informative and with an energy I can only envy. He did a video on neural networks and I'm kinda stumped on why it works.

https://www.youtube.com/watch?v=ntKn5TPHHAk In this he does a NN as easy as it can be, using a step function as the activation function. However I don't get why this works at all. The derivative should be 0 everywhere (assuming you define it yourself at point 0). So why does his "backpropagation" work? When you look at how he changes the weights it's not as in gradient descent at all is it? The partial derivatives of the gradient should be 0, therefore giving you nothing to optimize the weights right?

Can you please help solve my confusion? :D

Bonus question: If I somehow am wrong regarding the step function...why does his cost function work? Using gradient descent you try to optimize the cost function. However his cost function is C = label - prediction...this is linear so shouldn't have a minimum to descent to (or rather having infinity as minimum). However his Code still does the job he wants it to. Why?

1

There are 1 best solutions below

2
On BEST ANSWER

Yes, back-propagation with gradient descent(and its variants) requires the activation functions of all units to be differentiable (or differentiable in a piece wise fashion i.e. ReLU) in a neural network. Hence the step function can't (and wouldn't) want to be used.

However, the perceptron classifier (defined as using the step activation function) has a different learning algorithm and is not the same as that of a neural network. (see Wikpedia for more info about the algorithm).