The sigmoid function is $\sigma(x) = \frac{1}{1 + e^{-x}}$ and its derivative is $\sigma'(x) = \sigma(x) * (1 - \sigma(x))$. In this implementation of a simple neural network I saw the derivative of the sigmoid function is being calculated as $\sigma'(x) = x * (1 - x)$, which doesn't look anything like the actual derivative, but works even better than the real derivative (at least in the XOR example).
I can't find anything on the internet describing why this function can be used instead of the actual derivative. Can anyone explain to me or point out a paper I should look at to gather a better understanding?
While the code defines the sigmoid derivative as a function of $x$, if you look at where the function is actually called:
so the input to the derivative function is the value of the output, i.e. it is applying $\sigma'(x) = \sigma(x)(1 - \sigma(x))$ correctly.