I would like to implement a Multi layer Perceptron with different activations functions (right now i just begin with the tanh for the hidden layers and Softmax for the output layer). I realize that one nice way to implement it in order to apply the chain rule in backprop algorithm and keeping some flexibility for queuing the futurs activations function (sigmoid,Relu, ...) between the affine transformations could lay on implementing different Modules (as Pytorch do ? i don't know really well the framework but it seem to proceed alike). Something like the folowing:
class CrossEntropyLoss:
def Loss(self,o,y):
return -np.dot(y,np.log(o))
def Jacobian(self,o,y):
""" Return the gradient """
delta=-y/o
delta.shape=(1,-1)
return delta
class SoftmaxModule:
def forward(self,z):
o = np.exp(z)/sum(np.exp(z))
return o
def Jacobian(self,z):
o = self.forward(z)
SM = o.reshape((-1,1))
jac = np.diagflat(o) - np.dot(SM, SM.T)
return jac
def backward(self,delta,z):
return delta.dot(self.Jacobian(z))
class LinearModule:
""" This module take in charge the affine transformation of the input,
thus will store the gradient with respect to his weights (parameter to learn)
AND gradient with respect to its input TO PROPAGATE THE ERROR"""
class TanhModule:
pass
From mathematical point of view, the backpropagation is quite clear but it could raise some problem when one of the the probabilities output (marked as o in the code) is null (division by 0). So i thought of computing directly a function backward(self,y,z) in the SoftmaxLayer wich take the hypothesis than the cost function is the Cross entropy, with $y$ being the true label (after one hot encoding):
def backward(self,y,z):
return self.forward(z)-y
but i loose generality if i want to use an other Cost function than cross Entropy ...
What would you suggest as a design implementation ? I didn't find some similar implementation of FeedForward NN and i am not a programmer so i loose some clarity on the best way to tackle the problem...