I got a question studying Machine Learning.
in this diagram, the local gradient of output node is defined by:
the local graident of hidden node is defined by:
epsilon is defined like this:
By chaing rule, those two equations have same structure, except -. I know that there are many nodes between hidden node and output, but that doesn't explain why there's -.
can anyone help me why those are different??




