Why local gradient of output node and hidden node are different?

15 Views Asked by At

I got a question studying Machine Learning.

The author says that enter image description here

in this diagram, the local gradient of output node is defined by:

enter image description here

and in below diagram, enter image description here

the local graident of hidden node is defined by:

enter image description here

epsilon is defined like this:

enter image description here

By chaing rule, those two equations have same structure, except -. I know that there are many nodes between hidden node and output, but that doesn't explain why there's -.

can anyone help me why those are different??