Expression for derivative of neural net output w.r.t edge weight

35 Views Asked by At

This is a question on the mechanics of backpropogation derivatives in NNs. I've seen plenty of analysis on the derivative of a NN output with respect to a given input, but am unsure on how to compute the partial with respect to the weight of a given edge. That is, suppose our output of the network is $\hat{pred} = \sigma[z[v_{output}]]$. This is a node with no outgoing edges. $z[v]$ refers to the linear combination of the outputs of the nodes with outgoing edges to $v$ weighted by their corresponding edge weights. What is the derivative $\frac{\partial \hat{pred}}{\partial w(u,v)}$? I'm trying to define this in terms of basic parameters and $\delta$, which I believe we can define recursively. For simplicity I'm assuming activation is sigmoid. Here is my attempt:

For a node $v$ whose value is calculated using the sigmoid activation, the output $o[v]$ is given by: $ o[v] = \frac{1}{1 + e^{-\sum_{(u,v) \in E} w(u,v)o[u]}} $

Using the chain rule, we get: $ \frac{\partial \hat{pred}}{\partial w(u,v)} = \frac{\partial \hat{pred}}{\partial o[v]} \cdot \frac{\partial o[v]}{\partial z[v]} \cdot \frac{\partial z[v]}{\partial w(u,v)} $ (is this right??)

where $z[v] = \sum_{(u,v) \in E} w(u,v)o[u]$.

The derivative of $o[v]$ with respect to $z[v]$ is the derivative of the sigmoid function: $ \frac{\partial o[v]}{\partial z[v]} = o[v] \cdot (1 - o[v]) $

And $\frac{\partial z[v]}{\partial w(u,v)}$ is just $o[u]$, the output of the parent node $u$.

Thus, $ \frac{\partial \hat{pred}}{\partial w(u,v)} = \frac{\partial \hat{pred}}{\partial o[v]} \cdot o[v] \cdot (1 - o[v]) \cdot o[u] $

Is this on the right track? how can I further decompose the $\frac{\partial \hat{pred}}{\partial o[v]}$ term to get this expression in terms a recursive definition of $\delta(v)$)? Do I even need to? And, importantly, does this depend on what the activation function for $v_{output}$ is, as i'm trying to only fix the activation function for $v$.