Applying the chain rule to a line integral over gradients to get the Integrated Gradients method

435 Views Asked by At

I'm trying to understand how the authors of the paper 'Axiomatic Attribution for Deep Networks' determined the formula for Integrated Gradients.

The path function $\gamma = (\gamma_1, ..., \gamma_n): [0, 1] \rightarrow \mathbb{R}^n$ specifies a smooth path from the baseline $x' \in \mathbb{R}^n$ to the input $x \in \mathbb{R}^n$, where $\gamma(0)=x'$ and $\gamma(1)=x$. The function $F : \mathbb{R}^n \rightarrow [0, 1]$ is represented by a deep neural network in this case. The authors integrate the gradients over the path $\gamma$ with $\alpha \in [0,1]$ using the following line integral:

$$\text{PathIntegratedGrads}_i^\gamma(x)::=\int_{\alpha=0}^1 \frac{\partial F(\gamma(\alpha))}{\partial \gamma_i(\alpha)} \frac{\partial \gamma_i(\alpha)}{\partial \alpha} d\alpha. \tag{1}$$

Now, the authors consider the straight line path:

$$\gamma(\alpha) = x' + \alpha \cdot (x - x') \text{ for } \alpha \in [0, 1]. \tag{2}$$

Plugging the straight line path into the above formula for $\text{PathIntegratedGrads}_i^\gamma(x)$ they get:

$$\text{IntegratedGrads}_i(x)::= (x - x')\int_{\alpha=0}^1 \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial x_i} d\alpha. \tag{3}$$

Since

$$\frac{\partial \gamma_i(\alpha)}{\partial \alpha} = (x - x'), \tag{4}$$

it follows, that: $$\frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial (x_i' + \alpha \cdot (x_i - x_i'))} \overset{!}{=} \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial x_i}. \tag{5}$$

However, applying the chain rule, I got:

$$\frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial (x_i' + \alpha \cdot (x_i - x_i'))} = \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial x_i} \frac{1}{\alpha}. \tag{6}$$

Then shouldn't integrated gradients instead be:

$$\text{IntegratedGrads}_i(x)::= (x - x')\int_{\alpha=0}^1 \frac{\partial F(x' + \alpha \cdot (x - x'))}{ \partial (x_i' + \alpha \cdot (x_i - x_i'))} d\alpha. \tag{7}$$

This is also the way it is implemented in the author's github:

# Scale input and compute gradients.
scaled_inputs = [baseline + (float(i)/steps)*(inp-baseline) for i in range(0, steps+1)]
predictions, grads = predictions_and_gradients(scaled_inputs, target_label_index)  # shapes: <steps+1>, <steps+1, inp.shape>

avg_grads = np.average(grads[:-1], axis=0)
integrated_gradients = (inp-baseline)*avg_grads # shape: <inp.shape>

Did I go wrong somewhere or am I missing something?

EDIT: I'm still having some trouble to follow. Let's say we have the path integral and plug in the straight line path, then we get Integrated Gradients:

$$ \text{IntegratedGrads}_i(x)::=\int_{\alpha=0}^1 \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial (x'_i + \alpha \cdot (x_i - x'_i))} \underbrace{\frac{\partial (x' + \alpha \cdot (x - x'))}{\partial \alpha}}_{(x - x')} d\alpha.\tag{8} $$

Then I have a $\partial (x'_i + \alpha \cdot (x_i - x'_i))$ in the denominator of the integrand, whereas the paper only has a $\partial x_i$. Therefore, equation (8) is different from equation (3), which is the Integrated Gradients formula from the paper. When I rewrite the integrand of Integrated Gradients from the paper with the chain rule I get the following:

$$ \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial x_i} = \frac{\partial F(x' + \alpha \cdot (x - x'))}{\partial (x'_i + \alpha \cdot (x_i - x'_i))} \underbrace{\frac{\partial (x'_i + \alpha \cdot (x_i - x'_i))}{\partial x_i}}_\alpha, \tag{9} $$ which explains the additional $\alpha$ in (6) and that the LHS and RHS are not equal in equation (5). I also changed the equality sign in equation (5) to emphasize that the LHS and RHS are not the same. What am I missing, such that I get the formula of Integrated Gradients as stated in the paper?

1

There are 1 best solutions below

4
On BEST ANSWER

Regarding the interpretation of the math in the paper, I think you have a $∂(x′i+α⋅(xi−x′i))$ term instead of a $∂(xi)$ term in the denominator of the LHS of the equation "it follows, that:".

(I also wonder if there is some confusion about notation: The derivative is taken with respect to a variable[$xi$] at a particular point[$x′i+α⋅(xi−x′i$)].) Regarding the implementation in github repository, I am not seeing the extra α term in the implementation. [If it helps, the derivative returned by the ML library (e.g. tensorflow) corresponds to $∂F(x′+α⋅(x−x′))/∂xi$ and not $∂F(x′+α⋅(x−x′))/∂(x′i+α⋅(xi−x′i)$].

(If you want to do the complete derivation, first use the fundamental theorem of calculus, then the partial derivative chain rule as in case 1 here: http://tutorial.math.lamar.edu/Classes/CalcIII/ChainRule.aspx and the result should match up.)