Partial derivative of recursive exponential $f(x) = \sum^{K_2}_{k_2=1}c_{k_2} \exp(-z^{(2)}_{k2})$ with respect to the deepest parameter

227 Views Asked by At

I was trying to take the derivative of the following equation (which can be depicted nicely in a tree like structure, look at the end of question for diagram):

$$f(x) = f([x_1, ..., x_{N_p}])= \sum^{K_2}_{k_2=1}c_{k_2} \exp(-z^{(2)}_{k_2})$$

with respect to some parameters $\theta$ (which are not explicit above). As you can notice, the vector $x$ can be subdivided in parts: $x = [x_1, ..., x_{N_p}] \in \mathbb{R}^N$. Each $x_i \in \mathbb{R}^{D_p}$ and there are $N_p$ of them (hence, $N = N_p D_p$).

Let me explain how $\theta$ is present in the above equation. For that let me define: $$ a^{(3)}_{k_2} = \exp(-z^{(2)}_{k_2}) \ , \ k_2 \in \{ 1, ..., K_2 \} $$ $$ z^{(2)}_{k_2} = \| a^{(2)} - t_{k_2}^{(2)} \|^2 \ , \ \ k_2 \in \{ 1, ..., K_2 \}$$ where $a^{(2)} \in \mathbb{R}^{K_1}$ and $t^{(2)}_{k_2} \in \mathbb{R}^{K_1} $. $a^{(2)}$ can also be divided in parts as follow: $$ a^{(2)} = \left( \begin{array}{c} a^{(2)}_{1,1}\\ \vdots \\ a^{(2)}_{1,D_d}\\ \vdots \\ a^{(2)}_{n_p,1}\\ \vdots \\ a^{(2)}_{n_p,D_d}\\ \vdots \\ a^{(2)}_{N_p,1}\\ \vdots \\ a^{(2)}_{N_p,D_d}\\ \end{array} \right) $$ which leads to the following notation: $$ a^{(2)}_{n_p,k} = \exp(-z^{(1)}_{n_p,k}) \ , \ n_p \in \{1,...,N_p\} \ , \ k \in \{1,...,D_d\} $$ $$ z^{(1)}_{n_p, k} = \| a^{(1)}_{n_p} - t^{(1)}_{n_p, k} \|^2 \ , \ n_p \in \{1,...,N_p\} \ , \ k \in \{1,...,D_d\}$$ $$ a^{(1)}_{n_p} = x_{n_p} \ , \ n_p \in \{1,...,N_p\} $$

and $a^{(1)}_{n_p} = x_{n_p} \in \mathbb{R}^{D_p}$ and $ t^{(1)}_{n_p, k} \in \mathbb{R}^{D_p}$. The parameters are $\theta = \{ c, t_{k_2}^{(2)}, t^{(1)} \}$.

After that I tried taking partial derivatives with respect to the lowest level in the equation $t^{(1)}$. i.e. I did $\frac{\partial f(x) }{\partial t^{(1)}_{i,j}}$ and got:

$$ \frac{\partial f(x) }{\partial t^{(1)}_{i,j}} = \sum^{K_2}_{k_2 = 1} c_{k_2} (- \exp(-z^{(2)}_{k_2}))(2(a^{(2)}_{i,j} - [t^{(2)}_{k_2}]_{i,j}))(-\exp(-z^{(1)}_{i,j}))2(x_i - t^{(1)}_{i,j})(-1)$$

But wasn't sure if it was correct.

In fact, let me show you how I derived the above equation. Consider:

$$ \frac{\partial f(x) }{ \partial t^{(1)}_{i,j} } = \frac{\partial f(x) }{ \partial z^{(1)}_{i,j} } \frac{\partial z^{(1)}_{i,j}}{ \partial t^{(1)}_{i,j} } = \left( \frac{\partial f(x) }{ \partial a^{(2)}_{i,j} } \frac{\partial a^{(2)}_{i,j} }{ \partial z^{(1)}_{i,j} } \right) \frac{\partial z^{(1)}_{i,j}}{ \partial t^{(1)}_{i,j} } $$

holds since:

$$ \frac{\partial f(x) }{ \partial z^{(1)}_{i,j} } = \frac{\partial f(x) }{ \partial a^{(2)}_{i,j} } \frac{\partial a^{(2)}_{i,j} }{ \partial z^{(1)}_{i,j} }$$

Now, to evaluate the above, we require the chain rule for $\frac{\partial f(x) }{ \partial a^{(2)}_{i,j} } $, hence:

$$\frac{\partial f(x) }{ \partial a^{(2)}_{i,j} } = \sum^{K_2}_{k_2=1} \frac{\partial f(x) }{ \partial z^{(2)}_{k_2} } \frac{\partial z^{(2)}_{k_2} }{ \partial a^{(2)}_{i,j} } = \sum^{K_2}_{k_2=1} \frac{\partial f(x) }{ \partial a^{(3)}_{k_2} } \frac{\partial a^{(3)}_{k_2} }{ \partial z^{(2)}_{k_2} } \frac{\partial z^{(2)}_{k_2} }{ \partial a^{(2)}_{i,j} } $$

now consider each equation seperately one can see that:

$$ \frac{\partial f(x) }{ \partial a^{(3)}_{k_2} } = c_{k_2}$$ $$ \frac{\partial a^{(3)}_{k_2} }{ \partial z^{(2)}_{k_2} } = -\exp(-z^{(2)})$$ $$ \frac{\partial z^{(2)}_{k_2} }{ \partial a^{(2)}_{i,j} } = 2(a^{(2)}_{i,j} - [t_{k_2}^{(2)}]_{i,j})(1)$$

hence $\frac{\partial f(x) }{ \partial a^{(2)}_{i,j} }$ is:

$$ \frac{\partial f(x) }{ \partial a^{(2)}_{i,j} } = \sum^{K_2}_{k_2=1} c_{k_2} ( -\exp(z^{(2)})) 2(a^{(2)}_{i,j} - [t_{k_2}^{(2)}]_{i,j})(1) $$

and

$$ \frac{\partial a^{(2)}_{i,j} }{ \partial z^{(1)}_{i,j} } = -\exp(-z^{(1)}_{i,j}) $$

hence $ \frac{\partial f(x) }{ \partial z^{(1)}_{i,j} } $:

$$ \frac{\partial f(x) }{ \partial z^{(1)}_{i,j} } = \left( \sum^{K_2}_{k_2=1} c_{k_2} ( -\exp(z^{(2)})) 2(a^{(2)}_{i,j} - [t_{k_2}^{(2)}]_{i,j})(1) \right) (-\exp(-z^{(1)}_{i,j})) $$

$$\frac{\partial z^{(1)}_{i,j}}{ \partial t^{(1)}_{i,j} } = 2(x_i - t^{(1)}_{i,j})(-1)$$

which finally gives the equation that we wanted:

$$ \frac{\partial f(x) }{\partial t^{(1)}_{i,j}} = (2(x_i -t^{(1)}_{i,j})(-1)) \left( \sum^{K_2}_{k_2=1} c_{k_2} ( -\exp(z^{(2)})) 2(a^{(2)}_{i,j} - [t_{k_2}^{(2)}]_{i,j})(1) \right) (-\exp(-z^{(1)}_{i,j}))$$

I even implemented the above and tried to check it with computer simulations to see if the derivation was correct (and numerical derivatives too, i.e. really good approximations). I implemented this equation (twice) in MATLAB and couldn't get any of the different ways I implemented it to agree on the value of the derivative, so either my implementations are wrong or the equation is wrong. I actually made a stackoverflow question for that here if you want to take a look at it.


Actually, to make the equations and derivations more clear, I decided to draw a diagram/"network" of how the different variables down the recursion tree relate to each other (obviously, notice its only 4 layer recursion tree for this question):

enter image description here

Obviously, I've omitted some of the connections between the variables to not make the diagram to cluttered.