I am reading about contrast functions as a generalized version of divergences from this paper. There is a particular equation in page 17 that I cannot follow, even though it looks basic at the outset - the math is self-contained on this page, no need to go over other pages :). I will try and give the relevant equations here.
The contrast between two probability densities is defined as $$\rho\left(\boldsymbol{\theta}_{1}, \boldsymbol{\theta}_{2}\right)=\rho\left(f_{\boldsymbol{\theta}_{1}}, f_{\boldsymbol{\theta}_{2}}\right). \qquad(1)$$
We use the symbols $\varepsilon_{i}$ and $\delta_{j}$ for the partial differentials $\partial | \partial \theta_{1}^{i}$ and $\partial | \partial \theta_{2}^{j}$ respectively.
By definition of contrast function, $\rho\left(\boldsymbol{\theta}_{1}, \boldsymbol{\theta}_{2}\right)= 0$ when $\boldsymbol{\theta}_1 = \boldsymbol{\theta}_2$. Therefore, we have $$ \begin{array}{l} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=0 \qquad (2)\\ \delta_{j} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=0 \qquad (3) \end{array} $$
Next, the author claims that by differentiating equations 2 and 3, we obtain, $$ \begin{array}{l} \partial_{j} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=\varepsilon_{j} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})+\varepsilon_{j} \delta_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=0 \qquad (4)\\ \partial_{i} \delta_{j} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=\varepsilon_{i} \delta_{j} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})+\delta_{i} \delta_{j} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=0 \qquad (5) \end{array} $$, where $\partial_{\boldsymbol{i}}=\partial / \partial \theta^{i}$.
Can you please explain how equations (4) and (5) follow from equations (2) and (3)?
My attempt:
Using the total derivative formula, I get
$$ \begin{array}{l} \partial_{j} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=\varepsilon_{j} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})+ \delta_{j} \varepsilon_{i} \rho(\boldsymbol{\theta}, \boldsymbol{\theta})=0 \qquad \end{array} $$
Which is different from equation (4)