Let $$f(y_1, \ldots, y_n) = \ln(\sum\limits_{i = 1}^n \exp(y_i^T b))$$ where $y_i, b$ are $N$ dimensional vectors
I wish to compute $\nabla_{y_j} f(y_1, \ldots, y_n)$ rigorously using the chain rule. $j = 1,\ldots, n$.
So let $h(w) = \ln(w), w = \sum\limits_{i = 1}^n \exp(p_i), p_i = y_i^Tb$
Then $\nabla_{y_j} f(y_1, \ldots, y_n) = \nabla_{y_j} h(w) = \nabla_{y_j} h(w(p_i) = \dfrac{dh}{dw} \dfrac{d w}{d p_i} \nabla_{y_i} p_i$
where
$\dfrac{dh(w)}{dw} = \dfrac{1}{w} = \dfrac{1}{\sum\limits_{i = 1}^n \exp(y_i^T b)}$
$\dfrac{d w}{d p_i} = \exp(p_i)$
$ \nabla_{y_j} p_i = 0$
But then the entire derivative is $0$!
Where did I go wrong?
$\nabla_{y_j}p_i=0$ if $i\ne j$, but $\nabla_{y_j}p_j=b$.
So $\nabla_{y_j}f=\sum_i\frac{1}w\exp(p_i)\nabla_{y_j}p_i=\frac{b}w\exp(p_j)$.