This is a standard problem in convex optimization with well known solution but I cannot seem to follow the procedure given in Boyd's CVX book pg 643
Suppose I am given $f(x) = \log \sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi)$, $f: R^n \to R$, I need to find the gradient
Then, by the chain rule:
$Df(x) = Dg(h(x))Dh(x)= D(\log \sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi))D( \sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi))$
$Df(x) = \dfrac{1}{\sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi)} [?? \text{what is } D( \sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi))??] $
I would really appreciate if someone can show me how you would get a closed form of the expression $D( \sum\limits_{i = 1}^{m} \exp(a_i^Tx + bi))$.
The final solution is $\nabla f(x) = \dfrac{1}{1^Tz}A^Tz$, where $z_i = \exp(a_i^Tx + b_i)$
Split it up. Calculate each partial derivative of each term of the sum separately. By the chain rule we have $$\partial_{x_j}\exp(a_i^Tx+bi)=\exp(a_i^Tx+bi)\partial_{x_j}(a^T_ix+bi)=a_{ij}\exp(a_i^Tx+bi).$$ Hence $D\sum_{i=1}^m\exp(a_i^Tx+bi)=(\sum_{i=1}^ma_{ij}z_i)_{j=1}^n=A^Tz$.