There is an example about this question on Page 564 in Chapter 11 of B&V's Convex Optimization book. It presents the gradient and Hessian matrix of the following log barrier function,
$$\tag{11.5} \phi(x)=-\sum_{i=1}^m \log(-f_i(x)) $$ $$ \nabla \phi(x)=\sum_{i=1}^m\frac{1}{-f_i(x)} \nabla f_i(x)\\ \nabla^2 \phi(x)=\sum_{i=1}^m\frac{1}{f_i(x)^2} \nabla f_i(x)\nabla f_i(x)^T+\sum_{i=1}^m\frac{1}{-f_i(x)} \nabla^2 f_i(x) $$
I could not figure out the first term of the Hessian. According to the chain rule and the product rule, $U=\frac{1}{-f_i(x)}$ and $V=\nabla f_i(x)$, and $dU=\frac{1}{f_i(x)^2}\nabla f_i(x)$. Hence, the first term of the Hessian is supposed to be $dU\cdot V=\frac{1}{f_i(x)^2}\nabla f_i(x)\cdot \nabla f_i(x)$, but it does not satisfy the requirement of matrix product for dimensionality since $\nabla f_i(x)$ is a vector. I know the quoted form for the first term of Hessian meets the dimensionality requirement and makes sense in the respective of dimensionality consistency with the second term. Can anybody give me some instruction on this kind of case for calculating derivatives using both chain rule and pruduct rule simultaneously? I will appreciate any instructions.
This is a very informal answer, it can be formalised but it gets even more tedious and I think it is instructive to see where the terms come from using linear approximations.
The main issue is how we view derivatives, in particular the 2nd derivative. Take a function $\psi:X \to Y$, then $\psi'(x) \in L(X,Y)$, that is a linear map from $X$ to $Y$. We write $\psi'(x)(h)$ to indicate $\psi'(x)$ applied to $h$.
In a similar manner, $\psi''(x) \in L(X,L(X,Y))$, that is a linear map from $X$ into the space of linear maps $L(X,Y)$. We could write $\psi''(x)(h)$ to indicate a map $L(X,Y)$ and $\psi''(x)(h)(w)$ to indicate that map evaluated at $w$. Since we can identify the space $L(X,L(X,Y))$ with the space of bilinear forms $X \times X \to Y$, it is more usual to write $\psi''(x)(h,w)$, and the matrix representation of $(h,w) \mapsto \psi''(x)(h,w)$ is the Hessian.
In particular, to first order, we have $\psi'(x+h)(w) \approx \psi'(x)(w)+ \psi''(x)(h,w)$.
Now consider a single term of the form $\psi(x) = g(f(x))$. The chain rule gives $\psi'(x)(h) = g'(f(x))(f'(x)(h))$.
Now consider $\psi'(x+h^*)$ applied to $h$. To first order we have \begin{eqnarray} \psi'(x+h^*)(h) &=& g'(f(x+h^*))(f'(x+h^*)(h)) \\ &\approx& g'(f(x)+f'(x)(h^*))(f'(x+h^*)(h)) \\ &\approx& g'(f(x))(f'(x+h^*)(h))+g''(f(x))(f'(x)h^*,f'(x+h^*)(h)) \\ &\approx& g'(f(x))(f'(x)(h)+f''(x)(h^*,h))+g''(f(x))(f'(x)h^*,f'(x)(h)+f''(x)(h^*,h))\\ &=& g'(f(x))f'(x)(h) + g'(f(x))(f''(x)(h^*,h)) + g''(f(x))(f'(x)h^*,f'(x)(h)) + g''(f(x))(f'(x)h^*,f''(x)(h^*,h)) \end{eqnarray} If we retain the first order terms in $h^*$ we get $\psi''(x)(h^*,h) = g'(f(x))(f''(x)(h^*,h)) + g''(f(x))(f'(x)h^*,f'(x)(h))$.
Now let $g(t) = - \log (-t)$, since this is scalar valued we write $g'(t) = - {1 \over t}$, $g''(t) = {1 \over t^2}$.
If we use the notation $f'(x)(h) = \nabla f(x)^T h$ and $f''(x)(h^*,h) = (h^*)^T \nabla^2 f(x) h$ we get \begin{eqnarray} \psi''(x)(h^*,h) &=& - {1 \over f(x)} (h^*)^T \nabla^2 f(x) h + {1 \over f(x)^2} \nabla f(x)^T h^* \nabla f(x)^T h \\ &=& (h^*)^T \left[ - {1 \over f(x)} \nabla^2 f(x) + {1 \over f(x)^2} \nabla f(x) \nabla f(x)^T \right] h \end{eqnarray} and so $\nabla^2 \psi(x) = - {1 \over f(x)} \nabla^2 f(x) + {1 \over f(x)^2} \nabla f(x) \nabla f(x)^T$.