I have a learning sample $D_n = f(X_i, Y_i)_{i=1}^n$ where the $X_i$’s are $\mathbb{R}^d$-valued and the $Y_i$’s are $\{-1, 1\}$-valued.
$$ f(\theta) = \frac{1}{n} \sum_{i=1}^{n} \exp(-Y_i (\theta^T X_i)) $$
Where $\theta \in [-B, +B]$.
I want to calculate the Hessian matrix: $\nabla^2 f(\theta)$.
Lets focus on what is $\frac{d^2}{d\theta_j^2}f(\theta)$.
Firstly you can bring the operator inside of the sum by linearity. The contribution of $\theta_j$ to the power of $e$ in each term is $-Y_i \theta_j (X_i)_j $, therefore the derivative multiplies by $-Y_i (X_i)_j$.
$(X_i)_j$ means the $jth$ component of $X_i$.
So hence $\frac{d^2}{d\theta_j^2}f(\theta) = \frac{1}{n}∑_{i=1}^n Y_i^2 (X_i)_j^2 \exp(−Y_i(θ^TX_i))$
Similarly, $\frac{d^2}{d\theta_j \theta_k}f(\theta) = \frac{1}{n}∑_{i=1}^n Y_i^2 (X_i)_j (X_i)_k\exp(−Y_i(θ^TX_i))$