I have a function $Q\!:\!\mathbb{R}^N\!\times\mathbb{R}^N\!\!\to\!\mathbb{R}$ which takes in as arguments, the vectors $\boldsymbol{\theta}=[\theta_1\ \theta_2\ ...\ \theta_{N}]^T$ and $\boldsymbol{\phi}=[\phi_1\ \phi_2\ ...\ \phi_{N}]^T$. I wish to minimise $Q$ using Newton's method. If I stack $\boldsymbol{\theta}$ on top of $\boldsymbol{\phi}$, the gradient of $Q$ is expressed as $\nabla Q=[\partial Q/\partial \theta_1\ ...\ \partial Q/\partial \theta_{N}\ \partial Q/\partial \phi_1\ ...\ \partial Q/\partial \phi_{N}]^T$. Can anyone verify if the following terms are what I would require, in order to compute the Hessian matrix $\nabla^2 Q$?
$\frac{\partial^2 Q}{\partial\theta_m^2}$, $\frac{\partial^2 Q}{\partial\theta_m\partial\theta_n}$, $\frac{\partial^2 Q}{\partial\theta_m\partial\phi_m}\!=\!\frac{\partial^2 Q}{\partial\phi_m\partial\theta_m}$, $\frac{\partial^2 Q}{\partial\theta_m\partial\phi_n}\!=\!\frac{\partial^2 Q}{\partial\phi_m\partial\theta_n}$, $\frac{\partial^2 Q}{\partial\phi_m\partial\phi_n}$, and $\frac{\partial^2 Q}{\partial\phi_m^2}$ for all $m,n\!\in\![1,N]$, and $m\!\neq\! n$.
Yep, looks good to me. One thing I would suggest for when you're doing this calculation is not to split up the $m=n$ case. Then you just have to calculate: $\frac{\partial^2 Q}{\partial\theta_m\partial\theta_n}$, $\frac{\partial^2 Q}{\partial\theta_m\partial\phi_n}$, and $\frac{\partial^2 Q}{\partial\phi_m\partial\phi_n}$. These correspond to the upper left, skew-diagonal, and lower right blocks of the Hessian respectively.
Another thing you could look at is using a automatic differentiation library like Tensorflow, PyTorch, or Jax. There's generally no performance penalty for doing so.