I have a function $f(\mathbf{u}, \Sigma)$ where $\mathbf{u}$ is a $p \times 1$ vector and $\Sigma$ is a $p \times p$ real symmetric matrix (positive semi-definite).
I somehow successfully computed the partial derivatives $\frac{\partial f}{\partial \mathbf{u}}$ and $\frac{\partial f}{\partial \Sigma}$.
In this case, how do I optimize the function $f$ using Newton-Raphson's method?
=====Details=======
$y = X\mathbf{u} + \frac{1}{2}\operatorname{diag}(X\Sigma X^{T})$
$e = exp(y)$
$f = \mathbf{1}^{T} e$
where $exp$ is component-wise and $\mathbf{1}$ is a vector with ones. $X$ is not symmetric.
$ \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\o{{\tt1}}\def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\Diag#1{\operatorname{Diag}\LR{#1}\,} \def\S{\Sigma} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Based on this post, your gradient expressions are $$\eqalign{ g &= \grad{f}{u} = X^Te \\ G &= \grad{f}{\S} = \frac{X^T\Diag{e}X}{2} \\ }$$ Now one can take an ADMM approach, but instead of the ($2^{nd}$order) Hessian-based Newton algorithm, use a ($1^{st}$order) gradient-based Barzilai-Borwein algorithm to solve the vector-valued subproblem, i.e. $$\eqalign{ du_{k} &= u_{k} - u_{k-1} \\ dg_{k} &= g_{k} - g_{k-1} \\ u_{k+1} &= u_{k} - \fracLR{du_{k}^T\,dg_{k}}{dg_{k}^T\,dg_{k}} g_{k} \\ }$$ where $k$ is the iteration counter.
And based on this post, one can extend this algorithm to the matrix-valued subproblem $$\eqalign{ d\S_{k} &= \S_{k} - \S_{k-1} \\ dG_{k} &= G_{k} - G_{k-1} \\ \S_{k+1} &= \S_{k} - \fracLR{d\S_{k}:dG_{k}}{dG_{k}:dG_{k}} G_{k} \\ }$$ where a colon denotes the Frobenius inner product, which can be interpreted as a convenient notation for the standard trace function $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$
${\bf NB\!:}\,$ The initial step in each case should be something conservative, such as $$\eqalign{ u_\o &= u_0 - \LR{\frac{0.05\;|f_0|}{g_0^T\,g_0}}g_0 \\ \S_\o &= \S_0 - \LR{\frac{0.05\;|f_0|}{G_0:G_0}}G_0 \\ }$$