How can I implement Newton-Raphson's method with a function of one vector and one matrix?

Question

How can I implement Newton-Raphson's method with a function of one vector and one matrix?

208 Views Asked by Bumbble Comm At 29 Mar 2026 - 3:21

I have a function $f(\mathbf{u}, \Sigma)$ where $\mathbf{u}$ is a $p \times 1$ vector and $\Sigma$ is a $p \times p$ real symmetric matrix (positive semi-definite).

I somehow successfully computed the partial derivatives $\frac{\partial f}{\partial \mathbf{u}}$ and $\frac{\partial f}{\partial \Sigma}$.

In this case, how do I optimize the function $f$ using Newton-Raphson's method?

=====Details=======

$y = X\mathbf{u} + \frac{1}{2}\operatorname{diag}(X\Sigma X^{T})$

$e = exp(y)$

$f = \mathbf{1}^{T} e$

where $exp$ is component-wise and $\mathbf{1}$ is a vector with ones. $X$ is not symmetric.

Original Q&A

There are 2 best solutions below

user91684 On 11 Feb 2016 - 5:43

I assume that $X\in M_{p,p}$, $diag(V)=[v_{1,1},\cdots,v_{p,p}]^T$ and $y$ is a vector.

Consider $g:(u,\Sigma)\in \mathbb{R}^P\times \mathbb{R}^{p^2}\rightarrow (\frac{\partial f}{\partial u}(u,\Sigma),\frac{\partial f}{\partial \Sigma}(u,\Sigma))\in \mathbb{R}^P\times \mathbb{R}^{p^2}$. To do that, we stack the matrix $\Sigma$ row by row. For example, when $p=2$, one has $(u,\Sigma)=(u_1,u_2,\Sigma_{11},\Sigma_{12},\Sigma_{21},\Sigma_{2,2})\in\mathbb{R}^6$ and $(\frac{\partial f}{\partial u},\frac{\partial f}{\partial \Sigma})=(\frac{\partial f}{\partial u_1},\frac{\partial f}{\partial u_2},\frac{\partial f}{\partial \Sigma_{1,1}},\frac{\partial f}{\partial \Sigma_{1,2}},\frac{\partial f}{\partial \Sigma_{2,1}},\frac{\partial f}{\partial \Sigma_{2,2}})\in\mathbb{R}^6$.

We want to solve the system of $p+p^2$ equations with $p+p^2$ unknowns $g=0$; clearly, we can use the Newton-Raphson's method:

$(u_{n+1},\Sigma_{n+1})=(u_n,\Sigma_n)-(Dg(u_n,\Sigma_n))^{-1}(g(u_n,\Sigma_n))$.

**Bumbble Comm** · Accepted Answer

$ \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\o{{\tt1}}\def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\Diag#1{\operatorname{Diag}\LR{#1}\,} \def\S{\Sigma} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Based on this post, your gradient expressions are $$\eqalign{ g &= \grad{f}{u} = X^Te \\ G &= \grad{f}{\S} = \frac{X^T\Diag{e}X}{2} \\ }$$ Now one can take an ADMM approach, but instead of the ($2^{nd}$order) Hessian-based Newton algorithm, use a ($1^{st}$order) gradient-based Barzilai-Borwein algorithm to solve the vector-valued subproblem, i.e. $$\eqalign{ du_{k} &= u_{k} - u_{k-1} \\ dg_{k} &= g_{k} - g_{k-1} \\ u_{k+1} &= u_{k} - \fracLR{du_{k}^T\,dg_{k}}{dg_{k}^T\,dg_{k}} g_{k} \\ }$$ where $k$ is the iteration counter.

And based on this post, one can extend this algorithm to the matrix-valued subproblem $$\eqalign{ d\S_{k} &= \S_{k} - \S_{k-1} \\ dG_{k} &= G_{k} - G_{k-1} \\ \S_{k+1} &= \S_{k} - \fracLR{d\S_{k}:dG_{k}}{dG_{k}:dG_{k}} G_{k} \\ }$$ where a colon denotes the Frobenius inner product, which can be interpreted as a convenient notation for the standard trace function $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$

${\bf NB\!:}\,$ The initial step in each case should be something conservative, such as $$\eqalign{ u_\o &= u_0 - \LR{\frac{0.05\;|f_0|}{g_0^T\,g_0}}g_0 \\ \S_\o &= \S_0 - \LR{\frac{0.05\;|f_0|}{G_0:G_0}}G_0 \\ }$$

How can I implement Newton-Raphson's method with a function of one vector and one matrix?

There are 2 best solutions below

Related Questions in MATRIX-CALCULUS

Related Questions in COMPUTATIONAL-MATHEMATICS

Related Questions in NEWTON-RAPHSON

Trending Questions

Popular # Hahtags

Popular Questions