Assume that $\alpha$ is a real scalar and $x$ is a real vector of order $n$. If $I_n$ is an identity matrix of order $n$, how to compute the derivative of $(I_n+\alpha xx^T)^{\frac{1}{2}}$ with respect to $\alpha$ and $x$, respectively?
Derivative of $(I_n+\alpha xx^T)^{\frac{1}{2}}$
94 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Let us figure out an expression of $\left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)^{1/2}$ at first. Denote $$ \mathbf{v}_1=\frac{\mathbf{x}}{\left\|\mathbf{x}\right\|}, $$ and let $$ \left\{\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\right\} $$ be an orthonormal basis for $\mathbb{R}^n$. We will show that each $\mathbf{v}_j$ serves as an eigenvector of $I_n+\alpha\mathbf{x}\mathbf{x}^{\top}$.
In fact, provided that $\mathbf{x}^{\top}\mathbf{v}_1=\left\|\mathbf{x}\right\|$, $$ \left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)\mathbf{v}_1=\mathbf{v}_1+\alpha\mathbf{x}\left\|\mathbf{x}\right\|=\mathbf{v}_1+\alpha\left\|\mathbf{x}\right\|^2\mathbf{v}_1=\left(1+\alpha\left\|\mathbf{x}\right\|^2\right)\mathbf{v}_1. $$ This means that $\mathbf{v}_1$ is an eigenvector of $I_n+\alpha\mathbf{x}\mathbf{x}^{\top}$ associated with its eigenvalue $1+\alpha\left\|\mathbf{x}\right\|^2$.
Further, for $j\ge 2$, provided that $\mathbf{x}^{\top}\mathbf{v}_j=0$, $$ \left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)\mathbf{v}_j=\mathbf{v}_j+\mathbf{0}=\mathbf{v}_j. $$ This means that $\mathbf{v}_j$ is an eigenvector of $I_n+\alpha\mathbf{x}\mathbf{x}^{\top}$ associated with its eigenvalue $1$.
Thanks to these facts, $I_n+\alpha\mathbf{x}\mathbf{x}^{\top}$ can be diagonalized as follows. We have $$ \left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)\left(\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\right)=\left(\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\right)\left( \begin{array}{cccc} 1+\alpha\left\|\mathbf{x}\right\|^2&&&\\ &1&&\\ &&\ddots&\\ &&&1 \end{array} \right), $$ or equivalently, $$ \left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)V=V\Lambda\iff I_n+\alpha\mathbf{x}\mathbf{x}^{\top}=V\Lambda V^{-1} $$ for short, where $V=\left(\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\right)$ (by the way, since $\mathbf{v}_j$'s are orthonormal, $V$ is an orthogonal matrix), and $\Lambda$ denotes the last diagonal matrix. Therefore, $$ \left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)^{1/2}=V\Lambda^{1/2}V^{-1}=V\Lambda^{1/2}V^{\top}, $$ where $$ \Lambda^{1/2}=\left( \begin{array}{cccc} \sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}&&&\\ &1&&\\ &&\ddots&\\ &&&1 \end{array} \right). $$
Denote $A=\left(I_n+\alpha\mathbf{x}\mathbf{x}^{\top}\right)^{1/2}$, and the above result reads $$ A=V\Lambda^{1/2}V^{\top}. $$ Since $V$ is independent from $\alpha$, we have $$ {\rm d}_{\alpha}A={\rm d}_{\alpha}\left(V\Lambda^{1/2}V^{\top}\right)=V\left({\rm d}_{\alpha}\Lambda^{1/2}\right)V^{\top}. $$ Note that $$ {\rm d}_{\alpha}\Lambda^{1/2}={\rm d}_{\alpha}\left( \begin{array}{cccc} \sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}&&&\\ &1&&\\ &&\ddots&\\ &&&1 \end{array} \right)=\left( \begin{array}{cccc} \frac{\left\|\mathbf{x}\right\|^2}{2\sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}}&&&\\ &0&&\\ &&\ddots&\\ &&&0 \end{array} \right){\rm d}\alpha. $$ Thus \begin{align} {\rm d}_{\alpha}A&=V\left({\rm d}_{\alpha}\Lambda^{1/2}\right)V^{\top}\\ &=\left(\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\right)\left( \begin{array}{cccc} \frac{\left\|\mathbf{x}\right\|^2}{2\sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}}&&&\\ &0&&\\ &&\ddots&\\ &&&0 \end{array} \right)\left( \begin{array}{c} \mathbf{v}_1^{\top}\\ \mathbf{v}_2^{\top}\\ \vdots\\ \mathbf{v}_n^{\top} \end{array} \right){\rm d}\alpha\\ &=\frac{\left\|\mathbf{x}\right\|^2}{2\sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}}\mathbf{v}_1\mathbf{v}_1^{\top}{\rm d}\alpha\\ &=\frac{1}{2\sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}}\mathbf{x}\mathbf{x}^{\top}{\rm d}\alpha. \end{align} Therefore, $$ \frac{\partial A}{\partial\alpha}=\frac{1}{2\sqrt{1+\alpha\left\|\mathbf{x}\right\|^2}}\mathbf{x}\mathbf{x}^{\top}. $$
Similarly, you may work out $$ \frac{\partial A}{\partial\mathbf{x}} $$ by computing $$ {\rm d}_{\mathbf{x}}A={\rm d}_{\mathbf{x}}\left(V\Lambda^{1/2}V^{\top}\right)=\left({\rm d}_{\mathbf{x}}V\right)\Lambda^{1/2}V^{\top}+V\left({\rm d}_{\mathbf{x}}\Lambda^{1/2}\right)V^{\top}+V\Lambda^{1/2}\left({\rm d}_{\mathbf{x}}V\right)^{\top}. $$ Since $\mathbf{x}$ is a vector, you may use the entry-wise form to make the calculating clearer.
Define the matrix $$\eqalign{ A &= \sqrt{I+\alpha xx^T} \cr A^2 &= I+\alpha xx^T \cr }$$ and define separate variables for the length and direction of the $x$ vector $$\eqalign{ \lambda &= \|x\| \cr n &= \lambda^{-1}x \cr }$$
We can create an ortho-projector $P$ with $n$ in its nullspace $$\eqalign{ P &= I - nn^T \cr P^2 &= P^T = P,\,\,\,\,\,P n &= 0 \cr }$$ Notice that $$\eqalign{ (P+\beta nn^T)^2 &= P + \beta^2 nn^T \cr &= I + (\beta^2-1) nn^T \cr }$$ By choosing $\,\,\beta=\sqrt{1+\alpha\lambda^2}\,\,\,$ we can equate this to $A^2$, which then yields
$$\eqalign{ A &= P+\beta nn^T \cr }$$ Now take the derivative wrt $\alpha$ $$\eqalign{ \frac{dA}{d\alpha} &= \frac{d\beta}{d\alpha}\, nn^T \cr &= \frac{\lambda^2}{2\beta}\,nn^T = \frac{1}{2\beta}xx^T \cr }$$ To find the gradient wrt $x$, start by finding differentials of the relevant quantities $$\eqalign{ d\lambda &= \lambda^{-1}x^Tdx = n^Tdx \cr d\beta &= \frac{\alpha\lambda\,d\lambda}{\beta} = \frac{\alpha\lambda}{\beta}\,n^Tdx \cr dn &= \lambda^{-1}P\,dx \cr dP &= -(dn\,n^T+n\,dn^T)\cr }$$ So the differential of $A$ is easy enough to find $$\eqalign{ dA &= nn^T d\beta + \beta\,d(nn^T) + dP \cr &= nn^T d\beta + (\beta-1)(dn\,n^T+n\,dn^T) \cr &= nn^T\,\frac{\alpha\lambda}{\beta}\,(n^T\,dx) + \frac{\beta-\!1}{\lambda}(P\,dx\,n^T+n\,dx^TP) \cr }$$ but casting this into the form of gradient (i.e. a 3rd order tensor) using standard matrix notation is impossible. You must introduce special (isotropic) higher-order tensors, or resort to vectorization, or to index notation in order to proceed. Index notation is the best route.
If you contract the differential with matrices from the standard basis $E_{ij}=e_ie_j^T$ you can find the gradient on a component-wise basis
$$\eqalign{ E_{ij}:dA = dA_{ij} &= n_in_j\,\frac{\alpha\lambda}{\beta}\,(n^T\,dx) + \frac{\beta-\!1}{\lambda}(n_jp_i^T\,dx + n_ip_j^Tdx) \cr \frac{\partial A_{ij}}{\partial x} &= n_in_j\,\frac{\alpha\lambda}{\beta}\,n + \frac{\beta-\!1}{\lambda}(n_jp_i + n_ip_j) \cr }$$ where $p_k=Pe_k$ is the $k^{th}$ column of $P$ and $n_k$ is the $k^{th}$ element of $n$.
However, if you just want to calculate how $A$ will change in response to a change in $x$, using the differential is sufficient.