Compute the gradient of $f(x)=\|\text{diag}(x)\|$ with the chain rule

343 Views Asked by At

Consider the function $f:\mathbb{R}^n\to\mathbb{R}$ given by $f(x)=\|\text{diag}(x)\|$, where $\text{diag}(x)\in\mathbb{R}^{n\times{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,\dots,x_n$, and $\|\cdot\|$ is the spectral norm (matrix 2-norm).

Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=\|x\|_\infty$, where $\|\cdot\|_\infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $x\in\mathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by $$ \nabla{f(x)}=\text{sgn}(x_k)e_k $$ where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^\text{th}$ standard basis vector in $\mathbb{R}^n$, and $\text{sgn}(\cdot)$ is the sign function.

I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(g\circ{h})(x)$, where $g:\mathbb{R}^{n\times{n}}\to\mathbb{R}$ is given by $g(A)=\|A\|$, and $h:\mathbb{R}^n\to\mathbb{R}^{n\times{n}}$ is given by $h(x)=\text{diag}(x)$.

The "Jacobian" of $h$ is a three-dimensional object, where $$ \frac{\partial[h(x)]_{ij}}{\partial{x_k}}=\begin{cases}1,&i=j=k,\\0,&\text{else.}\end{cases} $$ The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by $$ \nabla{g(A)}=uv^\text{T}, $$ where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.

So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?

1

There are 1 best solutions below

3
On BEST ANSWER

A hint rather than a complete answer

It might help to think of $\Bbb R^{n \times n}$ as $\Bbb R^{n^2}$.

Now $h$ is a function from $\Bbb R^n$ to $\Bbb R^{n^2}$ and $g$ is a function from $\Bbb R^{n^2}$ to $\Bbb R$, so the composite goes from $\Bbb R^n$ to $\Bbb R$. Can you work out the $k$th partial derivative of this?

When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.

Just to get you started, $$\nabla g(A)_{ni + j} = u_i v_j,$$ where I'm working with indices that go from $0$ to $n-1$ here.

Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g \circ h$?