How to solve the derivatives of the compound functions in vector form?

161 Views Asked by At

for example:

$f(x)=(xx^T)^{-\frac{1}{2}}x$, where $x \in \mathbb R_{+}^{1\times d}$ is a row vector.

It is hoped that there will be specific theoretical basis (formula derivation and origin)

(Revised)

supplement:

for any a $||A||_{2,1}$, that is a norm from paper Efficient and Robust Feature Selection via Joint $l_{2,1}$-Norms Minimization

According to the above problem, how to solve the second derivative of this norm?

Some related work: The norm $\|\cdot\|_{2,1}$ of a matrix $A=(a_1,\ldots, a_n)\in\mathbb{R}^{m\times n}$ is defined as

$$ \Vert A \Vert_{2,1} = \sum_{j=1}^n \Vert a_{j} \Vert_2 = \sum_{j=1}^n \left( \sum_{i=1}^m |a_{ij}|^2 \right)^{1/2} $$

Thank you all for your help.

2

There are 2 best solutions below

3
On BEST ANSWER

Given a column vector $x$, consider how its length $\lambda$ varies as $x$ is varied. $$\eqalign{ \lambda^2 &= x^Tx,\quad \lambda\,d\lambda = x^Tdx \cr }$$ Now consider the unit vector $f$ and its variation with $x$. $$\eqalign{ f &= \lambda^{-1}x \cr df &= \lambda^{-1}dx - x\lambda^{-2}\,d\lambda \cr &= \lambda^{-1}dx - x\lambda^{-3}\,(\lambda\,d\lambda) \cr &= \lambda^{-1}Idx - x\lambda^{-3}(x^Tdx) \cr &= \lambda^{-3}\Big(\lambda^2I -xx^T\Big)\,dx \cr \frac{\partial f}{\partial x} &= \lambda^{-3}\Big(\lambda^2I -xx^T\Big) \cr }$$

0
On

Remember differentiating with respect to a tuple of variables simply means taking the derivative with respect to each variable and then organizing the derivatives in a tuple. There are various ways to do this. Here I'll use numerator layout, so that, if $\phi$ is a scalar valued function, and $x$ is a row, then we write the derivatives as a column.

$$\frac{\partial \phi}{\partial x} = \begin{bmatrix} \displaystyle\frac{\partial \phi}{\partial x_1}\\ \vdots\\ \displaystyle\frac{\partial \phi}{\partial x_n} \end{bmatrix}$$

i.e.

$$\left(\frac{\partial \phi}{\partial x}\right)^i = \frac{\partial \phi}{\partial x_i}$$ (where I've used a superindex on $\frac{\partial \phi}{\partial x}$ because it is a column).

In your case, $f$ is a row-valued function, and $x$ is a row, so the derivative $\frac{\partial f}{\partial x}$ will be a matrix

$$\frac{\partial f}{\partial x} = \begin{bmatrix} \displaystyle\frac{\partial f_{1}}{\partial x_1} & \dots & \displaystyle\frac{\partial f_{n}}{\partial x_1}\\ \vdots & \ddots & \vdots\\ \displaystyle\frac{\partial f_{1}}{\partial x_n} & \dots & \displaystyle\frac{\partial f_{n}}{\partial x_n} \end{bmatrix}$$ Note that in this case we have $$\left(\frac{\partial f}{\partial x}\right)^{i}_{j} = \frac{\partial f_{j}}{\partial x_{i}}$$ because we want upper indices to indicate in which row we are in and lower indices to indicate indicate in which column we are in.


Other thing we'll need is that if $x$ is a row, then $$(x^{T})^{i} = \sum_j x_j\delta^{ji}$$ where $\delta^{ij}$ is a kronecker delta symbol $$\delta^{ij}=\delta_{ij}=\delta^{i}_{j} = \begin{cases}1 & i=j \\ 0 &i\neq j\end{cases}$$

(also note that $\delta^{i}_{j}$ are the components of the identity matrix).


Okay, then. So we have your function. $$f(x) = \frac{x}{\sqrt{xx^{T}}} = (xx^{T})^{-1/2} x$$ We calculate its derivative using the Leibniz rule, first $$\frac{\partial f}{\partial x}(x) = \frac{\partial(xx^{T})^{-1/2}}{\partial x}x + (xx^{T})^{-1/2} \frac{\partial x}{\partial x}$$ Then by the chain rule $$\frac{\partial f}{\partial x}(x) = \frac{-1}{2}(xx^{T})^{-3/2}\frac{\partial xx^{T}}{\partial x}x + (xx^{T})^{-1/2}\frac{\partial x}{\partial x}$$ and then the Leibniz rule again $$\frac{\partial f}{\partial x}(x) = \frac{-1}{2}(xx^{T})^{-3/2}\left(\frac{\partial x}{\partial x}x^{T}+x\frac{\partial x^{T}}{\partial x}\right)x + (xx^{T})^{-1/2}\frac{\partial x}{\partial x}$$ you can check that $\frac{\partial x}{\partial x}$ is the identity matrix, so we have $$\frac{\partial f}{\partial x}(x) = \frac{-1}{2}(xx^{T})^{-3/2}\left(x^{T}+x\frac{\partial x^{T}}{\partial x}\right)x + (xx^{T})^{-1/2}I$$ All that remains is to calulate $x\frac{\partial x^{T}}{\partial x}$. It is for this that I'll use the component notation $$\left(x\frac{\partial x^{T}}{\partial x}\right)^{i} = x\frac{\partial x^{T}}{\partial x_i} = \sum_{j}x_j\frac{\partial \sum_k x_k\delta^{jk}}{\partial x_i} = \sum_{j,k}x_j\frac{\partial x_k}{\partial x_i}\delta^{jk} = \sum_{j,k}x_j\delta^{i}_{k}\delta^{jk} = \sum_{j}x_j\delta^{ji}$$ Hence $x\frac{\partial x^{T}}{\partial x} = x^{T}$ and we finally get $$\frac{\partial f}{\partial x}(x) = -(xx^{T})^{-3/2}x^{T}x + (xx^{T})^{-1/2}I = \frac{-1}{\sqrt{xx^{T}}^{3}}x^{T}x + \frac{1}{\sqrt{xx^{T}}}I$$

Disclaimer As mentioned at the beggining, there are different conventions for how derivatives with respect to a tuple should be organized. A different choice of convention, such as denominator layout, could lead to different results.