matrix calculus: derivative of $\frac{x^T r}{\sqrt{x^T Sx}}$ with respect to $x$

84 Views Asked by At

I'd like to compute the partial derivative of $\frac{x^T r}{\sqrt{x^T S x}}$ with respect to $x_i$, or in another word just with respect to vector $x$. Here $x$ is an $n$ by $1$ vector, $r$ is an $n$ by $1$ constant vector and $S$ is an $n$ by $n$ constant matrix.

[EDIT] I used to use $\Sigma$ rather than $S$ in this problem but I realized it was an awful notation choice. To remove ambiguity I'm using $S$ now.

The denominator $\sqrt{x^T S x}$ is just quite annoying to process. I have gone through https://en.wikipedia.org/wiki/Matrix_calculus, but I haven't found a rule similar to the quotient rule of reguluar calculus $(\frac{u}{v})^{'} = \frac{u'v - v'u}{v^2}$. So I am not sure if I could just use a similar rule to get the square root in the denominator out of the way.

As comment suggested I'm showing my work of doing component-wise partial derivative. Let's say we are differentiating against $x_1$. then (here ' means derivative against $x_1$, not transpose of matrix).

\begin{equation} \label{eq1} \begin{split} (\frac{u}{v})^{'} & = \frac{u'v - v'u}{v^2} \\ & = \frac{ r_1 \sqrt{ x^T S X} - (\sqrt{ x^T S X})' x^T r }{ x^T S x} \end{split} \end{equation}

Here the complicated part is $(\sqrt{ x^T S X})'$, my tedious computation shows it should be $\frac{(S x)_i}{\sqrt{x^T S x}}$ and it becomes hairy.

2

There are 2 best solutions below

0
On BEST ANSWER

$f(x) = (x^T r) (x^T \Sigma x)^{-1/2}$

$\begin{equation} \begin{split} \nabla f &= (x^T \Sigma x)^{-1/2} \nabla (x^T r) + (x^T r) \nabla (x^T \Sigma x)^{-1/2} \\ &= (x^T \Sigma x)^{-1/2} r + (x^T r) (-\frac{1}{2}) (x^T \Sigma x)^{-3/2} (2 \Sigma x) \\ &= \dfrac{ (x^T \Sigma x) r - (x^T r) \Sigma x}{(x^T \Sigma x)^{3/2}} \\ \end{split} \end{equation}$

0
On

$ \def\a{\alpha}\def\b{\beta}\def\p{\partial} \def\h{\frac 12} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use a colon to denote the Frobenius product, which is a convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ When $(A,B)$ are vectors, this is the usual dot product.

To avoid ambiguity with summations, rename the matrix $\;{\Sigma\to S}$.
Then define the scalar variables $$\eqalign{ \a &= r:x \quad&\implies\quad \c{d\a} = r:dx \\ \b &= \LR{S:xx^T}^{\h} \quad&\implies\quad \b^2 = S:xx^T \\ &\quad&\implies\quad 2\b\,d\b = S:\LR{dx\,x^T+x\,dx^T} = 2Sx:dx \\ &\quad&\implies\quad \c{d\b} = \b^{-1}Sx:dx \\ }$$ Use these scalars to rewrite the object function.
Then calculate its differential and gradient with respect to $x$. $$\eqalign{ \phi &= \frac{\alpha}{\beta} \\ d\phi &= \LR{\frac{\beta\,\c{d\alpha}-\alpha\,\c{d\beta}}{\beta^2}} \\ &= \LR{\frac{\beta r-\alpha\beta^{-1}Sx}{\beta^2}}:dx \\ &= \LR{\frac{\beta^2r-\a Sx}{\beta^3}}:dx \\ \grad{\phi}{x} &= \LR{\frac{\beta^2r-\a Sx}{\beta^3}} \\ }$$