Relating a derivate wrt to a matrix to a total derivative (differential) of columns/rows

126 Views Asked by At

I have some function $f(\boldsymbol{X}): \mathbb{R}^{m\times n} \rightarrow \mathbb{R}$:

$$f(\boldsymbol{X}) = \text{Tr}(S\boldsymbol{X}) - \log\det \boldsymbol{X}$$

If I now want to minimize that function with respect to $\boldsymbol{X}$, I can write its derivative with respect to the matrix $\boldsymbol{X}$:

$$\frac{df}{d\boldsymbol{X}} = S^T - X^{-T}$$

Setting it equal to $0$ we have:

$$0 = S^T - X^{-T} \Rightarrow X = S^{-1}$$

in some "appropriate" sense.

I can also think of that function as:

$$f(c_1,c_2,\ldots,c_n): \underbrace{\mathbb{R}^{m}\times\mathbb{R}^m\times\ldots\times \mathbb{R}^m}_{n \; \text{many times}} \rightarrow \mathbb{R}$$

That is, as a function of the columns.


Quick aside: If say $\boldsymbol{X}$ was triangular $(m=n)$, we could capture this as:

$$f(c_1,c_2,\ldots,c_n): \mathbb{R}^{1}\times\mathbb{R}^2\times\ldots\times \mathbb{R}^{n-1}\times \mathbb{R}^n \rightarrow \mathbb{R}$$


My Problem: I am having trouble finding the equivalent of the $\frac{df}{d\boldsymbol{X}}$ object in the new way of thinking of the problem in terms of a function of its columns (or rows).


My thoughts: In my mind, the derivative with respect to $\boldsymbol{X}$ would have to correspond to the total derivative/differential of the new function. But that would mean something like:

$$df = \frac{\partial f}{\partial c_1} dc_1 + \cdots + \frac{\partial d}{\partial c_n} d c_n$$

But here, its not clear to me what that object means, let alone what the $d c_i$ objects mean. Also, I don't know how to actually take the total derivative/differential to end up something I can easily "set equal to $0$ and solve for"


Note: I think this question is related to my previous optimization question. I'll probably delete that one, as I feel this one its a clearer way of expressing my confusion.

1

There are 1 best solutions below

2
On

The question is to be how to calculate the gradient of $f$ with respect to the $k^{th}$ column of $X$
$$\eqalign{ c_k &= X\cdot e_k \cr }$$

That's easy, it's the $k^{th}$ column of $\frac{\partial f}{\partial X}$, which is given by $$\eqalign{ \frac{\partial f}{\partial c_k} &= \frac{\partial f}{\partial X}\cdot e_k \cr }$$ where $e_k$ is the standard basis vector, whose $k^{th}$ component is equal to $1$ and all other components are equal to $0$.


The differential of the function can be expressed as the sum of the column gradients times the column differentials $$df = \sum_k \,\frac{\partial f}{\partial c_k}\cdot dc_k$$ as you observed.

You can go even further and ask about an expression in terms of the individual elements of $X$, which would be $$df = \sum_j\sum_k \,\frac{\partial f}{\partial X_{jk}}\,\,dX_{jk}$$ This can be written in a very compact form using the double-dot (aka Frobenius) product $$ df = \frac{\partial f}{\partial X}:dX $$ The differential $dX$ in this expression is completely arbitrary. It can consist of all zeros except for one column, or one row, or just the diagonal, or a single element.