Calculate $\left|\frac{\text{d}}{\text{d}\mathbf{Y}}\left[A^{-1}\left(\mathbf{Y}-\boldsymbol{\mu}\right)\right]\right|$.

469 Views Asked by At

Let $\mathbf{Y} \in \mathbb{R}^n$ be a column vector, $\boldsymbol{\mu} \in \mathbb{R}^n$, and $A$ be a $n\times n$ matrix of constants not dependent on $\mathbf{Y}$.

Definition. $\boldsymbol{\Sigma} = AA^{T}$, and we assume $\boldsymbol{\Sigma}$ is positive definite (comes immediately from $A$ being invertible).

Calculate $\left|\dfrac{\text{d}}{\text{d}\mathbf{Y}}\left[A^{-1}\left(\mathbf{Y}-\boldsymbol{\mu}\right)\right]\right|$ with respect to $|\boldsymbol{\Sigma}|$.

Problem. I haven't actually been given a definition of $\dfrac{\text{d}}{\text{d}\mathbf{Y}}\left[A^{-1}\left(\mathbf{Y}-\boldsymbol{\mu}\right)\right]$ for this problem, other than that it is the "matrix of partial derivatives" (no further explanation beyond that). For those of you who are familiar with multivariate statistics, this is used in the derivation of the PDF of the multivariate normal distribution.

Approach #1. Use computational formula that I found in a different book: $$\dfrac{\text{d}}{\text{d}\mathbf{Y}}\left[A^{-1}\left(\mathbf{Y}-\boldsymbol{\mu}\right)\right] = A^{-1}$$ and then using that $A^{-1} = A^{T}\boldsymbol{\Sigma}^{-1}$: $$|A^{-1}| = |A^{T}||\boldsymbol{\Sigma}^{-1}| \Longleftrightarrow \left(|A^{-1}|\right)^2=|\boldsymbol{\Sigma}|^{-1} \implies |A^{-1}|=|\boldsymbol{\Sigma}|^{-1/2}\text{,}$$ which is exactly what I want.

Approach #2. Compute the matrix of partial derivatives elementwise, and take the determinant.

I have no idea how to do it this way, and I think it's the way that my text wants me to approach the problem. So I think it should look something like this:

$$\dfrac{\text{d}}{\text{d}\mathbf{Y}}\left[A^{-1}\left(\mathbf{Y}-\boldsymbol{\mu}\right)\right] = \begin{bmatrix} \dfrac{\partial [a_1 (y_1 - \mu_1)]}{\partial y_1} & \cdots & \dfrac{\partial [a_1 (y_n - \mu_n)]}{\partial y_n} \\ \vdots & \vdots & \vdots \\ \dfrac{\partial [a_n (y_1 - \mu_1)]}{\partial y_1} & \cdots & \dfrac{\partial [a_n (y_n - \mu_n)]}{\partial y_n} \end{bmatrix}$$ where $A^{-1}=[a_{ij}]$, $\boldsymbol{\mu}=[\mu_i]$ and $\mathbf{Y} = [Y_i]$.

On second thought, this doesn't seem right because I think the elements of the resulting matrix should be sums.

How do I do this problem using approach #2?

2

There are 2 best solutions below

2
On BEST ANSWER

You almost have it.

(In case you don't know about it, I'll use here Einstein's convention of summation over repeated index. That is, a sum like $\sum_ka^i_kb^k$ will be written as simply $a^i_kb^k$.)

Call the vector $A^{-1}({\bf Y} - \boldsymbol{\mu})\equiv {\bf F}({\bf Y})$. ${\bf F}$ is a function mapping vectors into vectors, i.e., ${\bf F}:\mathbb{R}^n\rightarrow \mathbb{R}^n$. Its gradient is a matrix $\boldsymbol{\nabla}{\bf F}$ given by $$(\boldsymbol{\nabla}{\bf F})^{ij}\,\equiv\,\frac{\partial {\bf F}^i}{\partial {\bf Y}^j}\,=\,\frac{\partial [(A^{-1})^i_k\,(y^k-\mu^k)]}{\partial y^j}$$.

Using your notation, the first row is $$\frac{\partial[a^1_k(y^k-\mu^k)]}{\partial y^1},\cdots,\frac{\partial[a^1_k(y^k-\mu^k)]}{\partial y^n}$$ and the last row is $$\frac{\partial[a^n_k(y^k-\mu^k)]}{\partial y^1},\cdots,\frac{\partial[a^n_k(y^k-\mu^k)]}{\partial y^n}$$.

Clearly, $(\boldsymbol{\nabla}{\bf F})^{ij}\,=\,(A^{-1})^i_j$, which is the result you got above.

Note: If you want to explicitly show the summation symbol, the gradient is $$(\boldsymbol{\nabla}{\bf F})^{ij}\,=\,\sum_k\frac{\partial [(A^{-1})^i_k\,(y^k-\mu^k)]}{\partial y^j}$$

3
On

It is sometimes worth it to go back to the original definition of the (Gateaux) derivative. This avoids messy partials or passing to a particular basis; the derivative is a linear map that is independent of the coordinate system you use. If $f(Y)$ is a vector-valued function taking vector variable $Y$, then for another vector $V$ (with the same dimensions as $Y$), the directional derivative $df/dY$ at $V$ is $$ \frac{df(Y)}{dY}(V) = \lim_{h \rightarrow 0} \frac{f(Y + hV) - f(Y)}{h}. $$ In your case $f(Y) = A^{-1}(Y - \mu)$, so if you work out the above limit you'll get the result you wanted.