Partial derivatives of $A^n x$ with respect to the entries of $A$

99 Views Asked by At

Let $A \in \mathbb{R}^{n \times n}$ and $x \in \mathbb{R}^{n \times 1}$. I am wondering what the partial derivative of each entry of $A^n x$ with respect to the entries of $A$ is. Is there a closed form expression for this?

$$\frac{\partial}{\partial A_{i,j}} A^nx$$

I wasn't able to find an answer.

2

There are 2 best solutions below

3
On BEST ANSWER

Solution

Letting $e_k$ be $k$-th standard basis vector, the derivative is $$ \boxed{ \frac{\partial}{\partial A_{ij}} \left[A^{n}x\right]=\left(\sum_{k=0}^{n-1}A^{k}e_i e_j^\intercal A^{n-k-1}\right)x } $$

Derivation

For brevity, define $E=e_{i}e_{j}^{\intercal}$ and $\partial_{ij}\equiv\partial / \partial A_{ij}$. Then, \begin{align*} \partial_{ij}\left[A^{n}\right] & =\partial_{ij}\left[AA^{n-1}\right]\\ & =\partial_{ij}\left[A\right]A^{n-1}+A\partial_{ij}\left[A^{n-1}\right]\\ & =EA^{n-1}+A\partial_{ij}\left[A^{n-1}\right]. \end{align*} Therefore, by induction, \begin{align*} \partial_{ij}\left[A^{n}\right] & =EA^{n-1}+A\partial_{ij}\left[A^{n-1}\right]\\ & =EA^{n-1}+AEA^{n-2}+A^{2}\partial_{ij}\left[A^{n-2}\right]\\ & =\cdots\\ & =EA^{n-1}+AEA^{n-2}+\cdots+A^{n-1}EA^{n-n}+A^{n}\partial_{ij}\left[A^{n-n}\right]\\ & =EA^{n-1}+AEA^{n-2}+\cdots+A^{n-1}E\\ & =\sum_{k=0}^{n-1}A^{k}EA^{n-k-1}. \end{align*}

Verification

Here is some code to numerically verify (with finite differences) 1000 random instances. Note that in the code, the matrix powers are not computed efficiently.

import numpy as np
from numpy.linalg import matrix_power as mpow
import pytest

n_params = 1000
params = zip(
    np.random.randint(1, high=10, size=n_params),
    np.random.randint(1, high=4, size=n_params),
    np.random.uniform(size=n_params),
    np.random.uniform(size=n_params),
)
params = list(params)


@pytest.mark.parametrize('m,n,i_frac,j_frac', params)
def test_formula(m, n, i_frac, j_frac, h=1e-6, rtol=1e-3):
    i = int(i_frac * m)
    j = int(j_frac * m)

    A = np.random.randn(m, m)
    x = np.random.randn(m)

    e_i = np.zeros(m)
    e_j = np.zeros(m)
    e_i[i] = 1.
    e_j[j] = 1.
    E = np.outer(e_i, e_j)

    # Compute derivative using formula.
    D = sum(mpow(A, i) @ E @ mpow(A, n - i - 1) for i in range(0, n))
    exact = D @ x

    # Compute derivative numerically.
    delta = mpow(A + h * E, n) @ x - mpow(A, n) @ x
    approx = delta / h

    np.testing.assert_allclose(exact, approx, rtol=rtol)
1
On

Consider the differential \begin{eqnarray*} d\mathbf{u} &=& (d\mathbf{A}^n) \mathbf{x} = \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} (d\mathbf{A}) \mathbf{A}^{n-1-k} \right] \mathbf{x} \end{eqnarray*} Since $\mathbf{A}= \sum_{i,j} A_{ij} \mathbf{E}_{ij}$, we simply write $ d\mathbf{A} = (dA_{ij}) \mathbf{E}_{ij} $ where $ \mathbf{E}_{ij} = \mathbf{e}_i \mathbf{e}_j^T$. Finally $$ d\mathbf{u} = dA_{ij}\cdot \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} \mathbf{E}_{ij} \mathbf{A}^{n-1-k} \right] \mathbf{x} $$ which gives the derivative $$ \frac{\partial \mathbf{u}}{\partial A_{ij}} = \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} \mathbf{E}_{ij} \mathbf{A}^{n-1-k} \right] \mathbf{x} $$