Let $f:\mathbb{R}^{n \times n} \to \mathbb{R}$ be defined as: $$ f(A)= x^T (A^2)^i y + v^T A^i w, $$ where $i \in \mathbb{N}$ and $x,y,v,w$ are some fixed column vectors. One can assume that $A$ is a symmetric matrix. I am interested in computing the gradient of $f$ with respect to $A$. The only rule that I know is: $$ \frac{\partial x^T (A^TA)y }{\partial A}=A(xy^T+yx^T). $$ Can anyone help me to find the derivative $\frac{\partial f(A)}{\partial A}$?
2026-04-04 00:34:33.1775262873
Gradient of quadratic forms involving matrix powers
106 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MATRIX-CALCULUS
- How to compute derivative with respect to a matrix?
- Definition of matrix valued smooth function
- Is it possible in this case to calculate the derivative with matrix notation?
- Monoid but not a group
- Can it be proved that non-symmetric matrix $A$ will always have real eigen values?.
- Gradient of transpose of a vector.
- Gradient of integral of vector norm
- Real eigenvalues of a non-symmetric matrix $A$ ?.
- How to differentiate sum of matrix multiplication?
- Derivative of $\log(\det(X+X^T)/2 )$ with respect to $X$
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Warning: Lots of tedious algebra below, so there's definitely room for errors. I'll see if I can validate this result by other means.
To make the notation less confusing, I'll write the exponent as $N$ rather than $i$ so that I can use lower-case Latin letters such as $i$ for indices. I will also adopt the Einstein convention, i.e. doubled indices are to be summed over.
First, let's translate $f(A)$ into a sum over indices. Inserting dummy indices for each of the matrix multiplications yields \begin{align} v^T A^N w &=v_k (A^N)_{kl}w_l \\ &=v_k A_{kj_1}A_{j_1j_2}\cdots A_{j_{n-1}l}w_l,\\\\ x^T (A^2)^N y &=x_k (A^2)^N_{kl}y_l\\ &=x_k(A^2)_{k j_1}(A^2)_{j_1j_2}\cdots(A^2)_{j_{N-1}l}y_l\\ &=x_kA_{ki_1}A_{i_1j_1}A_{j_1i_2}A_{i_2j_2}\cdots A_{j_{N-1}i_n}A_{i_N l}y_l. \end{align}
We can now differentiate with respect to a matrix element $A_{ab}$. We have $(\partial A_{ij}/\partial A_{ab})=\delta_{ia}\delta_{jb}$, so the linear term gives
\begin{align} \frac{\partial}{\partial A_{ab}}\left(v^T A^N w\right) &=v_k (\delta_{ka}\delta_{j_1b})A_{j_1j_2}\cdots A_{j_{n-1}l}w_l+\cdots+v_k A_{kj_1}A_{j_1j_2}\cdots (\delta_{j_{n-1}a}\delta_{lb})w_l \\ &=v_aA_{bj_1}\cdots A_{j_{n-1}l}w_l+\cdots+v_k A_{kj_1}A_{j_1j_2}\cdots A_{j_{n-2}a}w_b \\ &=(v^T)_a (A^{N-1} w)_b+(v^T A)_a(A^{N-2}w)_b+\cdots+(v^T A^{N-1})_a (w)_b\\ &=(v)_a (w^T A^{N-1})_b+(Av)_a(w^T A^{N-2})_b+\cdots+(A^{N-1}v)_a (w^T)_b. \end{align}
where in the last line I have both used $A^T=A$ and swapped column vectors with row vectors (and vice versa). Similarly, for the quadratic term (going directly to the result) we obtain
\begin{align} \frac{\partial}{\partial A_{ab}}\left(x^T (A^2)^N y\right) &=(x^T)_a (A^{2N-1}y)_b+(x^T A)_a (A^{2N-2}y)_b+\cdots +(x^T A^{2N-1})_a (y)_b\\ &=x_a (y^T A^{2N-1})_b+(Ax)_a (y^T A^{2N-2}y)_b+\cdots +(A^{2N-1}x)_a (y^T)_b\\ \end{align}
Since $\left(\frac{\partial}{\partial A} f(A)\right)_{ab}=\frac{\partial}{\partial A_{ab}} f(A)$, we can combine these two terms and place them in matrix form: $$\boxed{\frac{\partial}{\partial A} f(A)=\left(x y^T A^{2N-1}+A x y^T A^{2N-2}+\cdots+A^{2N-1} xy^T\right)\\\hspace{2cm}+ \left( vw^T A^{N-1}+Avw^T A^{N-2}+\cdots A^{N-1}vw^T \right).}$$