Why is $(A^\top A \mathbf{x}, \mathbf{x}) = (A \mathbf{x}, A \mathbf{x})$?

178 Views Asked by At

Let $(\mathbf{x}, \mathbf{y})$ denote the inner product between $\mathbf{x}$ and $\mathbf{y}$, and let $A$ be a real matrix. Why is $(A^\top A \mathbf{x}, \mathbf{x}) = (A \mathbf{x}, A \mathbf{x})$?

Using the scalar product it's easy to see that $$ \begin{align} (A^\top A \mathbf{x})^\top \mathbf{x} &= \mathbf{x}^\top A^\top A \mathbf{x} \\ &= (A \mathbf{x})^\top A \mathbf{x} \end{align} $$ but using only the properties of the inner product I fail to see how one gets the result.

#Edit: To be clear, I'm asking how one comes to the conclusion that $(A^\top A \mathbf{x}, \mathbf{x}) = (A \mathbf{x}, A \mathbf{x})$, independently of how the inner product is defined (ie. only using the axioms of the inner product), so if your answer relies on a particular definition of the inner product such as $(\mathbf{x}, \mathbf{y}) := \mathbf{x}^\top\mathbf{y}$, it's not good.

#Edit 2: The equality I'm asking about can be found in the book Linear Algebra Done Wrong (pdf) by Sergei Treil (p. 172), although here I'm interested only in the case of real matrices in the book it covers the complex case as well.

5

There are 5 best solutions below

0
On BEST ANSWER

In order to understand it without regards to a specific inner product, you must understand what the transpose is without regards to a specific matrix representation. More specifically, you need to understand what the adjoint of a linear operator is.

Given a linear operator $T:V\to V$ on an inner product space $V$ with inner product $\langle\cdot,\cdot\rangle$, the adjoint of $T$ is the operator $T^*:V\to V$ such that $$ \langle Tx,y\rangle=\langle x,T^*y\rangle $$ for all $x,y\in V$.

The definition of the adjoint is a generalization of the transpose for real inner product spaces, and of the conjugate transpose for complex inner product spaces. One of the first results after obtaining this result is the following

The adjoint of an adjoint operator is the original operator. That is, $$ (T^*)^*=T $$

Using this we can see that, using the terminology in the first block, for $x,y\in V$ we have $$ \langle T^*Tx,y\rangle=\langle Tx,(T^*)^*y\rangle=\langle Tx,Ty\rangle. $$

0
On

In your work, we have $(Ax)^T Ax$. The inner product is $(x, y) = x^T y$. Now set $x=Ax, y=Ax$, and it becomes clear. You transpose the first argument and multiply by the second.

6
On

We have $$(A^TAx,x)=[A^TAx]^Tx=[A^T(Ax)]^Tx=[(Ax)^TA]x=(Ax)^T(Ax)=(Ax,Ax)$$

1
On

Well, I don't think what you're asking can be proven, because I don't think it's true for all inner products on $\mathbf{R}^n$. Therefore, you have to specifically use the defined inner product.

We have that $\langle x,y\rangle$ is an inner product on $\mathbf R^n$ if and only if $$\langle x,y\rangle = x^\intercal My$$ for a symmetric, positive definite matrix $M$, see here. Hence, we have that $$\langle A^\intercal A x,x\rangle = (A^\intercal A x)^\intercal M x = x^\intercal A^\intercal A Mx = (Ax)^\intercal A Mx $$ whereas $$\langle Ax, Ax\rangle = (Ax)^\intercal M Ax.$$

So they are only equal in general if $M$ and $A$ commute, which happens e.g. when $M = I_n$ is the idendity matrix which gives the standard inner product on $\mathbf R^n$.

1
On

We can generalize your question a little by looking at $$\langle A^\top x,y \rangle = \langle x,Ay \rangle.$$

It does not make sense to ask your question about a "general inner product," since you are working with vectors and matrices, and thus your inner product is a very particular one, namely $\langle x,y \rangle = x^\top y$ for real vectors. [But there are other inner products for $\mathbb{R}^n$.]

As amd pointed out, if you are working with complex vectors, then the an inner product (one of many) is $\langle x,y\rangle = x^* y$ where $x^*$ denotes the conjugate transpose. Then a similar argument shows $\langle A^* x, y\rangle = \langle x,Ay \rangle$. So the mystery operator that goes in the place of $A^\top$ in your question does very much depend on what your inner product actually is.

However, if you do want to talk about more general cases, then this concept is known as an adjoint of an operator. More generally (the following is informal) if you have an arbitrary inner product space $H$ and a linear operator $A:H\to H$, then the adjoint of $A$ is defined to be the linear operator $A^*$ that satisfies $\langle A^* x,y\rangle = \langle x,A y\rangle$ for all $x,y \in H$. In your example, $H$ is $\mathbb{R}^n$ with the usual dot product, linear operators are matrices in $\mathbb{R}^{n\times n}$, and the adjoint ends up being the matrix transpose. Similarly for $H=\mathbb{C}^n$, linear operators are matrices in $\mathbb{C}^{n \times n}$ and the adjoint ends up being the matrix conjugate transpose.