Show $Du(\mathbf{x})=[Dv(Q^T\mathbf{x})]Q^T=[Dv(\mathbf{x}')]Q^T$ and $Hu(\mathbf{x})=Q[Hv(\mathbf{x}')]Q^T=Q[Hv(\mathbf{x}')]Q^{-1}$

40 Views Asked by At

Let $u:\mathbf{R}^2\to\mathbf{R}$ and assume that all of the second-order partial derivatives of $u$ are continuous on $\mathbf{R}^2$. For each $\mathbf{x}\in\mathbf{R}^2$, regard $\mathbf{x}$ as column vector, and defined $\mathbf{x}'\in\mathbf{R}^2$ so that $\mathbf{x}=Q\mathbf{x}'=x'_1\mathbf{q}_1+x'_2\mathbf{q}_2$ where $\{\mathbf{q}_1,\mathbf{q}_2\}$ is orthogonal. Then, define a function $v:\mathbf{R}^2\to\mathbf{R}$ by $v(\mathbf{x}')=v(Q^T\mathbf{x})=v(F(\mathbf{x}))=u(\mathbf{x})$. Show $Du(\mathbf{x})=[Dv(Q^T\mathbf{x})]Q^T=[Dv(\mathbf{x}')]Q^T$ and $Hu(\mathbf{x})=Q[Hv(\mathbf{x}')]Q^T=Q[Hv(\mathbf{x}')]Q^{-1}$. (State explicitly the entries of each matrices)


Let $u:\mathbf{R}^2\to\mathbf{R}$ and suppose that all of the second-order partial derivative of $u$ are continuous on $\mathbf{R}^2$. For each $\mathbf{x}\in\mathbf{R}^2$, $\mathbf{x}=Q\mathbf{x}'=x'_1\mathbf{q}_1 + x'_2\mathbf{q}_2$ where $\mathbf{x}'\in\mathbf{R}^2$. Suppose that $v:\mathbf{R}^2\to\mathbf{R}$ is defined by $v(\mathbf{x}')=v(Q^T\mathbf{x})=v(F(\mathbf{x}))=u(\mathbf{x})$ for all $\mathbf{x}\in\mathbf{R}^2$ where $F(\mathbf{x})=Q^T\mathbf{x}$. Applying the information above, we have the followings: $Du(\mathbf{x})=\begin{bmatrix} \frac{\partial u}{\partial x_1}&\frac{\partial u}{\partial x_2} \end{bmatrix}$. As $u(\mathbf{x})=v(F(\mathbf{x}))=v(Q^T\mathbf{x})$, we have \begin{align*}Du(\mathbf{x})=& Dv(F(\mathbf{x}))DF(\mathbf{x})=\begin{bmatrix} D_1v(F(\mathbf{x})) & D_2v(F(\mathbf{x})) \end{bmatrix}Q^T\\ =& Dv(Q^T\mathbf{x})D(Q^T\mathbf{x})=\begin{bmatrix} \frac{\partial v}{\partial x_1} & \frac{\partial v}{\partial x_2} \end{bmatrix}Q^T\end{align*} Since $\mathbf{x}=Q\mathbf{x}'$ and $Q^TQ=I$, we receive $[Dv(\mathbf{x}')]Q^T$. Therefore, we have $$Du(\mathbf{x})=[Dv(Q^T\mathbf{x})]Q^T=[Dv(\mathbf{x}')]Q^T$$

Now take the Hessian for $u(\mathbf{x})$, that gives $Hu(\mathbf{x})=\begin{bmatrix} \frac{\partial^2u}{\partial x^2_1} & \frac{\partial^2u}{\partial x_1x_2}\\ \frac{\partial^2u}{\partial x_2x_1} & \frac{\partial^2u}{\partial x^2_2} \end{bmatrix}$. As $u(\mathbf{x})=v(Q^T\mathbf{x})=v(F(\mathbf{x}))$, take the Hessian for $v(F(\mathbf{x}))$, we have \begin{align*} Hu(\mathbf{x})=& Hv(Q^T\mathbf{x})=Q\begin{bmatrix} \frac{\partial^2u}{\partial x^2_1} & \frac{\partial^2u}{\partial x_1x_2}\\ \frac{\partial^2u}{\partial x_2x_1} & \frac{\partial^2u}{\partial x^2_2} \end{bmatrix}Q^T\\ =& Hv(F(\mathbf{x}))=Q\begin{bmatrix} D^2_1v(F(\mathbf{x})) & D_1v(F(\mathbf{x}))D_2v(F(\mathbf{x}))\\D_2v(F(\mathbf{x}))D_1v(F(\mathbf{x})) & D^2_2v(F(\mathbf{x})) \end{bmatrix}Q^T \end{align*} where $F(\mathbf{x})=Q^T\mathbf{x}$. Since $v(\mathbf{x}')=v(Q^T\mathbf{x})$, $Hv(\mathbf{x}')=QHv(Q^T\mathbf{x})Q^T$. And, since $Q^T=Q^{-1}$, we can have $Hu(\mathbf{x})=Q[Hv(\mathbf{x}')]Q^T=Q[Hv(\mathbf{x}')]Q^{-1}$.


I don't think I did it right because I am confusing how to apply the chain rule in a matrix. Can someone show me how apply it? Thanks.

2

There are 2 best solutions below

3
On BEST ANSWER

Watch! I'm gonna nail this question.

Remember this: Derivative of a multi-variable function, is not a number, but rather a linear transformation, a matrix.

If $u:\mathbf{R}^2\to\mathbf{R}$ then its derivative at $x \in \mathbf{R}^2 $ is a matrix, here from $\mathbf{R}^2$ to $\mathbf{R}$.

So, what does it do? It takes a vector $e$ in $\mathbf{R}^2$, and returns a real number. This real number, by the very definition of the derivative of $u$ is the directional derivative of $u$ in the direction $e$.

So, fix $ x \in \mathbf{R}^2 $, and take a vector (the direction) $e \in \mathbf{R}^2 $ . As I said, finding $Du(\mathbf{x})(e)$ means taking directional derivative:

$$ Du(\mathbf{x})(e) = \frac{d}{dt} u(x+te) = \frac{d}{dt} v(Q^T(x+te)) =\frac{d}{dt} v(Q^Tx+tQ^Te)) $$

But, isn't the RHS just the definition of directional derivative at $Q^Tx$ of $v$ in the direction $Q^Te$?! It is. So,

$$ Du(\mathbf{x})(e) = Dv(Q^Tx)(e)$$

Since $e$ was arbitrary, this says that the matrices are equal:

$$ Du(\mathbf{x}) = Dv(Q^Tx)Q^T $$

Note that $Q^Tx$ is named $x'$. This does one of the identities you were seeking.

0
On

Comment 2: Even more precisely, the derivative at $x \in R^2$ acts on the vector space of all directions emanating from x, but by representing them with vectors in $R^n$ we are implicitly using the fact that this vector space is canonically the isomorphic to vector space of directions emanating from $0 \in R^n$. If we are taking derivative of a function $u:\mathbf{S}^2\to\mathbf{R}$, that is from the surface of the sphere, then the space of directions at $ x \in \mathbf{S}^2 $ changes dramatically from one $x$ to another. (The tangent space is horizontal at the north pole, but shifts and rotates as we move on the sphere.) The derivative of such a function is defined exactly by the description I gave above: Take a direction at $x$ and calculate $\frac{d}{dt}u$ as we move in that direction, the real number we get is $ Du(x)(e)$. And this defines a matrix, called the derivative of $u$ at $x$, and denoted by $Du(x)$.

A lot of Riemannian geometry there!