Derivative of $f:\mathbb{R}^n\to\mathbb{R},\ f(\vec{x})=A\vec{x}\cdot\vec{x}=\vec{x}^T A\vec{x},\ A\in\mathbb{M}_{n\times n}$

120 Views Asked by At

I ma trying to prove that if $A$ is an $n\times n$ matrix, then $f:\mathbb{R}^n\to\mathbb{R},\ f(\vec{x})=A\vec{x}\cdot\vec{x}=\vec{x}^T A\vec{x}$ is differentiable and $Df(\vec{a})\vec{h}=A\vec{a}\cdot\vec{h}+A\vec{h}\cdot\vec{a}.$

(NOTE: computation edited according to Snaw's comment below)

Now, since a function $f:\mathbb{R}^n\to\mathbb{R}^m$ is differentiable if $$ \lim\limits_{\vec{h}\to\vec{0}}\frac{\vec{f}(\vec{x}+\vec{h})-\vec{f}(\vec{x})-Df(\vec{a})\vec{h}}{||\vec{h}||}=\vec{0} $$ we have that $f$ is differentiable at $\vec{a}\in\mathbb{R}^n$ if $$\lim\limits_{\vec{h}\to\vec{0}}\frac{(\vec{a}+\vec{h})^TA(\vec{a}+\vec{h})-\vec{a}^TA\vec{a}-A\vec{a}\cdot\vec{h}-A\vec{h}\cdot\vec{a}}{||\vec{h}||}\\ =\lim\limits_{\vec{h}\to\vec{0}}\frac{\vec{a}^TA\vec{h}+\vec{h}^TA\vec{a}+\vec{h}^TA\vec{h}-\vec{h}^TA\vec{a}-\vec{a}^TA\vec{h}}{||\vec{h}||}=\lim\limits_{\vec{h}\to\vec{0}}\frac{\vec{h}^TA\vec{h}}{||\vec{h}||} $$

but at this point I am not sure if I can claim that, as $\vec{h}\to\vec{0}$, the limit is $\vec{0}$, since it seems to me that I would encounter an indeterminate form of the type $\vec{0}/0$.

For example, if $A\in\mathbb{M}_{2\times 2}$ we would have $$\lim\limits_{(h_1,h_2)\to (0,0)}\frac{h_1^2 A_{11}+(A_{12}+A_{21})h_1h_2+h_2^2 A_{22}}{\sqrt{h_1^2+h_2^2}}.$$

How could I justify this claim?

Thanks

3

There are 3 best solutions below

0
On BEST ANSWER

The limit $$\lim_{h\to 0} \frac{h^T A h}{||h||}$$ intuitively equals $0$ because in the denominator we roughly have $h$ as a linear term and in the numerator it exists as a quadratic term. For instance, for $A\in \Bbb M _{2\times 2}$ we have as you've noted $$\lim\limits_{(h_1,h_2)\to (0,0)}\frac{h_1^2 A_{11}+(A_{12}+A_{21})h_1h_2+h_2^2 A_{22}}{\sqrt{h_1^2+h_2^2}}$$ and while this is of the indeterminate form $\frac{0}{0}$, the numerator tends to $0$ faster because of the quadratic terms. All of this of course requires justification and is nothing more than intuition so far. To prove this rigorously use the squeeze theorem and the triangle inequality: $$0\leq\left|\frac{h_1^2 A_{11}+(A_{12}+A_{21})h_1h_2+h_2^2 A_{22}}{\sqrt{h_1^2+h_2^2}}\right|\leq \frac{h_1^2 | A_{11}|}{\sqrt{h_1^2+h_2^2}} + \frac{|h_1h_2| | A_{12} + A_{21}|}{\sqrt{h_1^2+h_2^2}} + \frac{h_2^2 | A_{22}|}{\sqrt{h_1^2+h_2^2}}$$ For the first term we get $$0\leq \frac{h_1^2 | A_{11}|}{\sqrt{h_1^2+h_2^2}} \leq \frac{h_1^2 | A_{11}|}{\sqrt{h_1^2}} = \frac{h_1^2 | A_{11}|}{|h_1|} = |h_1|\cdot | A_{11}|\to 0$$ and the same calculation yields that the other two terms tend to $0$ as well.

For the general case of $A\in\Bbb M _{n\times n}$ a similar calculation works: in the denominator we get $\sqrt{h_1^2+\ldots+h_n^2}$ which is greater than $\sqrt{h_i^2}=|h_i|$ for each $1\leq i\leq n$, and by using the squeeze theorem and the triangle inequality again we get that each term in the numerator contains the product $|h_i|\cdot|h_j|$ for some $i,j$ (possibly $i=j$), so that over all we are left with the sum of a finite number of terms all tending to $0$.

0
On

Let $\lambda$ be the (possibly complex) eigenvalue of $A$ with the largest absolute value. Then $$|\vec h^TA\vec h|\leq \|\vec h\|\cdot\|A\vec h\|\\ \leq \|\vec h\|\cdot\|\lambda \vec h\| \leq|\lambda|\cdot\|h\|^2$$Now insert this into $\lim\limits_{\vec{h}\to\vec{0}}\frac{\vec{h}^TA\vec{h}}{\|\vec{h}\|}$ and you should get the result you desire.

0
On

Let $A\in M_{n\times n}(\mathbb{R})$ and

$$f:\mathbb{R}^n\to\mathbb{R}, \quad x\mapsto \langle Ax,x\rangle.$$

We wish to show that $f$ is differentiable, and that

$$Df(x)h=\langle Ax,h\rangle+\langle Ah,x\rangle.$$

Since I do not know what tools you have at your disposal, I'll provide two solutions.

Solution 1. For this solution we will assume we have the chain rule, as well as knowledge on the derivative of a linear map and the derivative of the inner product. As a reminder of the last one, recall that if

$$g:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R},\quad (x_1,x_2)\mapsto \langle x_1,x_2\rangle,$$

then the total derivative of $g$ is given by

$$Dg(x_1,x_2)(h_1,h_2)=\langle x_1,h_2\rangle+\langle x_2,h_1\rangle.$$

Now define

$$z:\mathbb{R}^n\to\mathbb{R}^n\times\mathbb{R}^n,\quad x\mapsto (Ax,x).$$

It is very straightforward to see that this has derivative given by

$$Dz(x)h=(Ah,h).$$

Now noting that $f=g\circ z$ we can use the chain rule and obtain that

\begin{align*} Df(x)h &= D(g\circ z)(x)h=Dg(z(x))(Dz(x)h) \\ &= Dg(Ax,x)(Ah,h) \\ &= \langle Ax,h\rangle +\langle x,Ah\rangle \\ &=\langle Ax,h\rangle + \langle Ah,x\rangle, \end{align*}

which is the desired result.

Solution 2. Assuming you don't have those tool to work with, we can simply work with one of the equivalent definitions to verify the derivative (personally I think the first method is easier because we don't need to already know the derivative). Let us use the definition you provided, that $Df(a)$ in this case is the unique linear mapping $\mathbb{R}^n\to\mathbb{R}$ such that

$$\lim_{h\to 0}\frac{\lvert f(x+h)-f(x)-Df(x)h\rvert}{\lVert h \rVert}=0.$$

This gives us that

\begin{align*} \frac{\lvert f(x+h)-f(x)-Df(x)h\rvert}{\lVert h \rVert} &= \frac{\lvert \langle A(x+h),x+h\rangle-\langle Ax,x\rangle-\langle Ax,h\rangle - \langle Ah,x\rangle\rvert}{\lVert h \rVert} \\ &=\frac{\lvert \langle Ax,x\rangle+\langle Ax,h\rangle+\langle Ah,x\rangle+\langle Ah,h\rangle-\langle Ax,x\rangle-\langle Ax,h\rangle - \langle Ah,x\rangle\rvert}{\lVert h \rVert} \\ &=\frac{\lvert\langle Ah,h\rangle\rvert}{\lVert h \rVert} \\ &\leq\frac{\lVert Ah \rVert \lVert h \rVert}{\lVert h \rVert} \\ &=\lVert Ah \rVert \\ &\leq \lVert A \rVert_{\text{Eucl}} \lVert h \rVert \end{align*}

Now since we have that

$$\lim_{h\to 0}\left(\lVert A \rVert_{\text{Eucl}} \lVert h \rVert \right)= 0,$$

it follows from the Squeeze theorem that

$$\lim_{h\to 0}\frac{\lvert f(x+h)-f(x)-Df(x)h\rvert}{\lVert h \rVert}=0.$$

Hopefully at least one of the solutions helped you understand how to solve the problem!