Unfamiliar Gradient Notation referencing first order Taylor approximation of f

64 Views Asked by At

Below if my problem

Gradient of some common functions.

Recall that the gradient of a differentiable function $f: \mathbb{R}^n \to \mathbb{R}$, at a point $x \in \mathbb{R}^n$, is defined as the vector $$ \nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}\\ \frac{\partial f}{\partial x_2}\\ \vdots\\ \frac{\partial f}{\partial x_n} \end{bmatrix} $$ where the partial derivatives are evaluated at the point $x$. The first order Taylor approximation of $f$, near $x$, is given by $$ \hat{f}_{\!\mathrm{tay}}(z) = f(x) + \nabla f(x)^\top (z-x) $$

This function is affine, i.e., a linear function plus a constant. For $z$ near $x$, the Taylor approximation $\hat{f}_{\!\mathrm{tay}}$ is very near $f$. Find the gradient of the following functions. Express the gradients using matrix notation.

a) $f(x) = a^\top x + b$, where $a \in \mathbb{R}^n$, $b \in \mathbb{R}$.

b) $f(x) = x^\top A\,x$, for $A \in \mathbb{R}^{n \times n}$.

c) $f(x) = x^\top A\,x$, where $A = A^\top \in \mathbb{R}^{n \times n}$. (Yes, this is a special case of the previous one.)

My Issue

This is the starting framework shown to solve the above, but I am unfamiliar with this formula. It looks like a finite differentiation equation, but I don't know it with both the $h$ and the $\Delta x$. I'm also confused about the $$ \frac{g^\top \Delta x}{\| \Delta x \|} $$ expression. My guess is it is some reference to the Taylor approximation, but I'm struggling to connect the two. Any help would be great!

$$ \frac{g^\top \Delta x}{\|\Delta x\|} = \lim_{h \to 0} \frac{f(x+h \Delta x) - f(x)}{\|h \Delta x\|} $$

Update

I think I've gathered that $\Delta x$ is a vector of dimension (n x 1). Perhaps a vector of the change in $x$ in $\mathbb{R}^n$? But this still leads me with the question of what does $h$ represent? Some scalar that interacts with the vector $\Delta x$?

1

There are 1 best solutions below

4
On

I find the notation to be slightly confusing, but otherwise this is classic stuff in multivariable differential calculus.

Basically $\delta x$ is a fixed vector. It does not go to zero. It just tells you of a direction in which you want to find the variation of your function.

Now $h$ is a dummy parameter that dictates how far in this direction you go to compute the variation of the function. Now you bring this to zero and it gives you a quantity which is some kind of derivative, and which I'd like to call the directional derivative in the direction $\delta x$.

Now you can do that for any direction (imagine you're on a mountainside, if you go west you go up with a 3% slope, if you go north you go up with a 4% slope (east and south are the same but down), if you go northwest it's a different slope, etc...)

So each direction gives you a different directional derivative. Now you as long as this function is differentiable, then you have a magic vector call the gradient, which allows you to find any directional derivative by simply taking the dot product of the gradient with the directional vector $\delta x$.