Why is gradient co-variant ? (intuitively )

276 Views Asked by At

At 7:47 in this video, the professor defines a function F(x,y) on regular cartesian grid, then later defines the same function on a scaled cartesian grid (where x' = 2x and y'=2y) , now, after this he takes gradient on the function defined on the new grid. And for some reason, the gradient on the new function is four times the function in cartesian, and I can't understand why that should be the case..

I tried replicating what he did for a single variable function but my results differed quite a bit. I took the function $f(2x) = x$ then defined it on a new x' coordinate system which is squished.

$$ f(2x) = x$$

$$ 2x= x'$$

$$ f(x') = \frac{x'}{2}$$

$$ \frac{df}{dx'} = \frac{1}{2}$$

The derivative is actually half of the original function. As in the derivative scales accordingly such that when I multiply the derivative of the inner function (analogue to basis here), everything becomes the same. As in,

$$ \frac{df}{dx'} \frac{dx'}{dx} = 1$$

So, I intuitively think that the gradient in this new coordinate system should scale in such a way such that the scale up of the basis vectors is cancelled by the scaling down of the gradient. As in the gradient should be invariant of coordinating scaling and dilation. However the regular gradient definition we all know and love is not, Why exactly does this happen? other than for the fact of it being a purely algebraic result?

Later in the video, he says that we can fix the discrepancy by doing,

$$ \nabla F = \frac{ \partial F}{\partial x} \frac{i}{|i|^2} + \frac{ \partial F}{\partial y} \frac{ j}{ |j|^2}$$

but, I can't see a systematic approach for finding that 'fix'

2

There are 2 best solutions below

4
On BEST ANSWER

You're doing multiple things differently than in the video.

First, you're not really defining the function $f$ as a function of $x.$ You have $f(2x) = x.$ What does that mean?

What is going on in the video is that there is a plane with a coordinate system, and there is a function that assigns some numeric value at each point in the plane. If we reduce this to one dimension, we should have a line with a coordinate system for the line and a function that assigns a value at each point.

Classically, however, a line exists independently of any coordinate system. Below is a representation of part of a line from Euclidean geometry.

enter image description here

(It's "part of" a line because Euclid says the line can be extended in either direction indefinitely.)

People did mathematics on lines and other geometric objects for thousands of years without using coordinates. The reason Cartesian coordinates are called Cartesian coordinates is because a man named René Decartes got this idea in the 1600s and it caught on.

But we also now have a modern definition of function that allows us to take arbitrary objects from some arbitrary set and assign arbitrary objects from some set as the function values of the objects in the first set. So we could take the points of the line shown above as the first set, and assign numbers as the function values of the points on the line:

enter image description here

If you suppose that all points to the left of the label $5$ have function value $5,$ all points to the right of the rightmost label $3$ have function value $3,$ and the points in between each two consecutive number labels have function values increasing or decreasing steadily between those numeric values, we have a function on the entire line: every point has a number assigned to it.

Here's a different function assigning numbers to points on the line:

enter image description here

If we imagine that the numeric values are increasing steadily at the same rate as we go from left to right anywhere along this line, then again we have a number assigned to every point on the line, therefore a function from the points to numbers.

Now if we also had a coordinate system on this line, and labeled some points with their coordinates as well as with their function values, we might have this:

enter image description here

Just to be clear, the function mentioned in the previous paragraph takes geometric points as its input, not numbers. In order to remind ourselves that the input to this function is geometric and not numeric, let's refer to it as a "geometric" function.

But since each point in the plane has a unique numeric coordinate, the "geometric" function implies another function that takes coordinates (numbers) as input and gives numbers as output. In the example above we will be able to write an algebraic expression for this mapping from numbers to numbers, so let's call this second function an "algebraic" function.

Pure "geometric" functions tend to be very hard to describe. Functions like our "algebraic" function are often easily described by formulas. That's one reason why coordinates are so popular, and why the professor in the video always likes to work with coordinates.

The algebraic function that maps the coordinates shown in the figure above to the value assigned at each corresponding point maps the coordinates shown as follows: \begin{align} 0 &\mapsto 0, \\ 1 &\mapsto 0.5, \\ 2 &\mapsto 1. \end{align}

If the original geometric function is defined so that its values increase linearly along the line, then the general rule for the algebraic function of these coordinates is $$ x \mapsto \frac x2. $$

The function $f(x') = \frac{x'}{2}$ is exactly the same algebraic function, and so is $f(2x) = x.$ And if we say that the coordinates in the figure above are $x$ coordinates, these various representations of algebraic functions all are correct algebraic functions for $x$ coordinates.

But what the professor in the video wants to do is not to maintain the same algebraic function on whatever coordinates he chooses. He wants to maintain the same geometric function (from points to numbers) regardless of coordinates. Moreover, he changed his coordinates so that in the new system, $x',$ the coordinates $0$ and $1$ are twice as far apart as in the old system, $x.$ That change of coordinates is illustrated by crossing out the old $x$ coordinates and writing new $x'$ coordinates as follows:

enter image description here

Note that the point that had $x$ coordinate $1$ before now has $x'$ coordinate $0.5.$ That is, when $x=1$ then $x'=0.5,$ or $x' = \frac x2 \neq 2x.$ The coordinate transformation you tried in your question was exactly the opposite of what is shown in the video.

Note also that we now have the algebraic mappings of particular coordinates \begin{align} 0 &\mapsto 0, \\ 0.5 &\mapsto 0.5, \\ 1 &\mapsto 1, \end{align} and the general mapping is $$ x' \mapsto x'. $$

That is, the algebraic function is now $f'(x') = x'.$ This is different from the old algebraic function.

Note that the old algebraic function had derivative $$ \frac{df(x)}{dx} = \frac 12, $$ but the new algebraic function has derivative $$ \frac{df'(x')}{dx'} = 1. $$

So that's different by a factor of $2,$ that is, $$ \frac{df'(x')}{dx'} = 2 \frac{df(x)}{dx}.$$

What you did in the question was to manipulate the same algebraic function $f$ while ignoring what this did to the geometric function. If you transform the coordinates as in the video but keep the same algebraic function, you get this result:

enter image description here

The derivative of the function values with respect to the coordinate values is the same as before, but most of the points have new values assigned to them by the function. The function is the same function of numbers but a different function of points.


But there's another thing in the video we have not yet looked at. Unlike a single-variable derivative, the gradient is not a number; it's a vector. To get a vector, initially the professor took the derivative in each direction and multiplied it by the unit vector in that direction. On a line (instead of a plane) we have just one unit vector to deal with instead of two. In $x$ coordinates this looks like this:

enter image description here

with the unit vector $\vec \imath$ shown as an arrow. But in $x'$ coordinates the unit vector is

enter image description here

where the unit vector $\vec\imath'$ is shown as a longer arrow. Since the distance from $x'=0$ to $x'=1$ is twice the distance from $x=0$ to $x=1,$ we have $\vec\imath' = 2 \vec\imath.$ Therefore

$$ \frac{df'(x')}{dx'} \vec\imath' = \left(2\frac{df(x)}{dx}\right) (2 \vec\imath) = 4 \frac{df(x)}{dx} \vec\imath. $$

Now if we stick to single-variable calculus we never need to worry about directions in two or more dimensions, so we don't really need a "unit vector" to orient us. So it was perfectly natural for you to ignore the part about the unit vector when trying to do a "gradient" in one dimension; just keep in mind that this doesn't translate well to two or more dimensions.

6
On

For a single variable note that if $f(x)=x/2$ then $df/dx=1/2 i_x$ (writing the direction of coordinates would be useful here) and after re-scaling 2 times the coordinates we have: $$f(y)=y/2,\quad y=2x$$ and $$df/dy=1/2 i_y=1i_x$$ and this is the derivative with respect to $y$ and if we convert it to $x$ using chain rule we get: $$df/dx=(df/dy)(dy/dx)=2\times 1i_x$$ which is 4 times of the original one i.e. $4\times 1/2i_x$.

The multi-variable case is similar.

Caution: When you write $f(2x)=x$ for an original function $f(x)=x/2$ then argue that if I take $x'=2x$ it will be $f(x')=x'/2$ is a useless cycle. In other words you are stretching coordinate first twice then stretching it in half. I.e. coming back to the original coordinates.