Gradient of function that can be expressed as another function

96 Views Asked by At

Given unknown constants $a$ and $b$, let

$$f(x,y) := \frac{a}{2} x^2+ \frac{b}{2} y^2$$

meaning that $$\nabla f(x,y) = \begin{bmatrix} ax\\ by\end{bmatrix}$$

Then I have $h(k) = f((1-ka)x_0, (1-kb)y_0)$, where $x_0$ and $y_0$ are constants.

(This is line search for gradient descent btw, we had the formula $h(k) = f(z_i + kp_i)$ where $z_i$ is the starting point we have ($(x_0, y_0)$ in this case) and $p_i$ is the negative gradient at $z_i$ ($-\nabla f(z_i)$))

I was asked to set the gradient of $h$ to $0$ and solve for $k$. Would this makes $\nabla h(k) = \nabla f((1-ka)x_0, (1-kb)y_0) = (a(1-ka)x_0, b(1-kb)y_0)$? In which case I would need $a(1-ka)x_0 = b(1-kb)y_0 = 0?$ Which is impossible since $a$ and $b$ might very well be not equal?

So the question I have is 1) did I misunderstand something/did a step wrong while doing the derivation and 2) is it actually possible to find a $k$ where gradient of $h(k)$ is $0$?

2

There are 2 best solutions below

0
On BEST ANSWER

Let's try to understand the algorithm, and that should also answer your query. We have a convex scalar function in space $f:R^2 \to R$ and we would like to find the point where it achieves it's minimum/maximum value

To start the game, we need to pick a starting point - arbitrarily labelled $(x_0,y_0)$. The beauty of a convex function is that you could pick any starting point and it would eventually converge to your solution

Now, we need a direction to proceed to the next point - how to pick it? The gradient gives us a direction which represents the fastest rate of change - hence for a small step in that direction we would change the value the most. Seems like a nice choice of direction. (I'm adding the negative sign here but even if you don't the value of $k$ would be negative)

$$\vec{p_0} = -\nabla f(x_0, y_0)$$

Finally - we need a distance to proceed in that direction. This is where $k$ comes into the picture. How far along should we travel? A good way of doing it is to keep moving till $f$ is minimised - right? That would translate to

$$k^* = \arg\min h(k)$$

where $h(k) = f((x_0,y_0) - k\vec{p_0})$

In your case that simply means finding $k^*$ such that

$$\frac{dh(k)}{dk}|_{k^*} = 0$$

Once you get $k^*$ you can find the next point and repeat the process till it converges

0
On

Setting $u(k)=(1-ka)x_0,v(k)=(1-kb)y_0$, gradient of $h(k)$ is simply$$\begin{align*}h'(k)&=\frac{df(u(k),v(k))}{dk}\\&=f_uu'(k)+f_vv'(k)\\&=-ax_0f_u-by_0f_v\\&=-a^2ux_0-b^2vy_0\end{align*}$$

Substituting back $u(k),v(k)$ and equating this to zero gives$$k=\frac{a^2x_0^2+b^2y_0^2}{a^3x_0^2+b^3y_0^2}$$