I can imagine something like above. But it works only when solving linear regression using gradient descent. How can I begin imagining when there are 4, 5, 6, ... 10000 variables? What does it even look like? Sorry, just a total beginner.
How can I imagine / visualize gradient descent with many variables?
465 Views Asked by anonymous https://math.techqa.club/user/anonymous/detail AtThere are 2 best solutions below
On
Personally I find the image you have deceptive, since it represents the scalar field in $\mathbb{R}^2$ as a height in some extra larger space $\mathbb{R}^3$. Instead, I would recommend you imagine your function of two variables to represent something like temperature or density at every point in 2D space, that way you're not adding in an extra dimension.
Now when we think of the gradient in this setting, we can "feel" it. For instance let's suppose there's a campfire in the middle of a room that we can walk around in. Effectively we're in the 2D scenario where our location is a coordinate $(x,y)$ and the temperature is something like $f(x,y)=e^{-x^2-y^2}$. To visualize this example, it means the fire is hottest at the origin, and the temperature gets colder the further away you are from the origin.
We could close our eyes and just feel where on our body we feel the most warmth. The hottest direction is exactly the direction that the gradient points. This analogy is pretty robust, since if we were a fish moving around in 3D we could still understand the gradient as just swimming towards the warmest place we feel, and so on we can "visualize" this as what's happening in higher dimensions, for as well as we can visualize higher dimensions.
Furthermore, the blind aspect of it is important. All the gradient tells you is that local information you feel on your body. If you could open your eyes and see everywhere, it might just happen you're close to a warm heat source, but actually there's a much hotter source of heat very far away. So this really is only guiding you to a local extrema.

You don't actually have to visualize that for many variables.
It is true that visualizing processes could help you understand better the idea, but what you have to understand in this case is that the gradient descent is an optimization algorithm for finding a local minimum of a differentiable function.
It doesn't really matter how the function looks. All that matters is that this algorithm will give you the parameters that applied to that function will output the a value close to the local minimum.