Cost Function (complicated graphs)

473 Views Asked by At

I've been starting to learn Machine Learning, and I've got stuck on quite a complicated graph. Here's function of parameters. How can I realize this graph? It's obvious, of course, that here's a function, but I can't understand how they have computed $\theta_1$? Usually I should divide y axis on x axis, but here I really don't get the point. Thanks in advance.

1

There are 1 best solutions below

1
On BEST ANSWER

Note: It will probably be hard for anyone (except perhaps for someone from the field of machine learning, that recognize the functions) to give a satisfactory answer to these questions, for a number of reasons:

  • We have no understanding of the underlying problem, e.g we do not know what the function $J(\theta_0, \theta_1)$ is.
  • We do not know how much or how little you do know of the problem, and it is not entirely clear what you have misunderstood (if anything).

I will make my best to give you an answer, as I at least think I know part of why you are confused. It will be some guessing on my part, and if not satisfactory, it might be a good aid for someone that knows more about machine learning to give an answer.

It looks to me that you are dealing with a gradient descent optimization algorithm of some sort. The plot that you refer to is not your usual plot of $J(\theta_0,\theta_1)$. As $J$ is a function of two varialbes, and I guess, not a vector valued function, it's graph is a surface. The circles you are seeing are the level curves of the function.

For a function $f(x,y)$, the level curves are what you get if you let $f(x,y)=c$, and then project $f(x,y)$ down into the $xy$-plane for different values of $c$, i.e each circle corresponds to a value for $c$. Think of the level curves on a map, where the surface of a mountain would represent the graph of a function of two variables. I think that the goal in this optimization is to minimize $J$. In your case the values of $\theta_0$, and $\theta_1$ that give the lowest cost, is thus the ones in the smallest, center circle, i.e at the bottom of the surface that is the graph of $J$.

EDIT: $\theta_0$ and $\theta_1$ are parameters that are probably updated in each step of the process, as you want to find the optimal value for the parameters, which, as I stated above, are the values that 'hits' the bottom of the graph of $J$. So for each update of $\theta_0,\theta_1$ you should get closer to the center of the smallest circle in that image. Exactly how these are updated probably depends on the chosen algorithm of gradient descent. Remember, from analysis, that the gradient points in the direction of steepest ascent. Hence the name gradient descent in this case, as we want to find the bottom.

I recommend that you read up on linear regression and gradient descent, perhaps here, or here.

I hope I did not confuse you too much, and that you at least find this helpful. Sometimes it is hard to formulate a concise question. Hopefully you now have enough to reformulate it better, if you feel that you need any more help.