I am having some problems trying to implement a gradient descent algorithm. Like I said in the title, initial values of these properties drastically change the outcome of the slope and the intercept and some values will even result in not numbers because they are either goes to negative or positive infinity.
For examples If I choose the first element of the set and find the slope of this point corresponding to the origin, and if this point is not a good representative of the other points, this also gives either a very 'wrong' answer or it will take so many iterates to get a better result by decreasing the learning rate and the values of halting criteria.
Is there any efficient way to find good initial values or do I have to try and see it by experimenting many initial points?