I recently read about logistic regression model.
$$y=\frac{1}{1+e^{-(\beta_0+\beta_1x)}}$$
where y is a categorical variable with either 0 or 1 output.
What seems to be perplexing to me is, I can see different optimization tools being used to estimate the parameters $\beta_0$ and $\beta_1$.
Here, Maximum likelilhood based estimation technique has been used. Whereas here the stochastic gradient descent has been used. Also we could have simply used the least square based technique to estimate the parameter.
Can you please tell which one is the best method to estimate parameters for logistic regression and why?
Maximum likelihood-based estimation is principle: you maximize the corresponding log-likelihood function $\ell$ derived from a Bernoulli distribution. Then it becomes an optimization problem. You may use various algorithms/methods to solve it: Newton's method, gradient descent (GD), stochastic gradient descent (SGD), and you name it. As long as you can obtain a (locally) maximum of $\ell$ within a reasonable time frame, methods are essentially equivalent: all under the MLE principle.
You may change the principle. Say you would like to carry out estimation without a specific distribution. Then iteratively reweighted least squares (IRLS) may be applied, which gives exactly the same solution as MLE by Newton's method.
From a decision-theoretical point of view, you may choose to minimize some loss function related to your research question. In particular, the MLE principle is equivalent to minimizing the empirical loss function $-\ell$, and the IRLS is equivalent to minimizing the sum-of-squares-of-residuals loss.
But you can do estimation without optimization. An example is the estimating equations, the idea of which is just to solve a system of equations.
There is no best in statistics. In practice, you may stick with the usual one: MLE, and resort to others only when MLE seriously fails.