I have been using Nelder-Mead optimization function to obtain optimum weight values for cost function in simple linear and logistic regression cases, the advantage of this algorithm is that it doesn't require the user to supply the cost function gradients with respect to the weights (it feels like cheating).
So, are there any limitations or constraints for using optimization methods like Nelder-Mead's that do not require calculating gradients?
As a general principle, gradient-based methods tend to converge significantly faster on smooth functions than gradient-free optimization methods. Also, while there are nice convergence guarantees for stochastic gradient descent on convex functions, it seems the convergence guarantees are quite limited for Nelder-Mead. See this paper on finally getting convergence guarantees for a variant in two dimensions, for example: http://www-math.mit.edu/~poonen/papers/nm.pdf.