I'm performing a numerical optimization of a function whose value I know and whose derivatives I can calculate. Let's say, for argument, that I'm optimizing the following function:
$$argmax_f: \sum_i log(\binom{N_i}{k_i} k_i^\theta N_i^{1-\theta})$$
Where: $$\theta = \frac{f}{f+(1-f)e^{\lambda_i}}$$
We can compute the gradient and hessian for this problem:
$$g = \sum_i \frac{k_i}{f} - \frac{N_i-k_i}{1-f} - \frac{N_i(1-e^{\lambda_i})}{f+(1-f)e^{\lambda_i}}$$
$$h = \sum_i -\frac{k_i}{f^2} - \frac{N_i - k_i}{(1-f)^2} + \frac{N_i(1-e^{\lambda_i})^2}{(f+(1-f)e^{\lambda_i})^2}$$
Using the ability to calculate derivatives and function values, what are some approaches for optimizing our value of $f$?
Specifically, a coworker mentioned something he referred to as Generalized Newton-Rhapson (sic?), whereby we match our derivatives to a function whose maximum we can compute in closed form. For instance, we would match the gradient and the hessian to those of a beta distribution, maximize the beta distribution, and repeat until convergence. Is this truly Generalized Newton-Rhapson, or is this approach named something different? Also, how do you match the derivatives to those of a beta distribution?
Any answers regarding these questions or any other optimization suggestions would be greatly appreciated! Thank you!