Likelyhood function analysis

82 Views Asked by At

I've done some calculations on a large number of data, and created the following graph in excel representing the data:

enter image description here

How do I go about analysing this regression in order to find the formula that approximately matches this graph?

A few random samples of data:

1, 1.861
0.95, 3.675
0.84, 4.487
0.83, 4.542
0.61, 5.389
0.50, 5.786
0.42, 6.076
0.34, 6.349
0.18, 7.102
0.08, 7.925
0.04, 8.511
0.01, 10.171

Is there a good tool for calculating it somewhere? I've tried searching but no luck

2

There are 2 best solutions below

1
On

What you're doing here is not how the term "logistic regression" is normally understood.

In logistic regression you have real numbers in one column and $0$s and $1$s in the other. The logistic function is an estimate of the probability of getting a $1$, given the value of the number on the $x$-axis. You're fitting a curve $$ \operatorname{logit} p = \log\frac p {1-p} = ax+b\qquad\text{or, equivalently}\qquad p=\frac{1}{1+e^{-(ax+b)}}.\tag1 $$ You have a likelihood function $$ L(a,b) = \prod_x \begin{cases} p & \text{if }x=1, \\ 1-p & \text{if }x=0, \end{cases} $$ where $p$ depends on $x$ as in $(1)$ and the product is over all of the observed $x$ values. So you have something like $x=$ the patient's income and you get a $1$ or a $0$ according as the patient survived the procedure or not $\ldots$ etc.

The estimates of $a$ and $b$ are the values that maximize $L(a,b)$, and those are found numerically. An algorithm called iteratively reweighted least squares is used, but I think other algorithms may be replacing it in practice.

0
On

The plot suggests a sigmoïd function and the logistic function seems to be a good candidate for a reasonable fit.

The most traditional formulation is $$y=\frac{L}{1+e^{-k(x-x_0)}}$$ and, in your case, $k$ would be negative. Since your data are obviously percents, then the value of $L$ is known (it should be $1$ or $100$ depending if you work with fractions or percents). Let us say that we use fractions (just as in the data set you provide). So, let us keep the model as $$y=\frac{1}{1+e^{-k(x-x_0)}}$$ This requires nonlinear regression which itself needs reasonable estimates of the parameters for starting iterations. These can be obtained by a preliminary step; writing $$z=\log\Big(\frac 1y-1\Big)=-k(x-x_0)=\alpha +\beta x$$ a first linear regression of the $z_i$'s as a function of the $x_i$'s will give $\alpha$ and $\beta$ from which you can deduce easily estimates of $k$ and $x_0$ (for sure, you must discard for this first step all data points for which $y=1$). All of this can be done using Excel. Now, you are ready for the nonlinear regression and you could find online various software for doing the work which can be done using Excel solver.

I suggest you have a look at http://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-excels-solver/

Using this method for the data set given in the post, the first step will give $\alpha= -6.8245$ and $\beta=1.1591$, that is to say $k=1.1591$ and $x_0=5.8878$ and the final result would be $$y=\frac{1}{1+e^{1.22051 (x-5.80793)}}$$ and the fit is very good (the coefficient of determination being $R^2=0.9998$. I suppose you notice how good are the initial estimates.