UPDATE: I read this article by Erik Erlandson and it seems to be thing I need. However, note that the power of x is parameter to the model, and so I have no way how to apply this to my problem.
I have this dataset, and I am using y = (a * x^n) / (b + x^n) Hill function as the model, where a is the limit of the Hill curve, b is the point at which a/2 is reached (for n = 1) and n is the cooperativity or steepness of the curve.
Currently, I am storing all X,y values, computing the parameters from scipy.optimize.curve_fit, and plotting the curve. If new data points come along, I re-calculate the parameters with the old+new data.
Is there a way to update the parameters of the model without storing all of the previous old data points, once the initial parameters are obtained from the previous data points?
Example, I fit the curve to the first 1000 data points and have my parameters. Next, I discard some or all of the old data. Then, when I see the 1001st point I simply update my parameters and plot the curve again and so on for every new data point.
EDIT
My existing code is as follows (not super elegant).
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def file_stream(file_name):
with open(file_name, 'r') as in_file:
for line in in_file:
yield map(float, line.strip().split('\t'))
def hill_model(X, a, b, n):
return [float((a * x**n)) / (b + x**n) for x in X]
def get_params(X_all, y_all, prev_par=None, fn=hill_model):
if prev_par is None:
a_init, b_init, n_init = y_all[-1], y_all[0], 1.0
else:
a_init, b_init, n_init = prev_par
opt_par, opt_cov = curve_fit(fn, X_all, y_all, p0=[a_init, b_init, n_init])
a_final, b_final, n_final = opt_par
return a_final, b_final, n_final
def main():
file_name = 'data.tsv'
file_streamer = file_stream(file_name)
X_all, y_all = [], []
# Get some intial data from stream
for _ in xrange(1000):
X, y = file_streamer.next()
X_all.append(X)
y_all.append(y)
plt.scatter(X_all, y_all)
# Initialize params of model
a, b, n = get_params(X_all, y_all)
y_model = hill_model(X_all, a, b, n)
plt.plot(X_all, y_model, 'r-')
plt.show()
# Rolling update
seen_all = False # Helps stop when all data is fit
while True:
for _ in xrange(1000):
try:
X, y = file_streamer.next()
X_all.append(X)
y_all.append(y)
except:
seen_all = True
break
a, b, n = get_params(X_all, y_all, prev_par=[a, b, n], fn=hill_model)
y_model = hill_model(X_all, a, b, n)
plt.scatter(X_all, y_all)
plt.plot(X_all, y_model, 'r-')
plt.show()
# Nothing more to update, return
if seen_all:
return
if __name__ == '__main__':
main()
The code currently reads in some X,y values, calculates the a, b, n parameters and when more X,y values are added, the code updates a, b, and n params. As you can see, I need to store previous X,y values, which I do not want. I want to update the parameters as new X,y values are seen and from the previous a, b, and n values only.