I have a dataset with me: http://pastebin.com/YZArky1j, which I am trying to polyfit. This is what I used to perform the polyfit:
x = [i[0] for i in data] # Get all x co-ordinates
y = [i[1] for i in data] # Get all y co-ordinates
coeff = polyfit(x, y, 5)
I am using the polyfit function from numpy: http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
The coefficients I get are: $$ -2.84904034523 \times 10^{15} \\ 2.46136213423 \times 10^{11} \\ 8.66716249115 \times 10^{6} \\ -5.3534845897 \times 10^{2} \\ -0.577304577155 \times 10^{-1} \\ 2.54464720615 \times 10^{-6} \\ $$
The plot of the polynomial with the plot of data looks like:

Here, red is the polynomial function and the blue is a plot of the data. So far, it is looking really good. This data is actually the prize pool of a gaming tournament v/s the day from when it started. So, when I try to project as to how it would look when the tournament actually starts (and there will be no more increase to the prize pool), I get a ridiculous number: $498,527,857.5$. This absolutely cannot be the case since the pool right now is around 6 million, there is no way that in two months it will reach 400 million+ in the remaining 2 months, especially as a look at the graph tells us that the rate of rate of growth is decreasing.
So, maybe I should try to fit it in the form $y = B + log(x)*A$. So, I tried to find the values of A and B:
log_x = [math.log(i) for i in x]
coeff_log = polyfit(log_x, y, 1)
And the coefficients I get are:
$$ 5.28101367 \times 10^{9} \\ -5.11847659 \times 10^{10} $$
And now, the projected prize pool looks something like: $24,987,469$. This also is highly suspect and I don't think is the correct projection.
What am I missing or what is it that I am doing wrong?
Since I need to post a picture, I will expand some of my comments and turn them into an answer.
About the part what's wrong in your approach, as mentioned in the comments, they are:
Once one move away from the range of data, the quality of a polyfit based extrapolation can deteriorate pretty quickly. Don't use it.
If you make a plot of $\log x$ vs $x$ over $[16200,16212]$, you will notice the graph is pretty straight there. Your result is no different from doing a linear fit.
It is understandable why you want to use $\log x$ for fitting the data. In fact, it is a good choice to capture the concavity of the provided data. However, to use a logarithm fit, one need to offset $x$ to the point where the data supposed to kick start. i.e one should try a fit of the form:
$$y = y(x) = A \log(x - \alpha) + B\quad\text{ where }\quad \alpha \sim 16200.$$
The next issues is how to fix/determine $\alpha$, $A$ and $B$. Instead of a full scale non-linear least square fit, one can
I'm not going to derive anything. I'll just write down some formula so that you can try the way I perform the fit.
(in unit date/time format, date 0 = 1970/01/01 mid-night)
For any two vectors $X_i$, $Y_i$ of size $N$, define $$\begin{cases} \langle X \rangle &= \frac{1}{N}\sum_{i=1}^{N} X_i\\ \text{Corr}( X, Y ) &= \langle XY \rangle - \langle X\rangle\langle Y\rangle\\ \text{Var}(X) &= \langle X^2 \rangle - \langle X \rangle^2 \end{cases} $$ The $A$ and $B$ which minimize the MSE (mean square error):
$$\text{MSE}(X,Y) = \langle ( Y - A X - B )^2 \rangle = \frac{1}{N}\sum_{i=1}^N (Y_i - A X_i - B)^2$$
is given by $\displaystyle\;\begin{cases} A_{opt} &= \frac{\text{Corr}(X,Y)}{\text{Var}(X)}\\ B_{opt} &= \langle Y \rangle - A \langle X \rangle \end{cases}$. The corresponding optimal MSE also has a simple formula
$$\text{MSE}_{opt}(X,Y) = \text{Var}(Y) - \frac{\text{Corr}(X,Y)^2}{\text{Var}(X)}$$
To fix $\alpha$ for this problem, one just need to find the $\alpha$ which minimize $\;\text{MSE}_{opt}(\log(x-a),y)$.
It turns out I tell you the wrong $\alpha$ in comment. The optimal $\alpha$ should be around $16198.\color{red}{5}9255$.
Following is a plot of the prize pool vs. date.
On 2014/07/19 (unix date 16268), the two fit above gives following estimates of the prize pool (in unit of millions): $$y_{opt}(16268) \sim 9.66951606144537 \quad\text{ and }\quad y_{est}(16268) = 8.540674609249399 $$