I'm a programmer, not a math expert or statistician by any means, but my organization wants a page in our admin console that displays a projection of how many registrations we can expect to see based on data we have from last year. Here's the math I'm currently working with:
The projection for total registrations in a given year (Ty) is calculated by the equation $$ Ty\; =\; \frac{N}{\frac{\left( \frac{\left( Lp\; +\; Ld \right)}{2} \right)}{Ly}} $$
where
- Ly is the total number of registrations last year
- N is the current number of registrations this year (so far)
- Lp is the number of registrations at the current percent of completion last year (say we're 15% through this year's registration cycle, so we'll look at where we were at 15% last year)
- Ld is the number of registrations at the current time last year (it's July 28, 2014, so look at where we were on July 28, 2013, which, in this case, will be different than Lp because we started registration on a different date last year so the percentages are different)
Thus, if
- Ly (total last year) is 616
- N (current registered) is 189
- Lp (registered at current percent last year) is 219
- Ld (registered at current time last year) is 44
then the projection for total registered attendees this year (Ty) is 886. $$ 886\; =\; \frac{189}{\frac{\left( \frac{\left( 219\; +\; 44 \right)}{2} \right)}{616}} $$
This makes sense to me and is in line with the growth we would expect. My question is whether the math is sound, or what suggestions anyone has to improve it.
Thanks!
This is a heuristic, and as such it comes with advantages, disadvantages, and assumptions. Let's go over the idea, see its assumptions, and see the advantages and disadvantages, and one possible (relatively easy) improvement.
Start simple
For the moment, let's forget that there are two ways of talking about how far through a cycle we are. Instead of both time and percentage completion, let's just think about percentage completion.
Then we might look at last year's data and try to make predictions based on that. The absolute simplest thing that we might try is to assume that the number of registrations depends solely and linearly on the number of registrations at a given percentage completion each year. (Ok, the absolute simplest is to assume no change - but this is the second absolute simplest idea).
This leads us to the heuristic
$$\frac{Ty}{Np} = \frac{Ly}{Lp}.$$
(I write $Np$ to emphasize that the number of current registrants is indicating our current percentage). This is saying that at $p$ percent through the project, we expect some absolute percentage of people to have registered - and from last year's data, we take this percentage to be $\frac{Ly}{Lp}$. To to get $Ty$, we multiply through by $Np$ and get
$$Ty = Np \cdot \frac{Ly}{Lp}.$$
Incorporating both date and percentage
But we have a problem: there are actually two ways of talking about how far through the cycle we are. We also have time of year. The absolute simplest way to try to incorporate time is to assume that it is equally important (ok... the absolute simplest is to ignore it). So instead of dealing with $Lp$ alone, we'll deal with the standard average of $Lp$ and $Ld$. This leads to our current formula:
$$Ty = N \cdot \frac{Ly}{\frac{Lp + Ld}{2}}.$$
(Notice this is equivalent to your formula, but written in an unambiguous way which I find easier to interpret; I've also dropped $Np$ in favor of $N$, though we could write $Npd$ to emphasize that the $p$ and the $d$ really come from this $N$). This has the advantage of being intuitively similar to the case above and easy to calculate.
A possible improvement
But the two major assumptions we've made (that the date and percentage completion are equally important; and that the total number of registrants depends linearly on the number of registrants at the current date and percentage completion) are strong assumptions that are almost certainly false.
So you might try to see if they are good enough - perhaps they are, perhaps they aren't. That's the nature of heuristics.
One very natural way to try to improve this method is to toss the assumption that the date and the percentage completion are equally important, and instead weight their importance. So instead of comparing to the average
$$.5Lp + .5Ld,$$
you might compare to
$$w\cdot Lp + (1-w)Ld,$$
where $w$ is some weight between $0$ and $1$. If $w$ is larger, it means that you think the percentage completion is more important. If it's smaller, if means you think the date is more important. How do you find/choose $w$? One way is to sort of guess and check. You look at your previous years' data (hopefully you have more than one year) and you try different $w$. For example, you might try to "predict" last year's data from the data of two years ago with different weights, and see which one works best. A little spreadsheeting with your favorite spreadsheet (or any other number of more programmatic tools) makes this very easy, even though it might sound a bit daunting at first.
In summary
Yes, there is some sense to the math. No, it's not perfect. And there is one potential improvement suggested above.