Probability, Linear Models, Expectation

70 Views Asked by At

I'm trying to find a way of predicting various models from one "perfect model", using EXCEL. i.e. If I assume that all models should behave like the original one, for which I have all the data, I would like to find out the missing data based on the "perfect model". So the problem is as follows: I'm selling the same TV model to 10 different clients and over the span of 1 year I record how many complaints each client made. So the data looks like this:
Client ID-#of Complaints
Client1- 3
Client2- 17
Client3- 0
Client4- 7
Client5- 6
Client6- 20
Client7- 1
Client8- 7
Client9-2
Client10- 5
Now If a have the data of 10 other clients for half a year, can I predict what the # of complaints will be by the end of the year? Client ID-#of Complaints
Client1-13
Client2- 7
Client3-10
Client4- 3
Client5- 4
Client6- 2
Client7- 9
Client8- 7
Client9-1
Client10- 2
Any advice on how I should go about this using excel would be greatly appreciated Thanks

1

There are 1 best solutions below

2
On BEST ANSWER

Since you are using complaint data from one set of customers, in a previous year, to predict the complaints from another set of customers in a different year, you are implicity making the following assumptions:

  1. The complaints from each customer are iid.
  2. Each customer has the same complaint distribution this year as last year.

Under these assumptions, you can do the following:

  1. Resample with replacement from your "perfect model" to create a new random sample of size 10. You can do this in excel using VLOOKUP(RANDBETWEEN(1,10),$A$2:$B$11,2), where the lookup table has column 1 as the client number, column 2 as the total complaints. you will enter this formula in 10 columns, one per client in the sample.

    1. Copy this formula for 1000 rows to generate 1000 iid samples of 10 client compliants from your "perfect model".

    2. Sum up the total number of complaints in each sample

    3. Copy the sum column and paste values into a separate column. This will be an estimate of your total complaint distribution for 10 clients.

    4. Your half year data shows 58 compliants. What you want is the conditional distribution of total annual compliants, given that the total will be at least 58. I.e., you want to estimate $P(C\leq c|C>57)$. You can do this by forming a cumulative distribution function from your resampled totals, truncating it at 58 and then re-scaling the CDF by subtracting the percentile value of 57 and dividing by 1 minus the percentile of 57. Now you will have a conditional CDF of possible annual compliants given you have at least 58. You can form prediction intervals from this by taking off various quantiles that encompass your desired level of coverage.

I've attached some screen shots of my analysis of your data to show you what can be done:

The first shows three "resample" distributions I got from your data. Note that using 1000 resamples results in very little variability in the estimated distributions. The second shows the conditional distribution, based on your half year sample. The third shows the spreadsheet I used so you can see it did not require anything but plain vanilla Excel. It took about 10 minutes to put together.

Resamples conditional dist excel