How do the Averages Work?

66 Views Asked by At

I am trying to figure out the average items sold per customer for the year 2019

I have multiple customers per day, who each have a random number of items. Sometimes a customer makes more than one purchase a day - about 6% of the time. In that case - all sales for that customer are considered to be 1 sale.

I have calculated this in 2 fashions - The 1st:

  1. Total # Items Sold ( 27,427,131 ) / Total # customers ( 6,556,133 ) = 4.18
  2. AVG(Items Sold Per Customer who make a purchase Per Day) = 4.21

The second way I did using an SQL function on an MS SQL Server The first way I did by getting the total # from the same SQL Database

With > 1 million customers - I can't replicate the 2nd method in Excel - and so I can't step it through and confirm the data.

I don't quite understand why the 2 avgs are different - but I suspected they would be and would like to use the # I think is more accurate - the 2nd one: 4.21 My boss has asked me to provide the formula used, and wants me to "prove" my answer. Not an unreasonable request, but I'm a little lost in explaining why I am getting 2 different averages. How do I explain this?

OR - and this is a real possibility - The numbers SHOULD be exactly the same and I am doing something wrong in one of my steps to calculate this.

1

There are 1 best solutions below

2
On BEST ANSWER

You're running into weighted averages on unequal sample sizes. The second formula is incorrect.

Suppose there's only the following two days:

  1. 2 items sold, 1 customer
  2. 2 items sold, 2 customers

It's obvious the first formula generates the correct result ($4/3$, 4 items/3 customers), the second is $\mathrm{AVG}(2,1)$ which is not $4/3$.

See Simpson's paradox for how bad it can get.