The new law requires companies to make summary statistics of salaries publicly available:
- Mean
- Standard deviation
- First quartile
- Median
- Third quartile
For $n$ people working at a company the true values of wages is a list of $n$ elements that has exactly this summary statistics. However, the number of possible lists is obviously finite!
Let's take this finite number of lists and order their elements. It is now possible to calculate the average of each $k$-th ($1 \le k \le n$) element. I think the list of average elements would be a very reasonable reconstruction of possible wages (I called it an "average" discrete distribution in the title).
How should I approach this problem? Could you suggest some references? Also, maybe there are more ways to reconstruct the sensible values easily?
Edit: after more than a year, I'm still thinking about this problem.
I see two disadvantages with the proposed approach: First, I don't know how to calculate a representative distribution without first enumerating all possible distributions which fit the summary statistics, and, though finite under an integer assumption, there are likely prohibitively many distributions that will satisfy the statistics. Second, even if every individual distribution satisfies the summary stat criteria, there is no guarantee that the "average" of these distributions as you describe it will still satisfy the mean and standard deviation criteria.
However you go about constructing a distribution, the fact is that there are only five numbers given, and so any full distribution will have to fill in the huge information gap with some assumptions. The classical statistical way to do this is through fitting a parametric distribution to the data on hand, where the shape of the distribution fills in the gaps nicely. I think that approach would work well here, though maybe not as exciting and nonparametric as your idea. Given that this is an income distribution, I'd suggest looking at power law distributions or some other right-skewed, heavy-tailed distribution.
Depending on your application, there is another way of looking at the problem, called distributionally robust optimization (DRO). DRO is based on other optimization theory such as linear and robust optimization, and identifies decisions which work well no matter which distribution is the correct one, given the summary statistic information.