My goal is to obtain a reasonable approximation of the Gini index of a company (UBS). I need to obtain an estimate of the salaries distribution from publicly available data:
- Nuber of employees=60205
- total compensation paid=15.182E9 CHF
- min salary=50000 CHF
- median salary = 100000 CHF
- max salary=11430000 CHF
I know it's very underdeterminated, but what's the best that can be obtained from this ?
You can at least obtain upper and lower bounds.
The data you describe effectively impose constraints on the distribution. If $y_i$ is the income of some individual $i \in \{1,\dots,n\}$, $n=60205$, and $m = 30103$, the median employee, the constraints are:
To obtain lower and upper bounds, you need to find the maximal and minimal value of the Gini index under constraints 1 to 4. You may need to prove it formally, but intuitively I think one can see that the Gini index will be minimized by making the lowest incomes as high as possible under the above constraints. In your case this would give
(this wage schedule satisfies constraint 5 because the median is lower than the mean so $y_i > y_j$ for $i \leq m$ and $m<j<n$.)
Conversely, the Gini index will be maximized when the lowest incomes are as low as can be (given the constraints). In your case this means setting
If I made no computation mistakes, you can use these distribution to compute lower and upper bound for the Gini coefficient of your distribution.
I think anything better than upper and lower bounds would require you to make assumptions on the distribution beyond the data which is available to you.