So this is a question I have asked on Stack Overflow but may be more appropriate here:
Ranking based on unequal numbers of discrete observations.
We have a set of targets which have different fixed numbers of discrete binary observations (0,1)
We want to compare targets to each other for some overall propensity towards a certain kind of observation(0 or 1) in a way which is not biased by total number of observations of the individual target.
The ranking is also a scale between 0 and 1 and should include an error proportionate to the number of observations (more observations the greater the certainty or propensity to one of the binary observational outcomes)
For example:
On an imaginary map we have countries, which can have between 1 and 20 neighbours with discrete boundaries.
We want to rate each country based on the quality of its boundaries with each of its neigbours
So that we can come up with a comparison scheme to rank countries based on the quality of their boundaries.
So for example:
Country A has 8 boundaries (8 neighbours) 4 of which are good boundaries +/- some uncertainty
Country B has 3 boundaries (3 neighbours) 2 of which are good boundaries +/- some uncertainty
So Country B rank > Country A rank ?
The problem requires comparison on features (boundaries) that vary in number for each country.
Intuitively: A comparison scheme suggests that there is some function which is proportionate to the number of good boundaries and inversely proportionate to the total number of boundaries that each country has.
However, unmoderated, this simple proportionality would be a decaying saw tooth pattern with decreasing frequency with number of boundaries, which impedes simple ranking between countries with disparate number of boundaries.
What would be good approach to rating the countries based on discrete boundaries in such a way that they can be practically compared?
My intuition is this problem, in some guise, is a known problem in some odd branch of discrete maths maybe graph theory? however I have been unable to find an algorithm or mathematical approach to rank countries based on discrete features.
Any thoughts appreciated