In a system with chunks of arbitrary number (5-200) of questions and quantifiable answers, I'm calculating multiple bayesian average values. One for each one of these chunks of questions. Due to the fuzzy nature of these questions, the result is quite satisfying compared to the mean value.
However, when the number of answered questions (i.e. votes) approaches zero, the resulting value approaches the mean of the whole report. Obviously. Thus, if I answer zero questions I get the mean value. I want to avoid that!
Q: How can I modify the parameters (or the formula itself) to get a more fair result when too few questions have been answered? Currently, I have tried setting m to different values between 0..v but the impact is negligible. (Maybe there are alternative formulas that suit my needs better?)
Bayesian average = (v / (v + m)) * R + (m / (v + m)) * C, where
R = average for the answers in this category (mean)
v = number of answered points for this category
m = minimum number of answered points required
C = the mean answer across the whole category.
C is usually around 50-66%.
To summarize, this is what I want to accomplish:
- The resulting value should be weighted in such way that the number of answered questions has a significant impact.
- If too few questions are answered, the value should be low.
- If a large amount of the questions have a relatively "high" answer the value should be very high.
Do I need to introduce threshold values? If yes, how do I calculate decent thresholds?
Wait, why do you want to avoid getting the mean value if zero questions are answered?
That corresponds to the roughest possible estimate, which is what you should get if no information is given (zero questions are answered).
Think of conditional expectation: conditioning on zero questions answered is roughly equivalent to conditioning on the trivial sigma field (i.e. no information), hence should produce the crudest estimate possible: the mean/expectation.
I know this doesn't answer your question; I am just confused about what your reasoning is (and the formatting in comments is poor).