The Brier score is a measure of the accuracy of a set of probabilistic predictions, where each prediction is considered against an output function that takes the value either 0 or 1. There are standard versions for two and for more than two possible outcomes.
What is a simple generalisation that takes account of how three or more outcomes are relatively spaced?
For example, say the possible outcomes are "It rains a lot", "It rains a little", and "It does not rain", and say the actual outcome is that it rains a lot. Now the standard multi-outcome version of the Brier score is worsened by the same amount whether the estimated probabilities are (0.1, 0.7, 0.2) (call this A) or (0.1, 0.2, 0.7) (call this B), because it only looks at the prior estimate for the outcome that actually occurs.
What I am seeking is a generalised score that considers the outcome "it rains a lot" to be closer to "it rains a little" than it is to "it does not rain". This score would be worsened less in case A than in case B. Whereas in both cases the estimator judged the prior probability of the actual outcome ("it rains a lot") to be 0.1, in A he assigned a relatively greater probability than in B to "it rains a little". We consider "it rains a little" to be "closer" than "it does not rain" to the actual outcome, and therefore we consider A to be a "better" assignment of prior possibilities than B.
The Brier score can be viewed as a squared distance in $n$-dimensional Euclidean space, where $n$ is the number of possible outcomes. Each outcome $j$ is associated with one of the canonical basis vectors $\vec e_j$, and the prediction is associated with the affine combination $\sum_jp_j\vec e_j$ of the basis vectors, where $p_j$ is the probability predicted for outcome $j$. The Brier score is the squared distance of this prediction vector from the basis vector associated with the actual outcome $k$:
$$ B=\left(\vec e_k-\sum_jp_j\vec e_j\right)^2=\sum_j\left(\delta_{jk}-p_j\right)^2\;. $$
You can take into account similarities between the outcomes by using a different set of vectors instead of the canonical basis vectors. For instance, in your example, you could have $\vec v_1=\vec e_1$ for “It rains a lot” and $\vec v_3=\vec e_3$ for “It does not rain”, but instead of $\vec e_2$ use $\vec v_2=\frac13\left(\vec e_1+\vec e_2+\vec e_3\right)$ for “It rains a little”; the score would then be
$$ \left(\vec v_k-\sum_jp_j\vec v_j\right)^2\;. $$
In the extreme case where you use $\vec v_2=\frac12\left(\vec e_1+\vec e_3\right)$, you would no longer be regarding “It rains a little” as a separate outcome in its own right, but as a sort of mixture of the other two, so that a $50/50$ prediction for “It rains a lot” and “It does not rain” would get a perfect score if the actual outcome is “It rains a little”. By choosing the vectors, you can interpolate between this case and the standard Brier score.
This approach fulfils a criterion that one might want to require of such a score, a kind of independence of clones criterion: Two results that are in fact indistinguishable can be assigned the same vector, and the score is then the score that would have been assigned if they hadn’t been distinguished in the first place; and this case can be reached continuously through cases where the two results are very similar but still distinguishable.