Let me begin by explaining the motivation behind this question (purely an academic exercise for fun):
I was looking at a website that "ranks" how difficult certain mountains are to climb: https://www.14ers.com/php14ers/usrpeaksrall.php
They say that the rankings are based on user input. I think they must have some sort of simple statistic ("average rank") that they are calculating.
I'm wondering if this could be improved. Here are my thoughts:
1) Not every user will have climbed every mountain, and so there is incomplete information within each entry.
2) There is user bias, so some rankings will be out of order with the true rankings
For example, say user one climbed mountain $1$,$4$, and $8$. While user 2 climbed mountain $4$,$9$, and $10$
Their respective rankings are:
User1 : $4>1>8$ (meaning mountain $4$ was harder than mountain $1$, etc.)
User2: $9>4>10$
The ordering is partially unambiguous: $9>4$, $4>1>8$, and $4>10$
However, there is no information about the relative orderings of $10$ vs. $1$ and $8$.
Furthermore, if there was a disagreement between users (say one user says $mountain1>mountain5$ but another says $mountain5>mountain1$), then a simple rank-ordering won't work.
I can think of a way to do this non-parametrically. It'd be an optimization problem between the user rank ($\sigma$) and the true ranking ($\pi$). The function would look something like this: $$argmin_{order of \pi}\left(\prod_i{max(1,\sum_j{|\sigma_j(i)-\pi'(i)|})}\right)$$ where $\pi'$ is the $\pi$ ordering for only the variables in $\sigma_j$, and $\pi'(i)$ and $\sigma(i)$ are the rank of the $i$'th mountain.
You could also modify this function to instead take into account the number of rearrangements that would be required in $\pi$ in order to accommodate the ranking of $\sigma_j$.
However, a major shortcoming of this method is that you get no information on the relative confidence of the ranking...
Alternatively you could set this up in a parametric framework using Bernoulli trials. My thought would be to make each mountain have an underlying "difficulty" ($\theta$) that would weight the probability of actually ranking it harder than another mountain. In this scenario, each user would have the exact same tendency for bias. For example, for 3 users and 3 mountains the input would be:
$User_1: 2>1$
$User_2: 1>3$
$User_3: 3>2$
The underlying probability of observing the rankings would be something like: $$p\left(\Theta=\{\theta_1,\theta_2,\theta_2\}\vert user\_rankings\right) = bernoulli(\theta_2,\theta_1) bernoulli(\theta_1,\theta_3) bernoulli(\theta_3,\theta_2)$$ where $bernoulli(\theta_i,\theta_j)$ would be the p-value for choosing mountain $i$ over mountain $j$. For example, you could weigh the probability statistic by the following method: $$bernoulli(\theta_i,\theta_j) = \frac{\frac{\theta_i}{\theta_j}}{\frac{\theta_i}{\theta_j}+1}$$ (i.e. so $p + (1-p) = 1$ given $p<1$)
This method is nice because it can give a variance around different rankings AS WELL AS provide the possibility of inputting prior information as $p(\Theta)$ on the right-hand side of the probability calculation equation.
You could also get a relative confidence by doing a parametric bootstrap simulation given the fit rankings.
Would either of these methods be appropriate?
Participants were asked to take six criteria into consideration in providing their personal rankings. Two (elevation gain and distance) are clearly objective, one (trail) is unclear, and three (climbing difficulty, terrain stability, and exposure) are clearly subjective. [Is 'trail' intended to be Yes/No, or is the quality of the trail (if present) an issue?]
Moreover, it is not clear whether respondents rank each criterion separately (so that the society can do some sort of weighting) or somehow to give one ranking that takes all six criteria into account. [I tried to take a look at the instructions, but they seem to be available only to society members, and I have not climbed any mountains lately.]
Therefore it seems best to take the rankings as largely reflecting personal subjective opinions. If this is true, I don't think it makes sense to assume that there is a 'true ranking'. Absent a convincing argument that true ranks exist, I cannot see how a scheme that assumes their existence will be generally persuasive.
My conclusion is that 'average rank' is a reasonably good method of summarizing objective opinion. In your fragmentary example, we have:
Lowest rankings correspond to most difficult climbs, so we would rank the mountains as 9, 4, 1, and 8/10 tie--in decreasing level of difficulty. It seems that this method would work well for large numbers of respondents and that you would need to make a strong argument for the superiority of another method over this one.
In related matters: (1) Some newspapers give summary rankings of currently showing movies based on critic opinion and (separately) based on viewer opinion. These rankings seem to be averages of 'Likert'-style scales. There is considerable controversy whether data from such scales can legitimately be averaged, but this method for movies has been widely used for some years. Maybe you can find some statistical analyses discussing the 'validity' of such rankings.
(2) Worldwide and lately in some US local elections, voters are asked to rank their top three candidates in order of preference. The objective is to avoid runoff elections if no candidate gets a majority of first-ranked votes. [Roughly speaking, if a voter's top choice is clearly out of the running, their second choice is advanced to first in an effort to get a majority.] I have seen some statistical arguments about the desirability of this method. There are also discussions by politicians and political scientists, principally challenging whether voters can make rational rankings not knowing who the strongest candidates really are.
Reading about these widely used and discussed methods of ranking and voting may prompt you to find a method for ranking mountains that you like better than averaging ranks.
If you are really more interested in methods of ranking than in the specific mountain-climbing data, then you might want to google 'Kendall's tau' to find an explanation matches you mathematical background. (Unfortunately, the relevant Wikipedia page seems to need refinement.)