Suppose we have $m$ graders and $n$ students, and we want to grade a test so that $k$ graders are assigned to grade to each test, and all graders grade the same number of tests. (I realize $m,n,k$ have to satisfy certain properties to make this "perfect assignment" possible, but I'd rather just skip this point and assume it's true). Also, to make things interesting, let's assume the assignment of graders to tests is random (so it's not like a group of graders all have the same set of tests to grade).
Furthermore, let's assume that each grader $i$ has a mean bias $\mu_i$ and a variance $\sigma_i^2$ for that bias associated with their grading, and the bias they apply to each test they grade is sampled independently from a normal distribution with these parameters. And each test $j$ has a "true grade" $c_j$. So then if grader $i$ is assigned to grade test $j$, then the grade they assign will be $c_j + x_{ij}$ where $x_{ij}$ is the sampled bias from the normal distribution with parameters $\mu_i$ and $\sigma_i^2$.
If the $\mu_i$ and $\sigma_i^2$ are unknown, how do we find the maximum likelihood values for the true grade scores $c_j$? If using a prior for graders' parameters is required I guess I'm ok with that. I would also like to know the MLE (or MAP if we go Bayesian) values for the grader parameters $\mu_i$ and $\sigma_i^2$. The idea being that graders with lower estimated variance should be preferred to those with higher variance, if we want as accurate of assigned grades as possible in the future.
I've phrased this in terms of test grading for clairty, but it's actually for an "active learning" problem in machine learning that we are very interested in in our lab, hence insights on this problem could really help.
I'm looking at your first paragraph. Depending on values of m, n, and k, you might make grader assignments according to a balanced incomplete block design or a partially balanced incomplete block design. Without some kind of balance it does not seem possible to disentangle so many unknown means and variances. There is rich literature on such issues of balance. Even if not directly applicable to the situation you really care about, you might get some ideas how to proceed.
A Bayesian approach with a Gibbs Sampler might be useful. But there are so many latent variables, that I wonder whether conclusions might be guided by priors (even noninformative ones) to a greater extent than you would prefer. Again here, some sort of balance may be required.
In either case, it seems that a totally random assignment of tests to graders is not a good design choice.