Monotonicity of a Metric Evaluating Code Large Language Models

20 Views Asked by Bumbble Comm At 06 Apr 2026 - 7:30

Background

The famous Evaluating Large Language Models Trained on Code from OpenAI proposed the code LLM; GitHub uses it as the code assistant called GitHub copilot. It also proposed a metric that evaluates the performance of new code LLMs:

$$ \mathrm{Pass}@k=\mathbb{E} _ \mathcal{D}\left[ 1 - \frac{\binom{n-c}{k} }{ \binom{n}{k}} \right] $$

where $n \geq c > 0$ and $k \geq 0$.

The $\mathrm{Pass}@k$ evaluates the probability of selecting the $c$ functionally correct responses from a total of $n$ responses should there be an oracle with $k$ chances.

Question

I wonder how this metric behaves with different choices of $n$, $c$, and $k$. For example, is this metric monotonic to $n$, $c$, or $k$? If not, how to attain the best value with different choices of $n$, $c$, and $k$?

I know I could easily take the derivative but the expressions I got do not seem to be positive or negative.

So could anyone help me with the problem? Thank you in advance.

from sympy import init_printing
from sympy import symbols
from sympy import derive_by_array
from sympy.functions import binomial

init_printing()
n, c, k = symbols("n, c, k", positive=True)

dn = derive_by_array(1 - binomial(n - c, k) / binomial(n, k), n)
dc = derive_by_array(1 - binomial(n - c, k) / binomial(n, k), c)
dk = derive_by_array(1 - binomial(n - c, k) / binomial(n, k), k)

Original Q&A

Monotonicity of a Metric Evaluating Code Large Language Models

Background

Question

Related Questions in COMBINATORICS

Related Questions in DERIVATIVES

Trending Questions

Popular # Hahtags

Popular Questions