What does Quinlan mean by "the confidence limits for the binomial distribution"?

Question

What does Quinlan mean by "the confidence limits for the binomial distribution"?

57 Views Asked by Bumbble Comm At 11 May 2026 - 9:29

My classmates and I are trying to figure out what J. Ross Quinlan means on page 41 of C4.5: Programs for Machine Learning. He says:

The probability of error cannot be determined easily, but has itself a (posterior) probability distribution that is usually summarized by a pair of confidence limits. For a given confidence level CF, the upper limit on this probability can be found from the confidence limits for the binomial distribution; this upper limit is here written $U_{CF}(E, N)$.

Quinlan gives a few examples:

$U_{25\%}(0, 6) = 0.206$

My class textbook, Data Mining Concepts, Models, Methods, and Algorithms by Mehmed Kantardzic gives slightly more detail, but not enough:

C4.5 follows the postpruning approach, but it uses a specific technique to estimate the predicted error rate. This method is called pessimistic pruning. For every node in a tree, the estimation of the upper confidence limit $U_{cf}$ is computed using the statistical tables for binomial distribution (given in most textbooks on statistics). Parameter $U_{cf}$ is a function of $|T_i|$ and $E$ for a given node. C4.5 uses the default confidence level of 25% and compares $U_{25\%}(|T_i|/E)$ for a given node $T_i$ with a weighted confidence of its leaves.

Kantardzic provides a few more examples of this function:

$U_{25\%}(6,0) = 0.206, U_{25\%}(9,0) = 0.143, U_{25\%}(1,0) = 0.750$

At least one other person on the Internet has the same question.

I have been unable to find these values in the binomial probability distribution ${n \choose r} p^r q^{n-r}$.

What does this syntax mean, and where do I compute this function (ideally in R or Julia)?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Finally answered my own question. Apparently this is the Clopper-Pearson Confidence Interval. You the Binomial Confidence Interval calculator at https://statpages.info/confint.html. First, enter $25$ for % Area in Upper Tail and $0$ for % Area in Lower Tail.

R

You can compute this in R using the GenBinomApps package.

> install.packages('GenBinomApps')
Warning: package 'GenBinomApps' is in use and will not be installed
> library(GenBinomApps)
> clopper.pearson.ci(0, 6, alpha = .25, CI = "upper")
 Confidence.Interval Lower.limit Upper.limit alpha
               upper           0   0.2062995  0.25
> clopper.pearson.ci(0, 9, alpha = .25, CI = "upper")
 Confidence.Interval Lower.limit Upper.limit alpha
               upper           0    0.142756  0.25
> clopper.pearson.ci(0, 1, alpha = .25, CI = "upper")
 Confidence.Interval Lower.limit Upper.limit alpha
               upper           0        0.75  0.25
> clopper.pearson.ci(1, 16, alpha = .25, CI = "upper")
 Confidence.Interval Lower.limit Upper.limit alpha
               upper           0   0.1596107  0.25

Julia

Julia's HypothesisTests.jl package will not accept such a low level. I have not found a way to compute $U_{25\%}(n, x)$ in Julia yet, but I have not tried very hard either.

julia> using StatsKit

julia> confint(BinomialTest(0, 6); level = .25, tail = :right)
ERROR: ArgumentError: coverage level 0.25 not in range (0.5, 1)
Stacktrace:
 [1] check_level
   @ C:\Users\wjhol\.julia\packages\HypothesisTests\V7PST\src\HypothesisTests.jl:96 [inlined]
 [2] confint(x::BinomialTest; level::Float64, tail::Symbol, method::Symbol)
   @ HypothesisTests C:\Users\wjhol\.julia\packages\HypothesisTests\V7PST\src\binomial.jl:104
 [3] top-level scope
   @ REPL[2]:1

Python

This discussion on Stack Overflow discusses some ways to get this CI in Python.

What does Quinlan mean by "the confidence limits for the binomial distribution"?

There are 1 best solutions below

R

Julia

Python

Related Questions in BINOMIAL-DISTRIBUTION

Related Questions in DATA-MINING

Trending Questions

Popular # Hahtags

Popular Questions