Non-integer $n$ in sample size problem

389 Views Asked by Bumbble Comm At 11 May 2026 - 6:12

Setup

Consider a sample size determination problem with the maximization of expected utility approach (as in Lindley 1997).

Let $\theta$ be the state, $x=(x_1,\dots,x_n)$ a sequence of $n$ iid samples, $n$ and $d$ are respectively the sample size and a terminal decision chosen by the decision maker.
Suppose the decision maker has prior $\pi$ about the state and let $p(x|\theta,n)$ be the likelihood.
Lastly assume the objective (utility) function takes the form $u(d,\theta)-cn$, where the cost of sampling is linear in $n$, with each additional observation costing $c$ utils, and $u(d,\theta)=-(d-\theta)^2$.

Then, the objective is $$ \max_n \left[\sum_x \max_d \left(\int u(d,\theta)p(x|\theta,n)\pi(\theta)\mathrm{d}\theta\right)-cn\right]. $$

Questions

Does the sample size $n$ have to be restricted to integer values? In particular, if for example, the prior is beta and the likelihood is binomial, could $n$ take non-integer values? What if both the prior and likelihood are normal?

If $n$ has to be an integer, what's the problem if one first finds $n$ using differentiation (i.e. treating it as a continuous quantity) and then take the integer part of the solution?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 08 Jun 2015 - 12:09 BEST ANSWER

In statistical computations that find the sample size or degrees of freedom, or that estimate integer parameters, do not have to result in integer values.

For sample sizes, it is customary to round up to the next higher integer.

For degrees of freedom in using printed t and F tables, it is customary to round down to the next lower integer.

(Both conventions are intended to be in the direction of conservatism or 'playing it safe'; usually it doesn't make a practical difference.)

In your Bayesian example, the binomial likelihood would have an integer value, but there is no reason that the parameters of the beta prior or posterior should be integers.

Even if a parameter of a distribution is usually defined as an integer, many statistical software packages permit use of noninteger inputs. Some even give noninteger outputs.

Using R software, here is an example of inputting df = 8.35 to find a critical value of t--along with values for adjacent integer df. Such a value of df might arise, for example, in doing a Welch (separate-variances) t test.

 qt(.975, 8.35)
 ## 2.289287
 qt(.975, 8)
 ## 2.306004
 qt(.975, 9)
 ## 2.262157

Non-integer $n$ in sample size problem

Setup

Questions

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in BAYESIAN

Related Questions in SAMPLING

Related Questions in DECISION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions