Randomised Response Technique

Question

Randomised Response Technique

516 Views Asked by Bumbble Comm At 17 Apr 2026 - 6:44

Context : To get answers to sensitive questions, we sometimes use a method called the randomized response technique. Suppose, for instance, that we want to determine what percentage of the students at a large university take ketamine. We construct 20 flash cards, write ‘I take ketamine at least once a week’ on 12 of the cards (where 12 is an arbitrary choice) and ‘I do not take ketamine at least once a week’ on the others. Then we let each student (in the sample interviews) select one of the 20 cards at random, and response yes or no without divulging the question.

Establish a relationship between P(Y), the probability that a student will give a yes response, and P(K), the probability that a student randomly selected at the university takes ketamine at least once a week.

I received the following question as an undergraduate Statistics student, and I am confused about the whole idea of the "randomised response technique". The process describes a student choosing the cards and they say yes or no, but how can we determine the percentage of students that may actually take ketamine? If I say that the probability a student gives a yes response is 0.5, this would imply that half the population in the university takes ketamine, although this would be incorrect. Could someone please explain how do I derive the probability more accurately?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 06 Mar 2021 - 9:32

A key point to understand here is that this randomized response survey technique does not come without a cost, which is increased variance in the estimate for the parameter of interest.

To see why, let $p$ be the true proportion of students who take ketamine once a week, and suppose there are $a$ cards of type $A$ and $b$ cards of type $B$, where $a \ne b$ are known parameters. We make the assumption that the card type drawn is independent of whether the student takes ketamine once a week. So in a sample of $n$ students, let $X_i = 1$ if the $i^{\rm th}$ student takes ketamine once a week, and $0$ if they do not. Then $X_i \sim \operatorname{Bernoulli}(p)$. Now let $Y_i = 1$ if the answer to the the card they drew is yes, and $0$ if the answer to the card they drew is no. Clearly $Y_i$ must also be Bernoulli, but what is its parameter? Well, there is a third Bernoulli variable involved, call this $C_i$, for the card that they drew, where $C_i = 1$ if they drew card type $A$ and $0$ if card type $B$. So $$C_i \sim \operatorname{Bernoulli}(a/(a+b)),$$ and $$Y_i = C_i X_i + (1-C_i) (1-X_i).$$ Therefore, $$\Pr[Y_i = 1] = \Pr[C_i = X_i = 1] + \Pr[C_i = X_i = 0] = \frac{a}{a+b} p + \frac{b}{a+b} (1-p) = \frac{ap + b(1-p)}{a+b}$$ is the Bernoulli parameter for $Y_i$.

In this sampling scheme, the data we have is only the $Y_i$: we do not have access to $X_i$ or $C_i$. How do we estimate $p$? Because $$\operatorname{E}[\bar Y] = \frac{ap + b(1-p)}{a+b},$$ where $\bar Y$ is the sample mean of the $Y_i$, an "intuitive" point estimate (which is the method of moments estimator) is $$\tilde p = \frac{a+b}{a-b} \bar y - \frac{b}{a-b}.$$ We can also show that this is the same as the maximum likelihood estimate $\hat p$, which is obtained by computing the critical point of the log-likelihood function $$\ell(p \mid a, b, \bar y) \propto \bar y \log \frac{ap + b(1-p)}{a+b} + (1 - \bar y) \log \frac{a(1-p) + bp}{a+b}.$$ Hence, $\hat p$ is unbiased. However, the variance of this estimator is what we need to consider: $$\begin{align} \operatorname{Var}[\hat p \mid a,b] &= \left(\frac{a+b}{a-b}\right)^2 \operatorname{Var}[\bar y] \\ &= \left(\frac{a+b}{a-b}\right)^2 \frac{1}{n} \frac{ap + b(1-p)}{a+b} \frac{a(1-p) + bp}{a+b} \\ &= \frac{p(1-p)}{n} + \color{red}{\frac{ab}{n(a-b)^2}}.\end{align}$$ Since the variance of the sample proportion based on the direct information $X_1, X_2, \ldots, X_n$ is $p(1-p)/n$, we see that the variance of the randomized responses is strictly larger by a term that does not depend on the true parameter, but rather, the extent of difference between $a$ and $b$ and the sample size $n$.

This illustrates several important principles:

When $b = 0$, then $Y_i = X_i$ identically and the sampling is not random, and we recover the usual binomial proportion sampling scheme; similarly, if $a = 0$, then $Y_i = 1 - X_i$ and there is no "additional randomness."
When $a = b$, the randomized response scheme fails because the responses $Y_i$ are completely noninformative of $p$, as seen by the fact that $ap + b(1-p) = a$.
The closer $a$ and $b$ are to each other--more specifically, the closer the ratio $a/(a+b)$ is to $1/2$--the larger the "excess" variance term for a fixed sample size, and this is the cost that is extracted from this survey methodology.

**Bumbble Comm** · Accepted Answer

Let $A$ be the event that the card drawn by the student interviewed (supposed to be picked uniformly at random among all students) says "I take ketamine at least once a week" and $B$ be the event that it says "I do not take ketamine at least once a week"; so that, here, $\Pr[A] = \frac{12}{20} = \frac{3}{5} $ and $\Pr[B] = \frac{2}{5}$.

By an application of Bayes' rule, the probability $\Pr[Y]$ satisfies $$\begin{align} \Pr[Y] &= \Pr[Y\mid A]\cdot \Pr[A] + \Pr[Y\mid B]\cdot \Pr[B] = \Pr[K]\cdot \Pr[A] + (1-\Pr[K])\cdot \Pr[B]\\ &= \frac{3}{5}\Pr[K]+ \frac{2}{5}(1-\Pr[K])\\ &= \frac{1}{5}\Pr[K]+\frac{2}{5} \tag{1} \end{align}$$ assuming, of course, that the students answer truthfully (that is, say "Yes" iff the statement they read on their card is true).

Why would we do that? Well, our goal is to estimate $\Pr[K]$ without knowing for sure the response of any given student (as this would violate their privacy; it's sensitive information).

Here, we have added some randomness, so if a given student answer "Yes", we don't know whether it's because they're taking ketamine and got the first type of card, or because they don't take ketamine but got the second type of card. So we can't know for sure whether a given student takes ketamine: good!

But we can still estimate $\Pr[K]$! How? Because we can estimate $\Pr[Y]$: take sufficiently many students uniformly at random, get their answer, use that to estimate $\Pr[Y]$ (call the value of the estimate $p$). Now, compute $q = 5(p-\frac{2}{5})$: by (1), this $q$ is a suitable estimate for $\Pr[K]$. (Namely, if $|p-\Pr[Y]| \leq \varepsilon$, then $|q-\Pr[K]| \leq 5\varepsilon$.)

(In your example, if the probability that a student gives the answer "Yes" is $1/2$ (note that it must be between $2/5$ and $3/5$ because of (1)); then the probability that a student takes ketamine is $5(1/2-\frac{2}{5})=1/2$ as well. But if the probability of Yes was say 45/100, then the fraction of students taking ketamine would be $5(45/100-\frac{2}{5})=1/4$.)

So we can estimate $\Pr[K]$ to very good accuracy without ever determining for sure if any given student takes ketamine: their privacy is preserved, and we got the statistical info we wanted.

Randomised Response Technique

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions