Find the covariance between a normally distributed cue and a Bernoulli distributed response (occurring on a threshold)

92 Views Asked by At

Very much appreciate any help!

A cue varies continuously in strength, and the probability of the cue of strength $X$ occurring is described by a normal distribution with PDF: $\phi(x-\mu)$, where $\phi$ is the standard normal PDF.

An observer makes the decision to respond whenever $X$ is stronger than some threshold $t$; that is, the response is made just in case $x>t$, whereas if $x \leqslant t$ no response is made.

Whether the observer responds or not is recorded by the random variable $Y$, which is 1 if they do respond, and 0 if they do not.

Note that $\mathrm{P}(x>t) = 1- \Phi(t-\mu)$, where $\Phi$ is the standard cumulative distribution function of the normal distribution. Therefore, $Y$ is Bernoulli distributed with a PDF: $ (1- \Phi(t-\mu))^{y}\Phi(t-\mu)^{(1-y)}$.

I want to find the covariation between cue strengths and responses $\mathrm{Cov}(X,Y)$.

Some numerical simulation suggests that $\mathrm{Cov}(X,Y) = \phi(t-\mu)$, but I want to know how to derive that result, to show that it is correct, and also to generalise it (to the case of the signal detection model, which is a bit more complicated.)

Edit.

The Mathematica code for the simulation is:

n = 100000;
\[Mu] = -1;
max = 4;
min = -4;
inc = 40;


x = RandomVariate[NormalDistribution[\[Mu], 1], n] ;

ans = Table[{N[t], Covariance[x, Boole[Map[# > t &, x]]]}, {t, min, 
    max, (max - min)/inc}];

p1 = ListLinePlot[ans];
p2 = Plot[PDF[NormalDistribution[\[Mu], 1], t], {t, min, max}, 
   PlotStyle -> Red];

Show[{p2, p1}]
2

There are 2 best solutions below

0
On BEST ANSWER

Thanks to angryavian and Misha Lavrov for their input. The solution is as follows and confirms the simulation.

Note,

$$\mathrm{Cov}(X,Y)=\mathrm{E}[XY]−\mathrm{E}[X]\mathrm{E}[Y]$$ $$\mathrm{E}[X] = \mu$$ $$\mathrm{E}[Y] = 1-\Phi(t-\mu)$$

Letting $I(x)$ be an indicator function of $x$, the joint PDF is:

$$f_{XY}(x,y) = \phi(x-\mu)I_{(t,\infty)}(x)y + \phi(x-\mu)I_{(-\infty,t)}(x)(1-y)$$

By definition:

$$\mathrm{E}[XY] = \int_{-\infty}^{\infty} \sum_{y\in Y} \ x y \ f_{XY}(x,y)\ \mathrm{d}x $$

In particular, this simplifies to:

$$\mathrm{E}[XY] = \int_{t}^{\infty} \ x \ \phi(x-\mu) \ \mathrm{d}x $$ $$= \phi(t-\mu)+\mu(1-\Phi(t-\mu))$$

Substituting back, we find that:

$$\mathrm{Cov}(X,Y) = \phi(t-\mu)$$

Note, that the covariance between responses and cues is the highest when $t=\mu$, so that half the time the observer responds and half the time they do not. The covariance variance falls off following the normal distribution as the threshold takes on a more extreme value in either direction, with responding occurring nearly all the time, or never.

4
On

Note that $\text{Cov}(X,Y) = E[XY]-E[X]E[Y]$.

You have $E[X]=\mu$ and $E[Y] = 1 - \Phi(t-\mu)$. For the remaining term, $$E[XY] = E[XY\mathbf{1}_{X \le t} + XY \mathbf{1}_{X > t}] = E[X\mathbf{1}_{X>t}] = \int_t^\infty x \phi(x) \, dx \overset{(*)}{=} \phi(t),$$ where $\phi=\Phi'$ is the density of the standard normal distribution. The last equality ($*$) can be shown by making the substitution $u=x^2$ in the integral.