Define:
$y= \theta + \varepsilon + a,$
where $a$ is a choice variable in a behavioral economic model, with equilibrium solution $a^e$, and $\theta$ and $\varepsilon$ are independently distributed random variables with distributions:
$\theta \sim \mathcal{N}(\bar{\theta},\sigma_{\theta}^2)$,
and
$\varepsilon \sim \mathcal{N}(0,\sigma_{\varepsilon}^2)$.
A paper I am reading, on page 173 at bottom states:
$E[E[\theta|y]]=\bar{\theta} + \phi E[\theta + \varepsilon + a - a^e - \bar{\theta}]$,
where
$\phi=\sigma_{\theta}^2/(\sigma_{\theta}^2+\sigma_{\varepsilon}^2)$.
It refers to this result as a "well known" signal extraction result.
Googling the latter I found that for $y= a + b$, with $a$ and $b$ i.i.d standard normal (i.e. mean zero) then:
$E[a|y] = \frac{\sigma_{a}^2}{\sigma_{y}^2}y$.
This differs from the case above in that the latter is not standard normal with mean zero and the definition of $y$ includes a constant.
Hence I am having difficulty deriving the result in the paper. Grateful if someone could explain the steps. Thanks!
I'm not surprised you're having difficulty deriving the result; I think it's rather badly explained in the paper. Here's how I would describe the derivation:
In the paper, $a^e$ is introduced not as an equilibrium solution, but as "the public's perception of $a$" (p. 172, above (4)). In (3) and (10), the expectation values $E(\theta|x)$ and $E(\theta|y)$, respectively, refer to calculations performed by the public, whereas the expectation operator $\mathsf E$ (you didn't reproduce the typographical distinction made in the paper) refers to calculations performed by the bureaucrat (who can use the correct value $a$). The public decides about the career prospects of the bureaucrat in accordance with its estimation of the bureaucrat's talents, and it can only do so based on its perception $a^e$ of the bureaucrat's effort $a$ and on the policy outcome $y$. The paper does not specify how the public models the difference between its perception $a^e$ and the actual value $a$, but it seems from (4) that the public simply calculates as if the actual effort where known to be $a^e$. If we transfer that to the present setting including the unobservable noise $\varepsilon$, the public will use the model
$$y=\theta^e+\varepsilon+a^e\;,$$
where $\theta^e$ is the public's idea of the talent $\theta$, which it models according to the given normal distribution. Since $y=\theta+\varepsilon+a$, we have $\theta^e=\theta+a-a^e$. Now the public will apply your result, which I'll rewrite for $c$, $d$ and $x$ since the version with $a$, $b$ and $y$ is rather confusing (with $a$ referring to $\theta$ rather than $a$ in the present case): For $x=c+d$, with $c$ and $d$ i.i.d. normally distributed random variables with zero mean, $E[c|x]=(\sigma_c^2/\sigma_x^2)x$. Rewritten in terms of zero-mean variables, the public's model is
$$y-\bar\theta-a^e=(\theta^e-\bar\theta)+\varepsilon\;,$$
so we can apply the general result with $x=y-\bar\theta-a^e$, $c=\theta^e-\bar\theta$ and $d=\varepsilon$. Thus, the public's calculation of the expected talent of the bureaucrat given its perception $a^e$ of the effort and the observed policy outcome $y$ is
$$ \begin{eqnarray} E[\theta^e|y] &=&\bar\theta+E[c|x] \\ &=& \bar\theta+(\sigma_c^2/\sigma_x^2)x \\ &=& \bar\theta+(\sigma_\theta^2/\sigma_y^2)(y-\bar\theta-a^e) \\ &=& \bar\theta+(\sigma_\theta^2/(\sigma_\theta^2+\sigma_\varepsilon^2))(\theta^e-\bar\theta+\varepsilon) \\ &=& \bar\theta+(\sigma_\theta^2/(\sigma_\theta^2+\sigma_\varepsilon^2))(\theta+a-a^e-\bar\theta+\varepsilon)\;, \end{eqnarray} $$
in agreement with the result in the paper.