Proof of the following theorem on Exponential Families

526 Views Asked by At

I unfortunately can't find a proof for the following theorem from Statistical Inference by Casella-Berger, Theorem 3.4.2, on exponential families.

It says the following:

If $X$ is a random variable with $pdf$ or $pmf$ of the form defining an exponential family, with $\theta$ and $x$ both vectors, then

$$ E\left[\frac{\partial}{\partial \theta_j} w^T(\theta) T(x)\right] = - \frac{\partial}{\partial \theta_j} \log c(\theta) $$

Could someone possibly link me to a proof or explain the general process? My attempts proved to be unsuccessful...

1

There are 1 best solutions below

0
On BEST ANSWER

A complete derivation for future reference since I couldn't find other examples of this form with https://en.wikipedia.org/wiki/Exponential_family#Properties being the closest

Start with: $$f(x|\theta) = h(x)c(\theta)e^{w^T(\theta)T(x)}$$ $$\int f(x|\theta) = \int h(x)c(\theta)e^{w^T(\theta)T(x)} dx = 1 $$ $$\frac {d}{d\theta}\int f(x|\theta) = \frac {d}{d\theta}\int h(x)c(\theta)e^{w^T(\theta)T(x)} dx = 0$$ $$=\int h(x)c'(\theta)e^{w^T(\theta)T(x)} dx +\int h(x)c(\theta)e^{w^T(\theta)T(x)}\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr]dx = 0$$

We use the fact that: $E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] = \int \biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr]*h(x)c(\theta)e^{w^T(\theta)T(x)}dx$ $$=\int h(x)c'(\theta)e^{w^T(\theta)T(x)} dx + E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] =0 $$

Using log derivative by chain rule: $\frac {\partial}{\partial\theta_j}log(c(\theta)) = \frac {c'(\theta)}{c(\theta)}$

$$=\int h(x)\biggr[\frac {\partial}{\partial\theta_j}log(c(\theta))\biggr]c(\theta)e^{w^T(\theta)T(x)} dx + E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] =0$$

$$=\frac {\partial}{\partial\theta_j}log(c(\theta))\int h(x)c(\theta)e^{w^T(\theta)T(x)} dx + E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] =0$$

Since we now have the full integral of a pdf:

$$=\frac {\partial}{\partial\theta_j}log(c(\theta)) + E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] = 0$$

$$-\frac {\partial}{\partial\theta_j}log(c(\theta)) = E\biggr[\frac {\partial} {\partial \theta_j}w^T(\theta)T(x)\biggr] $$