I am currently studying exponential distribution. Where the derivative of log normalizer shows as below. I don't really understand how we get from step 2 to step 3? also why in step 4 the expectation is respect to $p_\eta$?
$$ a(\eta) = \log{\int h(x)\exp\{\eta^Tt(x)\}dx} $$
$$ \frac{da(\eta)}{d\eta} = \frac{\int h(x)\exp\{\eta^Tt(x)\}t(x)dx}{\int h(x)\exp\{\eta^Tt(x)\}dx} $$
$$ \frac{da(\eta)}{d\eta} = \int h(x)\exp\{\eta^Tt(x)-a(\eta)\}t(x)dx \\ $$ $$ \frac{da(\eta)}{d\eta} = E_{p_\eta}[t(X)] $$
In $\dfrac{\int h(x)\exp\{\eta^Tt(x)\}t(x)\,dx}{\int h(x)\exp\{\eta^Tt(x)\}\,dx}$ the denominator is $\int h(x)\exp\{\eta^Tt(x)\}\,dx = \exp\{a(\eta)\}$, which is constant with respect to $x$, so you can say $$\dfrac{\int h(x)\exp\{\eta^Tt(x)\}t(x)\,dx}{\int h(x)\exp\{\eta^Tt(x)\}\,dx} \\=\dfrac{\int h(x)\exp\{\eta^Tt(x)\}t(x)\,dx}{ \exp\{a(\eta)\}} \\={ \exp\{-a(\eta)\}}\int h(x)\exp\{\eta^Tt(x)\}t(x)\,dx \\=\int h(x)\exp\{\eta^Tt(x)-a(\eta)\}t(x)\,dx$$
Meanwhile $h(x)\exp\{\eta^Tt(x)-a(\eta)\}$ is the density of this particular member of this exponential family with parameter $\eta$, written here as $p_\eta(x)$, and so $$\mathbb E_{p_\eta}[t(x)]= \int p_\eta(x)t(x)\,dx= \int h(x)\exp\{\eta^Tt(x)-a(\eta)\}t(x)\,dx$$