2 equivalent way of applying activation functions in RNN

135 Views Asked by At

I don't understand why in RNN the 2 following ways of applying the activation functions are equivalent:

First way: $$ h_t = W\sigma(h_{t-1}) + U x_t + b $$ Second way: $$ g_t = \sigma(Wg_{t-1} + U x_t + b) $$

I do understand that $g_t$ and $h_t$ are not the same but I do not understand what is the relationship between the 2 and how they are equivalent.

Could you please explain it to me or give me a resource that does it?