I'm a student that recently started taking a course on cognitive robotics. The book I use is Probabilistic Robotics by Thrun Burgard and Fox.
In the EKF algorithm, we linearized the action model in the following way
$$g(u_t,\:x_{t-1})\:=\:g(u_t,\:\mu _{t-1})+G_t\cdot (x_{t-1}-\mu \:_{t-1})$$
$g(u_t,\:x_{t-1})$ is the action model and $G_t$ is its Jacobian matrix with respect to the state $x_{t-1}$.
I don't see how this guarantees linearity because $g$ could be nonlinear in $u_t$. The authors don't mention anything about why this is the case.
In other words, I imagined that the multivariate taylor expansion for this where we get a linear function in both $u_t$ and $x_{t-1}$ and use the Kalman filter but with A and B (the linear coefficients) set as the corresponding Jacobians.
In an Extended Kalman Filter, we always calculate the different jacobian matrices by differentiating wrt the state (the unknown variables). This is because the EKF assumes that the state is an stochastic Gaussian variable (mean and covariance matrix). If we updated our model using a non-linear function wrt the state, the new estimate would not be Gaussian. However, it is totally ok that the function linearized wrt the state is non-linear wrt the action.
I typically use the notation similar to the EKF wikipedia article. In the prediction step, we predict the next state with the system model: $$ \hat{x}_k = f(x_{k-1}, u_{k-1}) $$ Please note that we don't use the linear approximation to update the state.
Then we linearize $f$: $F=\partial f/\partial x$ and use it to predict the uncertainty of the state: $$ \hat{P}_k=F_k P_{k-1} F_k^T + Q_k $$ What we are doing here is updating the covariance of the state using a linear equation. That's why we need to linearize wrt the state.
I hope this helps.