A process $(X_{t})_{t \in T}$ is $(\mathcal{F}_{t})_{t \in T}$-adpated if for every $t$, $X_{t}$ is $\mathcal{F}_{t}$-measurable.
But since the variable $X_{t}$ is interpreted as the state of process at time $t$ and $\mathcal{F}_{t}$ is interpreted as the information know at time $t$, why we don't require $\mathcal{F}_{s} \subset \sigma(X_{t})$ for $s < t$ ?
When you start learning martingale theory, the filtrations that you see are almost always just the natural filtration $\mathcal{F}_t = \sigma(X_s : s \leq t)$, and in this case it's clear that $X$ is adapted to $\mathcal{F}$.
Indeed, most filtrations are defined as a natural filtration for some process. However, we might then construct new processes from our original one without wanting to change the underlying filtration. A classic example would be the maximum process $$ M_t = \sup_{s\leq t} X_s, $$ which is adapted to $\mathcal{F}$, but in which $\mathcal{F}_t$ contains strictly more information that $\sigma(M_s : s \leq t)$.
Also, it is useful to make statements comparing processes like "the maximum of two submartingales is a submartingale." In order to even make sense of this, the two processes must be submartingales with respect to the same filtration even though they will most likely have different natural filtrations.