I'm trying to understand the method of characteristics for a general first-order PDE $$F(\nabla u(x),u(x),x)=0\;\;\;\text{for all }x\in\Omega,\tag1$$ where $F:\mathbb R^d\times\mathbb R\times\overline\Omega$ is sufficiently regular and $\Omega\subseteq\mathbb R^d$ is open.
I've tried to understand the motivation/derivation presented in chapter 3.2 of Evans Partial Differential Equations, but it's hard for me to understand the argumentation (and notation).
Here's what I understand (but I'm unsure whether it is the correct approach to obtain the "characteristic equations"): Assume $F\in C^1(\mathbb R^d\times\mathbb R\times\Omega)$ and $u\in C^2(\Omega)$. Let $$E(x):=(\nabla u(x),u(x),x)\;\;\;\text{for }x\in\Omega$$ and $G:=F\circ E$. Then$^1$ $G\in C^1(\Omega)$ and $$\nabla G(x)=\nabla^2u(x)\nabla_1F(E(x))+\partial_2F(E(x))\nabla u(x)+\nabla_3F(E(x))\tag2$$ for all $x\in\Omega$.
Now let $M:=\{G=0\}$ and $x_0\in M$. Assuming that $G$ is a submersion at $x_0$, the tangent space of $M$ at $x_0$ is fiven by $$T_{x_0}\:M:=\mathcal N({\rm D}G(x_0))\tag3.$$ Let $\gamma:I\to M$ be a $C^1$-curve on $M$ through $x_0$ for some neighborhood $I$ of $0\in\mathbb R$.
To ease the following, let \begin{align}z&:=u\circ\gamma;\\p&:=\nabla u\circ\gamma;\\ E_\gamma&:=E\circ\gamma;\\ G_\gamma&:=G\circ\gamma=F\circ E_\gamma=F(p,z,\gamma).\end{align}
Note that \begin{align}z'&=\langle\gamma',p\rangle;\\ p'&=\nabla^2u(\gamma)(\gamma')\tag4\end{align} Since $\gamma(I)\subseteq M$, we've got $G_\gamma=0$ and hence \begin{equation}\begin{split}0=G_\gamma'=\langle\gamma',\nabla G(\gamma)\rangle&=\langle\gamma',\nabla^2u(\gamma)\nabla_1F(E_\gamma)+\partial_2F(E_\gamma)p+\nabla_3F(E_\gamma)\\&=\langle p',\nabla_1F(E_\gamma)\rangle+z'\partial_2F(E_\gamma)+\langle\gamma',\nabla_3F(E_\gamma)\rangle\end{split}\tag5\end{equation} by $(2)$ and $(4)$.
But this is the point where I got stuck. Instead of $(5)$, Evans is deriving $$0=\nabla^2u(\gamma)\nabla_1F(E_\gamma)+\partial_2F(E_\gamma)p+\nabla_3F(E_\gamma)\tag{5'},$$ which can "formally" be obtained by inserting $x=\gamma$ into $(2)$ (hence ignoring that $\gamma$ is a function and hence ignoring that the chain rule should be applied).
Then he is assuming that $$\gamma'=\nabla_1F(E_\gamma)\tag6$$, from which we easily obtain the "characteristic equations" which Evans presents: \begin{align}p'&=-\partial_2F(E_\gamma)p-\nabla_3F(E_\gamma);\\ z'&=\langle p,\nabla_1F(E_\gamma)\rangle;\\\gamma'&=\nabla_1F(E_\gamma).\tag7\end{align}
However, while it's clearly finite to assume $(6)$ and look what we can derive from this, I have no idea how $(5)$ is justified or why it is at least sensible to assume that it holds. And, most importantly, aren't we able to derive a sensible set of "characteristic equations" from $(5)$, which - in contrast to $(5')$ - is an equation which can rigorously be derived (as I have shown) as long as we assume the stated regularity assumptions.
$^1$ $\nabla_iF$ denotes the gradient of the map $F$ in the $i$th argument and I write $\partial_2F$ for the derivative in the scalar second argument of $F$.
The trivial, but crucial, thing I was missing is that the derivation is based on the assumption that $u$ is a solution of $(1)$. But that means that $M=\Omega$ and hence $(2)$ holds for every $x\in\gamma(I)$ from which we immediately obtain that if $\gamma$ satisfies the third equation in $(7)$, the first two equations in $(7)$ must hold as well.