I am studying PDEs, in particular hyperbolic conservation laws. In particular we are using the method of characteristic to solve some of the problems.
Setting
Given the quite general PDE $$ U_t + \partial_x \big( a(x,t)U(x,t)\big) = 0\iff U_t + a\cdot U_x = - a_x\cdot U $$ where we assume $a:\mathbb{R}\times\mathbb{R}_{\geq 0}\to \mathbb{R}$ to be $C^1$. Say that we have initial condition $U(x,0) = F(x)$.
Method
We want to introduce a characteristic, we do so by adding a dependency to a new variable $s$ and computing: $$ \frac{d}{ds}U\big(x(s),t(s)\big) = \frac{dt}{ds}\cdot U_t + \frac{dx}{ds}\cdot U_x $$ By imposing $\frac{dt}{ds}=1$ and $\frac{dx}{ds}=a$ we should be able to rewrite the above equation as $$ \frac{d}{ds}U\big(x(s),t(s)\big) = -(\partial_x a)\ U\big(x(s),t(s)\big) $$
The condition $\frac{dt}{ds}=1\implies t(s) = s + t_0 $ can be reduced to $t = s$ since we want $t(0) = 0$.
Cleaning up things we get two equations
\begin{gather}
\frac{d}{dt}U\big(x(t),t\big) = -(\partial_x a)\ U\big(x(t),t\big) \tag{1}\\
\frac{dx}{dt} = a \tag{2}
\end{gather}
Knowing this, if we are given $(x,t)$ in the domain, we just have to get back along the characteristics to get $x_0$, then use $(1)$ to get the value of our solution.
Question(s)
How do you interpret $(2)$?
Should we interpret $a$ as $a(x,t)$ or as $a(x(t),t)$? I'm not sure if it makes a lot of difference.How do you get back along the characteristics to the initial value?
Assuming we interpret $a$ as $a(x(t),t)$ above, do we just work with $(2)$ as an ODE?
Let the solution to $(2)$ be a family $\varphi(t,c)$ where $c\in\mathbb{R}$ should depend on initial condition of $(2)$ (Note that we don't have initial condition imposed). Then does getting $x_0$ for a given pair $(x,t)$ mean we have to find $c$ such that $\varphi(t,c) = x$, meaning $x_0$ is the number such that $\varphi(t,x_0) = x $?
As mentioned in OP, applying the product rule to $U_t + \left(a U\right)_x = 0$ with $a = a(x,t)$ leads generally to the quasi-linear equation $$ U_t + a U_x = -a_x U . $$ The method of characteristics in parametric form reads $$ \textstyle \frac{\text d}{\text d t} x = a, \qquad \frac{\text d}{\text d t} U = -a_x U , \tag{*}$$ where implicitly $x = x(t)$ and $U = U(x(t), t)$, which answers your first question. In fact, computation of the total time-derivative of $U(x(t), t)$ gives $$ \textstyle \frac{\text d}{\text d t} U = \textstyle \partial_x U\, \frac{\text d}{\text d t} x + \partial_t U = -a_x U . $$ Now, to answer your second question, one observes that $\text{(*)}$ with $x(0) = x_0$ and $U(0) = F(x_0)$ forms an autonomous differential system. Under suitable assumptions, the Picard–Lindelöf theorem provides the solution $x(t)$, $U(t)$ on a given interval --- and these operations can be repeated for various initial positions $x_0$. However, the recovery of $x_0$ from an arbitrary $(x,t)$ is not trivial, in general.
Example. For illustration purposes, let's simplify things a bit by considering the variable-coefficient transport equation where $a(x,t)$ depends on space only, with expression $a(x,t) = -x^2$. Thus, we find $x = \frac{x_0}{1 + x_0 t}$ and $U = F(x_0) \left(1 + x_0 t\right)^2$, i.e. after some manipulations, $$ U(x,t) = F\big(\tfrac{x}{1 - x t}\big)\, \left(1 - x t\right)^{-2} $$ with $xt<1$. We observe that not every $(x,t)$ can be linked to an initial position $x_0 = \frac{x}{1-xt}$.
Remark. For this special case where $a(x,t) = -x^2$ depends on space only, introducing $Q = a U$ leads to $Q_t + a Q_x = 0$. Hence the method of characteristics reads $\frac{\text d x}{\text d t} = a$ and $\frac{\text d Q}{\text d t} = 0$, which might then be tackled in the usual way. In fact, we find $x = \frac{x_0}{1 + x_0 t}$ and $Q = -x_0^2 F(x_0)$, so that finally the same expression for $U$ is obtained. Again, the solution might not be defined for every $(x,t)$ as illustrated also in this post.