I'm trying to understand the parametric derivative identity $$ \frac{dy(t)}{dt} = \frac{dy}{dx} \frac{dx}{dt} \tag{1}$$
I feel this is not rigorous because we are say that $y(t)$ can be written as $y(x(t) )$ what conditions are required on $y$ for this to be true?
For me getting a better handle on derivatives only happened after looking into the more general notions related to differentiation, where the meaning of d becomes more clear. Keywords to consider are "total derivative", "Frechet derivative", "differential of a function". I find that there is some overlap between all these and the naming is somewhat fluid, but in general what you get is a definition of the differential as a linear linear approximation of the function around a point, which is really just a generalization of the (one dimensional) derivative on $\mathbb R$.
For a given function $f:X\to Y$ ($X$ and $Y$ can be arbitrary normed vector spaces at their most general but the same will work also for $\mathbb R^n$ and $\mathbb R$) its differential at a point $x\in X$ is defined as the (unique when it exists) linear operator $L$ (think matrix multiplication for finite dimension vector spaces and multiplication with a constant for $\mathbb R$) so that the value of $f$ around $x$ can be approximated as $$ f(x + h) = f(x) + L(h) + \epsilon(h) $$ and the error $\epsilon(h)$ goes to zero faster than the norm of $h$, which translates to a limit condition $$ \lim_{\|h\|_X\to 0} \frac{\|\epsilon(h)\|_Y}{\|h\|_X}=0 $$ which turns into the regular definition of the derivative for functions on $\mathbb R$. When this linear operator $L$ exists, we say that $f$ is differentiable at $x$ and we write its value as $df(x)$. A key observation here is that $df(x)$ in general is not a number, but a function.
The main property that demystifies $d$ (at least for me) is the chain rule for the differential, which is the following: $$ d(f \circ g)(x) = df(g(x)) \circ dg(x) $$
Now for the actual answer to your question, consider first a function $x$ of time. When differentiable, the value of the differential $dx(t)$ will be a linear function of $h$, which for $\mathbb R$ means it's of the form $k \cdot h$. We know the value of $k$ depends on $t$ also, so we'll denote it with $x'(t)$ (recognize here the derivative). So we have $dx(t)(h) = x'(t) \cdot h$.
So we have $dx$ already but we are missing $dt$. We can consider that $dt$ stands for the differential of the identity function on time, so $dt(t)(h) = 1 \cdot h$, which we can replace in the previous relation to get $dx(t)(h) = f'(t) \cdot dt(t)(h)$. From here we can (again with some abuse of the notation) obtain the derivative as $x'(t) = \frac{dx}{dt}(t)$. Using this we can rewrite the differential as: $$ dx(t)(h) = \frac{dx}{dt}(t) \cdot h $$
For the last step, we add $y$ as a function of time. We also know that $y$ only depends on time via $x$, so in fact there exists a different function $f$ of $x$ so that $y = f \circ x$. This function $f$ is usually implicit and in my opinion can cause a lot of confusion. We apply the chain rule for $y$: $$ dy(t) = d(f \circ x)(t) = df(x(t)) \circ dx(t) \Rightarrow \frac{dy}{dt}(t) \cdot h = \frac{df}{dx}(x(t)) \cdot \frac{dx}{dt}(t) \cdot h $$ Next, we ignore the distinction between $f$ and $y$, omit where the derivatives are computed and obtain $$ \frac{dy(t)}{dt} = \frac{dy}{dx} \cdot \frac{dx}{dt} $$ but for me a more appropriate simplification would be $$ \frac{dy}{dt} = \frac{df}{dx} \circ x \cdot \frac{dx}{dt} $$