Suppose we have a family of functions that governs some process of interest, $$F:=f_n(x)$$
Suppose further we have collected some data for some process, $$D:=(x_1,y_1), (x_2,y_2), ...$$ then we have $$MSE(f_n,D) = \sum (f_n(x_k) - y_2)^2$$
We are now in a position that we can solve for the best fit $n$, using differentiation, SGD, or whatever.
However, what happens if we also collected the derivatives of $D$, $$D':=(x'_1,y'_1),(x'_2,y'_2),...$$ like we collected data on the movement of a baseball or something, which includes position measurements as well as velocity measurements. How do we formulate the MSE now to find the best fit? I would think something like $$MSE(f_n,D) = \sum (f_n(x_k) - y_2)^2 + \alpha\sum (f'_n(x'_k) - y'_2)^2 $$ But what should alpha be? Is this even correct? Should the velocity have a higher order exponent? Would it be better to use the velocity to get additional approximate values for the position and just leave the velocity data out of the MSE?