I have data in the form $(t_i,x_i,y_i)$, i.e. position in 2D as a function of time.
I have non-linear equations which I want to fit to the data. They give me a position $(X,Y)$ as a function of time and some other parameters. Let's call these $X(t;\theta)$ and $Y(t;\theta)$, where $\theta$ is a stand-in for a number of parameters.
In a usual least squares fit of data with one dependent variable, say $(t_i,y_i)$, you minimise the sum of the squares of the residuals over your parameters $\theta$, where the residuals are given by $r_i=y_i-Y(t_i;\theta)$.
My question is about the generalisation to two dependent variables. My intuition is that it is the same idea: minimise the sum of the squares of the residuals, but with the residuals given by $r_i=\sqrt{[x_i-X(t_i;\theta)]^2+[y_i-Y(t_i;\theta)]^2}$. However I cannot find a reference to confirm this.
1) Is this the correct method or not?
2) Also, and I am not sure if the following question is well-posed, do I have to worry about any covariance between $x$ and $y$? Or maybe equivalently, I have $\partial X/\partial Y \ne 0$, so is this something that must be taken into account somehow?
If there are parameters common to $X$ and $Y$, you are correct and your idea of minimizing $$\Phi(\theta)= \sum_{i=1}^{i=n}\Big([x_i-X(t_i;\theta)]^2+[y_i-Y(t_i;\theta)]^2\Big)$$ is correct to me.
But, you must take care if the $x_i$ and $y_i$ are not of the same order of magnitude. If this is the case, may be it would be a better idea to minimize the sum of squares of the relative errors $$\Phi(\theta)= \sum_{i=1}^{i=n}\Big(\frac{x_i-X(t_i;\theta)}{x_i}\Big)^2+\Big(\frac{y_i-Y(t_i;\theta)}{y_i}\Big)^2$$ or to normalize the data.
With regard to your second question, I am unable to answer since I am not a statistician and only have experience in the area of data fitting.