A two part question:
The output from a biomechanics program provides you with the $x$-$y$ coordinates for any joint within each image of a video.
However this is relative to the image size and also relative to the distance of the animal from the camera. Not a problem if all the video you have is the same size and distance to animal is standard, but it makes data analysis difficult if you don’t have either.
Is there a way of normalizing the $x$-$y$ coordinates to deal with the difference in the image size (some are $640 \times 360$, some $1280 \times 720$ and others $1920\times1080$)? I had initially thought:
\begin{align} \text{xNorm} &= \frac{x}{\text{width of image}} & \text{yNorm} &= \frac{y}{\text{width of image}} \end{align}
But I wasn’t sure if that properly dealt with maintaining the aspect ratio of the image?
On the distance of the animal from the camera, once the image size is normalized and the $x$-$y$ coordinates reflect a normalized position, is there a way of engineering each $(x,y)$ position further to reflect the animals distance from the camera without knowing the focal length? Obviously I don't want to create a new $(x,y)$ that destroys the actual physical size of the animal (e.g small/large), rather just reflects the distance from the camera. I was thinking of some type of ratio relative to the distance to $(0,0)$ but my domain knowledge of image geometry has let me down.