I working on a project where doing some image processing detect objects using Kinect camera and then move it to a desired location with a help of robotic arm. In this project the sensor gives pixel coordinates (X, Y, Z ) but since I am not sure where is the origin of the camera. I am defining my own reference frame with four points on the image captured. I believe by doing so will help me move the object to desired location with ease using a manipulator.

If I have another point Q(x, y, z) in pixel coordinates. How do I find the point Q(x, y, and z) with respect to the coordinate system that I have defined using the four points? I know it is related to vectors. I did some readings and came across a lot of articles on translations, rotation and scaling. I am not sure how to approach the problem. Any help will be appreciated.
this is my idea:
step $1$: get the matrix of your frame of four points: $$M=[\frac{\vec{P_0P_1}}{|\vec{P_0P_1}|}\quad\frac{\vec{P_0P_2}}{|\vec{P_0P_2}|}\quad\frac{\vec{P_0P_3}}{|\vec{P_0P_3}|}]$$ here we normalise the three vectors to make sure that $\det M=1$. In fact $M$ is the transformation matrix of your frame in the inertial coordinate frame.
step $2$: get the coordinate of the point $Q$ with respect of your frame (we note it $Q$): denote that the coordinate of the point $Q$ with respect of the inertial coordinate frame is $Q_0$, then we have this equation: $$Q_0=MQ+P_0$$ where $Q_0$ is what you have in your project. then $$Q=M^{-1}(Q_0-P_0)$$ this is what you want.
you can consult this book for more theoretical details: "A Mathematical Introduction to Robotic Manipulation" by Richard M. Murray, Zexiang Li and S. Shankar Sastry.