I have a large empirical dataset which may be modelled via the following 3D surface formula:
A*[X]+B*[Y]+C*[Z]+D*[X]*[Z] = 1
Where X & Z are the independent variables (inputs), and Y is the dependent variable (output). [X], [Y] and [Z] represent 1 dimensional arrays of time-dependent historic data of approximately 20,000 entities. A,B,C,D are coefficients.
I have successfully fitted a surface by selecting 4 representative points and solving the system of equations to evaluate the coefficients A,B,C,D.
Point 1: A*X1+B*Y1+C*Z1+D*X1*Z1 = 1
Point 2: A*X2+B*Y2+C*Z2+D*X2*Z2 = 1
Point 3: A*X3+B*Y3+C*Z3+D*X3*Z3 = 1
Point 4: A*X4+B*Y4+C*Z4+D*X4*Z4 = 1
Solve simultaneously to evaluate A,B,C,D.
However this solution is only fitting according to a small subset of the very large dataset (4 / 20,000 entities).
My question is, what would be a smarter and more effective method to evaluating the coefficients A,B,C & D to generate the surface that matches the historic data the best? I want to minimize the sum of least squares when calculating parameter Y.
The purpose is to generate an empirical model from the dataset that accurately predicts parameter Y as a function of X and Z.
Thank you in advance for any feedback or suggestions on how to move forward.
$$AX+BY+CZ+DXZ = 1$$ $$Y=-\frac{D}{B}XZ-\frac{A}{B}X-\frac{C}{B}Z-\frac{1}{B}$$ $$Y=\alpha\:XZ+\beta\:X+\gamma\:Z+\delta$$ Linear regression for $\alpha$ , $\beta$ , $\gamma$ and $\delta$
Result : $$A=\frac{\beta}{\delta}\quad;\quad B=-\frac{1}{\delta}\quad;\quad C=\frac{\gamma}{\delta}\quad;\quad D=\frac{\alpha}{\delta}$$