I'm looking for a way to distinguish between the various types of missing data techniques?
Can someone help to clarify or organize these categories in sub-sections or indicate similarities or differences for the following:
*Imputation or partial imputation, *Partial deletion ( Listwise / Casewise or Pairwise deletion) *Full analysis (EM algorithm?) *Neural Networks algorighm *Interpolation *Curve fitting *Approximation *Least squares *Newton's Divided Difference *Lagrange polynomial *Piecewise linear interpolation? *Cubic splines *Quadratic interpolation etc
That is, if we have some given experimental data with a missing point, how can it be best approximated or "interpolated"
Say for example that we measure distance (x in m) vs displacement (y in mm) but the value at x4 is accidentally not measured for the following data:
x1=5, y1=1.2, x2=5.5, y2=1.6, x3=6, y3=1.3, x4=6.5m, y4=???, x5=7, y5=1.8
I can handle a few of these.
(A) Curve construction can be done either by interpolation or approximation
(B) In either case, you have a choice about what type of curve you're going to use. The common choices are:
Since every polynomial is a piecewise polynomial, (B1) is really a special case of (B2).
(C) You can choose the degree of the polynomial (or piecewise polynomial) you use. Some common choices are:
(D) You can choose what kind of algorithm you use to construct the curve. Some examples are:
So, if you choose (A2), (B2), (C3), (D1) you get a cubic spline that approximates your data constructed by a least-squares algorithm.
If you choose (A1), (B2), (C1) you get a piecewise linear curve that passes through all the given points.
This is certainly not a perfect classification scheme. For example, the choices in (D) are constrained by the choices you make in (A), (B), (C). But, perfect or not, it might be helpful.
In your particular case, you only have four data points, and there is no obvious trend in the data. So I would say that the missing value could be almost anything between 1.3 and 1.8. In fact, one could even make a case for it being less than 1.3. I'd suggest that you just enter your data into Excel and play with the "Trendline" function until you get a result that looks right for your problem. There is no magic correct answer.