Analyzing data in R

67 Views Asked by At

I need to analyse if certain physical characteristics of the urine might be related to the formation of calcium oxalate crystals.

You can look into the dataset as follows:

ftp://ftp.uni-bayreuth.de/pub/math/statlib/datasets/Andrews/T44.1

From the data, there were 2 missing value which is on the first and the 55th data points.

Should I just leave the missing data and proceed with the analyzing or is there some process or statistical test to predict the value?

And I'm thinking to use logistic regression to analyse the data. However, is there any other opinion or suggestion for analyzing the characteristic of the urine with formation of crystal?

I'm using R for this.

2

There are 2 best solutions below

0
On

If you try and fill in the blanks, no matter which method you use, you will be making up data. It may be the "best" way from such or such mathematical viewpoint, but from a real-life point of view it is absolutely pointless and likely against scientific spirit.

0
On

If the $2$ missing observations represented neglected proportion of the total number of observation, then you can just ignore (discard) them. Generally, there is a whole theory of dealing with missing data. Methods that fill-in a single value are basically flowed as you making up data ("information"). Hence a basic approach is multiple imputation - i.e., filling the missing cells multiple times and estimating the parameters of interest each time. In this way you don't discard the partial observations and taking into account the fact that you "guessed" the missing value by increasing the variance of the estimators.