Should i remove "know" faulty datapoints from dataset before doing statistical analysis or after importing to statistical analysis program?

29 Views Asked by At

I have a dataset of 21017 rows in excel, this has ~12 columns.

At row 10013 i have i have one, (1), measurement that i know must be faulty (As it physically cant happen)

I want to import this dataset to RStudio and perform analysis on it.

Do i remove the "faulty" datapoint by deleting the row in excel, or do I remove it /ignore it by using R-programming functionality?

These measurements are "time-related", so it will create a missing-point/gap in the dataset ( but i dont think that it is relevant?).

What is correct to do in such a case?

1

There are 1 best solutions below

0
On BEST ANSWER

As a general rule, the earlier you can scrub your data, the better. Why? Because then you're propagating less data from here to there, and the strain on your infrastructure is less. In this case, when deleting one row, it's probably not going to make much of a difference when you do it. But imagine having $5$ million images, and you're able to filter out half of them before sending them on to be processed: you'd reduce the stress on your networks and other hardware, and the entire process would be faster.