Best Way to Skip Rows in Dataset

42 Views Asked by At

I would like to be directed to some resources regarding optimal ways to sparse the rows in a dataset/dataframe. That is, finding the best way of reading-in large data sets into an application that can't handle the file size and requires lines to be skipped so that as few key features in the data are lost. Some apps will simply keep every $n^\text{th}$ row. While this approach is easy and inexpensive to implement, it may prove costly with regard to proper analyses. It must be that the best way to discard/keep rows of a dataset may be dependent on the data set. The only other thing I can think of that may be useful is dimension reduction, perhaps by applying this to the transpose of the data set. Thanks in advance.