Population Data
I have a large dataset of funds to all the training schools in London. It's a complete dataset i.e. it represents the population I am studying. I only want to know about the training schools in London and am not inferring anything about a larger population.
Some facts about the population data:
- The funding distribution is positively skewed - significant outliers make the mean larger than the median
- The population size is large i.e. several thousand schools.
- There are missing values - about 2% of the population data
My Analyses
I am comparing the median funding between:
- rural and urban schools
- large and small schools
- expensive and cheap schools
- etc.
My question
Is inferential statistics meaningless here? Is it enough to use descriptive statistics?
I am not inferring anything about the population, because I have the population data. When I say the median large school receives more funding than the median small school, that is a description of the population, not an inference. I have then added a warning to say that my conclusions apply to 98% of the population i.e. I am not making any claims about the 2% of the schools for which I am missing data.
I have not added in anything about p-values or statistical significance. Am I wrong!?
Inferential Statistics is not meaningless; however, it is not needed in this situation since you already have the population data. There would be no point in making an inference if you are not testing anything. You are simply stating the facts. It is important to note of any strong skewness (which you did) and describe the distribution of the medians between school size funding.
Basically, you can not do a test of any sort since it is population data. But you can make an inference about the difference of the medians.
Hope this helps