New to statistics. Reading a intro book. I just finished a chapter on Correlation Coefficients, and I'm going over covid data. I think I grasp the relevant formula, but I think I'm making some mistakes in interpretation and possibly application.
Using data from Our World in Data's Covid data I used Excel to calculate correlation coefficients. So for total covid cases per capita in USA vs. total covid deaths per capital I'm getting a CC of 0.97. That makes sense, people aren't dying of Covid without having acquired Covid first.
Some weird things I'm getting, negative CCs for total covid cases per capita vs. hospitalizations per capita (-.389) or vs. icu admissions (-.606).
Now OWID's hospital and icu data is current admissions which rises and falls, but the total cases per capital just keeps going up. Now I'd expect a loose correlation since total cases goes back to 2020, but what really matters for hosp/icu is current state of affairs. Still, why would either be negatively correlated and why would the CC be so far away from zero? I'd expect the values to be positive since if the total is increasing rapidly, the number of hospitalizations will also be rising.
I suspect I'm misinterpreting the information.
I don't think you actually have a problem. Something those new to statistics very often overlook when considering correlation coefficients is the effect of degrees of freedom. A CC of 0.97 is not very significant if you only have 4 or 5 pairs in your sample whereas one of 0.64 is massively significant if you have more than 100 pairs (still significant at the 0.05% level if you have only 20 pairs).
I suggest you try & find a book of statistical tables to give you a feel for degrees of freedom You should also take a good look at the theory behind the statistical tests you want to use - they all have conditions & catches.