I have data for selling books from 2 bookstores for 100 days. For the first 90 days, no book was sold. Then the following books were sold
Day# - BookStore1 - BookStore2
Day1 - 0 - 0
Day2 - 0 - 0
....
Day 90 - 0 - 0
Day 91 - 1 - 10
Day 92 - 2 - 9
Day 93 - 3 - 8
Day 94 - 4- 7
Day 95 - 5 - 6
Day 96 - 6 - 5
Day 97 - 7 - 4
Day 98 - 8 - 3
Day 99 - 9 - 2
Day 100 - 10 - 1
Question: Is there a correlation between selling books from bookstore 1 and 2?
The correlation for the last 10 days is negative and -1. However, if I take all the 100 days, Pearson correlation is 0.486413 and Spearman is 0.9351082.
My question now, should the zeros be included in the correlation (100 days), or should I just compare the days that books were sold (last 10 days)?
By arbitrarily limiting the data sample you can achieve (almost) whatever correlation you want. This means that your data selection should match the problem you are trying to solve.
The data you present is clearly synthetic, so I can make no inferences about the underlying reality.
E.g., if the zeros in the first 90 days are because the stores were effectively closed, then only the data from the last 10 days should be used and correlation is -1.
If, however, the stores are open all the time,
but it just so happens that the 90% of the time it is ignored, and every 10 days someone walks by and buys 11 books ("at random" from each store), then the first 90 are relevant and the correlation is ~0.5 or whatever your statistics package computes.In the absence of any further information, I think the latter answer is correct.