How to check if these two variables are correlated

69 Views Asked by At

I have two variables: A and B

I have a set of N "blocks": 1, 2, 3.... N

For each block I know if A appears or not in that block, for example:

A appears in block 1, A doesn't appear in block 2, ... , A appears in block N.

I have the same data for B, so:

B doesn't appear in block 1, B doesn't appear in block 2, ... , B appears in block N.

How can I check if there's some degree of correlation between the distributions of A and B, to see if they're indipendent or not, possibly with a value that goes from 0 (no correlation) to 1 (they're surely correlated)?

Edit: the variables A and B are a list of n tokens ∈ N (so the "blocks" are a bigger list of tokens), the presence of A (or B) in 1,2...N is determined by A (or B) sharing the specific token in position 1,2...N. So for example A = [the, dog, barks, the, cat, is, brown], B = [the, dog, is, brown, and, jumps], N = [the, dog, is, brown, barks, cat, and, jumps, cow, eagle, etc...]. I want to see if there's a correlation between A and B, in terms of shared tokens and of which tokens they're both missing, and viceversa (A has a token but not B).

The position of the token in the list doesn't matter.