How to compute co-occurence probability from a co-occurence matrix?

41 Views Asked by At

I have a co-occurence matrix in this format:

         act good great hate love movi
  act    [0   1    1    0     0    1]
  good   [1   0    1    0     0    1]
  great  [1   1    0    0     0    1]
  hate   [0   0    0    0     0    1]
  love   [0   0    0    0     0    1]
  movi   [1   1    1    1     1    0]

For the small dataset:

docs = ['I loved the movie',
        'I hated the movie',
        'a great movie. good acting']

An algorithm I am implementing from a scientific article asks for:

... probability that word Wj co-occurs in class Ci ...

So the question is: from the given co-occurence matrix, how do I calculate the probability that the word co-occurs in some class Ci (let's say the matrix above is for the class Ci)?

My line of thought - if I compute the sum of every row representing the number of co-occurences for every single word Wj:

 [3],
 [3],
 [3],
 [1],
 [1],
 [5]

And if I sum this new matrix, I will get the number 16. Now if I divide the previous 1D matrix by 16, I get a new 1D matrix:

 [0.1875]
 [0.1875]
 [0.1875]
 [0.0625]
 [0.0625]
 [0.3125]

Is this the correct line of thought? Does this last matrix represent the probability of co-occurence? For more context, I am implementing a version of Naive Bayes spam classifier. The scientific article is here. Relevant information on pages 3 and 4. Thank you very much.