Estimating the likelihood of independence of two discrete variables using the co-occurrence count matrix.

35 Views Asked by At

I have some data about users from different regions visiting different directories of some website. Aggregating that data I get the co-occurrence frequency matrix (for regions and directories). Now I want to distinguish two situations:

  1. The users visit directories independently from their regions
  2. There is a bijective map between the regions and directories and a user from some region only visits a specific directory (unless he makes a mistake)

The likelihood of the first hypothesis seems easy to estimate, but I have problem estimating the likelihood of the second one. Put differently, I want to measure the degree of diagonality of the co-occurrence matrix where some rows and columns are missing (zero), the rest rows/columns were scrambled and the noise was added.

How would you estimate the likelihood of the second hypothesis if we assume that there is some constant error rate (the probability of a visitor from some region going to a wrong directory)?