Interpretation of an undirected adjacency matrix

51 Views Asked by At

I am new and know not much about "graph theory" and "graph neural network". Assume, I have one incidence matrix $\mathbf{B}$ such as

visitor item1 item2 item3 item4
A 1 0 0 1
B 1 1 0 0
C 0 1 0 0
D 0 0 1 1
E 1 0 1 1

Where 1 represents purchased and 0 represents no purchase. If I build an undirected adjacency matrix $\mathbf{A}$ for items such as $\mathbf{B}^T\mathbf{B}$ with the diagonal set to 0. The $\mathbf{A}$ is defined as

item1 item2 item3 item4
$\textbf{item1}$ 0 1 1 2
$\textbf{item2}$ 1 0 0 0
$\textbf{item3}$ 1 0 0 2
$\textbf{item4}$ 2 0 2 0

$\textbf{Question1:}$ Can I say that if a visitor purchased $\textbf{item3}$, a recommender system should recommend a visitor to purchase $\textbf{item4}$ and $\textbf{item1}$. The $\textbf{item4}$ will be the 1st recommendation because the weight (count) is 2 and $\textbf{item1}$ will be the 2nd recommendation because its weight is 1.

$\textbf{Question2:}$ Should I normalize item matrix $\mathbf{A}$ by row like following OR

import pandas as pd
import numpy as np 

B = np.array([
    [1,0,0,1],
    [1,1,0,0],
    [0,1,0,0],
    [0,0,1,1],
    [1,0,1,1]
])
A = np.transpose(B).dot(B)
np.fill_diagonal(A, 0)
print(A/A.sum(axis = 1).reshape(4,1))
[[0.         0.25       0.25       0.5       ]
 [1.         0.         0.         0.        ]
 [0.33333333 0.         0.         0.66666667]
 [0.5        0.         0.5        0.        ]]

$\tilde{\mathbf{A}} = \mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}$, where $\mathbf{D}$ is a degree matrix. The $\tilde{\mathbf{A}}$ is

import pandas as pd
import numpy as np 
from scipy.linalg import sqrtm, inv

B = np.array([
    [1,0,0,1],
    [1,1,0,0],
    [0,1,0,0],
    [0,0,1,1],
    [1,0,1,1]
])
A = np.transpose(B).dot(B)
diag = np.diagonal(A)
zero = np.zeros((4,4))
np.fill_diagonal(zero, diag)
dm = zero.copy() # degree matrix
D = inv(sqrtm(dm)) # D^(-1/2)
out = D.dot(A).dot(D)
np.fill_diagonal(out, 0)
print(out)
[[0.         0.40824829 0.40824829 0.66666667]
 [0.40824829 0.         0.         0.        ]
 [0.40824829 0.         0.         0.81649658]
 [0.66666667 0.         0.81649658 0.        ]]

many thanks

1

There are 1 best solutions below

4
On BEST ANSWER

What you can definitely say from the numbers $1, 0, 0, 2$ you have in matrix $\mathbf A$ is that out of the five visitors you've collected data on, there was $1$ who purchased item3 together with item1, and there were $2$ who purchased item3 together with item4.

Interpreting that any further is more about common sense than it is about mathematics, and also depends a lot on the application: what do you plan to use this conclusion for?

For example, if the plan is "if a customer purchases item3, I will use this data to recommend item4 as well" then using matrix $\mathbf A$ is a mistake for the following reason. Imagine a data set in which item1 is batteries, and item2, item3, item4 are various battery-powered items: an emergency radio, a flashlight, and a calculator. Almost everyone who buys one of these items buys batteries for it as well. Then:

  • It would be really useful to tell people buying the flashlight that lots of people like them bought the emergency radio (and not a lot bought the calculator).
  • However, matrix $\mathbf A$ will begin by recommending batteries, because everyone who bought the flashlight bought batteries as well. This is still a useful reminder, but much less useful.
  • Now imagine that the main thing your store sells is chocolate candy, and the electronics is only a side venture. All of the customers buying the flashlight have bought candy multiple times, and so matrix $\mathbf A$ will recommend candy to them on those grounds alone. This is now completely useless!

In such a scenario, you might want to normalize the columns of $\mathbf B$ first (say, so that each column adds up to $1$). Then $\mathbf B^{\mathsf T}\mathbf B$ will make more specific recommendations: if item1 often gets purchased with item2 and item3, but item2 gets purchased a lot for other reasons as well, while item3 almost always is only bought with item1, then item3 will be a better recommendation.

On the other hand, if your goal is prediction, and not recommendation, then your original solution is the right choice. It is accurate that more people who buy flashlights will buy batteries than emergency radios, and it is accurate that they will all buy candy as well (in the example above).