Probabilistic Considerations on the Calculation of Shannon's Entropy in a Network Traffic

398 Views Asked by At

I have a network dump in PCAP format (dump.pcap) and I am trying to compute the entropy of the Source and Destination IPs.

I am using the Python code:

import numpy as np
import collections

sample_ips = [
    "131.084.001.031",
    "131.084.001.031",
    "131.284.001.031",
    "131.284.001.031",
    "131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)

I have two network traffic from lab experiment: one free of attacks (semAtaque.pcap) and another with DDoS attacks (Attacks.pcap).

The IP of the server that suffered the attack: 192.168.1.5

When calculating this way, some doubts arise:

1.I am considering a discrete probability distribution and equally likely outcomes. Is that correct? Is that reasonable?

2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis: "The entropy value allows you to detect the attack" Is that ok?

What would be a good hypothesis test for the case (the sample space is about 40)?