Assume we have the following sequence of states:
A-B-C-D
A-B-D-E
A-C-E
A-B-C-E
A-B
B-E
A-D-E
A-C
where we have input states A and B, output states D,E,B and C, as well as transitional states A-B, B-C, C-D, B-D etc.
So, I can calculate the number of the states and determine probability of the state, for example:
input state
Aoccurs 7 times out of 8, thus the probability of input stateAis:(7*100)/8=87.5%
transition state
A->Boccurs 4 times, therefore its probability 50%.
However, I am not sure about the right way to calculate the repetitive states, for example:
A-B-C-C-C-C-C-D
A-B-D-E
A-C-E
A-B-C-C-C-C-E
A-B-C-C
B-E
A-D-E
A-C
In this case, the state C->C preserves 8 times, with the probability (8*100)/8=100% ? Which IMHO does not make sense. Obviously I'm doing something wrong.
UPDATE
I'm trying to implement ideas from this paper section III in particular. I've written a script which parses large number of pcap files containing TLS sessions, the result of the script is a list TLS session states, kind of a graph of states.
Now, the above mentioned paper says:
The transition probability between states is derived from frequencies observed in the sequences [...]
So, the transition probability is what I need to compute. If there's already a python library that would do this for me, it'd be ideal :)
I would appreciate if someone would shed some light for me. Thanks!