Is the a version of typical set defined for non-i.d.d. sequences

19 Views Asked by At

For some context, recently I chanced upon the definition of typical set in information theory, and I had a feeling that it could be a useful tool to analyze large language models (LLMs). The issue is, however, typical set is typically (no pun intended) defined for i.i.d. sequences, where $P(X_1,X_2,...,X_n) = P(X_1)P(X_2)...P(X_n)$. However, if we treat $(X_1, X_2, ..., X_n)$ as the sequence of tokens generated by LLMs, this i.i.d., assumption doesn't hold. Hence, I am wondering if there is a counterpart of this definition for non-i.i.d. sequences, or whether it makes sense to have such a definition in the first place.