I'm trying to understand entropy and KL divergence. While it makes sense in a simplistic case, such as the case of a coin flip, I struggle wrapping my head around it when it is a more complicated case where the information content is a decimal. I am trying to imagine it in the form of a binary tree, where $$ \log_2\left(\frac{1}{p(x)}\right) $$ is the depth to the leaf of the binary tree. This would give us the number of moves we would have to take to reach the leaf from the root. However, if we have something like: $$ p(x_1) = \frac{7}{8} , p(x_2) = \frac{1}{8} $$ I struggle to visualize the meaning, besides from a functional point of view. How can I interpret that we have $$ \log_2\left(\frac87\right) = 0.193 $$ "bits" of information, is there a way to visualize this, preferably in the style of binary tree codings?
2026-03-27 02:34:38.1774578878
Intuitive interpretation of entropy
111 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in INFORMATION-THEORY
- KL divergence between two multivariate Bernoulli distribution
- convexity of mutual information-like function
- Maximizing a mutual information w.r.t. (i.i.d.) variation of the channel.
- Probability of a block error of the (N, K) Hamming code used for a binary symmetric channel.
- Kac Lemma for Ergodic Stationary Process
- Encryption with $|K| = |P| = |C| = 1$ is perfectly secure?
- How to maximise the difference between entropy and expected length of an Huffman code?
- Number of codes with max codeword length over an alphabet
- Aggregating information and bayesian information
- Compactness of the Gaussian random variable distribution as a statistical manifold?
Related Questions in ENTROPY
- Relation between Shanon entropy via relation of probabilities
- How to maximise the difference between entropy and expected length of an Huffman code?
- Appoximation of Multiplicity
- Two questions about limits (in an exercise about the axiomatic definition of entropy)
- Computing entropy from joint probability table
- Joint differential entropy of sum of random variables: $h(X,X+Y)=h(X,Y)$?
- What is the least prime which has 32 1-bits?
- Eggs, buildings and entropy
- Markov chains, entropy and mutual information
- Entropy and Maximum Mutual Information
Related Questions in VISUALIZATION
- open-source illustrations of Riemann surfaces
- Making something a control parameter or a variable when analysing a dynamical system
- Does this dynamical system show an "absorbing area" or a "chaotic area"?
- What is the difference between a trace and a contour in calculus?
- Graph layout that reflects graph symmetries
- What's new in higher dimensions?
- Error made if we consider the whole globe as the coordinate chart?.
- Visualizing Riemann surface
- How to visualise positive and negative tangents
- Using Visualization for Learning: $a^0=1$
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Perhaps focusing on the definition of entropy as an expected value may help you.
Remember that a contiuous random variable (RV) $X$ with a distribution $p(x)$ has an expected value given by $$ \langle X\rangle = \int{x p(x) dx}. $$
By analogy, and referring to the definition of entropy (here I'm using the continuous case), one has that $$ H = \langle -\log(p(x)) \rangle = -\int{p(x)\log(p(x))dx} $$
Now, what's the meaning of this unusual RV, given by -$\langle \log(p(x)) \rangle$? First, note that the minus sign and the log allow us to express $H$ as $$ H = \left\langle \log\left(\frac{1}{p(x)}\right)\right\rangle $$
Look at the expression above and think about the magnitude of $p$ in two extreme cases:
Well, think about the amount of information these two extreme cases carry: which one of them are more informative than the other? The common event, which is conceptually ordinary; or the rare event, which by its own definition tells us that something unusual is going to happen?
Conceptually speaking, then, $-\log(p(x))$ may be seen as the amount of information carried by the event $x$. Therefore, $H$ would correspond to the average amount of information carried by the system, since a sum over all events is being performed (the integral sum)