I have the following proof for the chain rule for entropy of random variables: We write:
\begin{eqnarray*} H(X_1,X_2,...,X_n)&=&-\sum\limits_{x_1,x_2,...,x_n}p(x_1,x_2,...,x_n)logp(x_1,x_2,...,x_n) \\&=&-\sum\limits_{x_1,x_2,...,x_n}p(x_1,x_2,...,x_n)log\prod\limits_{i=1}^{n}p(x_i|x_{i-1},...,x_1) \\&=&-\sum\limits_{x_1,x_2,...,x_n}\sum\limits_{i=1}^n p(x_1,x_2,...,x_n)logp(x_i|x_{i-1},...,x_1) \\&=&-\sum\limits_{x_1,x_2,...,x_i}\sum\limits_{i=1}^n p(x_1,x_2,...,x_n)logp(x_i|x_{i-1},...,x_1) \\&=&\sum\limits_{i=1}^n H(X_i|X_{i-1},...,X_1) \end{eqnarray*}
Basically, I have 2 questions: first how we get from line 1 to line 2. Is it some kind of Markov property? And second, why the summation lower boundary changes in line 4?
The first line is just conditioning: \begin{align} &p(x_1,x_2) = p(x_1)p(x_2|x_1)\\ &p(x_1,x_2, x_3) = p(x_1)p(x_2|x_1)p(x_3|x_1,x_2) \end{align} and in general: $$ p(x_1, ..., x_n) = p(x_1)\prod_{i=2}^np(x_i|x_{i-1}, ..., x_1) = \prod_{i=1}^n p(x_i|x_{i-1}, ...) $$
I agree the third-to-fourth line seems unclear (in fact, it is incorrect as they have a typo, they should have gotten rid of all variables $x_{i+1}, .., x_n$). I would go from the third to fourth line in the following steps: \begin{align} &\sum_{x_1, ..., x_n} \sum_{i=1}^np(x_1, ..., x_n)\log[p(x_i|x_{i-1},...)]\\ &=\sum_{i=1}^n\sum_{x_1, ..., x_i}\sum_{x_{i+1}, ... ,x_n} p(x_1,...,x_n)\log[p(x_i|x_{i-1}, ...)]\\ &=\sum_{i=1}^n\sum_{x_1, ..., x_i}\log[p(x_i|x_{i-1},...)]\sum_{x_{i+1},...,x_n}p(x_1,...,x_n)\\ &=\sum_{i=1}^n\underbrace{\sum_{x_1,...,x_i}\log[p(x_i|x_{i-1},...)]p(x_1, ..., x_i)}_{-H(X_i|X_{i-1}, ...)} \end{align} This uses the idea that we can sum out a subset of the variables: $$ \sum_{x_{i+1}, ..., x_n} p(x_1, ..., x_n) = p(x_1, ..., x_i) $$