In pg. 193 of Cover's Information Theory textbook, a discrete memoryless channel (DMC) is defined by the triplet $(\mathcal{X}, p(y \mid x), \mathcal{Y})$, and its nth extension is defined as the channel $\left(\mathcal{X}^n, p\left(y^n \mid x^n\right), \mathcal{Y}^n\right)$, where $$ p\left(y_k \mid x^k, y^{k-1}\right)=p\left(y_k \mid x_k\right), \quad k=1,2, \ldots, n $$
It is further defined that the channel is used without feedback if $p\left(x_k \mid x^{k-1}, y^{k-1}\right) =p\left(x_k \mid x^{k-1}\right)$.
Then, the author claims that for a DMC without feedback, the $p\left(y^n \mid x^n\right)$ distribution is reduced into the following:
$$p\left(y^n \mid x^n\right)=\prod_{i=1}^n p\left(y_i \mid x_i\right)$$
I don't get this. My attempt so far is that $$p\left(y^n \mid x^n\right) = p\left(y_1 \mid x^n\right)p\left(y_2 \mid x^ny^1\right)\cdots p\left(y_n \mid x^ny^{n-1}\right)$$ which holds for any distribution via the chain rule, and if we can show that for each of the $p\left(y_k \mid x^ny^{k-1}\right)$ the following holds:
$$p\left(y_k \mid x^ny^{k-1}\right) = p\left(y_k \mid x^ky^{k-1}\right)$$
Then we can reduce the right-hand side into $p\left(y_k \mid x_k\right)$ using the definition of DMC. So, I tried to show that the above equality holds when using the without feedback condition, and have not succeeded. Am I missing something obvious?
Using the chain rule, and memorylessness, we have $$ p(y^n|x^n) = p(y_n|x^n, y^{n-1}) p(y^{n-1}|x^n) = p(y_n|x_n)p(y^{n-1}|x^n).$$
Now, $$p(y^{n-1}|x^n) = \frac{p(y^{n-1},x_n|x^{n-1})}{p(x_n|x^{n-1})} \\= \frac{p(y^{n-1}|x^{n-1}) p(x_n|x^{n-1},y^{n-1})}{p(x_n|x^{n-1})} = p(y^{n-1}|x^{n-1}),$$ where we used Bayes' rule to begin, and used the no feedback condition for the final equality. This yields $$ p(y^n|x^n) = p(y_n|x_n) p(y^{n-1}|x^{n-1}),$$ iterating which leads to the decomposition in the question.