In the standard and simplest turbo decoding scheme, each one of the two decoders (say $A$) is fed at iteration $n$ with some "a priori" information for each bit value (as some log-likelihood $L_A^{(n)}$) plus some channel information $L_{Ac}$ (measured channel output and assumed channel statistical model) and combines that input with the coder structure to compute a "a posteri" bit information $L_{A|Y}^{(n)}$.
We call "extrinsic information" the information difference : $L_{
Ae}^{(n)}=L_{A|Y}^{(n)}-L_{A}^{(n)}$
In the next iteration, the other decoder ($B$) will do the same. The point is that we want to have included, in the new "a priori" $L_B^{(n+1)}$, the information obtained by the other decoder in the previous iteration. On first sight, it might be plausible to take $L_B^{(n+1)} = L_{A|Y}^{(n)}$ (the a priori information of one decoder is the a posteriori information of the other) (with bit interleaving implied), but this is not so. The correct thing to do is to fed $B$ with the "extrinsic information" : $L_B^{(n+1)} = L_{
Ae}^{(n)}$.
To gain some insight on why the recipe "the a priori information of one iteration is the a posteriori information of the previous" is wrong, we could imagine
the following general (and absurd) bayesian learning scheme. We want to estimate some input $X$ given some output $Y$ and some a priori $p(X)$. Then, of course, we compute the a posteriori $p(X \mid Y)=p(Y \mid X) p(X)/p(Y)$. Done. Now, suppose someone tell us: "Given that $p(X \mid Y)$ reflects our best current knowledge about $X$, why not repeat the above computation using $p(X \mid Y)$ as our new "a priori", and iterate? by using an improved a priori, we'll obtain an improved posteriori..." We would answer: that would make sense only if we have a total new set of observations (indeed, that's Bayesian learning), but it would be totally wrong to apply that to the same observations! We want good a-priors, of course; but we want them to be truly a-prior, otherwise the Bayes formula is invalid, and the a posteri will be nonsense.
In general, there's a basic and important principle ("turbo principle" - also relevant for LDPC) that, informally, says something like this: in this kind of "belief network", in which nodes (here the two decoders) iteratively compute some global function by locally passing messages among neighbouring nodes, node $A$ should pass to $B$ all the information that $A$ has obtained, excluding that information that came (directly or indirectly) from $B$ itself.
In the turbo code case: it would be wrong to pass the posteriori $L_{A|Y}^{(n)}=L_{Ae}^{(n)} +L_{A}^{(n)}$ to decoder $B$ and tell it "use this as your new a priori", because that includes data that came from $B$ itself (the previous "a prori" $L_{A}^{(n)}$. The correct thing to do is to pass the difference, $L_{Ae}^{(n)}$ ("all the information I have excluding that that you already passed to me before").
In the standard and simplest turbo decoding scheme, each one of the two decoders (say $A$) is fed at iteration $n$ with some "a priori" information for each bit value (as some log-likelihood $L_A^{(n)}$) plus some channel information $L_{Ac}$ (measured channel output and assumed channel statistical model) and combines that input with the coder structure to compute a "a posteri" bit information $L_{A|Y}^{(n)}$.
We call "extrinsic information" the information difference : $L_{ Ae}^{(n)}=L_{A|Y}^{(n)}-L_{A}^{(n)}$
In the next iteration, the other decoder ($B$) will do the same. The point is that we want to have included, in the new "a priori" $L_B^{(n+1)}$, the information obtained by the other decoder in the previous iteration. On first sight, it might be plausible to take $L_B^{(n+1)} = L_{A|Y}^{(n)}$ (the a priori information of one decoder is the a posteriori information of the other) (with bit interleaving implied), but this is not so. The correct thing to do is to fed $B$ with the "extrinsic information" : $L_B^{(n+1)} = L_{ Ae}^{(n)}$.
To gain some insight on why the recipe "the a priori information of one iteration is the a posteriori information of the previous" is wrong, we could imagine the following general (and absurd) bayesian learning scheme. We want to estimate some input $X$ given some output $Y$ and some a priori $p(X)$. Then, of course, we compute the a posteriori $p(X \mid Y)=p(Y \mid X) p(X)/p(Y)$. Done. Now, suppose someone tell us: "Given that $p(X \mid Y)$ reflects our best current knowledge about $X$, why not repeat the above computation using $p(X \mid Y)$ as our new "a priori", and iterate? by using an improved a priori, we'll obtain an improved posteriori..." We would answer: that would make sense only if we have a total new set of observations (indeed, that's Bayesian learning), but it would be totally wrong to apply that to the same observations! We want good a-priors, of course; but we want them to be truly a-prior, otherwise the Bayes formula is invalid, and the a posteri will be nonsense.
In general, there's a basic and important principle ("turbo principle" - also relevant for LDPC) that, informally, says something like this: in this kind of "belief network", in which nodes (here the two decoders) iteratively compute some global function by locally passing messages among neighbouring nodes, node $A$ should pass to $B$ all the information that $A$ has obtained, excluding that information that came (directly or indirectly) from $B$ itself.
In the turbo code case: it would be wrong to pass the posteriori $L_{A|Y}^{(n)}=L_{Ae}^{(n)} +L_{A}^{(n)}$ to decoder $B$ and tell it "use this as your new a priori", because that includes data that came from $B$ itself (the previous "a prori" $L_{A}^{(n)}$. The correct thing to do is to pass the difference, $L_{Ae}^{(n)}$ ("all the information I have excluding that that you already passed to me before").