I think I'm confused about something very basic with the definition of expected value. Consider the question of "what's faster to get on average rolling dice: consecutive $4,5,6$ or consecutive $6,6,6$"?
I understand the following intuitive argument why $4,5,6$ should be obtained faster: if we fail after getting $4,5$, our failure could still be a $4$ that starts a new chain, whereas if we fail after getting $6,6$, our failure cannot be a $6$ and we need all three values again.
More formally, if $X$ is the random variable that measures the number of tries until first getting $4,5,6$, and $Y$ is the same for $6,6,6$, I think it's true that for any given $N$, $P(X=N) > P(Y=N)$. For $X=N$ to be true, we need throws $N-2,N-1,N$ to be $4,5,6$ and then also that no $4,5,6$ occurs before that, which disqualifies a number of sequences. But in the parallel case $Y=N$ more sequences are disqualified, because for instance throw $N-3$ cannot be $6$ at all - the winning triple helps disqualify more sequences.
So far this seems logical, if $P(X=N) > P(Y=N)$ that just says that for any given $N$ it's more likely that $4,5,6$ first occurs at $N$ than $6,6,6$ does. But the formal definition of expected value is $E_X=\sum_N{P(X=N)*N}$, and if each $P(X=N)>P(Y=N)$, doesn't it follow automatically that $E_X>E_Y$? What am I missing?
I think your assumption is in fact incorrect. If for any N: $P(X = N) \gt P(Y = N)$, then $P(X = N) = P(Y = N)$ since both have to sum to $1$. Try to formulate the problem as a Markov chain.