I'm trying to solve puzzle. Can someone please check my solution? Thank you!
I have four cards with values $1,2,3,4$ faced down. Each turn, I can either flip a card or do nothing. If I flip a card, I get the face value. If I don't, I get $2.5$. I have four turns. What's the optimal strategy (assuming you're risk neutral) and what is its expected value?
This is my solution. Let $V(a,b,c,d)$ denote the value at each state. The arguments of $V$ can only take zero or one, indicating whether a card is unflipped. For example, $V(1,0,0,0)$ is the value where the card with value $1$ is unflipped and $V(0,0,0,1)$ is the value where the card with value $4$ is unflipped.
I then work backwards. During my last turn, if the card that's unflipped is $1$ or $2$, I won't flip it. If it's $3$ or $4$, I will flip it. Then $V(1,0,0,0) = 9 + 2.5 = 11.5$, $V(0,1,0,0) = 10.5$, and $V(0,0,1,0) = V(0,0,0,1) = 10$.
It follows
$$V(1,1,0,0) = \max\left\{ \frac{1}{2} \left(V(1,0,0,0) + V(0,1,0,0)\right), 12\right\}$$
where the first argument in the max operator is the value of flipping and the second argument the value of not flipping.
I can back out $V(1,1,1,1)$ using this logic, until I realize I haven't proved that it's never optimal to stay input one round and then flip cards the next round.
Can I get a hint on this?
Let $U$ denote the set of cards (e.g., $U=\{1,2,3,4\}$). Let $X$ denote the subset of cards that have been flipped. Let $v(t,X)$ be the value of playing the game with $t$ turns remaining. Let $r$ be the reward for not flipping a card (e.g., $r=2.5$). Trivially, $v(0,X)=0$ for any $X$. Moreover, if $t>0$, $$ v(t,X)=\max\left\{ \frac{1}{\left|U\setminus X\right|}\left(\sum_{x\in U\setminus X}v(t-1,X\cup\{x\})+x\right),v(t-1,X)+r\right\} . $$
Remark. The above is understood w.r.t. the conventions $\sum_{x \in \emptyset} f(x) = 0$ and $0 / 0 = 0$.
This, with some simplifications, is implemented in the code below.