Two robots are operating in a factory to bring metal bars to two different production halls. The metal bars are dispensed in one place, only one bar can be picked up at a time, and each robot can only carry one bar at a time. Once a metal bar is picked up, a new one will appear at the dispenser with probability 0.5 every time step (every action taken corresponds to one time step). Each robot has 3 action choices. It can either try to pick up a metal bar, deliver it to the production hall, or wait. If it tries to pick up a metal bar, it will succeed with probability 0.5 (due to imprecisions in its programming) if there is a metal bar available and fail if there is none available. If it tries to deliver a metal bar to the production hall, it will succeed with probability 1 if it is holding a metal bar and fail otherwise. If it decides to wait it will stay in place. If both robots try to pick up a metal bar at the same time, they will both fail. Each robot receives a payoff of 4 if it successfully delivers a metal bar to the production hall and incurs a cost of 1 if it tries to pick up a metal bar or if it tries to deliver one to the production hall (reflecting the energy it uses up). The wait action does not incur a cost.
I am looking to draw this scenario as a state machine but I am not able to come up with anything. Any help would be appreciated.
So what you'll be looking at is a decision making markov process with eight states ($S_n$). A decision making markov process looks like so:
Each robot ($R_n$) has two possible states: it either has a bar ($S_1$) or it doesn't ($S_0)$. The dispenser (denoted with $M$ since $D$ is used for delivery) has two possible states as well, it either has created a bar ($S_1$) or not ($S_0$). Each robot can perform three actions: pick up bar ($B$), deliver bar ($D$), or wait ($W$). The dispenser has just one action, making a new bar ($N$).
I sketched out these decision making processes: 
Unfortunately, it's not that simple, in fact, it's a lot less simpler. The complete process has eight states total and 9 actions per state. The states can be summarized as lists of 3 elements like $(x,y,z)$. Where $x$ is $1$ if $R_{1}$ has a bar and $0$ if it doesn't, $y$ is $1$ if $R_2$ has a bar and $0$ if it doesn't, and $z$ is $1$ if $M$ has made a new bar and $0$ if it hasn't. The states are:
And the possible actions are also a list of actions, these are the combinations of actions the robots can perform in the format $(A(R_1), A(R_2))$. Where $A(R_n)$ is the action performed by robot $R_n$:
So, the entire process is pretty large, but not all that complex. It's simply a matter of drawing it all out.