
I am trying to understand the solution, because I think I got it completely wrong. I wrote we could take the initial DFA and replace the normal transitions with epsilon transition except for all transitions to a state that is a multiple of 3, but I don't think that's what we did here.
It doesn't really make sense to ask for the state diagram. To draw a single specific diagram is impossible, because the specific diagram will depend on what $A$ looks like. But the construction here is completely general, and will work for any $A$. So you have to approach this more abstractly. (I did try drawing the state diagram for a simple example, and it came out correct, but I decided it would be more helpful to leave it out.)
There are several ideas working at once. The main idea is that machine $B$ can simulate what $A$ would have done given some string, and then if $B$'s simulated copy of $A$ accepts, $B$ can accept too.
The situation is a little bit complicated by the fact that $B$ doesn't know what input to give to its simulation of $A$. For example, should $B$ accept the string
cfi? Yes, if there is any string of the form $??\mathtt{c}??\mathtt{f}??\mathtt{i}$ or $??\mathtt{c}??\mathtt{f}??\mathtt{i}?$ or $??\mathtt{c}??\mathtt{f}??\mathtt{i}??$ that would be accepted by $A$. So $B$ needs to simulate $A$'s behavior on every possible string of that form, to find out if $A$ would accept any of them. Then if $A$ accepts any one of those, $B$ can acceptcfi, and otherwise not. Fortunately, $B$ is non-deterministic, and can run many computations at once, one for each possible string that $A$ might accept.Another way to think of this, that is sometimes easier, is to imagine that $B$ can make lucky guesses: if there is any set of guesses that would cause it to accept, it will make those guesses, and accept; it will reject the input only if there is no possible lucky guess that would result in it accepting.
Actually $B$ doesn't have to guess the entire input to $A$; it only has to guess two-thirds of it. The other one-third comes from its own real input. It will guess two characters, and feed those to its simulation of $A$, then read one character from its own real input and feed that to the simulation; then guess two more characters, read one more, and so on. When it reaches the end of the real input, it can guess between zero and two more characters, feed them to the simulation, and then see if the simulation is in an accepting state. If simulation of $A$ accepts the input that was fed to it by $B$, then $B$ should accept the real input, and otherwise not.
To keep track of whether it is feeding guesses or real characters to the simulation of $A$, the machine $B$ will have, as part of its finite control, a counter $c$, initially set to 1. When the counter is 1 or 2, $B$ should guess a character to give to the simulation, when the counter is 3 it should read a real input and give that to the simulation. Each time, after feeding the simulation, $B$ will increment the counter, from 1 to 2, or from 2 to 3, or, if the counter is already at 3, $B$ should reset it to 1. and start over. By using the counter $B$ can keep track of where it is in its guess-guess-read cycle.
So each state of $B$'s finite control has two parts. One part will record the state of the simulated $A$, and one part will record the value stored in the counter $c$. Each state of $B$ will look like $\def\state#1#2{\langle{#1},{#2}\rangle}\state qc$ where $q$ is the state of its simulation of $A$, and $c$ is the value of its counter. This is why the states of $B$ are elements of $Q\times \{1,2,3\}$. The $Q$ part is for storing a state of $A$, because $B$ is simulating what $A$ is doing, and the $\{1,2,3\}$ part is the set of possible values of the counter. Each state of $B$ has both of these at once, because at all times $B$ must remember both the state of its simulation of $A$ and the number it has stored in its counter $c$. (We actually need four possible values for the counter, but we'll come back to that later.)
Let's see now how $B$ can do its job of feeding two guesses and then one real input to its simulation of $A$. When $c=1$ or $c=2$, it should guess a character and adjust the simulation, without reading any real input, and then increment the counter. “Without reading any real input” means it makes an $\epsilon$-transition. It should guess a character, say $\def\c#1{{\mathtt{#1}}}\c x$, and make the appropriate transition in the simulated copy of $A$. So if $A$ had the transition $\delta(p, \c x) = q$, $B$ should have a transition $\rho(\state p1, \epsilon) = \state q2$. This transition corresponds to $B$ guessing a letter $\c x$, feeding it to $A$, thus causing the simulation of $A$ to make a transition from state $p$ to state $q$, and incrementing the counter from 1 to 2. When $B$ changes its state from $\state p1$ to $\state q2$, it is simulating that $A$ changed from state $p$ to state $q$, and also incrementing its counter.
This transition is an $\epsilon$-transition, because $B$ does all this without actually reading any real input. Similarly $B$ should also have a transition $\rho(\state p2, \epsilon) = \state q3$, which is the same, except this time the guess is being made when the counter is 2, and $B$ increments it to 3. These transitions are described in lines 1 and 2 of the definition you gave for the transition function $\rho$.
Now let's see how $B$ handles its real input. $B$ only reads a real character when its counter $c$ is equal to 3; that is, in states of the form $\state p3$. Say $B$ reads the character $\c x$. Then it should simulate what $A$ would do if fed $\c x$, and it should also reset its counter to 1. If $A$ goes from state $p$ to state $q$ on character $\c x$, then $B$ should adjust its simulation accordingly when it reads $\c x$ So whenever $A$ has a transition with $\delta(p,\c x)= q$, machine $B$ will have a transition $\rho(\state p3, \c x) = \state q1$. Here $B$ had changed the state of the simulated $A$ from $p$ to $q$, and reset the counter to 1. This is line 3 of the definition you have of the transition function $\rho$.
Where should $B$ start? It should initialize its simulation of $A$ without whatever $A$'s starting state is; that's $s_0$. And it should initialize its counter $c$ to 1. So $B$ starts in state $\state{s_0}1$. (The exam solution forgot to mention this, and should be penalized ten percent.)
When should $B$ accept? Exactly when its simulated copy of $A$ accepts! That is, $B$ should accept in all states $\state pc$ where $p\in F$. Or put another way, $B$ accepts on any state that is in the set $F \times \{1,2,3\}$.
So we have, if $A = \langle Q, \Sigma, \delta, s_0, F\rangle$ then $$B=\langle Q \times\{1,2,3\}, \Sigma, \rho, \state{s_0}1, F\times\{1,2,3\}\rangle,$$ where $\rho$ is as described above.
Now there is an important piece missing here, which is that the exam question allows the counter $c$ to take values $1,2,3,4$, not just $1,2,3$. Why does it do this? Here I'm on shaky ground, because you didn't actually show us what the question was!
I suspect that what happened is this: Suppose $L(A)$ contains the empty string. Should $L(B)$ necessarily contain the empty string also? After all, that's what you get when you take every third character from the empty string. I think your exam question probably specified that $\operatorname{third}(L(A))$ only includes every-third-character strings of nonempty strings from $L(A)$, or something like that.
If this is right, then the way this is handled by the exam solution is to add an extra set of copies of $\state p1$, called $\state p4$. The $\state p4$ states are basically the same as the $\state p1$ states, and indeed, $B$ can get from state $\state p4$ to state $\state p1$ without changing the simulated state of $A$ (which is $p$) and without reading any input. So whenever $B$ is in any state $\state p4$, it is also in state $\state p1$, and this $\state p4$ is almost the same as $\state p1$. The difference is that $B$ can't get into the $c=4$ states first giving some input to the simulated $A$. Since the exam solution makes the $\state p4$ states accepting but not the $\state p1$ states, $B$ can only accept if its simulation of $A$ accepts and if it gave the simulated $A$ at least one character of input.
This was long, but I hope that some of it is helpful. In the future, when asking about an answer to an exam or homework problem, please also include the question.