Here's what I have:
The size of the set $V^8$ is $2^8=256$. If the code can correct 2 errors, the minimum distance of the code must be 5. Then the size of an open ball of radius 5 around the code word (0,0,0,0,0,0,0,0), which we can call B(0,5), should be $8\choose0$ + $8\choose1$ + $8\choose2$ + $8\choose3$ + $8\choose4$ = 163. I believe that the open balls around every code word must be this large. But, 256 / 163 = 1.57, which I believe tells me that we can only fit 1 such open ball into $V^8$, so that there can only be 1 code word. This is obviously wrong, so I assume that there must be something wrong with my counting. However, I still can't come up with anything better. I need to prove this using a counting argument. Can anyone help to correct me?
The counting of points on Hamming balls proves that the maximum is $\le6$, which rules out the possibility of larger linear codes as pointed out by leonbloy. But linearity is not needed. Four is the maximum for non-linear codes as well. The argument is a bit longer but not unduly complicated.
Assume contrariwise that a code with minimum Hamming distance $5$, length $8$ and with at least five words would exist. If $C$ is such a code, then so is $C+x=\{w+x\in V^8\mid w\in C\}$. This is because translation by $x$ is an isometry of the Hamming space. Given that we can w.l.o.g. assume that the all zeros word $0\in C$. This prohibits all vectors of weight between one and four as words of $C$. If there were a vector $y$ of weight seven or eight in $C$, then that vector is at Hamming distance at most four from any vector of weight $>4$. This would mean that we could not add a single vector to the set $\{0^8,y\}$ without violating the minimum distance bound.
Thus we can conclude that in addition to $0^8$ the code $C$ contains only words of weights five or six. It cannot contain two distinct words of weight six for those would be at Hamming distance four from each other.
Two cases remain: A) $C=\{0,z,u_1,u_2,u_3\}$ where $z$ is of weight six and the remaining three have weight five, B) $C=\{0,u_1,u_2,u_3,u_4\}$ with all $u:i$:s of weight five.
The word $z$ of weight six is at distance five from a word $u$ of weight five if and only if $u$ has ones at both those positions, where $z$ has zeros. This applies to all $u_1,u_2,u_3$, so they all must share those two ones. But there are only six positions outside that pair, so at least some of these weight five words share a third common $1$. This means that the distance between those two words would be only four. This contradiction rules out case A.
Case B can be reduced to case A as follows. If we assume that a code fitting case B would exist, then $u_1$ and $u_2$ must share exactly two common $1$s. This means that the code $C+u_1$ will contain the all zero word $u_1+u_1$ and a weight six word $u_1+u_2$. In other words the code $C+u_1$ would fit case A. We just saw that such a code does not exist.