A common expression for calculating the entropy of a series of bits appears to be:
$$-\sum_{i}{P\left (x_i\right )log_b\left (P \left (x_i\right )\right )}$$
This seems to fail (or my intuition of entropy is simply incorrect) in the case where the bits are highly correlated or patterned, e.g. 0000000011111111 would maximize the value of entropy calculated by this expression. Another example is 0101010101010101.
In both of these cases, the data is highly compressible, but the entropy is high according to the given expression. Is there some definition of entropy that takes this into account? Or am I looking for something else entirely?
There seems to be some misunderstandings in your question. First, when you speak of "the case where the bits are highly correlated" ... are you thinking of a source that produces always (or with high probability) such patterns ? Or are you thinking of particular realizations? In the second case, you must understand that the traditional definition of entropy does not apply to realizations (single events) but to a source that follows a given probability distribution (hence, it does not make sense to speak of the "entropy of a message", only of the "entropy of the source").
If, instead, you are thinking of a source that produces correlated bits, then the definition still applies. But then you must consider the probabilities of the full messages. That is, you should compute the entropy of $X^n=(x_1,x_2 \cdots x_n)$ , which you'd compute as $$H(X^n)=-\sum p(X^n) \log p(X^n)$$ where the sum is over all the possible messages. Then, if you want, you can compute the "rate of entropy" (entropy per simbol) as $H_r=H(X^n)/n$
Only when the bits are independent (and identically distributed) we'd get $H_r = H(x_1)$. This is all explained in any textbook, as well as the original paper of Shannon.
A common model to simplfy the computation of the entropy rate is to assume some Markov chain; in the case of a stationary first order Markov chain (the present depends only on the immediate past, via a fixed transitions matrix of probabilities), then the entropy rate (and hence the joint entropy) can be computed from the transition matrix. To assume $H_r = H(x_1)$ would be equivalent to assume a zero order Markov model - which sometimes might be a good approximation, but sometimes (when the symbols are highly correlated) might not.