(Somewhat inspired by this website, particularly Section III. Also, I might be using a different definition of entropy than usual; what I am using is closest to the physics definition (the one I encountered first) of the amount of disorder in a system.)
Consider the following two "random" 256-bit strings:
1110001011010010101000001111001100001100011111000111011011101000000000000001111110010110010100011101010010111110000010010101001001101100111110011000000110111111000111101111000011010100001001100010010010011000000011101110000000110001101100000110111001100011
...created via RNG, and:
1001001001010010001001010010110100101100100101110001001010110100101001001010010010000110001000011010000010101010001001100110100010001010101010101100010010010011100100010010111010001011010101001011101000101010001101010100100101010010101010010010010101100101
...created by "randomly" hammering away on the 1 and 0 keys on the keyboard. To the layperson, the second string "looks" more random than the first, partly because the first has longer runs, including one of length 14. However, the string
1010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
...would be considered relatively non-random (even more so than the first string), as although it has only runs of length 1, when elements are grouped into pairs, the pairs effectively become a run of length 128 if each pair is considered to be one unit. My question is, is there a formal measure of bitwise entropy that takes into account these factors?
My first thought towards a potential answer to this question would be something along these lines:
$$N-\sum_{\ell=1}^{20 \text{ (or so)}} {{\sum_{k=1}^{\text{(# of runs)}} (\text{length of $k$th run})^2}\over \ell}$$
where $N$ is the maximum value of the rest of the expression (so that the minimum value of the whole thing is zero). This takes into account the fact that longer runs are worse, as well as the reducing importance of shorter runs as the group length gets longer.
You're using the term entropy in reference to some given finite string, whereas entropy is a function defined on probability distributions. One way of reconciling this is to suppose that the relevant probability distributions are the actual ("empirical") distributions of substrings of various lengths present in the given string. The entropies of these distributions then offer a measure of the "disorder", "mixed-up-ness", "diversity", etc., in the string. (This is unlike the usual focus on source entropy, as here the "disorder" of direct interest is supposed to be in the string itself.)
The collection of all length-$k$ substrings that occur in a given string $x$ forms a multiset whose multiplicities can be used to determine a corresponding probability distribution. For each $k$, the corresponding distribution then has entropy $H^{(k)}$, called a block entropy because it concerns a distribution on subwords (which are blocks of contiguous bits). These block entropies $\big(H^{(k)}, k\in 0..|x|\big)$ provide a kind of "entropy profile" for the given string, which can serve to quantify the kind of intuitions you're expressing.
More precisely, here's the general principle:
Now, given the string $x = x_1 .. x_n$, define the length-$k$ blocks $$B_i^{(k)} = x_i .. x_{i+k-1}, \ \ i\in 1..{(n-k+1)}.$$ For each block length $k$, these blocks form a multiset $$M^{(k)} =\big \{B_i^{(k)}: i\in 1..{(n-k+1)}\big\},$$ with corresponding block entropy $H^{(k)}$ given by the above formula. The "block entropy profile" for the given string $x$ is then $\big(H^{(k)}, k\in 0..|x|\big)$, where $|x|$ denotes the length of $x$.
Here's a picture of the block entropy profiles (up to block length $16$) for the three bitstrings you posted ($1$ is from your RNG, $2$ is from keyboard input, $4$ is patterned) plus three others: $3$ is from biased coin tossing with $p=0.8$, while the two others have various patterns, and all have length 256:
The dotted line is the ideal maximum-entropy profile (which is in fact attained by De Bruijn sequences): $H^{(k)} = \min(k,\log_2(|x|-k+1)$.
For reference, here are the first 100 bits of each string:
Finally, here is a picture of the profiles over the whole range of block lengths, showing how all of the profiles eventually merge with the ideal maximum-entropy profile:
You must consider for yourself whether these profiles appropriately discriminate among the various strings.
(This approach is quite general. For an example of "entropy profiling" applied to 2D images, see this posting, where it's $k\times k$ subimages that form the multisets.)
EDIT: For comparison, here are the average profile entropies and the results of compressing each string using the zlib compressor (higher entropy should correspond to lower compressibility):
\begin{array}{|c|c|} string\ label & \frac{1}{n+1}\sum_{k=0}^nH^{(k)} & uncompressed\ length \rightarrow compressed\ length\\ \hline 1&6.67&256 \rightarrow 78 \\ 2&6.50&256 \rightarrow 66 \\ 3&6.18&256 \rightarrow 64 \\ 4&4.37&256 \rightarrow 24 \\ 5&3.78&256 \rightarrow 19 \\ 6&0.97&256 \rightarrow 13 \end{array}