If I have a arbitrarily long random binary number with the condition that the probability that a given digit is 0 and 1 is 1/4 and 3/4, respectively. What is the best way to compress this into a binary number, i.e. an algorithm with the smallest expected length of the compressed binary number.
Additionally, what is the information-theoretic bound, and is it achievable.
So you have a Bernoulli$(1/4)$ source (which we presume to be i.i.d), which has entropy $-\frac{1}{4} \log_2 \frac{1}{4} - \frac{3}{4} \log_2 \frac{3}{4}$ which is a lower bound on bits per symbol for any compression scheme.
Now, if you use a universal source coder, such as arithmetic coding which maps infinite source sequences to incompressible binary sequences, you will get within at most 2 bits for any block length.
The Lempel-Ziv algorithms (also universal source codes; the basic variants are sliding window and tree structured) are known to have their asymptotic compression rate approach the entropy rate of the source (asymptotically) if the source is stationary and ergodic - that is, if the amount of symbols you're coding goes to infinity, you'll approach the entropy rate of the source.
For more details (and proofs), see Chapter 13 of Cover and Thomas' Elements of Information Theory 2e.