In Ralph Hartley's 1928 paper, "Transmission of Information", he outlines a quantification of information as, $H = log(s^n)$.
H = information
s = number of primary symbols
n = number of selections made
In the beginning of the paper, he gives the intuitive suggestion that the total number of possible sequences given $s$ primary symbols and $n$ selections, $s^n$ may be used as a measure of information. However, he then goes on to suggest that this measure is not suitable and that information should be related to $n$ by a constant that is proportion to $s$. I cannot convince myself the validity of this argument, and I suspect the reason may be a practical one related to some aspect of engineering.
His justification is on page 5 of this document, I'm hoping someone could provide a more accessible explanation. http://www.uni-leipzig.de/~biophy09/Biophysik-Vorlesung_2009-2010_DATA/QUELLEN/LIT/A/B/3/Hartley_1928_transmission_of_information.pdf
Thank you.
Storing/transmitting $n$ symbols $x_1,\ldots x_n$ will clearly take half the memory/time as that of $2n$ symbols $x_1,\ldots,x_n,x_{n+1},\ldots,x_{2n}.$ So information should grow linearly with $n$ while the original measure $s^n$ would grow exponentially.
As for the proportionality constant, if we number the symbols from 1 to $n$ the number of digits we require is $\lceil \log_b s \rceil$ where $b$ is the base of the representation, $b=2,$ for base 2 etc. We just omit the ceiling for simplicity and ease of manipulation.