The computer uses base $256$ instead of base $10$. Two adjacent bytes can represent numbers between $0$ and $255 \times 256 + 255 = 65535 = 2^{16}−1$, inclusive.
I understand how a byte is $8$ bits and $0$ is the starting point, so two bytes would have a maximum value of $2^{16}-1$. However, I do not understand how they came up with the $255 \times 256 + 255$ representation. I see it works, but I am not sure how they derived that.
Actually computers use base 2 because the smallest unit (i.e.: the bit) can take 2 different values: either 0 and 1.
Think of base 10 as a group of symbols which can be either [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. It goes the same way with computer and binary base except there are only two symbols [0, 1].
One byte is made of 8 bits. This represents $2^8$ possible combinations of 0s and 1s. Now, you are considering what we call a "word": an element made of two bytes, so 16 bits. This gives a total number of combination of $2^{16}$.
We know that powers time, so
$$\begin{equation}\begin{aligned} 2^{16} - 1&= 2^8 2^8 - 1 \\ &= (2^8 - 1) 2^8 + 2^8 \\ &= 255 \times 256 + 255 \end{aligned}\end{equation} $$
Which matches your statement \o/
The maximal value you can represent depends on the starting point. indeed, as you are starting from 0 and you have 65536 different values, then the maximal value you can represent is $65536-1 = 65535$
EDIT
Note regarding the starting point: let's consider a signed binary numbers (i.e.: a binary number which the most significant bit indicates the sign of the represented value) expressed over 8 bits.
We still have $2^8$ different values but we range from $-2^7$ to $2^7 - 1$. That is, the maximum value you can express on positive side is 127 and the maximum value you can express on the negative side is -128.