These are 3 questions which are more or less related.
The Little Picard Theorem states that if the image of an entire function $f$ omits more than 1 point, then it is necessarily a constant. The proof proceeds by contradiction in assuming that $f(z)$ omits 2 points (wlog $0$ and $1$). Now I guess the observation is that since $$f: \mathbb{C} \rightarrow \mathbb{C} \setminus \{0,1\}$$ and also $$\lambda: \mathbb{H} \rightarrow \mathbb{C}\setminus \{0,1\}$$ where $\lambda$ is the modular lambda function, we (basically) can construct a map $$g=\lambda^{-1} \circ f: \mathbb{C} \rightarrow \mathbb{H}.$$ From there, it is plain that $h=e^{ig}$ is bounded and therefore constant, by Liouville.
Now obviously I left out the intricacies. For example do I have to constrain $\lambda$ on the fundamental region as only there $\lambda$ is bijective?
What is the problem in using the function $j$ instead? Afterall they are just related, right?
Now in essence the proof shows that a function that omits 2 points is a constant, but how does this imply that an entire function that omits any more than 1 point is a constant? What if $3,4,5,...$ points are missed? What if an entire closed subset of $\mathbb{C}$ is omitted?
You are assuming that $f$ maps into $\mathbb{C}-\{0,1\}$. This does not mean that $f$ maps onto $\mathbb{C}-\{0,1\}$. Indeed $f$ could be missing many many values, but at least two values are known and missing.
However, $\lambda$ is constructed to be 1-to-1, onto and most importantly, invertible. Any $\lambda$ with these properties would work in the proof of Picard's theorem.
The heart of Picard's theorem is the fact that $\lambda$ (and many other entire functions like it) exist. Because of the existence of these functions, we are able to constrain every entire function to posses these types of properties.