A vector space $V$ over a skew field $D$ is an abelian group on which $D$ acts as automorphism from the left and such that $(a+b)v = av + bv$ for each $a,b \in D$ and $v \in V$. Suppose $V$ is two-dimensional. Define the group $$ GL(2, V) = \left\{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \mid ad - bc \ne 0\right\}. $$ Then $GL(2, V)$ acts on the one-dimensional subspaces $D\cdot v$ for some $v \ne 0$ and the kernel of this action is $$ \left\{ \begin{pmatrix} \lambda & 0 \\ 0 & \lambda \end{pmatrix} \mid \lambda \in D \setminus \{0\} \right\}. $$ In general this subgroup does not has to be the center of $GL(2,V)$ as $D$ is not commutative, like we have in the case of fields. The quotient of the action of $GL(2,V)$ by its kernel is called the projective linear group $PGL(2,V)$.
Now select for each line $Dv$ with $v = (v_1, v_2)$ and $v_2 \ne 0$ the vector $(v_1 v_2^{-1},1)$ as representative and identify it with the point $v_1 v_2^{-1}$. If $v_2 = 0$ we identify the line $Dv = D(1,0)$ by the new point $\infty$.
Now the map from $GL(2,V)$ to the mapping $$ x \mapsto (ax + b)(cx + d)^{-1} $$ is a homomorphism with kernel the same as the kernel of the above action. Hence we can identify $PGL(2, V)$ as an abstract group with the set of those mapping. And by setting $\frac{a\cdot \infty + b}{c\cdot \infty + d} = a/c$ and division by $0$ gives $\infty$ and division by $\infty$ gives $0$ in the other cases, the action of $PGL(2, V)$ under this represention of the points $D \cup \{\infty\}$ is the same as the action on the chosen representatives from the lines, i.e., we can identify both actions.
But as I see it, all of the above reasoning would apply if we identify $GL(2,V)$ with the mappings $$ x \mapsto (cx + d)^{-1} (ax + b). $$
But this should not work out, for if we determine the three-point-stabilizer of $0,1$ and $\infty$, we find that we must have $a = d \ne 0$, which under the first identification gives the mappings $x \mapsto axa^{-1}$, i.e., conjugation in the multiplicative group of $D$, but for the second identifcation it would give $x \mapsto a^{-1}a x = x$, just the identity, which gives a trivial three-point-stabilizer.
But where exactly is the second identification different from the first? Surely my last argument gives that they are different, but what make me wonder is that both constructions seem to work out perfectly similar, but that should not be the case, for then both actions would indeed be equivalent? So what am I missing?
The linear transformation $\begin{pmatrix}a&b\\c&d\end{pmatrix}$ acts in $V^2$ as
$$(v_1,v_2) \mapsto (a v_1+b v_2, c v_1+d v_2).$$
If $c v_1+d v_2$ is nonzero, we can postmultiply both entries by by $(c v_1+d v_2)^{-1}$ giving a vector in the same line
$$((a v_1+b v_2)(c v_1+d v_2)^{-1}, 1).$$
Moreover, if $v_2$ is nonzero, setting $v= v_1 v_2^{-1}$ as in your question we have
$$((a v_1+b v_2)(c v_1+d v_2)^{-1}, 1) = ((a v_1+b v_2)(v_2^{-1} v_2)(c v_1+d v_2)^{-1}, 1) =$$ $$=((a v_1+b v_2)(v_2^{-1}) ((c v_1+d v_2)(v_2^{-1}))^{-1}, 1) =$$ $$= ((a v_1 v_2^{-1}+b) (c v_1 v_2^{-1}+d)^{-1} ,1) = ((av + b) (cv + d)^{-1} ,1),$$
where we have used associativity and the fact that $y^{-1}x^{-1} = (xy)^{-1}$. This induces a mapping $v \mapsto (av + b) (cv + d)^{-1}$ between representatives (in the cases where $c v_1+d v_2$ or $v_2$ are zero, we recover the rules for dealing with $\infty$ that you mention in your post). Hence an action of $PGL(2,V)$ as linear fractional transformations on the projective line is uniquely induced from the usual action of $GL(2,V)$ on $V^2$. From this point of view the form of these transformations becomes a bit clearer.
As for your proposed construction $v \mapsto (cv + d)^{-1} (av + b)$, note that due to commutativity issues the mappings don't compose as they should. For example in the quaternions,
$$\begin{pmatrix}j&0\\0&i\end{pmatrix} \begin{pmatrix}i&0\\0&j\end{pmatrix} = \begin{pmatrix}-k&0\\0&k\end{pmatrix}$$
but
$$i^{-1} j (j^{-1} i x) = x \neq -x = k^{-1}(-k x).$$
So one doesn't even get a well-defined homomorphism (respecting composition) from $GL(2,V)$ to these functions.
However, if instead one considers a right vector space, fractional linear transformations would take the form $(xc+d)^{-1}(xa+b)$, similar to what you proposed.