Question:
Let $M,P,Q$ be semigroups and $\sigma:M\rightarrow P$, $\rho:M \rightarrow Q$ be morphisms with $\sigma$ surjective. Then $\ker \sigma \subseteq \ker \rho$ if and only if $\rho$ factors through $\sigma$ (that is, there exists a morphism $\phi : P \rightarrow Q$ with $\rho = \phi\sigma$).
My attempt:
$(\Rightarrow)$ Assume that $\ker \sigma \subseteq \ker \rho$. Take $x,y \in \ker \sigma$, then $$\sigma(x) = \sigma(y) \Rightarrow \rho(x) = \rho(y).$$ Since $\sigma$ is surjective, $$ x = \sigma^{-1}(\sigma(y)) \Rightarrow \rho\sigma^{-1}(\sigma(y)) = \rho(y) \Rightarrow \rho\sigma^{-1}\sigma = \rho.$$ Take $\phi = \rho\sigma^{-1}$ as required.
$(\Leftarrow)$ Let $\phi:P \rightarrow Q$ such that $\rho = \phi \sigma$. Take $x,y \in \ker \sigma$. Then, $$ \phi(\sigma(x)) = \phi (\sigma(y)) \Rightarrow \rho(x) = \rho(y).$$ Thus, $x,y \in \ker \rho$.
Please comment on my proof if they are incomplete. Thank you.
You can't consider $\sigma^{-1}$, because in general $\sigma$ has no inverse: it's just surjective, not an isomorphism. Your idea is good, but not expressed in an acceptable way.
The kernel of $\sigma$ is $\{(x,y)\in M\times M:\sigma(x)=\sigma(y)\}$.
The easy direction is proving that the existence of $\phi$ with $\rho=\phi\sigma$ implies $\ker\sigma\subseteq\ker\rho$. Indeed, if $(x,y)\in\ker\sigma$, then $$ \rho(x)=\phi\sigma(x)=\phi\sigma(y)=\rho(y) $$ so $(x,y)\in\ker\rho$.
Suppose now $\ker\sigma\subseteq\ker\rho$. We want to define $\phi\colon P\to Q$; there are not many choices: if $p\in P$, let $x\in M$ such that $\sigma(x)=p$. Since we want $\phi\sigma=\rho$, we must have $$ \phi(p)=\phi\sigma(x)=\rho(x) $$ but we need to check that this doesn't depend on the choice of $x$. If $p=\sigma(y)$, then $\sigma(x)=\sigma(y)$, so $(x,y)\in\ker\sigma$. By assumption, $(x,y)\in\ker\rho$ and therefore $\rho(x)=\rho(y)$.
Thus the definition of $\phi$ is well posed and it remains to see that $\phi$ is a semigroup homomorphism. But this is clear: if $p_1=\sigma(x_1)$ and $p_2=\sigma(x_2)$, then $p_1p_2=\sigma(x_1)\sigma(x_2)=\sigma(x_1x_2)$ and, by definition, $$ \phi(p_1p_2)=\rho(x_1x_2)=\rho(x_1)\rho(x_2)=\phi(p_1)\phi(p_2) $$
Note. If $M$ and $P$ are groups, the kernel of $\sigma$ is usually defined as $K=\{x\in M:\sigma(x)=1\}$. Now $$ \{(x,y)\in M\times M:\sigma(x)=\sigma(y)\}= \{(x,y)\in M\times M:\sigma(xy^{-1}=1\}= \{(x,y)\in M\times M:xy^{-1}\in K\} $$ Thus the two definitions of kernel agree, in the sense that the relation determines the subgroup and conversely. In the case of groups it's easier to use (normal) subgroups instead of relations as the kernel; this is generally not possible even for monoids.