I want to check if messages received by an account come from men more than women.
I took a sample of all the messages, and checked to see if I could figure out each of the messages the perceived gender of the sender.
So I have a total of x messages from men, y contributions from women and z from other (nonbinary people, accounts representing organisations, etc).
I can't remember or work out what statistical test I should be using to test my belief that men are more likely to send messages than women - tests I keep finding don't take into account that the third group z makes up some of the population so it's not an either/or.
If someone could point me in the right direction it'd be really helpful.
Thanks!
With hypothesis testing, you need to start with a this hypothesis, and then conduct a sample of the random variable. Working under the assumption that the hypothesis is true, you then determine the probability of obtaining a sample which deviates from the mean at least as much as our sample does. If that probability is under a certain pre-agreed threshold, then we decide that the hypothesis is more likely to be true than the null hypothesis, and thus accept it, rejecting the null hypothesis. Common values for the threshold probability include 5%, 1%, 0.27% (the $3\sigma$ criterion), and 0.00006% (the $5\sigma$ criterion). Since you don't have a model for the probability distribution of your random variable, nor a current knowledge of the distribution's mean, you can't determine whether a given sample you make passes a probability threshold or not, and so you can't use hypothesis testing.
Answering your question can be done by simply obtaining a sample and computing the sample mean. You can use a random variable $X$ given by
$$ X(\omega) = \begin{cases} 1, &\omega \text{ is a message from a man,} \\ -1, &\omega \text{ is a message from a woman,} \\ 0, &\text{otherwise.} \end{cases} $$
Then you want to know what the mean (or expected value) $E(X)$ is, which we can estimate with a sample mean $\overline{x}$. This is just the familiar arithmetic mean of the samples taken, i.e.
$$\overline{x}:= \frac{x_1 + x_2 + \cdots + x_n}{n},$$
where your $n$ samples are denoted $x_i$. If $\overline{x} > 0$, then more of your messages are from men.