From reading this resource, the writer wrote:
A linear model does not output probabilities, but it treats the classes as numbers (0 and 1) and fits the best hyperplane (for a single feature, it is a line) that minimizes the distances between the points and the hyperplane.
That's fine, but why then, does a sigmoid get the privilege of outputting probabilities? In my mind they're both continuous functions in $\mathbb{R}$. Is it just because sigmoids output $\mathbb{R} \to [0,1]$? Does a function having values restricted to $[0,1]$ allow one to say the function outputs probabilities?
Basically, yes. But speaking more strictly - the advantage of the logistic model is that it models the odds as a linear function of the covariates. That is, logistic regression is not unique in any sense. There are many functions whose image is $[0,1]$, however the logistic model allows you to model the odds and the odds ratio elegantly. Formally, you assume that your observations $Y_1,..,Y_n$ are generated with some Bernoulli r.v. with probability $p$. Such that $p$ depends on a set of covariates $X$, i.e., $p = p(X)$. A common model for $p(X)$ is the logistic regression $$ P(Y=1|X=x)=p(x) = \frac{1}{1+\exp\{-\beta^Tx\}}, $$ thus $$ \ln\left( \frac{p(x)}{1-p(x)} \right) = \beta^Tx $$