I'm trying to grasp the intuition behind the definition of variance. It seems plausible that we want to measure how much a random variable deviates from it's expected value. But why using the square exactly?
From what I can see, we are interested in an assignment of the form $X\mapsto E(f(|E(X)-X|))$ for some strictly monotonous $f$ with $f(0)=0$ and $f(1)=1$. Are there any further properties of the variance from which, if used as axioms, we can derive $f(x)=x^2$?
For example, would additiveness w.r.t. independent random variables, i.e. $$E(f(|E(X+Y)-X-Y|))=E(f(|E(X)-X|))+E(f(|E(Y)-Y|))$$ for $X,Y$ independent, suffice as such an axiom?
Yes, additivity for independent random variables does suffice.
To simplify matters a bit, we may assume $E[X] = 0$ and $E[Y]=0$.
Let's also assume $X$ and $Y$ are bounded, to avoid questions of existence of expected values. Also, since you only ever use $f$ on absolute values of random variables, we may define $f$ to be an even function on $\mathbb R$. I'll also assume $f$ is continuous. Now you want an even function $f$ such that $E[f(X+Y)] = E[f(X)] + E[f(Y)]$ for bounded independent random variables such that $E[X] = E[Y]=0$. By linearity of expectation, this is equivalent to $E[f(X+Y) - f(X) - f(Y)] = 0$.
In particular, for constants $s$ and $t$, consider independent $X$ and $Y$ such that $P(X=s)=P(X=-s)=1/2$ and $P(Y=t)=P(Y=-t)=1/2$. Then $E[f(X)] = (f(s) + f(-s))/2 = f(s)$, $E[f(Y)] = f(t)$ similarly, and $E[f(X+Y)] = (f(s+t) + f(s-t))/2$. Thus our equation becomes
$$ \dfrac{f(s+t) + f(s-t)}{2} - f(s) - f(t) = 0 $$
Note that for $s=t=0$ we get $f(0) = 0$. Now taking $s = k t$ for integers $k$, we can show by induction that $$ f(k t) = k^2 f(t) $$ and thus for rationals $a/b$, $$f\left(\frac{a}{b}\right) = a^2 f\left(\frac1b\right) = \frac{a^2}{b^2} f(1)$$ By continuity, we extend this to reals: $f(x) = x^2 f(1)$. If you assume the normalization $f(1) = 1$, you have $f(x) = x^2$.
I'm pretty sure that, as with the Cauchy functional equation, the assumption of continuity may be replaced by measurability (and we certainly need $f$ to be measurable, else $E[f(X)]$ would be undefined for, say, uniform random variables).