Especially in machine learning one often reads the phrase "to marginalise out" something, and while I understand that this means to integrate over a property, I cannot quite grasp the larger significance.
For example $z$ is unobserved, so its needs to be "marginalised out" to compute the likelihood of the parameters. I.e. $p_{\theta}(y) = \int p_{\theta}(y,z)dz$. Where $y$ is an observed variable, and $\theta$ parametrised the joint distribution.
I suppose I am simply wondering why we need to marginalise out something at all, and why we call it marginalisation? Are we trying to change the dependency of the problem on as few variables as possible?
Because you are computing a marginal distribution, that is, a marginal density. Say the density of $(X,Z)$ is $f(x,z)$ then, integrating over $z$ "marginalizing out $z$" gives $$ \int f(x,z)\; dz = f(x) $$ where $f(x)$ is the marginal density of $X$.
The terminology comes from tables. If $N(x,y)$ is a two-way table of counts, say, then you find the marginal tables (the ones you find printed in "the margin" in the usual sense) by summing over one of the indices, $x$ or $y$.