What's the intuition for the sampling weight to be the inverse of the probability of that unit being selected? Here's an example of such a definition.
I would tend to think that if a unit has a higher chance of being selected, it should have a higher sampling weight. However, the above definition seems to be exactly the opposite. I've searched other references on surveys, and it seems it's the usual definition.
Edit: Assume we have 4 units, with the probabilities of being chosen: $p_1 = 5/10$, $p_2 = 2/10$, $p_3 = 2/10$, and $p_4 = 1/10$.
The weights are $w_1 = 2$, $w_2 = 5$, $w_3 = 5$, $w_4 = 10$. If we add these, we get 22, instead of the original 4. Why would unit 1 only represent 2 units, whereas unit 4 represent 10 units?
The intuition comes from the way the estimator is constructed. We have a population $U$, from which we are drawing a sample $s$ which is a random subset of $U$. For each unit $i \in U$ there's an associated value $y_i$, and we are interested in the total population value $Y = \sum_{i \in U} y_i$. To do that, we're going to construct a weighted linear sum from our sample, $\hat{Y} = \sum_{i \in s} w_i y_i$, where the $w_i$ is some kind of weighting.
Ideally, we want $\hat{Y}$ to "be similar to" $Y$ in some measure, and so let's require that they are equal in expectation, i.e. $\mathbb{E}(\hat{Y}) = Y$, where the expectation is taken over every possible sample that we could have taken. So we could calculate this as $\mathbb{E}(\hat{Y}) = \sum_{s \in S} \hat{Y}|s$, but that's a pretty nasty calculation. Instead, I'm going to introduce a random variable $\delta_i$ for each unit in the population, which is defined as:
$$\delta_i = \begin{cases} 1 & i \in s \\ 0 & i \notin s \end{cases}$$
In other words, $\delta_i$ is an indicator that tells you whether unit $i$ is in the sample $s$ or not. The nice thing is that $\mathbb{E}(\delta_i) = p_i$, i.e. the expected value of $\delta_i$ is exactly the probability that unit $i$ is in the sample, regardless of what other units may or may not be in the sample.
Now using this variable, I'm going to re-write the estimator $\hat{Y} = \sum_{i \in U} w_i y_i \delta_i$. So now instead of summing over just the sample, I'm summing over the whole population, but that's ok because every unit that isn't in the sample is getting its value multiplied by zero. But now when we look at the expectation, we can use the linearity of the expected value to simplify things:
$$\begin{eqnarray} \mathbb{E}(\hat{Y}) & = & \mathbb{E} \left( \sum_{i \in U} w_i y_i \delta_i \right) \\ & = & \sum_{i \in U} \mathbb{E}\left( w_i y_i \delta_i \right) \\ & = & \sum_{i \in U} w_i y_i \mathbb{E}(\delta_i) \\ & = & \sum_{i \in U} w_i y_i p_i \end{eqnarray}$$
The second line works because the expected value of a sum is the sum of the expectations, and the third works because we're looking at the expectation taken over all possible samples, and we're assuming that $w_i$ and $y_i$ are fixed values that don't depend on the specific sample (i.e. we're choosing to weight unit $i$ the same whenever it's selected).
So then in saying that we want our estimator to be equal in expectation to the population total, that means that $\sum_{i \in U} y_i = \sum_{i \in U} w_i p_i y_i$ and since we're just going for an intuitive solution here it's pretty clear that we achieve that by forcing $w_i p_i = 1$, or in other words the unit's weight should be the inverse of its selection probability.
Another way to understand it is to assume that each unit should, on average, contribute exactly its own value to the estimate, so every time it does show up in the sample it needs to contribute an amount that makes up for all the times it doesn't show up.