A webpage has users, where each user has a number of projects uniquely assigned to him or her. I want a random sample of users by randomly sampling projects and then taking the users connected to this project into the sample. I am not interested in users without any projects. This approach will, however, create a biased sample of users, since users with more projects are more likely to be selected. Can I obtain an unbiased sample by selecting users from the biased sample into the unbiased sample with probability $$p_i = \frac{\sum_j c_j }{ c_i \sum_k (\frac{1}{c_k} \sum_j c_j )}$$ , where $c_i$ is the number of projects of user $i$? Subsequently, $\sum_j c_j$ is the sum of all projects for the users in the biased sample. The overall formula is supposed to create a probability that cancels out the original bias, at least for those chosen in the biased sample.
2026-03-29 19:07:44.1774811264
Biased sample from biased sample
93 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in STATISTICS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Fisher information of sufficient statistic
- Solving Equation with Euler's Number
- derive the expectation of exponential function $e^{-\left\Vert \mathbf{x} - V\mathbf{x}+\mathbf{a}\right\Vert^2}$ or its upper bound
- Determine the marginal distributions of $(T_1, T_2)$
- KL divergence between two multivariate Bernoulli distribution
- Given random variables $(T_1,T_2)$. Show that $T_1$ and $T_2$ are independent and exponentially distributed if..
- Probability of tossing marbles,covariance
Related Questions in SAMPLING
- Defintion Ideally sampled image
- What is expected value of a sample mean?
- Why does k-fold cross validation generate an MSE estimator that has higher bias, but lower variance then leave-one-out cross-validation?
- Sampling Question
- Limit of chi square distribution.
- Sampling Distribution of Difference of Normal Random Variables
- Sampling Distribution and Chi Squared Random Variables
- Variance of $S^2$ taken from Normal Distribution
- Sample Variance Definition Disparity
- [data generating process]-[sampling from an infinite population]-[i.i.d.]: some clarifications
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Yes, you can do this with importance sampling if you just want to correct your estimate of a population statistic (e.g., mean, stdev, etc). However, if you want an actual random sample, then here's one possible approach:
Let $c_i$ be the number of projects assigned to person $i$ and $N$ be the total number of people with at least one project. Now, if you are uniformly sampling projects, then the probably that you will pull person $i$ into your sample is:
$$p_i = \frac{c_i}{M}$$
Where $M$ is the total number of projects. When you say you want a random sample over people, then you want a sampling scheme such that the new probability that person $i$ is in our sample, $\hat{p}_i$, is given by:
$$\hat{p}_i = \frac{1}{N}$$
Now, you don't actually have direct control over the probability of picking a person, but you do have control over the probability of picking a project. Unfortunately, it is not generally true that you can find a distribution over projects that will achieve the above goal (think of that rare person who has worked on all the projects!).
However, we can devise an iterative sampling technique that will get us there. The strategy will be to formulate a sequence of optimization problems, where the solution to each will be in the form of a probability distribution over projects and conditional distributions over people. Each iteration involves only a single sample, where we select a project then select a person who worked on that project. Thus, we will increase our sample size by one each iteration.
Let $a_{ij}=0$ if person $i$ did not work on project $j$. Otherwise, $a_{ij} \in [0,1]$. Additionally, let $q_j$ be the probability of selecting project $j$. Under this formulation, the probability that we choose person $i$ is given by:
$$P(\mathrm{Choose\;person\;i})=\sum_{j=1}^M q_ja_{ij}$$
The $a_{ij}$ define an $N\times M$ matrix, $\mathbf{A}$, and the $q_i$ define an $M$-dimensional vector $\mathbf{q}$. What we want is to find $\mathbf{A,q}$ such that:
$$\mathbf{Aq}=\frac{1}{N}\mathbf{1}$$
With constraints:
$$\sum_{j=1}^M a_{ij} =1, \;\; i \in 1...N$$ $$\sum_{j=1}^M q_j = 1$$ $$a_{ij},q_j\geq 0$$
In general, this will be an underdetermined problem. We can attack it using quadratic programming. We will define a quadratic objective function that represents the sum of squared differences between the probability of selecting person $i$ under some feasible assignment of $A,q$ and the target probability of $1/N$.
$$\arg \max_{a_{ij},q_j}\sum_{i=1}^N \left(\langle \mathbf{a}_{i\cdot},\mathbf{q}\rangle-\frac{1}{N}\right)^2$$
$$s.t.$$
$$\sum_{j=1}^M a_{ij} =1, \;\; i \in 1...N$$
$$\sum_{j=1}^M q_j = 1$$
$$a_{ij},q_j\geq 0$$
Where I've used angle-brackets $\langle a,b \rangle$ to denote the dot product of vectors $a,b$.
Except in special circumstances, there will be multiple optima to choose from, but any of them will do.
Using the above machinery, we can perform the following steps:
The reason this works is because we've designed the sampling sequence to be exchangeable. At each step, each person has the same probability (albeit increasing) of being selected, so for a sample size of $K$, every possible ordering of $K$ people from the available set of $N$ people is equally likely. So each sample has the same probability and you have a random sample.
Sooo...tedious but doable (at least in principle).