This problem has stumped me for a while, so I'd love to get some fresh eyes on it.
Problem
Suppose an online retail provider served a large number of users ($N$) in a given time-window. Each user was exposed to different products and/or services, which they have tracked and so know. The retail provider has a certain subset of inventory items for which they are trying to increase (collectively) the conversion rate from seeing to purchasing these items. Assuming the retailer has a predetermined weight $w_i$ for each item, how would you estimate the effect of various treatments on inventory user-level conversion rates?
The metric they are targeting is: $$ C = \sum_i w_i \frac{\textrm{# users purchased}}{\textrm{# users seen}}$$
Calculating the expected value is straightforward given empirical observations, but I have had a lot more trouble estimating variances.
Solution 1: Permutation Test
The most straightforward way to measure and evaluate the significance of treatment effects on this metric is a permutation test. Once $N$ (the number of users) gets large enough, however, this becomes impractical.
Solution 2: Analytical Approximation?
There are several approaches that one can take here. You can try to find an analytical approximation to the permutation test, or you can try to estimate the the metric and its variance for each treatment group, and then compare between treatments using standard z/t/etc tests. I found some papers on the former approach, but ultimately the mathematics there was very complicated, and so I wanted to try the other approach to see if that bore fruit before returning. I made some progress, but couldn't get the results to agree with the permutation test, and so got bogged. In what follows I write down my best solution, and hope you will be able to help improve it!
$$ \begin{align} C =& \sum_i w_i \frac{\textrm{# users purchased}}{\textrm{# users seen}}\\ =& \sum_i w_i X_i / n_i \end{align} $$ where $X_i \sim B(N_i, p_i)$, $N_i$ is the number of users who saw item $i$, and $p_i$ is the empirical probability of users converting for item $i$. Note that we have explicitly set the denominator to match observed values, and do not consider it to be a random variable.
From this is naturally follows that $ \mathbb{E}[C] = \sum_i w_i p_i $. Variance is trickier.
$$ \textrm{Var}(C) = \sum_{i,j} \frac{w_i w_j}{n_i n_j} \textrm{Cov} (X_i, X_j) $$
This is not straightforward to compute because the $X_i$ and $X_j$ have different domains (different sets of users may not have seen the same items). Perhaps this means that we should abort this particular derivation, and try elsewhere, and in particular perhaps we should relax the simplifying assumption that $N_i$ is an empirically observed variable, but then the mathematics becomes much more difficult.
One attempt to push through this was to restrict the covariance calculations to the common domain, and then weight the covariance with $n_{ij}^2/n_i/n_j$ (where $n_{ij}$ is the number of users who saw both item $i$ and $j$). This seems to give somewhat sensible measure of variance, but it isn't very well motivated, and does not allow for a straightforward way to compute the variance of the mean, which would be necessary to do somewhat standard z/t-tests.
Any ideas or pointers?