Condition logit model: Weighted mean of ratios of coefficients of subgroups does NOT equal the ratios of coefficients of the whole sample, why?

22 Views Asked by At

I am well aware that when one splits the sample into subgroups (e.g. sex, country, whatever), and then estimates any logistic regression model, the coefficients are not comparable between (otherwise identical) models for different subgroups of the sample. This is the unobserved heterogeneity problem that is very well discussed in e.g. Mood (2010).

However, the ratios the beta coefficients, to my understanding, should not be affected by this phenomenon. For example, if I divide a data set into, say, two halves (fh and sf), and therefore I get three logistic regression models with three sets of betas, it should hold that: $$ \beta_{1_{total}} / \beta_{2_{total}} = (\beta_{1_{fh}} / \beta_{2_{fh}} + \beta_{1_{sh}} / \beta_{2_{sh}}) / 2 $$

Shouldn't it?

I am applying this to a discrete choice model, which I have estimated using using conditional logit model. The ratios of logarithms of odds ratios (i.e. betas) can be used to measure, for example, willingness-to-pay:

$$ -\beta_{any\ other\ attribute} / \beta_{price} $$ ...i.e. how much quantity of any other attribute one is willing to trade for one unit of price. In addition, Train (2009, p. 41) discusses the issue of unobserved heterogeneity in his book, and how it should not affect the willingness-to-pay (this is explicitly mentioned):

"Willingness to pay, values of time, and other measures of marginal rates of substitution are not affected by the scale parameter. Only the interpretation of the magnitudes of all coefficients is affected."

Given this, I have tried to estimate a conditional logit model using clogit R function from survival package (Therneau, 2020), and syn.res1 data set from support CEs R package (Aizaki, 2020). Subsequently, I have estimated willingness-to-pay with mwtp function from the same support CEs R package. The R code (this is modified version of a piece of code that can be found in the support CEs manual) is as follows and technically it seem to work as it should:

library(survival)
library(stats)
if(getRversion() >= "3.6.0") RNGkind(sample.kind = "Rounding")
# Case 1
# Choice experiments using the function rotaion.design.
# See "Details" for the data set syn.res1.
des1 <- rotation.design(
  attribute.names = list(
    Region = c("Reg_A", "Reg_B", "Reg_C"),
    Eco = c("Conv.", "More", "Most"),
    Price = c("1", "1.1", "1.2")),
  nalternatives = 2,
  nblocks = 1,
  row.renames = FALSE,
  randomize = TRUE,
  seed = 987)
des1
questionnaire(choice.experiment.design = des1)
desmat1 <- make.design.matrix(
  choice.experiment.design = des1,
  optout = TRUE,
  categorical.attributes = c("Region", "Eco"),
  continuous.attributes = c("Price"),
  unlabeled = TRUE)
data(syn.res1)
dataset1 <- make.dataset(
  respondent.dataset = syn.res1,
  choice.indicators =
    c("q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9"),
  design.matrix = desmat1)
clogout1 <- clogit(RES ~ ASC + Reg_B + Reg_C + More + Most +
                     More:F + Most:F + Price + strata(STR), data = dataset1)
clogout1
gofm(clogout1)
mwtptable <- mwtp(
  output = clogout1,
  monetary.variables = c("Price"),
  nonmonetary.variables =
    c("Reg_B", "Reg_C", "More", "Most", "More:F", "Most:F"),
  seed = 987)

dataset1firsthalf <- dataset1[1:(nrow(dataset1) / 2),]
dataset1secondhalf <- dataset1[(nrow(dataset1) / 2 + 1):nrow(dataset1),]

clogout1fh <- clogit(RES ~ ASC + Reg_B + Reg_C + More + Most +
                     More:F + Most:F + Price + strata(STR), data = dataset1firsthalf)
clogout1fh
gofm(clogout1fh)
mwtptablefh <- mwtp(
  output = clogout1fh,
  monetary.variables = c("Price"),
  nonmonetary.variables =
    c("Reg_B", "Reg_C", "More", "Most", "More:F", "Most:F"),
  seed = 987)

clogout1sh <- clogit(RES ~ ASC + Reg_B + Reg_C + More + Most +
                       More:F + Most:F + Price + strata(STR), data = dataset1secondhalf)
clogout1sh
gofm(clogout1sh)
mwtptablesh <- mwtp(
  output = clogout1sh,
  monetary.variables = c("Price"),
  nonmonetary.variables =
    c("Reg_B", "Reg_C", "More", "Most", "More:F", "Most:F"),
  seed = 987)

However, the result is something else than two equivalent vectors:

> colMeans(mwtptable[["mwtps"]])
       Reg_B        Reg_C         More         Most       More:F       Most:F 
-0.126990720 -0.075244675  0.127596931  0.170096365 -0.007301657  0.005807953

> 0.5*(colMeans(mwtptablefh[["mwtps"]])+colMeans(mwtptablesh[["mwtps"]]))
       Reg_B        Reg_C         More         Most       More:F       Most:F 
-0.130048200 -0.079639400  0.133030022  0.184700100 -0.008986277 -0.003676807

Some of the willingness-to-pay values are not so different, but some are. Have I understood something wrong about the coefficients (or their ratios) of logistic regression models? Could this even be a software or algorithm related problem, and the mathematics should hold? What's going on?

Sources:
Aizaki H. (2012). Basic Functions for Supporting an Implementation of Choice Experiments in R. Journal of Statistical Software, 50(C2), 1-24. URL https://doi.org/10.18637/jss.v050.c02.

Mood C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European sociological review, 26(1), 67-82.

Therneau T. (2020) A Package for Survival Analysis in R. R package version 3.2-7, <URL: https://CRAN.R-project.org/package=survival>.

Train, K. (2009). Discrete choice methods with simulation. Cambridge University Press. (Available on author's website: https://eml.berkeley.edu/books/choice2.html)