Steps to create a multivariate regression model for yes/no outcome

40 Views Asked by At

A colleague recently presented a piece of work they had done to create a model for probability of a customer responding to a letter, they explained that it was a multivariate regression model, and that the model gave them co-efficient values they could plug into an equation, to generate the probability of response, along the lines of:

Probability = (var1 * coef1) + (var2 *coef2) + ...

I have a similar set of data that I'd like to be able to create a model for, I have approx 5k "yes" outcomes, and 20k "no" outcomes, and maybe 3k individual data points for each customer.

I've tried googling, but so far articles I've come across are either too broad to begin to start (the multivariate statistics wiki page), or don't expand on the tool used to create the model (this tutorial, for instance).

My current (limited) understanding is that I would need to do the following steps:

  1. Gather/clean/prepare all data
  2. Pick out all of the normally distributed independent variables
  3. Determine how much, if at all these affect the dependent variable
  4. Take the top 5/6 variables and use these to create the model

I think one of the areas I'm struggling with are that all of the examples online seem to be for a continuous outcome (i.e. price), rather than a yes/no outcome.

Can someone please help by expanding upon these steps, ideally with an example/tutorial/link to the relevant topic?

I have the following tools available to help with the analysis:

Excel (high level familiarity - comfortable with VBA/advanced functions etc.) SAS (enterprise guide & miner, entry level familiarity - query builder, some SQL etc.) Handful of statistics modules from BSc Biology (Chi square, ANOVA etc.)

I don't have access to python/R in the same network as the data.