I have the following dataset with discrete, boolean, and categorical variables :
'data.frame': 133778 observations of 23 variables:
$ id : num 10 100 1000 1000157 1000183 ...
$ age : int 30 3 48 52 32 32 52 28 40 36 ...
$ bikeAvailability : chr "FOR_SOME" "NO" "FOR_SOME" "FOR_ALL" ...
$ employed : chr "true" "false" "true" "true" ...
$ hasLicense : chr "no" "no" "yes" "yes" ...
$ ptHasGA : chr "true" "true" "false" "false" ...
$ sex : chr "f" "f" "m" "f" ...
I also have a subset of this data set (~30'000 observations) with people who chose to travel by car instead of another mode of transport. I want to analyze if any of the variables have a significant impact on the behavioral choice to take the car.
I have read this, which sais linear regression is the wrong approach for binary outcomes (which is in my case taking the car or not).
What would be the appropriate approach or test for this type of analysis? Thank you very much in advance!
This is often done using logistic regression (or its normal distribution version: probit). You can look it up yourself for more details, but the main idea is that you try to find how the regressors affect the probability that the outcome is 1.