Trouble analysing data set in R

Question

Trouble analysing data set in R

44 Views Asked by Bumbble Comm At 31 Mar 2026 - 12:40

Hi I am having some trouble with the following;

So I have some data set, it contains an outcome of satisfaction , it also contains four predictors, 3 continuous , age, weight, height, and one factor predictor, either graduated high school yes or no.

So In R, I have uploaded the data set, and set $X1$ for age, $X2$ for weight , $X3$ for the factor and $X4$ for height.

I want to know if there is evidence that graduating high school has an effect on satisfactions.

But here are some things: I know that I can not simply look at lm(y~x3), because I need to consider all the other possibilities. So how do I take all of these into account? How many models must I check? What is the general approach to this?

I can do lm on diffirent models for example the full model, or the model just excluding x3. Do I just need to look for when $R^{2}$ values change?

Also, would I need to consider any and all possible interactions? Any advice/general guidelines for this?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This problem can be decomposed into several pieces:

Make a hypothesis about what independent variable will highly affect the outcome of satisfaction, can you confidently include the these four independent variable. If the answer is yes, you can try a full model with four X's. If you're not quite confident, you can use Likelihood ratio test to test different model with different variables.
Are they linearly correlated? If the answer is yes(or you don't have any further information, for brevity, only use linear model), you can try the basic generalized linear model. In R, you can use "glm" to fit the model: $$Y=\beta_1 * X_1 +\beta_2 * X_2 + \beta_3 * X_3 + \beta_4 * X_4 + \epsilon$$
Goodness of fit, check $R^2$, t-test for each coefficient, F-test for whole model.
Build confidence interval for coefficients. And interpret your results.
If you're not satisfied with your model or forecasting power, there are two directions you can try.
- add more interactions between independent variables.
- try nonlinear model

In general, you can try all possible models if you want, but keep in mind, $R^2$ is not the only thing you should look at. You need more reasonable model rather than a high $R^2$ garbage.

Trouble analysing data set in R

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MATHEMATICAL-MODELING

Trending Questions

Popular # Hahtags

Popular Questions