Statistics linear modelling in R

32 Views Asked by At

Suppose I have a date set of the form:

Test Subject/Sex/No. of mistakes made in the morning/ No. of mistakes made in the afternoon

A / M / 2 / 5

B / F / 1 / 4

C / M / 3 / 5

D / F / 1 / 5

Suppose that I want to model No. of mistakes as a linear function of the categorical variables morning/ afternoon and of sex, how can I do this using the lm() function in R? The only thing I can think of is to edit the data frame with so that the new columns are: No. of mistakes / Morning or afternoon/ Sex/ Subject

Then we will have 8 row entries of data in this new data frame. However, here we are not grouping the same morning/ afternoon exercise together if they came from the same subject- so as the test subject is not part of the linear model we are losing some information (introducing increased variance, etc.).

Could someone please explain a) how to use R here with the lm function b)if we do indeed choose not to account for the factor that the afternoon/ morning mistakes are actually grouping with pairs (when coming from the same test subject) when creating the linear model, then how is this a fair way to model? My guessing would be that as the data is from a complete randomised design which is balanced, this effect is randomised and so does not add any bias to the data. Could anyone verify/ correct me on this- perhaps with a more formal argument?

Thanks.