How can I perform linear regression for sports?

1.3k Views Asked by At

I was told by a professor that in order to do regression, I need to have "fixed treatments," as in all of the $X_i$'s should have fixed levels. How can I do such a thing for something as volatile as sports? For example, suppose we tried modeling whether turnovers influence wins in an NFL season. How do I "assign" treatments of turnovers to a given team so that I may perform the analysis?

1

There are 1 best solutions below

5
On

Your question is not sufficiently specific. That may be what your prof is trying to tell you. What is your objective? To predict final league standing from early-season information? To model how various factors affect win/loss ratio? Etc.?

In a simple linear regression you have the model $Y_i = \beta_0 + \beta_1x_i + e_i,$ where you have data pairs $(Y_i, x_i),$ the $e_i$ are normally distributed errors, and you seek to estimate coefficients $\beta_0$ and $\beta_1$ from the data.

You are assuming a linear relationship between the $x_i$ and the $Y_i$ and you are hoping to approximate the correct line by using data. What then?

As a very simple example, maybe you have data on flight times (gate departure to landing) $Y_i$ and distances $x_i$ for major US airline nonstop flights on wide-body jets.

A prediction problem might be that you know the distance of your flight and want to estimate the flight time (presumably, you have lost the link to the online schedule). For a 1000-mile flight you might find that the flight time is about $2.5 \pm .25$ hours.

A modeling problem might be that you want to understand about how fast commercial planes fly and how much of the flight time is actually used in taxiing from the gate to the runway and waiting in the queue for takeoff. You might find that that $\beta_0 \approx 0.65$ hours (40 min) of taxi/queue time, and that $\beta_1 = 0.0019$ hours/mile, for a cruising speed of about 525 mph (not so fast that a sudden dive would likely reach supersonic speed and tear the wings off the plane).

Looking into the matter more carefully, you might find that eastbound flights go faster (because of jet stream tailwinds) and westbound flight slower. Or that congestion at some times of day influences taxi time or air speed. So you may decide to add additional predictor variables ($x$'s) to your model.

Once you have some idea what measurable quantity $Y$ you want to predict or understand, and what measurable quantities (roughly linearly related to $Y$) might be reasonable predictor variables, then you are ready to do business.