Using regression tables to understand the data from two variables?

106 Views Asked by At

The following is from a study that investigated whether babies take longer to learn to crawl in cold months when they are often bundled in clothes that restrict their movement, than in warmer months. The study sought an association between babies’ first crawling age and the average temperature during the month they first try to crawl (about 8 months after birth). The data were collected from parents of 414 young children reflecting the child’s age (in weeks) when they first crawled (CRAWL). Meteorological data were also collected reflecting the average temperature (in degrees Fahrenheit) in a child’s town when he or she was 8 months old (TEMP). Now the following summarizes the distributions on the 2 variables:

enter image description here

I am better trying to understand what regression tables are and what they are actually telling us about the data.

Because of this I have three questions:

One, how much does one-degree change in temperature speed up or slow down when a baby starts crawling?

Two, using the regression line to predict the age at crawling when temperatures are 35-degrees on average. What about for 70-degrees? And are these realistic numbers (recalling that CRAWL is measured in weeks)?

Three, what are the significance of slope and intercept coefficients? As well what role does the t-test results and confidence intervals play in all this data?

Thank you

1

There are 1 best solutions below

0
On

It is irresponsible to draw conclusions only from the information you have provided. With a caveat here, and more caveats along the way, I will do my best to make sense of your three questions.

I assume from what you say that the model is $Y_i = \beta_0 + \beta_1 x_i + e_i,$ where $Y_i$ are 'Crawl', $x_i$ are 'Temp', and independent $e_i \sim \mathsf{Norm}(0, \sigma),$ for $i = 1, \dots, 414.$

Does a plot of Crawl vs. Temp look reasonably linear? There must be a significant linear component of association between the two variables because the correlation is $r = -\sqrt{.487} \approx -.7.$ According to the t test of $H_0: \beta_0 = 0$ against the two-sided alternative, $H_0$ is strongly rejected with a tiny P-value. This indicates that the population correlation between $Y$ and $x$ is not $0.$

But that means almost nothing if the residuals from the model show a pattern of steady change in absolute value with $x_i$ or if residuals are not roughly normally distributed. Also, conclusions might not be sound if there are outliers among the residuals, or single data points that too highly influence the values of $\hat \beta_0$ and $\hat \beta_1.$

Provided that the model is as stated and all is well with assumptions and diagnostics, it makes sense to try to answer your questions.

(1) The estimated slope is $\hat \beta_1 = -.0756,$ so each increase of 1 Fahrenheit degree in temperature decreases Crawl by 0.067 weeks (about half a day).

(2) The estimated value $\hat Y$ (of Crawl for given Temp) is $\hat Y = 35.7 - 0.067x.$ So you can roughly predict time to crawl by plugging various temperatures in as $x.$ That works as long as $x$ is not too far from the mean observed temperature of 50 degrees. [There are formulas for 'prediction intervals' that give some guidance how rough such a prediction is.]

Without seeing the data (only the mean and SD of $x,$) or how well the data points fit the regression line, I'd guess that temperature values in the range 30 to 70 degrees have some chance of giving useful predictions. Even if the model passes all tests, I'd still feel better about predictions, if there are some actual data points in the vicinity of the estimated $Y$ for each given $x.$

Especially in this example, one has to be wary of predicting Crawl for extreme values of Temp. All humans, and I suppose babies in particular, are very sensitive to temperature extremes. I don't suppose you have any data on babies suffering from frost bite or heat stroke, so it does not make sense to make predictions for temperatures beyond the observed range.

(3) The t test for intercept (constant) strongly suggests that the constant (estimated as 35.7) is significantly different from $0.$ The t test for slope (temperature) strongly suggests that the slope (estimated as -0.0756) is significantly different from $0.$ [There are formulas to give confidence intervals for $\beta_0$ and $\beta_1,$ which are useful provided that assumptions of the linear model are met. The printout shows such CIs.]