I have a data set that comprises of Independent Variable $(X)$ and Dependent Variable $(Y)$ values with a certain frequency $(F)$.
I know that I have to find $x^2$ and $xy$ but how do I factor in the frequency?
I am calculating regression using the least squares method ($Y = a + bX$).
For clarity, this is the data set that I am working with.
Frequency (F) Independent Variable (X) Dependent Variable (Y)
3 4 60
4 4 65
2 5 65
4 5 70
3 6 75
2 6 80
4 7 85
3 8 90
2026-04-02 07:37:00.1775115420
How do I calculate regression line using a data set with repeated values indicated as frequencies?
111 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
This dataset is presented in an unusual format that might lead to confusion in several ways. The Comments show how to do basic computations that lead to estimating the slope and intercept of the regression line.
Obtain true data vectors. More systematically, here is how such a dataset might be handled in R statistical software. First, we need to get the 'true' data vectors for $x$ and $y$.
Find regression line. Then we do the linear regression. (You might want to use parts of the output, slightly abbreviated here, to check your own computations.)
We see that the regression line is $\hat Y = 33.87 + 7.13x,$ where $\hat \beta_9 = 33.87$ and $\hat \beta_1 = 7.13$ are both significantly different from $0.$ Moreover, about 95% of the variability in $y$ is 'explained' by regression on $x.$
Make informative scatterplot. Another potential difficulty lies in plotting data with repeated values. The plot on the left shows fewer dots than data points because some of the points are 'overplotted' (dots fall on top of one another). One might think there are only 8 datapoints.
One cure for this is to 'jitter' the data; that is, to introduce small random errors that are just big enough to separate the points, but not big enough to give a misleading impression of the dataset. The regression line is shown in both plots below. In the jittered plot one can see that there are more than 8 datapoints (25 actually).
Note: If you are interested in learning more about using R for regression, you can find several tutorials by searching the Internet for
simple linear regression in R. Many other statistical software packages provide similar output.