For a random person selected from a population, let $X$ be their Instagram followers and $Y$ be their Facebook followers. We randomly sample the followers of $n$ people, $(X_1, Y_1), ..., (X_n, Y_n)$. Let $D = X - Y$ and $D_i = X_i - Y_i$ for $i = 1, ..., n$. Suppose we can accurately model $D$ as a normal random variable, but whose mean $\mu_D$ and variance $\sigma^2_D$ are unknown.
(a) We take $n = 7$ samples:
$X_i: 118, 136, 125, 121, 111, 142, 114$
$Y_i: 81, 91, 63, 78, 59, 93, 83$
Determine a $95$% confidence interval for $\mu_D$.
(b) Recent research suggests that $\mu_D = 40$. We claim that the average difference between Instagram and Facebook followers has increased, and we take $n = 17$ random samples. Use the test statistic $\frac{\bar{D} - 40}{\frac{S_D}{\sqrt{17}}}$ to form a test at the $0.05$ significance level.
I tried to do this
(a) I did a few calculations and found the sample statistics as:
$\bar{x} = 123.86, s_x = 11.42, \bar{y} = 78.29, s_y = 13.00$
By the Welch statistic, we find the degrees of freedom as
$$\Delta = \frac{(\frac{s_x^2}{n_x} + \frac{s_y^2}{n_y})^2}{\frac{1}{n_x - 1}(\frac{s_x^2}{n_x})^2 + \frac{1}{n_y - 1}(\frac{s_y^2}{n_y})^2}$$
Plugging in the values gives $\Delta = 11$.
The value of $t_{\alpha /2}$ at $0.025$ with $11$ degrees of freedom, from the tables, is $2.01$.
The interval is therefore
$$((\bar{x} - \bar{y}) - t_{\alpha / 2}\cdot \sqrt{s_x^2/n_x + s_y^2/n_y}, (\bar{x} - \bar{y}) + t_{\alpha / 2}\cdot \sqrt{s_x^2/n_x + s_y^2/n_y})$$
$$= (34.6, 56.6)$$
Another way:
$\mu_D = 45.57, s_D = 10.13$.
$t_{\alpha/2}$ with $\alpha = 0.05$ and $6$ degrees of freedom is $2.447$.
The interval is:
$(\bar{d} - t_{\alpha/2} \cdot \frac{s_D}{\sqrt{n}}, \bar{d} - t_{\alpha/2} \cdot \frac{s_D}{\sqrt{n}})$
$= (36.20, 54.94)$
Both methods give similar values but I am not sure which is correct.
(b) $H_0: \mu_D = 40$ and $H_1: \mu_D > 40$.
$t_{\alpha}$ with $\alpha = 0.05$ and $16$ degrees of freedom is about $1.75$.
The critical region is
$$\frac{\bar{d} - 40}{\frac{s_d}{\sqrt{17}}} \geq 1.75$$
We reject the null hypothesis if the test statistic is greater than or equal to $1.75$.
Is what I have done correct? For (a), which method is supposed to be the right one?
b looks good to me. For a, the second method is correct. We want to treat each observation as one entity, i.e. $d_i=x_i-y_i$ and then conduct the confidence interval treating d as a single normal random variable with unknown mean and unknown variance. Also called a paired analysis. Option 1 would be comparing the means of two different populations.