quant interview: (mathematical modelling) linear regression and statistical significance

4.1k Views Asked by At

I am preparing a quantitative finance interview and I am struggling with this exercise:

Consider two data series, X = (x1, x2, . . . , xn) and Y = (y1, y2, . . . , yn), both with mean zero. We use linear regression (ordinary least squares) to regress Y against X (without fitting any intercept), as in Y = aX + $\epsilon$ where $\epsilon$ denotes a series of error terms.

Suppose that ρXY = 0.01. Is the resulting value of a statistically significantly different from 0 at the 95% level if:

i. $n = 10^2$ \ ii. $n = 10^3$ \ iii. $n = 10^4$ \

I already know the relation between a and $\rho$ is given by $$a = \frac{\rho_{XY}}{\sigma_X}$$

But I am struggling with the confidence level part.

Any help would be appreciated. Thank you!

2

There are 2 best solutions below

1
On BEST ANSWER

You can use the $F$-test in order to calculate statistically significance, given you hypotheis you have that $$ F_n= \frac{\rho^2}{1-\rho^2}*(n-2) $$

Hence you obtain the following $F_{100} = 0.0098$, $F_{1000} = 0.098$ and $F_{1000}=0.98$. In order to have a significance level at 0.05 you should hav $F>f(1,n-2,1-\alpha)$ where $f(1,n-2,\alpha)$ is the $\alpha$ percentile of a $F$ distribuition with parameters 1 and $n-2$. You can find these values on a table or on some online F-calculator. For the 3 values of $n$ this number is almost 3.5 hence you cannot reject the null hypothesis i.e. you value $\rho =0.01$ is not significative at the given confidence.

0
On

This is a classic case of hypothesis testing. Here, our null hypothesis is that there is no significant relationship between $X$ and $Y$ in the simple linear regression model $Y = \beta X + \epsilon$:

$$ H_0: \beta = 0$$ $$ H_A: \beta > 0$$

This can be answered by using a one-sided t-test:

$$ t = \frac{\hat{\beta}-0}{\text{SE}(\hat{\beta})} $$

where I use the hat notation to indicate estimated parameters.

The variance of $\hat{\beta}$ is given as

$$\text{Var}(\hat{\beta})=\frac{\sigma^2}{\sum(x_i-\bar{x})^2} $$

where $\sigma^2$ is the variance of the error term $\epsilon$ and $\bar{x}$ is the mean of $X$. We can estimate $\sigma^2$ by the sample variance $s^2$

$$ s^2 = \frac{1}{n-1}\sum(y_i-\hat{\beta}x_i)^2 = \frac{\text{RSS}}{n-1} $$

For our case of simple linear regression, we have $\text{RSS} = (1-\rho^2)\sum(y_i-\bar{y})^2$. Putting all the pieces together, we obtain for the t-value:

$$ t = \frac{\hat{\beta}\sqrt{\sum(x_i-\bar{x})^2}}{\sqrt{\sum(y_i-\bar{y})^2}\sqrt{1-\rho^2}}\sqrt{n-1} = \frac{\rho}{\sqrt{1-\rho^2}}\sqrt{n-1} $$

This follows a t-distribution with $n-1$ degrees of freedoms, which is very well approximated by a normal distribution for the given values of $n$. The resulting values of $ t\approx0.1$ (i), $t\approx0.32$ (ii) and $t\approx1$ (iii) are not statistically significant at the 95% level. However, $n=10^5$ would be.