Coefficient of Determination and Standard Error of the Model

197 Views Asked by At

Background explaining standard concepts and standard terminology used in linear regression and analysis of variance:

It will be supposed that one has data points $(X_i,Y_i),\, i = 1,\ldots,n.$

The average $Y$ value for a given $X$ value (which need not be one of the $n$ observed $X$ values), as estimated by least squares, is denoted $\widehat Y.$ The $i$th fitted value $\widehat Y_i$ is the value of $\widehat Y$ when $X$ is the $i$th observed $X$ value $X_i.$

The total corrected sum of squares is $\sum_{i=1}^n (Y_i-\overline Y)^2$ where $\overline Y = \frac 1 n \sum_{i=1}^n Y_i.$

The residual sum of squares, also called the unexplained sum of squares, is $\sum_{i=1}^n (Y_i - \widehat Y_i)^2.$

The explained sum of squares is $\sum_{i=1}^n (\widehat Y_i - \overline Y)^2.$

The nature of least squares entails that the total (corrected) sum of squares is the sum of the explained and the unexplained sums of squares. "Corrected" means that $\overline Y$ has been subtracted from all of the $Y$ values before squaring.

The coefficient of determination $R^2$ is what proportion of the total (corrected) sum of squares is is explained, i.e. the explained divided by the total. (It is called $R^2$ because when one is simply fitting a straight line, it is the square of the correlation. When fitting a plane or a polynomial, etc., it is not the square of the correlation but it is still conventional to call it $R^2.$)

The $F$ statistic for fitting the null hypothesis that the sample was taken from a population in which the slope of the line is $0$ is $$ F = \frac{\text{explained sum of squares}/1}{\text{unexplained sum of squares}/(n-2)}. $$ (The denominators $1$ and $n-2$ are "degrees of freedom"; if one were fitting a more complicated model a number other than $1$ would appear in the numerator. The unexplained sum of squares has $n-2$ degrees of freedom because the vector $(Y_i - \widehat Y_i : i = 1,\ldots,n)$ satisfies two linear constraints: that the sum of its entries is $0$ and that the sum of the products of its components with the respective $X_i$ is $0.$) (One rejects that null hypothesis if $F$ is improbably large.)

There's the textbook; below is the question as the original poster wrote it.


i need some help.

$\hat{Y} = 5+2X$

F(Stat.) = 25

$n=102$

$\overset{n}{\underset{i=1}{\sum}}{(Y_{i}-\bar{Y}})^2 = 10$

i got this outputs but i need to find Coefficient of Determination and Standard Error of the Model with using this outputs.

1

There are 1 best solutions below

0
On

I'm assuming here that $\text{“}F\text{''}$ is intended to mean the $F$ statistic for testing the null hypothesis that the population slope is $0.$ Then you have: $$ 25 = F= \frac{\text{explained sum of squares}/1}{\text{unexplained sum of squares}/(102-2)}. $$ And $$ R^2 = \frac{\text{explained sum of squares}}{\text{total (corrected) sum of squares}} = \frac{\text{explained sum of squares}}{10}. $$ I will abbreviate some things: $$ 25 = \frac e {u/100}. \qquad R^2 = \frac e {10}. \qquad e+u = 10. $$ So \begin{align} & 25u = 100 e \\[6pt] & 10R^2= e \\[6pt] & e+u=10 \end{align} You have three linear equations in three unknown quantities, and what you want is $R^2,$ the coefficient of determination.

(You get $R^2 = 0.2$.)