My name is Peter and I'm writing my dissertation atm. I am analysing Morningstars rating system of funds(They rate funds with stars between 1-5, depending on how good they are).
What we are trying to investigate how much each star contributes to in expected future returns.
This is how i thought i could write the regression model:
y = b_0 + b_1STAR(1) + b_2STAR(2) + b_3STAR(3) + b_4STAR(4) + b_5STAR(5) + e_t
Where; y is the expected future returns; beta parameters are dummy-variables i.e have value 0 or 1; and STAR(i) represent how much each extra each STAR predicts in future returns.
What we have data on is: Expected future returns for funds Number of stars each fund has The value of each fund
How do i find the values for the STAR(i) variables? What confuses me is that usually when I've done regression, I've been interested in finding out the value of beta parameters.
Kind Regards,
Peter
When you say you want to know "how much each star contributes to expected future returns," it seems to me you have decided to regard the number of stars as a meaningful numerical variable. To me that would imply the model $$Y_i = \beta_0 + \beta_1 x_i + e_i,$$ where $x_i$ is the number of stars and $e_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma).$
If you want to treat the number of stars as a categorical variable, then you might feel more comfortable with an ANOVA design $$X_{ij} = \mu + \alpha_i + e_{ij},$$ where $i = 1,2,3,4,5,$ the $\alpha_i$ are the effects for each number of stars, and $e_{ij} \stackrel{iid}{\sim}\mathsf{Norm}(0,\sigma).$ Also, $j = 1, \dots, n_i,$ where there $n_i$ is the number of funds with $i$ stars. Then, if the main ANOVA F-test finds differences among the effects $\alpha_i,$ you could investigate relationships among the number of stars with linear and quadratic contrasts (to the extent you feel categories are roughly numerical). You might also do a Welch-style ANOVA if it seems variabilities are different for each $i$ (which would not surprise me).
It seems to me your regression model is an ANOVA in disguise, and that may account for your difficulty interpreting the results. In any case, I hope I have given you some alternative models that may help you clarify the connection between your data and the appropriate model for analysis.