working at a Distribution center someone needs to figure out how they can predict what the future units sold will be for a certain season. the season could be Christmas, Easter, Memorial day etc... For the dependent variable (Y) they had Last years units sold and for my independent variables (x1, x2....) could be anything.
Any help will be greatly appreciated.
Thanks!
Let's say you are trying to predict Christmas sales for year $n+1.$ For this dependent (predicted) variable $Y,$ you would need past data on Christmas sales for years $j = 1, \dots, n$.
Then you can consider what other, possibly relevant, data you could obtain for several independent (predictor) variables $x_1, x_2, \dots, x_k.$ Two of them might be sales earlier in each year for Labor Day and Black Friday. Another might be the amount of money the company has spent on advertising earlier in the year. Still other variables might be economic variables (stock index, interest rate, new housing starts, etc.) or opinion variables (percentage who are optimistic about their personal finances, percentage who think the local government is on the 'right track', etc.) or any other variables that might have to do with people's ability and mood to do business with you. For each predictor variable, you would need data in years $1, \dots, n.$
You would have a regression model:
$$Y_j = \beta_0 + \beta_1 x_{1j} + \cdots + \beta_k x_{kj} + e_j,$$ where $e_j \stackrel{iid}{\sim}\mathsf{Norm}(0, \sigma).$
Next, you can run a regression program to estimate the $\beta_i$'s and $\sigma^2.$ Using those estimates, you can get a prediction $\hat Y_{n+1}$ for this year's Christmas sales. There are methods of making 'prediction intervals' for such predictions. There are also ways to assess whether your data are capable of making useful predictions.
Finally, methods to decide which predictor variables are useful and not just contributing noise are discussed briefly on another page. This is a useful refinement of a regression model. For example, you might start out with eight predictor variables, and find that three of them are actually useful.