Say I am doing a study with 3 different types of fruit and I want to make a regression depending on the type that tries to predict the amount sold. I know that I could make 2 dummy variables: orange (V1:1,V2:0), pear (1,1), apple (0,1), for example. Is it also acceptable to make orange (V1:1) pear (2) and apple (3) for one variable? I have a large number of categories in my dataset, and I don't want to have to make a large number of dummy variables: I would rather just number them. Again, this is just going to be used for predictive purposes, not to try to detect the significance of the effect of the fruit type on the amount sold.
2026-03-28 00:47:47.1774658867
Dummy recoding for more than two categorical variables
169 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in STATISTICS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Fisher information of sufficient statistic
- Solving Equation with Euler's Number
- derive the expectation of exponential function $e^{-\left\Vert \mathbf{x} - V\mathbf{x}+\mathbf{a}\right\Vert^2}$ or its upper bound
- Determine the marginal distributions of $(T_1, T_2)$
- KL divergence between two multivariate Bernoulli distribution
- Given random variables $(T_1,T_2)$. Show that $T_1$ and $T_2$ are independent and exponentially distributed if..
- Probability of tossing marbles,covariance
Related Questions in REGRESSION
- How do you calculate the horizontal asymptote for a declining exponential?
- Linear regression where the error is modified
- Statistics - regression, calculating variance
- Why does ANOVA (and related modeling) exist as a separate technique when we have regression?
- Gaussian Processes Regression with multiple input frequencies
- Convergence of linear regression coefficients
- The Linear Regression model is computed well only with uncorrelated variables
- How does the probabilistic interpretation of least squares for linear regression works?
- How to statistically estimate multiple linear coefficients?
- Ridge Regression in Hilbert Space (RKHS)
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
The two ways you propose are not equivalent in an OLS regression. You should stick to the version where every fruit has a dummy. A generic OLS model is of the form $$y_i=\alpha+\beta_1 orange_i+\beta_2 apple_i+X\gamma+\epsilon_i.$$ So you need 2 dummies $orange,apple$, and if both are zero the constant $\alpha$ is the unconditional mean of pears. in general, you need $n-1$ dummy variables for $n$ categories.
If, instead, you estimate something like $$y_i=\alpha+\beta fruit_i+X\gamma+\epsilon_i,$$ with $$fruit=\begin{cases} 1 & \text{ if apple} \\ 2 & \text{ if orange} \\ 3 & \text{ if pear} \\ \end{cases}$$ then you implicitly assume the effect of changing from apple to orange is the same as switching from orange to pear, which doesn't make sense. One just doesn't represent categorical variables as continuous variables like this. Also, because your goal is prediction, the forecast based on the latter regression will usually be worse, because you use fewer variables/coefficients to predict and pool the categories you have in a linear form.
If you are concerned that the amount of variables becomes too big: some software packages like stata allow you to "batch-generate" dummy variables, so it doesn't take long to include all.