In linear regression problems it is important not to have a curve that overfits the input data or training examples. In other words, the curve should generalise your training data so you can predict new values. Sometimes it is necessary to apply normalisation to the input data so all our features of each training example are in the same range of values (i.e. [-1 1]). By doing so we can get our cost minimisation algorithm like Gradient Descent to converge to the minimum with less iterations. I can easily visualise this effect when running Gradient Descent but, and here is my question: Why when implementing an analytical approach (not iterative) such as the Normal Equation it is not necessary to normalise the input data? When testing new examples with the predicted hypothesis I am obtaining the same results with normalised Gradient Descent and unnormalised Normal Equation. Why?
2026-03-29 19:26:07.1774812367
Normalization in Linear Regression
302 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in CONVERGENCE-DIVERGENCE
- Finding radius of convergence $\sum _{n=0}^{}(2+(-1)^n)^nz^n$
- Conditions for the convergence of :$\cos\left( \sum_{n\geq0}{a_n}x^n\right)$
- Proving whether function-series $f_n(x) = \frac{(-1)^nx}n$
- Pointwise and uniform convergence of function series $f_n = x^n$
- studying the convergence of a series:
- Convergence in measure preserves measurability
- If $a_{1}>2$and $a_{n+1}=a_{n}^{2}-2$ then Find $\sum_{n=1}^{\infty}$ $\frac{1}{a_{1}a_{2}......a_{n}}$
- Convergence radius of power series can be derived from root and ratio test.
- Does this sequence converge? And if so to what?
- Seeking an example of Schwartz function $f$ such that $ \int_{\bf R}\left|\frac{f(x-y)}{y}\right|\ dy=\infty$
Related Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
The normal equation gives the exact result that is approximated by the gradient descent. This is why you have the same results.
However, I think that in cases where features are very correlated, that is when the matrix $X'X$ is bad conditioned, then you may have numeric issues with the inversion that can be made less dramatic as soon as you normalize the features.