Let's say I have a set of vectors and I want to find the minimum distance between one of them and the rest. EDIT: This is a Machine Learning task in which I'm missing values in my data and am looking for the vectors most "similar" to the ones with missing values in order to fill those up with the values of the most similar vector.
My thought is immediately to compute the Euclidian distances in $n$ dimensions between that one vector (with missing value(s)) and all the others in pairs and take the minimum of those results. However, someone pointed out to me that I would have to "normalize" my vectors so that my calculations wouldn't be biased by the different units and scales of the independent variables (components of the vectors in this case), in other words so that an independent variable with a much wider range of values wouldn't potentially weigh much more than another in the distance calculation.
He recommended taking the standard deviation of the row vectors (SD for each component across all vectors) and dividing the corresponding complnent of each vector by that values, as well as subtracting the mean of each component across all vectors and subtracting it, so that each component across all vectors would be scaled by $\frac{1}{SD}$ and $(-)mean$.
My question is: Is standardizing really important in this case, does that justification make sense? And if it does, why and what is the best way of doing it (and why again)? And what is really the effect of standardizing in this way? Does everything come out to have the same range?