Question
What are the reason/basis/rationale and the actual steps and design/mechanism behind to do the normalization to convert |WX| > 0 into |WX| = 1 in the process of getting the optimal W for the largest margin (such as in https://www.youtube.com/watch?v=eHsErlPJWUU Youtube video).
Background
Trying to understand SVM mechanism and the mathematics behind (lagrange duality). However, in any lecture videos, slides, web pages, they refer to WX + b = 1 or WX + b = -1 out of blue. I am not clear about the MUST reason to do so. In the video clips below, the professor mentioned he could scale up and down without losing generality and could get |WX| = 1 instead of |WX| >0.
Assistance required
- why it needs to be converted into |WX| = 1? Is it not possible to solve it with |WX > 0|?
- Then how it can be converted with what steps?


