Looking at the Kaggle Titanic Dataset there exists two columns of predictors:
Parchdescribes the sum of that passenger's parents and children that have boardedSibSpdescribes the sum of that passenger's siblings and spouses that have boarded
I've seen multiple examples under the shared code where people create a new predictor column named Family that is the sum of Parch and SibSp.
Now, it is common sense to say that Family = Parch + SibSp. Therefore Family is easily predicted based on the other two columns -- that would make it multicollinear (multicollinear wiki) which I understand to be problematic.
So why make the family column? If it really is multicollinear, I don't see why so many people (who seem to have a better understanding of ML than I) are doing this.
EDIT: I've also discovered people sometimes will combine variables they expect to demonstrate multicollinearity, into a single new variable. I.E. in the problem described, if it is suspected that SibSp is a predictor of Parch, it would make sense to create Family, though the video [timestamp 14:05] offers no explanation as to why.
Multicollinearity is mainly an issue when fitting Linear regression models. However, in machine learning applications it is rare to fit just a vanilla OLS model. Moreover, adding a regularization term is very common, and even for OLS model this can address many of the problems of multicollinearity since it yields a problem that has a unique solution and with controlled condition number. The issues of parameter identifiability and signifiance estimates mentioned in the link are essentially irrelevant to machine learning, where the ultimate objective is predictive accuracy.