Bishop's book [1] describes a least-squares approach for classification with a linear model:
$$y_k(x)=w_k^Tx + w_{k0}$$
and sum-of-square-errors cost function.
Then it mentions an interesting fact:
An interesting property of least-squares solutions with multiple target variables is that if every target vector in the training set satisfies some linear constraint $a^T t_n + b = 0$ for some constants $a$ and $b$, then the model prediction for any value of $x$ will satisfy the same constraint so that $a^Ty(x) + b = 0$.
I can't understand how can a linear constraint influence the predictions in such a way? Is there an intuitive way how to see this or a nice proof?
Thanks!
[1] Bishop: Pattern recognition and machine learning