I'm having some trouble understanding the intuition behind the theory of affine scaling(algorithm used in linear programming)
$D_k$ is a diagonal matrix with $x_k$ on the diagonal the vector of dual variables: $w^k = (AD^2_kA^T)^{-1}AD^2_kc$
we then compute the vector of reduced costs $r^k = c-A^Tw^k$
then the theory states (this part I don't quite understand) that if $r^k > 0$ and $1^TD_kr^k < \epsilon$
that the current solution $x_k$ is optimal. why is that?
I am looking at this: https://en.wikipedia.org/wiki/Affine_scaling
thank you for any help