I have found James Murphy's "Benders, Nested Benders and Stochastic Programming: An Intuitive Introduction" (http://www.optimization-online.org/DB_FILE/2013/12/4157.pdf) to be quite helpful at deciphering exactly how and why optimality cuts perform the magic that they (seem to) do. The intuition he provides is showing how the optimality cuts identify the constraint(s) from the subproblem that must be included in the master problem (expressed using only the non-complicating variables, x and the approximation function $\alpha(x)$ to sufficiently reconstruct the $\alpha$ function such that the bounds converge (the solution found in the master matches what's found in the subproblem).
My question centers on Murphy's section on termination criteria (where the original problem is a minimization). Murphy says "we can test this by seeing whether the value of our approximation $\theta$ is already equal to (or greater than) the value of the currently active constraint on it, i.e. the optimality cut.
Question: when or how would we ever get a master problem value strictly greater than the subproblem value? Is he referring to numerical issues causing this? It makes no sense to me that we'd be iterating along and then find a solution like this...is this what would happen if we found the optimal solution and added an optimality cut for it as well?