The theory behind Linear Support Vector Machines with tolerance of misclassifications states that we are trying to minimise in the primal weight space the following function:
$$\min\limits_{w,b,\xi} J_{P}(w,\xi) = \frac{1}{2}w^{T}w + c\sum\limits_{k=1}^{N}\xi_{k}$$
Subject to the following constraints defined for all data points $k$ where $\xi_{k}$ are slack variables, $x_{k}$ are input points, $(w^{T}x_{k}+b)$ are outputs of trained classifier for weights $w$ and $b$ and $y_{k}$ are classes (either $-1$ or $+1$): $$\forall_{k\in1...N} \ \ \xi_{k} \geq 0$$ $$\forall_{k\in1...N} \ \ y_{k}(w^{T}x_{k}+b) \geq 1 - \xi_{k}$$
These sets are used of inequality constraints are - as far as I understand - mandatory to satisfy. The slack variables $\xi_{k}$ provide wiggle room for some datapoints that would not satisfy a regular $y_{k}(w^{T}x_{k}+b) \geq 1$ condition.
However, if for some datapoint $l$ we have misclassification - thus the slack variable is greater than $1$: $\xi_{l} > 1$ then obviously this condition cannot be satisfied:
$$y_{l}(w^{T}x_{l}+b) \geq 1 - \xi_{l}$$
Because of the following example values ($y_{l} \neq (w^{T}x_{k}+b)$ because we have a misclassification): $$\xi_{l} = 1.5$$ $$y_{l} = +1$$ $$(w^{T}x_{l}+b) = -1$$ then
$$1\cdot -1 \ngeq 1 - 1.5$$ $$ -1 \ngeq -0.5 $$
So, as far as I understand this makes it impossible to find any solutions to this problem. Yet if I input such data to a SVM classifier using SMO Quadratic Programming algorithm it outputs some results with the offending point being on the "wrong" side of classification.
My question is: How is this possible? Aren't those conditions mandatory to satisfy? If not, what regulates how many of them won't be satisfied, since slack variables $\xi$ can't?
Your mistake is that you did not choose $\xi_l$ to be large enough for your example. Note that it is not bounded from above, so it will grow to be as large as necessary to make the constraint valid. In you example above, you could have chose $\xi_l$ to be 2. This would satisfy the constraint.
You also assume that $(w^{T}x_{l}+b) = -1$ in the case of a misclassification. This is not true. A misclassification happens when $y_l(w^{T}x_{l}+b) \lt 1$. The value $(w^{T}x_{l}+b)$ could be positive and still be misclassified.