I'll quote Bishop's Pattern Recognition and Machine Learning (2006) in the context of the perceptron, an algorithm that, given two finite subsets of "patterns" $\mathcal{C}_1,\mathcal{C}_2\subseteq \mathbb{R}^D$, finds a hyperplane parametrized by $y(\mathbf{x}):=\mathbf{w}^T\mathbf{x}+w_0=0$ that separates $\mathcal{C}_1$ and $\mathcal{C}_2$ (when possible):
[...] The perceptron criterion [error function to be minimized] is therefore given by \begin{equation}\tag{4.54} E_P(\mathbf{w}) = -\sum_{n\in\mathcal{M}}\mathbf{w}^T\phi_n t_n \end{equation} where $\mathcal{M}$ denotes the set of misclassified patterns. [...] We now apply the stochastic [or sequential] gradient descent algorithm to this error function. The change in weight [$\mathbf{w}$] is then given by \begin{equation}\tag{4.55} \mathbf{w}^{(\tau +1)}=\mathbf{w}^\tau - \eta\nabla E_P(\mathbf{w})=\mathbf{w}^\tau +\eta\phi_n t_n \end{equation} where $\eta$ is the learning rate parameter and $\tau$ is an integer that indexes the step of the algorithm. [...] Note that, as the weight vector evolves during training, the set of patterns that are missclassified will change.
How does $\nabla E_P(\mathbf{w}) = -\phi_n t_n$?
As Bishop mentions, $\mathcal{M}$ depends on the choice of $\mathbf{w}$ so the gradient doesn't commute with $\sum_{n\in \mathcal{M}}$. Furthermore, what does $n$ mean in this context? $n$, in the sum that defines $E_P(\mathbf{w})$, takes the role of an indexation variable so I don't see how one could then use it outside of the sum.
Equation 4.55 is very misleading. The paragraph after your excerpt does a better job of explaining the author's intent.
In stochastic gradient descent, instead of using the gradient of the entire error function, you choose one of the data points $x_n$ evaluate the gradient of the individual addend of the error function corresponding to that data point. For your $E_P$ function, this becomes $$\begin{cases}0 & n \notin \mathcal{M} \\ -\phi_n t_n & n \in \mathcal{M}\end{cases}$$
So the update is essentially $\mathbf{w}^{\tau + 1} \gets \mathbf{w} - \eta \phi_n t_n \mathbf{1}_{n \in \mathcal{M}}$, which matches the plain English description above.
Equation 4.55 is misleading in two ways: (1) it suggests that it is using the full gradient $\nabla E_P$ as the step, when it is actually using the stochastic gradient method, and (2) it only applies for the case when $n \in \mathcal{M}$.