According to Casella-Berger (2002) the definition of p-value (def. 8.3.26, §8.3.4, p. 397) is:
A p-value $p(X)$ is a test statistic satisfying $0 \le p(x) \le 1$ for every sample point $x$. Small values of $p(X)$ give evidence that $H_1$ is true. A p-value is valid if, for every $\theta \in \Theta_0$ and every $0 \le \alpha \le 1$, $P_{\theta}(p(X) \le \alpha) \le \alpha$.
However, other books such as Rohatgi (2001) define it as:
The probability of observing under $H_0$ a sample outcome at least as extreme as the one observed is called the P-value. The smaller the P-value, the more extreme the outcome and the stronger the evidence against $H_0$.
I feel this definition is similar in spirit to the one by Schervish (2012):
p-value. In general, the p-value is the smallest level $\alpha_0$ such that we would reject the null-hypothesis at level $\alpha_0$ with the observed data.
How are these definitions equivalent?
The difference between these definitions is that, while the first one presents a property that a p-value must satisfy, the latter ones define the p-value as a function of a collection of hypothesis tests. It is possible to show that these definitions are related, as I discuss below.
Note that the latter two definitions refer to varying levels of significance. In order to make these definitions operational, you must consider a collection of hypothesis tests. Formally, an hypothesis test, $\phi: \mathcal{X} \rightarrow \{0,1\}$ is a function from the sample space that assumes the value $1$ if $H_0$ is rejected and the value $0$, otherwise. Let $(\phi_{\alpha})_{\alpha \in (0,1)}$ be a collection of hypothesis tests such that $\phi_{\alpha}$ has size $\alpha$ and that satisfy monotonicity, that is, if $\alpha_1 \leq \alpha_2$, then for every $x$, $\phi_{\alpha_1}(x) \leq \phi_{\alpha_2}(x)$. The latter two definitions say that the p-value is a function $p: \mathcal{X} \rightarrow (0,1)$ such that $p(x)=\inf \{\alpha: \phi_{\alpha}(x)=1\}$. Observe that, for every $\theta \in H_0$,
\begin{align*} \mathbb{P}_\theta(p(X) \leq \alpha^*) &= \mathbb{P}_\theta(\inf \{\alpha: \phi_{\alpha}(X)=1\} \leq \alpha^*) \\ &= \mathbb{P}_\theta(\phi_{\alpha^*}(X)=1) & \text{monotonicity} \\ &\leq \alpha^* & \phi_{\alpha^*} \text{ has size } \alpha^* \end{align*}
This shows that the p-value as in Rohatgi and Schervish satisfies the property presented in Casella.
Next, consider that $p: \mathcal{X} \rightarrow (0,1)$ is a function such that, for every $\theta \in H_0$, $P_{\theta}(p(X) \leq \alpha^*) \leq \alpha^*$. In this case, you can define a collection of hypothesis tests such that $\phi_{\alpha}(x)=\mathbb{I}(p(x) \leq \alpha)$. It follows from the initial definition that each $\phi_{\alpha}$ has size $\alpha$. Also, it follows from construction that, if $\alpha_1 \leq \alpha_2$, then for every $x$, $\phi_{\alpha_1}(x) \leq \phi_{\alpha_2}(x)$. Finally, note that $p(x) = \inf\{\alpha: \phi_{\alpha}(x)=1\}$. That is, you can construct a collection of hypothesis tests based on Casella's definition. If you apply Rohatgi's or Schervishe's definition to this class, then you obtain $p(x)$.