At some point in a lesson in Calculus of Variations, my teacher had a function defined as:
$$v(x)= \begin{cases} u(x) & x\notin B(x_0,3\delta) \\ \max\{R(2\eta-1),u\} & x\in B(x_0,3\delta) \end{cases},$$
where $u$ was the $W^{1,p}$ solution to some variational problem (with fast growth of the lagrangian IIRC), $x_0$ was a point in the (bounded IIRC) domain $\Omega$ of integration, $\delta$ was picked so that the ball in the definition of $v$ was contained in $\Omega$, $R$ was basically the sup of $u$ in the ball, and $\eta$ was a smooth transition from 1 in $B(x_0,\delta)$, to 0 outside $B(x_0,2\delta)$ (maybe we only needed $\mathcal{C}^1$, but I don't think this is relevant to the question). He justified it being weakly differentiable by saying roughly that, where it is defined as a max, it is $[R(2\eta-1)-u]^+$, where $[\cdot]^+$ is the positive part. Now I saw immediately that this was not true: e.g., if we consider $\max\{x,-x\}$, we get $|x|$, whereas $[x-(-x)]^+=[2x]^+=2x^+$. So I tried working on a proof myself. I noticed that we can write:
$$\max\{f,g\}=[f-g]^++[g-f]^++g\cdot I_{\{f\geq g\}}+f\cdot I_{\{f<g\}},$$
so I concentrated on $g\cdot I_{\{f\geq g\}}$, trying to prove it belonged to $W^{1,p}$. I tried a smoothing approach. I noticed the indicator could be written as $\theta\circ(f-g)$, where $\theta=I_{[0,+\infty)}$ (closed half-line), so I first smoothed out $\theta$ with:
$$\theta_\epsilon(x)= \begin{cases} \theta(x) & x\notin[-\epsilon,0] \\ p_\epsilon(x) & x\in[-\epsilon,0] \end{cases},$$
where I determined that the choice:
$$p_\epsilon(x)=-\frac{3}{\epsilon^3}x^3-\frac{2}{\epsilon^2}x^2+1$$
makes $\theta_\epsilon$ a $\mathcal{C}^1$ function. It is easy to show (dominated convergence) that $\theta_\epsilon\circ v\to\theta\circ v$ in $L^p$. Trouble is, what happens when I add $g$ back in? I didn't notice the problem until a long session of gradient work, in which I found $v_n$ converging in $W^{1,p}$ to $v$ and being smooth on $\Omega$, assumed they converged pointwise a.e. (modulo choosing a subsequence), showed $\theta_\epsilon\circ v_{n_k}\to\theta_\epsilon\circ v$ in $L^p$ via dominated convergence, computed the gradients with the rule that $(q\circ s)'=q'\circ s\cdot s'$ when $q\in W^{1,p}$ and $s$ is $\mathcal{C}^1$, proved those gradients converged to the analog with $v$, concluded by the closing lemma that $(\theta_\epsilon\circ v)'=\theta_\epsilon'\circ v\cdot v'$ (well, this works for any partial $\partial_k$, but I was to lazy to write $\partial_k$ out :) ), hence $\theta_\epsilon\circ v$ is $W^{1,p}$, and then I said and what about $g\theta_\epsilon\circ v$? So is it true that the weak derivative of $\max\{f,g\}$, that is of $f\cdot I_{\{f\geq g\}}+g\cdot I_{\{f<g\}}$, is the most natural guess one would try, i.e. $\nabla f\cdot I_{\{f\geq g\}}+\nabla g\cdot I_{\{f<g\}}$? And how do I prove it?
Note
Naturally, once we have that formula for the weak gradient, the max is automatically Sobolev, since $f,g$ are, and $\int|\nabla(\max)|^p$ can be estimated via convexity in terms of $\int|\nabla f|^p$ and $\int|\nabla g|^p$.
If $f, g$ are in $W^{1,p}$, it is in general not true that $g I_{f\ge g}$ is in $W^{1,p}$. Think it this way: If $p >n$, then the Sobolev imbedding theorem implies that all functions in $W^{1,p}$ are continuous. But one can find $f, g$ so that $g I_{f\ge g}$ is not continuous thus not in $W^{1,p}$ (for example, $f(x) = x$ and $g(x) = 1$ in $\Omega = (0,2)$).
Instead, using
$$\max \{ f, g\} = \frac 12(f+g+ |f-g|),$$
it suffices to show that $|f-g|$ is weakly differentiable. But this follows from Lemma 7.5 and Lemma 7.6 in Gilbarg and Trudinger (If you do not have access to the book, I can provide a proof here)