I know that in general this is true because the smaller model is nested within the larger model, so the larger model must have SSE at least as low as the smaller one, but I'm having a hard time proving this mathematically.
I know that $$R^{2} = \frac{\mathbf{{\hat{\boldsymbol\beta '} X ' y}} - n \bar{\mathbf{y}}^{2}}{\mathbf{y'y} - n\bar{\mathbf{y}}^{2}}$$
So my idea was to show that $\mathbf{{\hat{\boldsymbol\beta '} X ' y}}$ is increasing with $p$ (assuming $\mathbf{X}$ is $n$ x $p$) since the rest of the equation is constant wrt $\mathbf{X}$. I've gotten as far as $$\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{p} \hat{\boldsymbol\beta}_j \ x_{ij} \ y_{i}$$That is where I'm stuck.
I think it's best to use matrix notation to answer this. Suppose the original estimate by OLS is $$y=X\beta+u \tag{1}$$ where $X$ is an $n\times p$ matrix of observables, $\beta$ is a $p\times 1$ vector of coefficients obtained by OLS and $u$ is a $p\times 1$ vector of residuals. Now we add a new variable $X_0$ and perform a new OLS estimate $$y=X_0\hat{\beta}_0+X\hat{\beta}+v \tag{2}.$$ Since $\beta$, $\hat{\beta}_0$ and $\hat{\beta}$ are all OLS estimates we know that $u^TX=v^TX_0=v^TX=0$.
Combine $(1)$ and $(2)$ to get $$X\beta+u=X_0\hat{\beta}_0+X\hat{\beta}+v \tag{3}.$$ If we multiple both sides of $(3)$ by $u^T$ we have $$u^Tu=u^TX_0\hat{\beta}_0+u^Tv.$$ Similarly, multiplying by $v^T$ gives $$v^Tu=v^Tv.$$ This tells us that $$u^TX_0\hat{\beta}_0=u^Tu-v^Tv \tag{4}.$$
Finally,
$ \begin{align*} v^Tv&=(y-X_0\hat{\beta}_0-X\hat{\beta})^T(y-X_0\hat{\beta}_0-X\hat{\beta}) \\ &=(X\beta+u-X_0\hat{\beta}_0-X\hat{\beta})^T(X\beta+u-X_0\hat{\beta}_0-X\hat{\beta})\qquad\text{from $(1)$} \\ &=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0+u)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0+u) \\ &=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)+2u^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)+u^Tu \\ &=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)-2u^TX_0\hat{\beta}_0+u^Tu \\ &=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)-2(u^Tu-v^Tv)+u^Tu \qquad\text{using $(4)$}\\ &=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)-u^Tu+2v^Tv \\ u^Tu&=(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)^T(X(\beta-\hat{\beta})-X_0\hat{\beta}_0)+v^Tv \\ &\geq v^Tv. \end{align*} $
This says that the new SSE is less than the original SSE. It follows that $R^2$ must increase (weakly).