I'm reading theorem 7.27 of Gilbarg and Trudinger's "Elliptic Partial Differential Equations of Second Order"
It says Let $u\in W^{k,p}_0(\Omega)$, then for any $\epsilon>0,0<|\beta|<k$ \begin{equation} \|D^{\beta}u\|_{L^p(\Omega)}\leq\epsilon\|u\|_{W^{k,p}(\Omega)}+C(k)\epsilon^{\frac{|\beta|}{|\beta|-k}}\|u\|_{L^p(\Omega)} \end{equation}
In the book, the author only proved this inequality for $|\beta|=1,k=2$, and says the general cases can be proved by induction. And I'm stuck finding the suitable induction argument. Can anyone tell me how to prove the general case using induction? Any help would be appreciated!
I was not able to produce an inductive proof, so instead I propose to simplify the problem using an arbitrage-related trick, and then consider a direct approach. Since it could be useful to you on other occasions, I think it's worth outlining.
What Gilbarg-Trudinger actually prove. The proof given there actually yields something a little stronger: $\newcommand{\eps}{\varepsilon}$ \begin{equation} \label{eqB} \tag{B} \|D u\|_{L^p(\Omega)} \leq \eps \| D^2 u\|_{L^p(\Omega)} + C \eps^{-1} \| u \|_{L^p(\Omega)}, \quad \text{for } \eps > 0. \end{equation} so it makes sense to aim for the stronger formulation \begin{equation} \label{eqC} \tag{C} \|D^s u\|_{L^p(\Omega)} \leq \eps \| D^k u\|_{L^p(\Omega)} + C(k) \eps^{-\frac{s}{k-s}} \| u \|_{L^p(\Omega)} \quad \text{for } \eps > 0, \ 0 < s < k, \end{equation} and this is what we'll do.
Multiplicative formulation. Let's denote $\|D^s u\|_{L^p(\Omega)}$ by $a_s$, so that the base case \eqref{eqB} tells us $a_s \le \eps a_{s+1} + \eps^{-1} a_{s-1}$ for $0 < s < k$ (and the claim \eqref{eqC} can be rephrased similarly) . The freedom of choosing $\eps$ in all of these inequalities is useful, but somewhat misleading: there is always some best choice of $\eps$. In fact, choosing $\eps = \sqrt{a_{s-1}/a_{s+1}}$ leads to $a_s \le C a_{s-1}^{1/2} a_{s+1}^{1/2}$ (possibly with a different constant $C$). In the other direction, one could use the inequality $2xy \le x^2+y^2$ with the Peter-Paul trick (replace $x$ with $\eps x$ and $y$ with $y/\eps$) to recover $a_s \le \eps a_{s+1} + \eps^{-1} a_{s-1}$. Thus, \begin{equation} \tag{B} a_s \le C a_{s-1}^{1/2} a_{s+1}^{1/2} \quad \text{for } 0 < s < k \end{equation} is equivalent to \eqref{eqB}. Similarly, $a_s \le \eps a_{k} + C(k) \eps^{-\frac{s}{k-s}} a_{0}$ \eqref{eqC} is equivalent to \begin{equation} \tag{C} a_s \le C(k) a_{0}^{(k-s)/k} a_{k}^{s/k} \quad \text{for } 0 < s < k \end{equation} by optimizing $\eps$ in one direction and using Young-Peter-Paul inequality in the other (it's even mentioned on Wikipedia).
Proof. I think it's much easier without all the $\eps$'s floating around, and there's probably a lot of ways to do it. Here's one.
It's maybe easier to think about an arithmetic mean rather than geometric, so let's introduce $b_s := \log a_s$. The base case and the claim now translate to \begin{align} b_s & \le \frac 12 b_{s-1} + \frac 12 b_{s+1} + D \tag{B} \\ b_s & \le \frac{k-s}{k} b_0 + \frac{s}{k} b_k + D(k) \tag{C}. \end{align} If it weren't for the constant $D$, it would be clear that \eqref{eqB} implies \eqref{eqC}, as the sequence would have to be convex (we'll return to this point). So let's introduce $c_s := b_s + D s^2$ and check that it satisfies $c_s \le \frac 12 c_{s-1} + \frac 12 c_{s+1}$ without any additional constant (this can be found by taking an ansatz $c_s := b_s + E s^2$ and looking for the right $E$). It now follows that $c_s \le \frac{k-s}{k} c_0 + \frac{s}{k} c_k$, which implies the final claim \eqref{eqC}. To see this, one can without loss of generality assume $c_0 = c_k = 0$ (by adding a linear function) and see what happens if the maximum value of $c_s$'s in strictly positive (spoiler: it has to achieve this maximum for neighboring indices, and so on).
Side comments. Let me mention where these tricks come from. First, I already mentioned arbitrage. Second, if one has a function $f$ with $f''(x) \ge -D$, then the function $g(x) := f(x) + \frac 12 D x^2$ has $g''(x) \ge 0$, so it's convex, which can be useful; a similar thing happened with $b$ and $c$. Third, there's a theorem called the maximum principle, which in one variable reads as follows: if $g''(x) \ge 0$, then $g$ achieves its maximum on the boundary of the domain. Its proof, when translated to the discrete case, gives the final argument used for $c$.