How do I learn to stop worrying and love the substitution $y'' = y' (dy'/dy)$

145 Views Asked by At

The following is a solution of the differential equation $y'' = y$ with initial values $y(0) = 3$, $y'(0) = 1$. Considering $y$ to be a function of $x$ and omitting some standard details:

Let $z = y'$. Then $y'' = \frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx} = z \frac{dz}{dy}$. Substituting, $z\frac{dz}{dy} = y$ is separable with initial value $y = 3$, $z = 1$, so has the unique solution $z = \sqrt{y^2 - 8}$. Now the equation $y' = \sqrt{y^2 - 8}$ with initial value $y(0) = 3$ has solution $y = 2e^x + e^{-x}$, which is also a solution to the original problem.

While this does produce the correct answer, the step using $\frac{dz}{dy}$ is unjustified. One way to fill this gap is to note that since $y'(0) \ne 0$, the inverse function theorem says that locally $x$ and therefore $y'$ is in fact a (smooth) function of $y$. This is somewhat implicitly used in picking the positive branch of the square root $\sqrt{y^2 - 8}$.

But I do not know how to justify the definition of $z$ as a function of $y$ and therefore of $\frac{dz}{dy}$ if you change the initial values to something like $y(0) = 2$, $y'(0) = 0$. The solution does have $y$ an invertible function of $x$ but this is not known a priori. If you just blindly work through the integration without worrying you still get the solution $y(x) = e^x + e^{-x}$, no matter which branch of the square root $\pm \sqrt{y^2 - 4}$ you pick (and in fact $y'$ changes sign near $x = 0$, so neither of these is valid even locally).

Is there a way to make this work rigorously? Or are there similar equations and initial values where this substitution misses a solution? I'm not so worried about it introducing extraneous solution although I do not have an example of that either.

I originally learnt this solution in high school from a physics-y book that did not justify the differentiability, as is usual in high school level physics-y books. Actually in that book the original solution was for the simple harmonic motion equation $y'' = -\omega^2 y$ but, as I noticed later, the same substitution seems to help with some other second order ODEs too.

4

There are 4 best solutions below

8
On BEST ANSWER

Your manipulation ultimately ends up being the ODE $yy' = y' y''$ in disguise which is of trivially statisfied by any solution to the original ODE. Separating variables on the equivalent (for nonzero $y$) $y' = \frac{y' y''}{y}$ then results in solving $y' = (y' y'') \frac{1}{y}$ where $y'y''$ is the term you'd integrate over $x$ and $\frac{1}{y}$ the one to integrate over $y$. So using the "less problematic" version of separation of variables you just integrate $y y' = y' y''$ over $x$: $$ \textbf{LHS:} \quad \int y y' dx = \int y dy = \frac{y^2}{2} \\ \textbf{RHS:} \quad \int y' y'' dx = \int y' dy' = \frac{y'^2}{2} + C. $$ Together we thus get the new ODE $y^2 = y'^2 + C$ which we can easily solve (you can find that $C=8$ by plugging in the initial conditions).

So basically you can avoid the whole substitution by first working things through with it, and then just eliminating it in a way that avoids all potentially undefined or problematic quantities.

3
On

If you are worrying about the fact that $y'(0) =0$ just remember that since the function is differential for all real $x$. The limit converged to a constant term. Here we are taking in terms of limit. If you solve it using basic calculus or using definition of differentiation, you will get the same result.

1
On

A lot of intro differential equations just do nonsense or make large assumptions in order to get the answer. It’s oftentimes much easier to do something non-rigorous to get the solutions and then use standard uniqueness/existence theorems (i.e. checking for singularities) on them to prove they’re the full set of solutions rather than maintain rigor throughout the initial calculation, keeping track of every time you assume a derivative is nonzero and where you’ve used various higher level theorems to justify it.

6
On

One way to think about this is to view the introduction of $z=y'$ as switching from considering a $1$-dimensional initial value problem (of second order) to a $2$-dimensional initial value problem (of first order). In other words, I can consider a vector-valued first order problem, $\mathbf x(t) = (x_1(t),x_2(t))$ satisfies $$ \mathbf x'(t) = \left(\begin{array}{c} x_1'(t)\\ x_2'(t) \end{array}\right) = \left(\begin{array}{cc} 0 & 1\\ 1 & 0\end{array}\right) \left(\begin{array}{c} x_1(t)\\ x_2(t) \end{array}\right) $$ Since $x_1'(t)=x_2$ and $x_2'(t)=x_1$, the first component of any solution of this vector-valued ODE will satisfy $x_2''(t)=x_2$. Conversely any solution $y(t)$ to the original ODE will yield a solution of the vector-valued ODE, namely $\mathbf x(t) =(y(t),y'(t))^T$.

Viewing things through the first-order vector-valued ODE, you don't have to take any special consideration into account when $y'(0)=0$, it is just like any other vector-valued initial condition.

By transposing the problem to a vector-valued setting,the initial value problem becomes one of the form $\mathbf x'(t) = F(\mathbf x(t))$, where $F\colon \mathbb R^2\to \mathbb R^2$. In the example above, $F$ is the linear map $F(\mathbf x) = (x_2,x_1)^T$. You can use the chain rule $etc.$ to $F(\mathbf x(t))$ without having to think of one coordinate being a function of the other (which you can do using the implicit function theorem, as you point out, most, but not all of the time along a solution curve - moving to the vector-valued problem, however, there is no need to).

The separation of variables technique in the vector-valued problem is then just noting that as $x_1'(t)=x_2$ and $x_2'(t)=x_1$, we have $x_1x_1' = x_2x_2'$, or $\frac{d}{dt}(x_1^2-x_2^{2})=0$, and hence $x_1^2=C+x_2^2$.

Indeed, you can go a little further and see what underlies the separation of variable technique here:The the function $g(x)=x_1^2-x_2^2$ has the property that $Dg(x) = (2x_1,-2x_2)$, so that $Dg(x)(F(x)) =0$, that is $x_2\partial_1 g(x) + x_1\partial_2 g =0$. Now rather than thinking of $Dg$ acting on $F$, you can associate to $F$ the vector field $\vartheta =\vartheta_F = x_2\partial_1 + x_1 \partial_2$ then $\vartheta(g)=0$, so the solution curves $\mathbf x(t)$ will lie on the level sets of $g$, as $D(g(\mathbf x(t)) = \vartheta(g)=0$. In general if $F= (f_1,f_2)^T$ then $\vartheta_F = f_1\partial_1+f_2\partial_2$, and solutions to $\mathbf x'(t) = F(\mathbf x)$ will lie on the level sets of functions $g$ that satisfy $\vartheta_F(g)=0$. If you prefer to think in terms of gradients, then $\vartheta_F(g) = \langle F, \nabla_g\rangle$, where $\langle \mathbf x,\mathbf y\rangle = \sum_{i=1}^2 x_iy_i$ is the standard dot product.

Finally, the vector-valued formulation also lets you completely solve constant coefficient ODE, or at least reduce the problem to the classification of matrices up to conjugation. Since generically a matrix is diagonalizable, one can generically at least, replace an $n$-th order constant coefficient ODE with $n$ independent $1$st order problems, which are then trivial to solve. In the case of $y''=y$ above, if I let $A = \left(\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right)$ , so that the differential equation becomes $\mathbf x'(t) = A\mathbf x(t)$, I can solve this with the matrix exponential, so the solution formally looks like the $y'=a.y$ case: $$ \mathbf x(t) = \exp(tA)\mathbf x(0) \quad \text{ where } \exp(tA) = \sum_{n=0}^\infty \frac{t^nA^n}{n!}. $$

Viewing $\text{Mat}_n(\mathbb R)$ as an inner product space via $A\cdot B = \sum_{i,j} a_{ij}b_{ij} = \text{tr}(A^tB)$, we have $\|A\|:=(\text{tr}(A^tA))^{1/2}$ and $\|AB\|\leq \|A\|.\|B\|$, so the series $\exp(tA)$ is absolutely convergent for all $A\in \text{Mat}_n(\mathbb R)$ and $t \in \mathbb R$. Similarly, it is easy to see that $\frac{d}{dt}\exp(tA) = A\exp(tA)$, so that $\exp(tA).\mathbf x(0)$ does indeed solve the differential equation.

Moreover, since $A$ is in this case symmetric and so diagonalizable, there is a basis $\{v_1,v_2\}$ of $\mathbb R^2$ consisting of eigenvectors of $A$ -- explicitly in this case we can take the basis $\{v_1,v_2\}$ where $v_1 = (e_1+e_2)/\sqrt{2}$ and $v_2 = (e_1-e_2)/\sqrt{2}$ which have eigenvalues $1$ and $-1$ respectively, so that $A = PDP^{-1}$ where $$ P= (\mathbf v_1 |\mathbf v_2) = \frac{1}{\sqrt{2}}\left(\begin{array}{cc} 1 & 1\\ 1 & -1 \end{array}\right), \quad \text{and } D= \left(\begin{array}{cc} 1 & 0\\ 0 & -1 \end{array}\right) $$ If $\mathbf z = P^{-1}\mathbf x$, then as $P$ is orthogonal $P^{-1}=P^T$ which is equal to $P$, so that $$ \mathbf z(t) = \left(\begin{array}{c} z_1(t) \\ z_2(t) \end{array}\right) = \frac{1}{\sqrt{2}}\left(\begin{array}{cc} 1 & 1\\ 1 & -1 \end{array}\right)\mathbf x = \frac{1}{\sqrt{2}}\left(\begin{array}{c} x_1(t)+x_2(t) \\ x_1(t)-x_2(t) \end{array}\right) $$ then $\mathbf z'(t) = D\mathbf z(t)$, that is, $z_1'(t)=z_1(t)$ and $z_2'(t) = -z_2(t)$. Thus the change of coordinates "decouples" the vector-valued system into two 1-dimensional first order equations. It is not helpful to think of whether $z_2$ can/should be viewed as a function of $z_1$ here, and again the vector-valued problem doesn't care, but some textbooks will say incomprehensible things about replacing $y,y''$ with "independent functions" $u=y+y'$ and $v=y-y'$