In this question, I want to investigate the detailed derivation of the Strum-Liouville self-adjoint operator $$Lu = - \frac{1}{w(x)} \left\{ \frac{d}{dx} \left[p(x) \frac{d u}{dx} \right] + q(x) u\right\}$$ and some problems I find along the way.
1. Setting up the playground
Having in mind applications in physics, we consider a Hilbert space of complex-valued, square integrable functions defined over the interval $[a,b] \in \mathbb{R}$. The inner product is sesquilinear with the real positive weight function $w(x) > 0$, $$\langle u, v\rangle = \int_a^b \bar{u}(x)v(x) w(x) dx.$$ In other words, the Hilbert space is just $L^2([a,b], w(x)dx)$.
Now we seek the most general Linear Differential Self-Adjoint Operators (LDSAO) acting on this space.
2. Zeroth order LDSAO
The most general zeroth order linear differential operator $D_0$ is just multiplication by a complex function $c_0(x)$, $$D_0 = c_0(x).$$ Under what conditions is this operator self-adjoint on our Hilbert space? The adjoint operator $D_0^\dagger$, defined as $$\langle u, D_0 v\rangle = \langle D_0^\dagger u, v \rangle,$$ can immediately seen to be $$D_0^\dagger = \bar{c}_0(x).$$ Therefore, in order to have a self-adjoint operator $D_0 = D_0^\dagger$, the function $c_0(x)$ must be real, and not complex. Without losing generality, we can write $$c_0(x) = \bar{c}_0(x) = \frac{r_0(x)}{w(x)},$$ where $r_0(x)$ is some real function which can further be taken to be positive. Therefore, this is the most general zeroth order LDSAO $$\boxed{D_0 = \frac{r_0(x)}{w(x)}}$$
In the following, we will assume that $c_i(x)/r_i(x)$ are complex/real functions without explicitly mentioning it and suppress the $x$ dependence everywhere.
3. First order LDSAO
The most general first order linear differential operator is $$D_1 = c_1 \frac{d}{dx} + c_0. $$ Its adjoint operator reads $$D_1^\dagger = - \bar{c}_1 \frac{d}{d x} - \frac{(\bar{c}_1 w)'}{w} + \bar{c}_0.$$ Additionally, the boundary condition $$\left. c_1 w \bar{u} v\right|_a^b = 0$$ has to be imposed. In order that $D_1$ be self-adjoint, we have to have $c_1 = - \bar{c}_1$ and $c_0 = \bar{c}_0 - (\bar{c}_1 w)'/w$. There conditions enable us to write, $$c_1 = \pm i \frac{r_1^2}{w}$$ and $$c_0 = \frac{r_0 \pm i r_1 r_1'}{w}$$ for some $r_0$ and $r_1$. Then, we can write the most general first order LDSAO $$\boxed{D_1 = \frac{1}{w} \left[\pm i r_1 \frac{d}{dx}\left(r_1 \ \cdot \ \right) + r_0 \right]}$$ The boundary conditions can be written in terms of new functions as $$\boxed{\left. r_1^2 \bar{u} v\right|_a^b = 0}$$
4. Second order LDSAO
Following the analogous procedure as before, we find that for the most general second order linear differential operator $$D_2 = c_2 \frac{d^2}{d x^2} + c_1 \frac{d}{dx} + c_0$$ the adjoint is given by $$D_2^\dagger = \bar{c}_2 \frac{d^2}{d x^2} + \left[2\frac{(\bar{c}_2 w)'}{w} - \bar{c}_1 \right] \frac{d}{dx} + \frac{(\bar{c}_2 w)'' - (\bar{c}_1 w)'}{w} + \bar{c}_0.$$ Self-adjointness, $D_2 = D_2^\dagger$, imposes the following conditions: $c_2 = r_2/w$, $c_1 = (r_2' \pm i r_1^2)/w$ and $c_0 = (r_0 \pm i r_1 r_1')/w$ for some real $r_{0,1,2}$. Finally, for the second order LDSAO we obtain $$\boxed{D_2 = \frac{1}{w} \left[\frac{d}{dx} \left(r_2 \frac{d}{dx} \right) \pm i r_1 \frac{d}{dx}\left(r_1 \ \cdot \ \right) + r_0 \right]}$$ The appropriate boundary conditions turn out to be $$\boxed{\left. \left[r_2 (\bar{u}v' - \bar{u}'v) \pm i r_1^2 \bar{u}v \right]\right|_a^b=0}$$
5. The problem
It is stated that the most general second order LDSAO is the Sturm-Liouville operator form the beginning of the post. It would correspond to the choice $r_2 = - p, r_1 = 0, r_0 = -q$ in my derivation. I am puzzled why $r_1$ has to vanish? How can the Sturm-Liouville operator be considered the most general one when one can have $r_1 \neq 0$? Note that $r_1$ also enters into boundary conditions so it has severe consequences. I would appreciate some deeper insight.
Sturm-Liouville equations were originally assumed to be real, including the coefficients, the functions in question, and the endpoint conditions. It became convenient to deal with the equations in the complex plane, but only through a complex eigenvalue parameter, even though that was not strictly necessary in order to look at the eigenfunction expansions and solutions of the equation. Because of this requirement of having real coefficients, real endpoint conditions, real eigenvalue parameter, and real solutions, the first order terms could be absorbed into the equation by a change of independent and/or dependent variable. There are no real first order differential operators that are selfadjoint because $\frac{d}{dx}$ is anti-symmetric. So you have to be able to absorb those terms into a second order operator $Mf=(pf')'$, which may require changing the inner product weight for the problem and/or changing the independent variable.
If you are going to use a complex inner product, and to allow complex coefficients, then there are symmetric first order operators. (By the way, $ir(rf)'$ is not selfadjoint if you allow complex $r$. I believe you need $i\overline{r}(rf)'$, don't you?) Allowing complex coefficients makes endpoint conditions potentially non-real, which is also not classical.