I have learnt about the characteristic equations of differential equations (D.E.) only informally and recently observed that the way they are defined seems to depend not only on the equation, but on the solution / eigenfunctions.
e.g. compare the following:
First, a 2nd order constant coefficient D.E.
$$ay'' + by' + cy = 0,$$
with characteristic polynomial $p(D) = aD^2 + bD + C$ (which we can find by taking the ansatz $y = e^{\lambda x}$ and substituting into the D.E.).$\\[5pt]$
Second, a 2nd order Cauchy-Euler D.E.
$$x^2 \, y'' + \alpha \, x \, y' + \beta\, y = 0,$$
with characteristic polynomial $f(D) = D^2 +(\alpha-1)D + \beta$, which we can find by taking the ansatz $y = x^r$ and substituting into the D.E.).
Clearly the characteristic polynomial in each case would be different if we started with a different ansatz, which leads me to suspect that my informal understanding is missing an important point in how these polynomials are defined. Can anyone clarify the situation, please?
I suspect this is an issue with definitions, but am curious to know if the bigger picture might lend greater insight into D.E.
§1.1. MATRICES
$\ \ \ \ \ $Associated with every square matrix $A$ is a number, written $|A|$ or $|\det(A)|$ called the determinant of $A$, given by:
$$\det\begin{pmatrix} a&b\\ c&d \end{pmatrix}=ad-bc$$
$\ \ \ \ \ $The trace of a square matrix $A$ is the sum of the elements on the main diagonal; it is denoted $\text{tr}(A)$:
$$\text{tr}\begin{pmatrix} a&b\\ c&d \end{pmatrix}=a+d$$
Remark. $\ \ $Theoretically, the determinant should not be confused with the matrix itself. The determinant is a number, the matrix is the square array. But, everyone puts vertical lines on either side of the matrix to indicate its determinant, and then uses phrases like "the first row of the determinant," meaning the first row of the corresponding matrix.
$\ \ \ \ \ $An important formula which everyone uses and no one can prove is \begin{align}\det(AB)=\det A \cdot \det B\tag{1}\end{align}
§1.2. HOMOGENEOUS 2X2 SYSTEMS
$\ \ \ \ \ $Matrices and determinants were originally invented to handle, in an efficient way, the solution of a system of simultaneous linear equations. This is still one of their most important uses. We give a brief account of what you need to know for now. We will restrict ourselves to square 2 x 2 homogeneous systems; they have two equations and two variables (or "unknowns", as they are frequently called). Our notation will be:
$$A=(a_{ij})\text{ a square 2x2 matrix of constants,}$$
$$\textbf x = (x_{1},x_{2})^{T}, \text{ a column vector of unknowns};$$ then the square system
$$\begin{align*} a_{11}x_{1}+a_{12}x_{2}&=0\\ a_{21}x_{1}+a_{21}x_{2}&=0 \end{align*}$$
can be abbreviated by the matrix equation
$$\begin{align} A\textbf{x}=\textbf{0.}\tag{2} \end{align} $$
$\ \ \ \ \ $This always has the solution $\textbf{x} = \text{0}$, which we call the trivial solution. The question is: when does it have a nontrivial solution?
Theorem. Let $A$ be a square matrix. The equation \begin{align}A\textbf{x}&=\textbf{0}\text{ has a nontrivial solution}&\iff& \det A=0 \ \ \text{(i.e., $A$ is singular)}\tag{3}\end{align}
§1.3. LINEAR INDEPENDENCE OF VECTORS
$\ \ \ \ \ $Conceptually, linear independence of vectors means each one provides something new to the mix. For two vectors this just means they are not zero and are not multiples of each of other.
Examples.
$\textbf{a}=(1,2)$ and $\textbf{b}=(3,4)$ are linearly independent.
$\textbf{a}=(1,2)$ and $\textbf{b}=(2,4)$ are linearly dependent because $\textbf{b}$ is a multiple of $\text{a}$. Notice that if we take linear combinations then b doesn’t add anything to the set of vectors we can get from a alone.
§1.3.1 DETERMINANTAL CRITERION FOR LINEAR INDEPENDENCE
Let $\textbf{a}=(a_{1},a_{2})$ and $\textbf{b}=(b_{1},b_{2})$ be 2-vectors, and $A$ the square matrix having these vectors for its rows (or columns). Then \begin{align}\textbf{a},\textbf{b}&\qquad\text{ are linearly independent}&\iff&\qquad \det A\not=0\tag{4}\end{align}
Examples.
1.$\ $ $\det\begin{pmatrix} 1&2\\ 3&4 \end{pmatrix}=4-6=-2\not=0.$ Therefore, $(1,2)$ and $(3,4)$ are linearly independent.
2.$\ $ $\det\begin{pmatrix} 1&2\\ 2&4 \end{pmatrix}=4-4=0.$ Therefore, $(1,2)$ and $(2,4)$ are linearly dependent.
Remark. $\ \ $The theorem on square homogeneous systems (3) follows from this criterion. We will prove neither.
$\ \ \ \ \ $Two linearly independent 2-vectors $\textbf{v}_{1}$ and $\textbf{v}_{2}$ form a basis for the plane: every 2-vector $\textbf{w}$ can be written as a linear combination of $\textbf{v}_{1}$ and $\textbf{v}_{2}$. That is, there are scalars $c_{1}$ and $c_{2}$ such that
$$c_{1}\textbf{v}_{1}+c_{2}\textbf{v}_{2}=\textbf{w}$$
Remark. $\ \ $ All of the notions and theorems mentioned generalize to higher $n$ (and a larger collection of vectors).
§2 GENERAL CASE: EIGENVALUES & EIGENVECTORS
$\ \ \ \ \ $Now let's consider the linear 2x2 system:
$$\begin{align*} \dot{x}&=ax+by\\ \dot{y}&=cx+dy, \end{align*}$$ where the $a,b,c,d$ are constants.
$\ \ \ \ \ $We want to learn to write the system efficiently in matrix form. So, throughout the derivation, we will give the expanded matrix form of our manipulations on the left, and the abridged form on the right. For example, our system is:
$$\begin{align} \begin{pmatrix} \dot{x}\\ \dot{y} \end{pmatrix}= \begin{pmatrix} a&b\\ c&d \end{pmatrix} \begin{pmatrix} x\\ y \end{pmatrix}&\iff&\dot{\textbf{x}}=A\textbf{x}\end{align}\tag{1}$$
$\ \ \ \ \ $We look for solutions to our system having the form
$$\begin{align} \begin{pmatrix} x\\ y \end{pmatrix}= e^{\lambda t}\begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}= \begin{pmatrix} e^{\lambda t}a_{1}\\ e^{\lambda t}a_{2} \end{pmatrix}&\iff&\textbf{x}=e^{\lambda t}\textbf{a}\end{align},$$
where $a_{1},a_{2}$ and $\lambda$ are unknown constants. We substitute this into the system (1) to determine these unknown constants. Since $D(ae^{\lambda t})=\lambda ae^{\lambda t},$ we arrive at
$$\begin{align} \lambda e^{\lambda t}\begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}= e^{\lambda t}\begin{pmatrix} a&b\\ c&d \end{pmatrix} \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}&\iff&\lambda e^{\lambda t}\textbf{a}=Ae^{\lambda t}\begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}\end{align}.$$
We can cancel the factor $e^{\lambda t}$ from both sides, getting
$$\begin{align} \lambda \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}= \begin{pmatrix} a&b\\ c&d \end{pmatrix} \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}&\iff&\lambda \textbf{a}=A\begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}\end{align}.$$
As it stands, we cannot combine the two sides by subtraction, since the scalar $\lambda$ cannot be subtracted from the square matrix on the right. The trick is to replace the scalar $\lambda$ by the diagonal matrix $\lambda I$. This gives
$$\begin{align} \begin{pmatrix} \lambda&0\\ 0&\lambda \end{pmatrix} \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}=\begin{pmatrix} a&b\\ c&d \end{pmatrix} \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}&\iff&\lambda I\textbf{a}=A\begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}. \end{align}$$
Subtracting the left side from the right one and using the distributive law for matrix addition and multiplication, we get a 2 x 2 homogeneous linear system of equations:
$$\begin{align} \begin{pmatrix} a-\lambda&b\\ c&d-\lambda \end{pmatrix} \begin{pmatrix} a_{1}\\ a_{2} \end{pmatrix}= \begin{pmatrix} 0\\ 0 \end{pmatrix}&\iff&(A-\lambda I)\textbf{a}=\textbf{0}. \end{align}$$
Written out without using matrices, the equations are
$$\begin{align} (a-\lambda)a_{1}+ba_{2}&=0\\ ca_{1}+(d-\lambda)a_{2}&=0\tag{2} \end{align}$$
According to the theorem on square homogeneous systems this system has a non-zero solution for the $\textbf{a}$’s if and only if the determinant of the coefficients is zero, i.e.,
$$\begin{align} \begin{vmatrix} a-\lambda&b\\ c&d-\lambda \end{vmatrix} =0 &\iff&|A-\lambda I|=0. \end{align}$$
Evaluating the determinant we get a quadratic equation in $\lambda$: $$\lambda^2-(a+d)\lambda+(ad-bc)=0.$$ Definition. $\ \ $This is called the characteristic equation of the matrix
$$A=\begin{align} \begin{pmatrix} a&b\\ c&d \end{pmatrix} \end{align}$$
and is often denoted $p_{A}(\lambda).$ Its roots $\lambda_{1}$ and $\lambda_{2}$ are called the eigenvalues, characteristic values, or proper values of the matrix $A$.
Remark.$\ \ $ In calculating the characteristic equation notice that $$\begin{align} ad-bc=\det{A}\qquad&\qquad a+d=\text{tr }A. \end{align}$$
Using this, the characteristic equation for a 2 x 2 matrix A can be written as $$\lambda^2-(\text{tr }A)\ \lambda+\det{A}=0.$$
In this form, the characteristic equation of $A$ can be written down by inspection; you don’t need the intermediate step of writing down $|A-\lambda I|=0$.
Remark$\ \ $ Abridged vs. expanded notation
In the manipulations above, the matrix notation on the right is compact to write, which makes the derivation look simpler. On the other hand, its chief disadvantage for beginners is that it is very compressed. Practice writing the sequence of matrix equations so you get some skill in using the notation. Until you acquire some confidence, keep referring to the written out form on the left, so you are sure you understand what the abridged form is actually saying.
$\ \ \ \ \ $There are now various cases to consider, according to whether the eigenvalues of the matrix $A$ are:
two distinct real numbers,
a single repeated real number,
a pair of conjugate complex numbers.
§2.1 REAL DISTINCT EIGENVALUES
$\ \ \ \ \ $To complete our work, we have to find the solutions to the system (2) corresponding to the eigenvalues $\lambda_{1}$ and $\lambda_{2}$. Formally, the systems become
$$\begin{align} (a-\lambda_{1})a_{1}+ba_{2}&=0&(a-\lambda_{2})a_{2}+ba_{2} &=0\\ ca_{1}+(d-\lambda_{1})a_{2}&=0&ca_{1}+(d-\lambda_{2})a_{2}&=0\tag{3}\end{align}$$ The solutions to these two systems are column vectors, for which we will typically use $\textbf{v}$.
Definition.$\ \ $ The respective solutions $\textbf{a} = \textbf{v}_1$ and $\textbf{a} = \textbf{v}_2$ to the systems (3) are called eigenvectors (or characteristic vectors) corresponding to the eigenvalues $\lambda_{1}$ and $\lambda_{2}$.
Remarks.
If the work has been done correctly, in each of the two systems in (3), the two equations will be dependent, i.e., one will be a constant multiple of the other. Why? The two values of $\lambda$ have been selected so that in each case the coefficient determinant $A − \lambda I$ will be zero, which means the equations will be dependent.
The solution $\textbf{v}$ is determined only up to an arbitrary non-zero constant factor: if $\textbf{v}$ is an eigenvector for $\lambda$, then so is $c\textbf{v}$, for any real constant $c$; because of this, the line through $\textbf{v}$ is sometimes called an eigenline. A convenient way of finding the eigenvector $\textbf{v}$ is to assign the value $1$ to one of the $a_{i}$, then use the equation to solve for the corresponding value of the other $a_{i}$. (First try $a_{1} = 1$; if that does not work, then $a_{2} = 1$ will.)
$\ \ \ \ \ $Once the eigenvalues and their corresponding eigenvectors have been found, we have two independent solutions to the system (1). They are
$$\begin{align} \textbf{x}_{1}(t)&=e^{\lambda_{1}t}\textbf{v}_{1},& \textbf{x}_{2}(t)&=e^{\lambda_{2}t}\textbf{v}_{2},&\text{ where}\qquad &\textbf{x}_{i}=\begin{pmatrix}x_{i}\\y_{i}\end{pmatrix}. \end{align}$$
Definition. $\ \ $In science and engineering applications, these are usually called the normal modes.
$\ \ \ \ \ $Using the superposition principle, the general solution to the system (1) is $$\begin{alignat}{3}\textbf{x}&=c_{1}\textbf{x}_{1}+c_{2}\textbf{x}_{2}&=c_{1}e^{\lambda_{1}t}\textbf{v}_{1}+c_{2}e^{\lambda_{2}t}\textbf{v}_{2}\end{alignat}.$$
Remarks.
The normal nodes often have physical interpretations; this means that it is sometimes possible to find them just by inspection of the physical problem.
In the compact notation, the definitions and derivations are valid for square systems of any size. Thus, for instance, you know how to solve a 3x3 system, if its eigenvalues turn out to be real and distinct.