To make this simple, let's say we have $x: \mathbb{R} \rightarrow \mathbb{R}^2$ such that $$\frac{d}{dt}\vec{x}(t) = \begin{pmatrix} x_1'(t) \\ x_2'(t) \end{pmatrix} = A \vec{x}(t)$$ for some constant matrix $A$. We know that the general solution has the form $$\vec{x}(t) = e^{A t} \begin{pmatrix}x_1(0) \\ x_2(0) \end{pmatrix},$$ where $e^{At}$ denotes the matrix exponential. Suppose we are given $\vec{x}(0)$ and also have the observations $$\hat{x}(t_1), \ldots, \hat{x}(t_k) $$ for some known collection $\{t_i\}_{i=1}^k.$ Right now, we don't worry about whether the observations are noisy; we just assume they are perfect.
The goal is to recover the matrix $A$, rather than $\vec{x}(t)$. All we know is that $A$ is constant.
This is stumping me, because it seems so simple; however, the standard matrix differential equation techniques (e.g. here) usually assume that $A$ is given.
One option numerically is to define the error function $$ err(A) = \sum_{i=1}^k \| \vec{x}(t_i,A) - \hat{x}(t_i)\|^2 $$ for the given $\{t_i\}_{i=1}^k$ and compute $$ \min_{A \in \mathbb{R}^{2 \times 2}} err(A) $$ in $\texttt{Matlab}$ (or some other program). However, I keep getting incorrect/unstable results, which is worrisome since $A$ is only $2 \times 2$.
In general, I am not even sure what it means to minimize $err(A)$ with respect to a matrix, so I also do not understand what $\texttt{Matlab}$ is doing from a theoretical standpoint in this computation. How can one recover the matrix $A$ theoretically or via some brute-force algorithm -- and if the latter is used, why does that algorithm converge to or output the correct $A$?
Your fears are well-founded. What you seek is impossible. The following three initial value problems all have the same solution, yet the matrices are fundamentally different. Let $A_i$ be given by $$A_1 = \begin{bmatrix} \lambda & 0 \\ 0 & \lambda \end{bmatrix}, \quad A_2 = \begin{bmatrix} \lambda & 1\\ 0 & \lambda \end{bmatrix}, \quad A_3 = \begin{bmatrix} \lambda & 0 \\ 0 & \mu \end{bmatrix}, \quad (\lambda \not = \mu)$$ and consider the initial value problems $$x_i'(t) = A_ix(t), \quad x(0) = x_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$$ In each case, the solution is $$x_i(t) = e^{\lambda t} x_0.$$ In short, even when the entire trajectory is exactly available, you cannot discern the matrix which defined the differential equation.
However, an interesting possibility emerges. Identifying the matrix $A$ is equivalent to knowing $Av_j$ for where $\{v_j\}_{j=1}^n$ span the underlying vector space, say, $\mathbb{R}^n$. Suppose $t \rightarrow x_j(t)$ solves the initial value problem $$x_j'(t) = Ax_j(t), \quad x_j(0) = v_j,$$ then in particular $$x_j'(0) = Ax_j(0) = Av_j.$$ You can now approximate $Av_j$ as accurately as your sampling will allow, using, say, a finite difference approximation, i.e., $$Av_j = x_j'(0) \approx \frac{x_j(t+h)-x_j(t)}{h}.$$