Background
I have the following homogeneous ODE system as an Initial Value Problem: $$ y'=A\cdot y\quad\wedge\quad y(0)=y_0 $$ where $y\in\mathbb{R}^{N\times 1}$ is the unknown vector and $A\in\mathbb{R}^{N\times N}$ is a known, constant coefficient nonsingular, diagonalizable matrix with only $N$ independent entries.
For instance, if $N=3$ it could be: $$ A=\begin{pmatrix} -x_1 & 0 & 0\\ x_1 & -x_2 & 0\\ 0 & x_2 & -x_3 \end{pmatrix} $$ with $x_1,x_2,x_3$ known. It is possible to express a closed form solution of this system as: $$ y(t)=\sum_{i=1}^NK_i e^{\lambda_it}\cdot u_i $$ where $\lambda_i$ is the i-th eigenvalue of $A$ and $u_i$ is the i-th eigenvector of $A$. The $K_i$ are constants of integration such that the Initial Condition is followed.
Question
Suppose to have the above ODE system, with the matrix $A$ with a known structure but the entries' values $x_1,\ldots,x_N$ are unknown.
Suppose also to have $t_k$ and $y(t_k)=y_k$ values for $k$ "experiments" with $k\gg N$, including the initial condition state, $k=0\to t_{k=0}=0$.
Is there a way to calculate such entries values by means of a regression? If so, what kind of regression? Also we assume that $y_k \sim \mathcal{N}(0,\sigma^2)$.
Above all, I think the answer to the first question being "yes" since the number of unknown parameters $N$ is way less than the number of experimental data $k$.
Edit 1 - The Brute Force Approach
Since the eigenbasis of $A$ is invariant under uniform scaling, meaning that for any constant $K\neq 0$ if $u_i$ is an eigenvector of $A$ (with eigenvalue $\lambda_i$) then $w_i=Ku_i$ is also an eigenvector (with eigenvalue $\lambda_i$), since: $$ Au_i=\lambda_iu_i\quad\to\quad AKu_i=\lambda_iKu_i\quad\to\quad Aw_i=\lambda_iw_i $$ the general solution can be rewritten as a model: $$ \hat{y}(t,\lambda,W)=\sum_{i=1}^N e^{\lambda_it}\cdot w_i $$ or, by component: $$ y_j(t) = \sum_{i=1}^N e^{\lambda_it}\cdot w_{ji} $$ This yields $N$ equations with a total of $N+N^2$ parameters ($N$ eigenvalues and $N^2$ eigenvector components of the matrix $W=\{w\}_{ji}$) and thus $k\gg N(N+1)$ is required at least.
The regression is expressed as a minimization problem: $$ \min_{\lambda, W}\sum_{j=1}^{k} \left\lvert y_{j} - \hat{y} (t_j,\lambda,W)\right\rvert_2 $$ and once $\lambda$ and $W$ are known, then: $$ A=W^{-1}\mathrm{diag}(\lambda)W $$ This process is however tedious, boring, and does not exploit the structure and properties of A. What are the improvements?
From the following system at the generic time $t$, the model and experimental values: $$ \hat{\dot{y}}_t=\hat{A}y_t\quad\wedge\quad \dot{y}_t=\frac{y_{t+1}-y_t}{\Delta t} $$ Then, the error vector is quadratic with $e_t=\dot{y}_t-\hat{\dot{y}}_t$, omitting $t$ for clarity: $$ ε_t\left(A\right)=e^Te=\left(\dot{y}-Ay\right)^T(\dot{y}-Ay) $$ The modified error $ε'$ is obtained without all terms that do not include $A$, so $\arg\min ε = \arg \min ε'$ $$ ε'\left(A\right)=-{\dot{y}}^TAy-y^TA^T\dot{y}+y^TA^TAy $$ We have$\left(Ay\right)^T=y^TA^T$, so $Ay=\left(y^TA^T\right)^T$, and moreover ${\dot{y}}^TAy=\left(Ay\right)^T\dot{y}$, so: $$ \varepsilon^\prime\left(A\right)=-{\dot{y}}^TAy-\left(Ay\right)^T\dot{y}+\left(Ay\right)^TAy=-2\left(Ay\right)^T\dot{y}+\left(Ay\right)^TAy $$ So we have a way to obtain the objective function to be minimized: $$ \varepsilon^\prime\left(A\right)=\sum_{i=1}^{N}{\left(Ay\right)_i\left(\left(Ay\right)_i-2{\dot{y}}_i\right)}=\sum_{i=1}^{N}\left(Ay\right)_i^2-2\sum_{i=1}^{N}{\left(Ay\right)_i{\dot{y}}_i} $$ We let $\left(Ay\right)_i$ be the new variable, or better have the vector $\hat{B}=\hat{\dot{y}}=\hat{A}y$: $$ \varepsilon^\prime\left(B\right)={\hat{B}}^T\hat{B}-2{\hat{B}}^T{\dot{y}}_i $$ The minimum condition is respected: $$ \frac{\partial\varepsilon^\prime\left(B\right)}{\partial{\hat{B}}_i}=2{\hat{B}}_i-2{\dot{y}}_i=0\ \ \ \rightarrow\ \ \ \ \frac{\partial^2\varepsilon^\prime\left(B\right)}{\partial{\hat{B}}_i^2}=2>0\ \ \ \rightarrow\ \ \ \text{min}\ \ \forall\ i $$ So, we have $\left(Ay\right)_i={\dot{y}}_i$, when the objective function is minimized, it corresponds to the actual objective linear system for all components. We have reduced this to a linear regression, where $A_i^T$ is the transposed $i$-th row of A: $$ {\dot{y}}_i=\sum_{j=1}^{N}{A_{ij}y_j}=\left(\begin{matrix}y_1&\cdots&y_N\\\end{matrix}\right)\begin{pmatrix}A_{i1} \\\vdots\\A_{iN}\end{pmatrix}=Y_tA_i^T $$ At each time instant, we have a matrix expression for each $K$ time sample point: $$ \begin{pmatrix}\dot{y}_{i,t_1} \\\vdots\\\dot{y}_{i,t_K}\end{pmatrix}=\begin{pmatrix}% y_{1,t_1} & \ldots & y_{N,t_1}\\ \vdots & \ddots & \vdots\\ y_{1,t_K} & \ldots & y_{N,t_K}\\ \end{pmatrix}\begin{pmatrix}A_{i1} \\\vdots\\A_{iN}\end{pmatrix} $$ Thus ${\dot{Y}}_i\in\mathbb{R}^{K\times1}$ and $Y\in\mathbb{R}^{K\times N}$. We can finally obtain: $$ \dot{Y}_i=YA_i^T\ \ \ \rightarrow\ \ \ Y^T\dot{Y}_i=Y^TYA_i^T\ \ \ \rightarrow\ \ \ A_i^T=\left(Y^TY\right)^{-1}Y^T\dot{Y}_i $$ We can repeat this procedure for all $i$, and obtain the rate coefficients per row directly for each time step. We have constructed the matrix A from the data, and the effective rate coefficients are taken directly by concatenating the rows: $$ A=\mathop{\mathrm{cat}}\left(A_i^T\right) $$ This method depends on the choice of $K$ and can lead to numerical instabilities if $K\gg N$ or $K\simeq N\gg 0$.