In regression, linear models are of the form: $$y_i = \pmb z_i^T \pmb\beta_i + \epsilon_i$$
Or we can write this in a more general form with vectors and a design matrix:
$$\pmb y = \pmb Z \pmb \beta + \pmb \epsilon$$
For example, take $\pmb y = \begin{pmatrix}y_1\\y_2\\y_3\\\end{pmatrix}$, $\pmb\beta = \begin{pmatrix}a\\b\\c\\\end{pmatrix}$, $\pmb \epsilon= \begin{pmatrix}\epsilon_1\\\epsilon_2\\\epsilon_3\\\end{pmatrix}$ and $\pmb Z = \begin{pmatrix}1&x_1&x_1^2\\1&x_2&x_2^2\\1&x_3&x_3^2\\\end{pmatrix}$
Then the linear model looks like this:
$$\begin{pmatrix}y_1\\y_2\\y_3\\\end{pmatrix}=\begin{pmatrix}1&x_1&x_1^2\\1&x_2&x_2^2\\1&x_3&x_3^2\\\end{pmatrix}\begin{pmatrix}a\\b\\c\\\end{pmatrix} + \begin{pmatrix}\epsilon_1\\\epsilon_2\\\epsilon_3\\\end{pmatrix}$$
Focusing on $\begin{pmatrix}1&x_1&x_1^2\\1&x_2&x_2^2\\1&x_3&x_3^2\\\end{pmatrix}\begin{pmatrix}a\\b\\c\\\end{pmatrix}$ in particular, how can we interpret what these matrices and vectors are doing? For instance, under the visual interpretation of matrix-vector multiplication, $\pmb Z$ is transforming the parameter vector $\pmb \beta$. However, I'm not sure why this interpretation of a linear model would be helpful? Is it better to interpret these matrices and vectors from a more "data structure" perspective? As in, they're just arrays for conveniently storing values.
The idea of linear regression is to predict a value of $y$ based on a vector of observed explanatory variables $Z=(z_1,z_2,\ldots)$. To fit this regression, i.e., determine the coefficients $\beta_j$ for each $z_j$, you have multiple observations - these are the rows in the data matrices ($y$ and $Z$).
Once you fitted the regression, i.e., obtained a vector $\beta$, you can predict the value of a new observation $y_i$ based only on the vector of its explanatory variables $Z_i$. This prediction is $$\hat{y}_i=\beta_1z_{1i}+\beta_2 z_{2i}+\ldots+\beta_nz_{ni}.$$
This is what your above matrix multiplication does, except it replaces the predicted value $\hat{y}_i$ with the actually observed value $y_i$ and adds an error $\epsilon_i=y_i-\hat{y}_i$ to account for the difference between the observed and predicted outcome $y_i$.
The matrix representation does this for every observation, i.e., every row, so that for $m$ observations you have $m$ equations of this kind. This is the main difference between the single equation $$y_i = \pmb z_i^T \pmb\beta_i + \epsilon_i$$ and the matrix representation $$\pmb y = \pmb Z \pmb \beta + \pmb \epsilon.$$ They are the same if you would add "$\forall i$" to the single equation, resulting in the same system of linear equations. This is how the single equation is intended, but custom is to leave the "$\forall i$" implicit (or people are just too lazy).