Goal: Consider a control problem of the form: $$ \dot{x}(t) = M(x(t)) u(t) $$ where $x\in\mathbb{R}^{n}$, $u\in\mathbb{R}^m$ and $m<n$. Moreover, all components of the matrix $M(x)$ are non-zero for all $x$. The goal is to (ideally) make $x(t)$ converge to a desired value $x_r$, or at least get as close as possible to it.
My thoughts: The issue is that we have less control inputs than outputs ($m$ inputs, $n$ outputs), so evidently it may not be possible to make $x\to x_r$ for arbitrary $x_r$. However, if one focuses to a particular component $x_i$ one gets: $$ \dot{x}_i = \sum_{j=1}^m M_{ij}(x)u_j $$ So if one wanted only to make $x_i\to [x_r]_i$ (regarless of the other components) one can choose $u_j = v_j/M_{ij}(x)$ and then any combination of the virtual inputs $v_j$ would suffice such that $\sum_{j=1}^n v_j = -k(x_i-[x_r]_i)$ with $k>0$. Of course, it is not possible to do this for all outputs $x_i$ at the same time.
However, I wonder if it is possible to compute a control $u$ such that $\|x^*-x_r\|$ is minimal where $x^*$ is the equilibrium reached through the control $u$.
Through simulations, using $u = -kM(x)^+ (x-x_r)$ with $k>0$ I obtain good results, where $M(x)^+$ is a Moore-Penrose pseudo-inverse. However, I haven't been able to prove anything about this particular controller.
I guess you can find examples where this is not possible at all somehow, but it would be useful to discuss also under which situation this is possible, in order for me to look into the particular matrices $M(x)$ I'm interested in, and check if they comply some useful property. For example, is it useful to assume that the columns of $M(x)$ are linearly independent?
First, let us ask a simpler question. Does their exist a $w$ such that $Mw = v$? When $m<n$ in general their is not. The next question is what $w$ minimizes the distance $||v - Mw||^{2}$. The answer is $w = \left(M^{T}M\right)^{-1}Mv$, and notice that the penrose inverse is $M^{+} = \left(M^{T}M\right)^{-1}M$ so $w = M^{+}v$.
The controller you propose $u=−kM(x)^{+}(x−x_r) = k( M(x)^{+}x_r - M(x)^{+}x)$ gives you the dynamics $\dot{x} = -kM(x)(\tilde{x}_{r} - \tilde{x})$ where $\tilde{x}_{r} = argmin_{w}||x_r-M(x)w||^{2}$ and $\tilde{x} = argmin_{w}||x-M(x)w||^{2}$ are the closest $M(x)w$ can get to representing $x_r$ and $x$. So if your dynamics gets to a point where $\dot{x} = 0$ then you have found the point where either $\tilde{x} = \tilde{x}_{r}$ or $\tilde{x} - \tilde{x}_{r}$ is in the null space of $M(x)$. If you assume that columns of $M(x)$ are linear indepedent then $\tilde{x} = \tilde{x}_{r}$ and therefore $x$ is the closest point to $x_r$ (linear independent columns gives you that the null space of $M(x)$ is empty and that $argmin_{w}||x-M(x)w||^{2}$ is unique).
However, this is the closest for that particular matrix $\textbf{M(x)}$. You need some assumption about how $M(x)$ changes as a function of $x$ to make a stronger statement. Without additional assumptions you can't say that you have reach the global or even a local optimal because a small pertubation could result in a another matrix $M(x')$ that has a smaller $||x_r-M(x')w||^{2}$. I'm not exactly sure what kind of meaningful assumptions can be made without having a problem in mind.