Calculate $\mathbf{A}^+\mathbf{y}$ indirectly without accessing $\mathbf{A}$

Question

Calculate $\mathbf{A}^+\mathbf{y}$ indirectly without accessing $\mathbf{A}$

143 Views Asked by Bumbble Comm At 04 May 2026 - 11:00

The Original Question (informal):

I have a black-box linear system $f:\mathbb{R}^N\rightarrow\mathbb{R}^M~(0\ll M\ll N)$. It is guaranteed that there exists a full-rank matrix $\mathbf{A}\in\mathbb{R}^{M\times N}$ satisfying $\forall \mathbf{x}\in\mathbb{R}^N,f(\mathbf{x})\equiv\mathbf{Ax}$. Here I call $\mathbf{A}$ the transformation matrix of $f$.

Now given a vector $\mathbf{y}\in\mathbb{R}^M$, I want to calculate $\mathbf{z}=\mathbf{A}^+\mathbf{y}\in\mathbb{R}^N$, where $\mathbf{A}^+\in\mathbb{R}^{N\times M}$, satisfying $\mathbf{AA}^+=\mathbf{I}_M$ in some cases, is the Moore–Penrose inverse of $\mathbf{A}$ (Reference 1).

There are some supplementary information:

I may know that there is a way to get the explicit form of $\mathbf{A}$ from a given fixed $f$ (Reference 2). However, since $M$ and $N$ are both very large numbers (about $10^6$ or even $10^9$), I would not get $\mathbf{A}$, explicitly calculate $\mathbf{A}^+$, and finally calculate $\mathbf{z=A}^+\mathbf{y}$. (The computational complexity of direct calculation is too high for me.) I may want an indirect way of obtaining $\mathbf{z=A}^+\mathbf{y}$.
The inner structure of linear $f$ is very complicated. Actually, $f$ is a black-box. But in my system, I can conveniently calculate $f(\mathbf{r})$ for any given $\mathbf{r}\in\mathbb{R}^N$. In other words, the forward pass of $f$ is fast and efficient.
I may not need $\mathbf{A}$, $\mathbf{A}^+$ or some operators about them. I may only want $\mathbf{z=A}^+\mathbf{y}$, which is known to be unique when $\mathbf{y}$ and $f$ are given and fixed.
There are no prior knowledge about $f$, $\mathbf{A}$ and $\mathbf{y}$. In other words, their inner values or elements can be random numbers, like some noise.

So, how to get $\mathbf{z=A}^+\mathbf{y}$ efficiently?

Some of My Efforts:

I was trying to search an $\mathbf{x}$ satisfying $\mathbf{Ax=y}$. To be concrete, I use a tool dubbed PyTorch (a deep learning framework on Python) to optimize a randomly initialized $\mathbf{x}$ with a loss function $\mathcal{L}=\lVert f(\mathbf{x})-\mathbf{y} \rVert _2^2$. And $f$ is a neural network in my own implementation. When $\mathcal{L}$ hits $0$, I stop my optimization program and get the estimation.

However, since $0\ll M\ll N$, there may exist $\mathbf{x}_1$ and $\mathbf{x}_2$ satisfying $\mathbf{x}_1\ne \mathbf{x}_2$ and $\mathbf{Ax}_1=\mathbf{Ax}_2=\mathbf{y}$. Therefore, I think this method could not exactly find the unique $\mathbf{z=A}^+\mathbf{y}$ that I want.

Does there exists an efficient way to achieve this?

There may be two statements (but in fact, only one of them is true):

The answer is "Yes". There exists a way to efficiently calculate $\mathbf{z=A}^+\mathbf{y}$ from given fixed $f$ and $\mathbf{y}$, without accessing the explicit forms of $\mathbf{A}$ or $\mathbf{A}^+$. In other words, there are some properties of $\mathbf{z=A}^+\mathbf{y}$ can be utilized. But I have still not found them.
The answer is "No". To get $\mathbf{z=A}^+\mathbf{y}$, I should try to get the explicit form of $\mathbf{A}^+$ and then calculate $\mathbf{A}^+\mathbf{y}$. There are no efficient and indirect ways.

After a long struggle, I still have no idea about this problem. Now I may tend to believe that the above second statement is true.

Any solutions, suggestions and discussions about this problem would be appealing for me.

I am still searching, trying and thinking ...

Original Q&A

There are 2 best solutions below

Bumbble Comm On 13 Dec 2022 - 2:35

Contrary to David's answer, there is neither a reason nor a necessity to assume $A^⊤v$ is easily computable in this case. Nevertheless, matrix free methods are the right tool. Consider the following characterization of the pseudoinverse:

$$ A^+y = \operatorname*{arg\,min}_{x∈S(A, y)} ‖x‖^2 \qquad S(A,y) = \operatorname*{arg\,min}_{x} ‖Ax-y‖^2$$

That is, the Pseudoinverse is the minimal norm solution among all minimizers of $‖Ax-y‖^2$ (notice that $S(A,y)$ is an infinite set if the system is underdetermined). Equivalently, we can write

$$ x^*(λ) = \operatorname*{arg\,min}_{x}‖Ax-y‖^2 + λ‖x‖^2 $$

Which has a unique solution for all $λ>0$ due to strict convexity. One can show that $A^+y = \lim_{λ→0} x^*(λ)$, i.e. the pseudoinverse essentially gives the value for an infinitesimal amount of $L^2$-regularization. This gives immediately the following algorithm:

Initialize λ, x randomly.

while not converged:
   x = minimize(‖Az-y‖² + λ‖z‖², z₀=x)
   λ = shrink(λ)  # for example λ/2

return x

This is called a homotopy method, because we gradually transform the regularized problem into the unregularized problem while carrying the solution along with us.

**Bumbble Comm** · Accepted Answer

The Wikipedia article on Moore-Penrose inverses includes the following formula, valid if $\mathbf A$ has linearly independent rows: $$\mathbf A^+ = \mathbf A^T (\mathbf A \mathbf A^T)^{-1}$$ where I've replaced their $\mathbf A^*$ with $\mathbf A^T$ because your matrix entries are real-valued.

You said we can assume $\mathbf A$ has full rank, so this formula is valid in your case. (There's also a separate formula that instead assumes $\mathbf A$ has l.i. columns, which would obviously be false since you have $M \ll N$.)

You can use this formula to compute $\mathbf z = \mathbf A^+ \mathbf y$ without building $A^+$. In particular:

Find a solution $\mathbf{v}$ to the system $\mathbf{A} \mathbf{A}^T \mathbf{v} = \mathbf{y}$.
The final vector you want is $\mathbf{z} = \mathbf{A}^T \mathbf{v}$.

Step (2) is easy if your black box allows you some way to efficiently compute $\mathbf A^T \mathbf v$ for an input vector $\mathbf v \in \mathbf{R}^M$.

Step (1) looks hard, but actually this is a well-studied problem; see the Wikipedia article on "matrix-free methods" which exactly refers to finding approximate solutions to linear systems when we only access the matrix as a black box. These methods mostly come down to multiplying by the linear system's matrix a bunch of times. In your case, to multiply by $\mathbf A \mathbf A^T$ you should not build the full matrix; instead you could take a starting vector $\mathbf u_0$, then compute $\mathbf u_1 = \mathbf A^T \mathbf u_0$, then finally compute $u_2 = \mathbf A \mathbf u_1$. The step $\mathbf u_1 \mapsto \mathbf u_2$ is just an application of your black box function $f$. The step $\mathbf u_0 \mapsto \mathbf u_1$ again relies on you having some relatively efficient method for applying $\mathbf v \mapsto \mathbf A^T \mathbf v$.

In summary, you should look at the details of how you're computing $f$ and see if you can set up a similar black-box way to efficiently compute $\mathbf A^T \mathbf v$ for input vectors $\mathbf v \in \mathbb{R}^M$. If you can do that, then the above strategy will let you multiply by $\mathbf A^+$ way more efficiently than fully building+storing $\mathbf A^+$.

Calculate $\mathbf{A}^+\mathbf{y}$ indirectly without accessing $\mathbf{A}$

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in OPTIMIZATION

Related Questions in MATRIX-EQUATIONS

Related Questions in PSEUDOINVERSE

Trending Questions

Popular # Hahtags

Popular Questions