Given a vector $z \in \mathbb{R}^n$ and $k < n$, finding the best $k$-sparse approximation to $z$ in terms of the Euclidean distance means solving $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} ||z - x||_2$$ This can easily be done by choosing $x$ such that it consists of the $k$ largest components of $z$ in terms of $| \cdot |$ and zero in every other component.
I was now thinking about what happens if we slightly modify the question to allow for other metrics on $\mathbb{R}^n$. For example, what would happen if we instead try to solve $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} (x-z)^TA(x-z)$$ for a symmetric positive definite matrix $A$? I guess this is much harder, but are there good algorithms that do this?
Since $A$ is symmetric definite positive, we can write $A = S^2$ where $S$ is a square-root matrix of $A$. Introducing $y:= Sz$, your problem can now be rewritten as $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} \Vert Sx - y\Vert^2.$$
This is a quite well-known problem, which I think is known as sparse regression, related to the compressed sensing problem, a research area which was popular around the 2000's. There is a huge litterature on this. I will give a couple of pointers to your question about how to solve it:
One approach is to try to solve directly this nonconvex problem. You can try for instance a projected-gradient method : $$x^{k+1} = P_k(x^k - \lambda S^\top(S x^k - y))$$ where $\lambda < 2 / \Vert A \Vert$, and $P_k$ is the projection onto $\{x \in \mathbb{R}^n : ||x||_0 \le k\}$ which is quite easy to compute (as you wrote yourself, it sets to zero the $n-k$ smallest coefficients). This sequence $(x^k)_{k \in \mathbb{N}}$ is guaranteed to converge (because the problem is semi-algebraic, see the arguments in Example 5.4 of Convergence of descent methods for semi-algebraic and tame problems). But the limit is not guranteed to be a solution of the problem, because the problem is nonconvex.
An other approach, widely used in statistics and signal processing is to consider a convex approximation of the problem, known as the LASSO : $$\min_{\{x \in \mathbb{R}^n : ||x||_1 \le k\}} \Vert Sx - y\Vert^2.$$ Replacing $\Vert \cdot \Vert_0$ with $\Vert \cdot \Vert_1$ makes now the problem convex.
A last note, in case you start to investigate more on the topic with those keywords. The problem you consider is equivalent to $$\min_{\{x \in \mathbb{R}^n : \Vert Sx - y\Vert^2 \le \varepsilon\}} \Vert x \Vert_0$$ and $$\min_{\{x \in \mathbb{R}^n\}} \lambda \Vert x \Vert_0 + \Vert Sx - y\Vert^2$$ with a suitable choice of $\lambda,\varepsilon$ depending on $k$. Depending on what you'll read, one form or the other will be discussed.