I'm sure I'm missing something here but it's honestly giving me a headache.
First of all, by the Singular Value Decomposition theorem I can decompose any $m \times n$ matrix into: $$A = \sum_{i=1}^k\sigma_i u_i v_i^T = U\Sigma V^T$$ where $k$ is the rank of $A$, and $\Sigma$ is a diagonal matrix whose entries are the singular values of $A$ in descending order.
Now, I can take $\tilde{k}<k$ to have the following representation: $$A_{\tilde{k}}= \sum_{i=1}^{\tilde{k}}\sigma_i u_i v_i^T $$ which of course has less elements (as I'm cutting off sums from the first equation). This is called the "best $\tilde{k}$ rank approximation".
All in all, I end up approximating $A$ by another matrix with less elements.
Now let's put this in practice: I have a $784\times 784$ matrix. Suppose its rank is $784$ aswell. Then of course the matrix has $784*784 = 616656$ elements. Now let's make a rank $700$ approximation. I've seen that the SVD approximation I just explained reduces the parameters to $\tilde{k}(m+n)$, so that would make $700*(784+784) = 1,097,600$ parameters.
How can I end up with more parameters if the $A_{\tilde{k}}$ approximation literally cut off sums from the representation of $A$? I get that for very small values of $\tilde{k}$ it will have less parameters, but how come that it does have more parameters with $\tilde{k}=700$ if it's cutting 84 sums from the representation?? It should be impossible right?
You have an approximation by using a different kind of "encoding". This encoding is very inefficient if $\tilde{k}$ is large but can be very efficient if $\tilde{k}$ is small. Yes, you originally leave out some additions by restricting some singular values, but you are saving the result of the sum, not the summands when using standard-encoding.
It's like saving a single 6 over 1+2+3. But if you use those summands a lot more often it can be efficient to save only those.