Let's say we have N customers that supply a stream of requests, but each customer i supplies different number of requests per minute - $R_i$. All requests are identical in terms of the amount of processing they require.
On the other hand, we have K servers that process those requests. The servers are not identical. Each server has different number of CPU cores and clock frequencies. To make it simple, each server j has processing capacity $C_j$. After certain point, the more requests the server gets, the slower they will be processed.
We can't dynamically assign each new individual request to a different server, but can periodically re-assign customers to servers, and that assignment is a challenge.
I am trying to find possible algorithms for distributing customers between servers to ensure best overall throughput in these conditions, if it's even possible, or at least correct terminology to translate this into mathematical problem.
This is a variant on the assignment problem (https://en.wikipedia.org/wiki/Assignment_problem) called job shop scheduling (https://en.wikipedia.org/wiki/Job_shop_scheduling). It's a very annoying and complicated to model problem, that can take many different forms depending on the exact metric you want to minimize. Ie do you want to minimize total processing time, minimize the maximum delay, minimize total delay, etc.
There's a bunch of things you haven't specified so it's not quite ready for modeling yet. And to be honest, it is probably not a task some random at math.SE should do for you. But there's a good amount of literature describing this problem. The general approach is to split up the time horizon into $K$ parts and create binary variables $x_{ijk}$ that are equal to one if machine $i$ will process customer $j$ at time $k$. In this case it sounds as if you want to have a time interval be equal to one minute, and impose the constraint $$ \forall i,k: \sum_j x_{ijk} \leq C_j,$$
i.e. processor $j$ can only process $C_j$ customers each minute. Note that this is an NP-complete problem. I think the best way to solve it is by linear relaxation utilizing some advanced decomposition techniques (Dantzig-Wolfe), since the resulting linear program will be quite huge.
There is lots of literature, for example http://link.springer.com/chapter/10.1007%2F0-387-25486-2_10 but note that this is a rather sticky problem which requires some acquaintance with convex analysis, linear programming duality and some other topics.
Good luck!