Consider a bijective function $f$ from $\{0, 1\}^n$ to $\{0, 1\}^n$.
Now, sample a random $k$ to $1$ function $g$ (that is, from the set of all possible $k$ to $1$ functions from $\{0, 1\}^n$ to $\{0, 1\}^n$, pick one uniformly at random) and compose it with $f$. Repeat this procedure for $d$ rounds (that is, each time, sample a random $k$ to $1$ function independently and compose it with whatever function we constructed till the previous round.)
I am trying to lower bound the expected image size of the resultant function, where the expectation is taken over the randomness in choosing the functions.
My intuition is that the image size will be sufficiently high. I think one way to show this is to calculate the collision probability (the probability that two randomly picked elements in the image of the resultant function are the same) and show that it is sufficiently small.
Here is a rough analysis.
This problem can be restated as follows: Let $N:=2^n$, and suppose that $k$ divides $N$. Suppose that we start with an urn with $N$ red balls. Then, take an empty urn, and sample $k$ balls from the first urn without replacement. If any of these balls are red, we place a red ball in the second urn; otherwise, place a black ball in the second urn. Now sample $k$ more balls without replacement from the first urn, and do the same thing. Repeat this until we have taken $N/k$ samples of $k$ balls from the first urn and it becomes empty. Then add $N(1-1/k)$ black balls to the second urn and discard the first urn. Call all this one large step, and take $d$ large steps in all. The number of red balls left at the end behaves in the same way as the desired image size.
Since, according to the comments, $n$ is large (so $N$ is large) we can approximate sampling without replacement by sampling with replacement. If we do this, we get $d+1$ random variables, $X_0$, $\dots$, $X_d$, where $X_0=N$ and $X_{i+1}$ is the sum of $N/k$ independent random variables, each of which is 1 with probability $1-(1-X_i/N)^k$ and 0 otherwise; that is, $X_{i+1}$ is binomially distributed with parameters $N/k$ and $1-(1-X_i/N)^k$.
For large $N$ we can approximate this further by assuming that each $X_i$ will be close to its mean. In this case the final answer is $f(d, k) N/k$, where $f(d, k)$ is the result of iterating the function $g_k(z):=1-(1-z/k)^k$ $d$ times, starting at $k$. Numerically, $f(d, k)$ appears to behave as $\Theta(1/d)$; we have $1\le d f(d, k)\le 4$ for all $1\le d\le 1000$ and $2\le k\le 1000$.