Most fractals can be seen as attractors of a given set of affine transformations $\{T_1,\cdots,T_N \}$. There are different ways one can generate a fractal by using this information. The most two common methods are the Deterministic IFS algorithm and the Random IFS algorithm.
The Random IFS Algorithm is a generalization of the Chaos game and essentially works as follows:
Determine the affine transformations $T_1,\cdots,T_N$ that characterize the fractal, and given an initial point $P_0$, iteratively compute $$ P_n= T_i(P_{n-1}), $$ where $i$ is randomly and uniformly chosen among the set $\{1,\cdots,N \}$.
This is quite a slow process, but it can be improved by adjusting the probabilities $p(T_i)$ of choosing transformation $T_i$ at each iteration. More specifically, it is known that the following probabilities are very efficient to speed up convergence: $$ p(T_i) = \frac{\det(M_{T_i})}{\sum_{j=1}^N\det(M_{T_j})} \tag{1} $$
Here, $M_{T_i}$ denotes the matrix of transformation $T_i$, and $det$ is the determinant operator of a matrix. Roughly speaking, you want the probability $p(T_i)$ to be the fraction of the fractal $F$ occupied by $T_i(F)$.
My question: Why is the ratio $(1)$ equal to such a fraction ? Why does $\det(M_{T_i})$ represent the area contraction factor of affine transformation $T_i$?
The fractal is the union of $N$ distorted (and smaller) copies of itself. The random choice determines, in which of the smaller copies the next point will be plotted. By choosing the probabilities proportional to the determinant of $T_i$, the probabilities are proportional to the area occupied by the corresponding copy. Hence this makes the points more evenly distributed among the copies according to area. The uniform random choice on the other hand places the same amount of points in each smaller copy, which results in a higher point density in a small copy vs. a low density in a large copy.
Note that due to overlapping among the copies, the formula $(1)$ may not be optimal after all - it is only a good heuristic. To see this just add another transform $T_{N+1}=T_N$. Then $T_N$ will essentially be executed tiwce as often even though the limit set is the same.
Also note that one may be interested in an invariant distribution instead of an invariant compact set - in that case, playing around with probailities in the game greatly changes the outcome!
Remains the question: Why is $\det T$ (or preferably $|\det T|$) the volume scaling factor? Instead of considering the general case, note that this assertion is correct in a lot of basic cases: the identity, a skew translation, a reflection, a rotation (all with determinant $\pm1$), and also for a scaling (where the determinant is $c^n$ for scaling with factor $c$). All other linear transforms are a finite product of such transforms, hence the property hoilds for them as well