I know the definition of minimal sufficient statistics is:
A sufficient statistic T(X) is called a minimal sufficient statistic if, for any other sufficient statistic T'(X), T(X) is a function of T'(X).
I want to know how to interpret this definition from function view. T is smaller than T', if T is a function of T'. What does that mean?

The notion of sufficiency is data compression.
A sufficient stat is a stat which in some sense contains all useful information about the sample $X$. (for estimating the true parameter, of course)
The definition of a sufficient stat is the following: the conditional distribution of X(sample) given T(X) does not depend on $\theta$, which is saying in more human language $T(X)$ is a summary of $X$. Immediately, we see each sufficient stat defines an equivalence relation on the sample space. Again, we're compressing data so a set of samples which yield repetitive information about the true parameter is compressed to a single equivalence class. For example, suppose we do $n$ iid Bernoulli $\theta$ trials, any set of samples such that each sample in the set has the same sum may be regarded as one sample for the purpose of estimating $\theta$.
We want maximal compression, which is a sensible way to understand the samples. All that the definition of MSufficientStat T says is if $T'(x) = T'(Y)$, then T(X) = T(Y). In order to define T as a composition of T' and a function g on the image of T', we may have to even define g in a funny way, but the point is it always exists. I guess we can think about the equivalence classes induced by a 'smaller' sufficient stat as grouping and merging the equivalence classes induced by a 'larger' sufficient stat.