Guillemin and Pollack's Differential Topology Page 164:
$U \subset \mathbb{R}^k$ and $V \subset \mathbb{R}^l$ be open subsets. Let $f: V \to U$ to smooth. Use $x_1, \dots, x_k$ for the standard coordinate functions on $\mathbb{R}^k$ and $y_1, \dots, y_l$ on $\mathbb{R}^l$. Write $f = (f_1, \dots, f_k)$, each $f_i$ being a smooth function on $V$. The derivative $df_y$ at point $y \in V$ is represented by the matrix $$\frac{\partial f_i}{\partial y_j}(y),$$ and its transpose map $df_y^*$ is represented by the transpose matrix. Consequently,
$$f^*dx_i = \sum_{j=1}^l \frac{\partial f_i}{\partial y_j} dy_j = df_i.$$
My solution:
For a smooth function $f$, $df$ is linear. And $df^*$ is the adjoint of the map $df_*$.
Consider a tangent vector $Y \in T_yU$ such that $$Y = \sum_{j = 1}^l Y^j\frac{\partial}{\partial y^j}.$$
Then we have $$(f^*dx_i)(Y) = (f^*dx_i)\sum_{j = 1}^l Y^j\frac{\partial}{\partial y^j}.$$
$f^*$ is linear, $dx_i$ is linear, and the composition of linear function is linear. Hence $f^*dx_i$ is linear. So $$(f^*dx_i)\left(Y^1\frac{\partial}{\partial y^1} + \cdots + Y^l\frac{\partial}{\partial y^l}\right) = (f^*dx_i)\left(Y^1\frac{\partial}{\partial y^1}\right) + \cdots + (f^*dx_i)\left(Y^l\frac{\partial}{\partial y^l}\right).$$
By Commutativity of $Y^j$.: $$ Y^1(f^*dx_i)\left(\frac{\partial}{\partial y^1}\right) + \cdots + Y^l(f^*dx_i)\left(\frac{\partial}{\partial y^l}\right) = \sum_{j=1}^l Y^j (f^*dx_i)\left(\frac{\partial}{\partial y^j}\right).$$
Use the definition $$f^*\omega = \omega \circ f_*.$$
We have $$\sum_{j=1}^l Y^j (f^*dx_i)(\frac{\partial}{\partial y^j}) = \sum_{j=1}^l Y^j dx_i(f_*(\frac{\partial}{\partial y^j})).$$
Because $f$ maps from $V \subset \mathbb{R}^l$ to $U \subset \mathbb{R}^k$, it can be written as $$f = (f_1, f_2, . . . f_k),$$ with each $f_i$ being a function of the $y_j \in V \subset \mathbb{R}^l$. Consider and $g:U \to R$ sufficiently differentiable, then the vector field $f_*(\frac{\partial}{\partial y^j})$ on $U$ may be applied to $g$:
$$f_*(\frac{\partial}{\partial y^j})[g(x_1, x_2, . . . x_k)] = \frac{\partial}{\partial y^j}(g(f_1(y_1, . . . y_l), f_2(y_1, . . . y_l), . . . f_k(y_1, y_2, . . . , y_l))),$$ according to the definition $$(F_*X)(f) = X(f \circ F).$$
Hence, $$\frac{\partial}{\partial y^j}(g(f_1(y_1, . . . y_l), f_2(y_1, . . . y_l), . . . f_k(y_1, y_2, . . . , y_l))) = \frac{\partial}{\partial y^j}(g(x_1, \dots, x_k)).$$
Following Chain rule, $$\frac{\partial}{\partial y^j}(g(x_1, \dots, x_k)) = \frac{\partial g}{\partial x^1} \frac{\partial x^1}{\partial y^j} + \cdots + \frac{\partial g}{\partial x^k} \frac{\partial x^k}{\partial y^j} = \sum_{n = 1}^k \frac{\partial g}{\partial x_n} \frac{\partial f_n}{\partial y_j}.$$
That is $$f_*\left(\frac{\partial}{\partial y^j}\right)[g(x_1, x_2, . . . x_k)] = \sum_{n = 1}^k \frac{\partial g}{\partial x_n} \frac{\partial f_n}{\partial y_j}.$$
By commutativity of first-order derivative, $$\sum_{n = 1}^k \frac{\partial g}{\partial x_n} \frac{\partial f_n}{\partial y_j} = \sum_{n = 1}^k \frac{\partial f_n}{\partial y_j} \frac{\partial g}{\partial x_n} .$$
Thus we see that the vector field $f_*\left(\frac{\partial}{\partial y^j}\right)$ satisfies $$f_*\left(\frac{\partial}{\partial y^j}\right) = \sum_{n = 1}^k \frac{\partial f_n}{\partial y_j}\frac{\partial}{\partial x_n}.$$
So $$\sum_{j = 1}^lY^jdx_i(f_*(\frac{\partial}{\partial y^j})) = \sum_{j = 1}^lY^jdx_i\left(\sum_{n = 1}^k \frac{\partial f_n}{\partial y_j}\frac{\partial}{\partial x_n}\right).$$
As before, we use the fact that $dx_i$ is linear, and first-order derivative is commutative, $$\sum_{j = 1}^lY^jdx_i\left(\sum_{n = 1}^k \frac{\partial f_n}{\partial y_j}\frac{\partial}{\partial x_n}\right) = \sum_{j = 1}^lY^j\left(\sum_{n = 1}^k dx_i \frac{\partial}{\partial x_n}\frac{\partial f_n}{\partial y_j}\right).$$
Using $dx_i(\frac{\partial}{\partial x_n}) = \delta_{in}$, $$\sum_{j = 1}^lY^j\left(\sum_{n = 1}^k dx_i \frac{\partial}{\partial x_n}\frac{\partial f_n}{\partial y_j}\right) = \sum_{j = 1}^lY^j\left(\frac{\partial f_i}{\partial y_j}\right) = \sum_{j = 1}^l (dy_j Y)\left(\frac{\partial f_i}{\partial y_j}\right).$$
So, $$\sum_{j = 1}^l (dy_j Y)\left(\frac{\partial f_i}{\partial y_j}\right) =\sum_{j = 1}^l\left(\frac{\partial f_i}{\partial y_j}\right) (dy_j Y).$$
According to Show that $d\phi = \sum \frac{\partial \phi}{\partial x_i}dx_i.$ We have $$df = \sum \frac{\partial f}{\partial y_i}dy_i.$$
So $$\sum_{j = 1}^l\left(\frac{\partial f_i}{\partial y_j}\right) (dy_j Y) =df_i (Y).$$
Since this holds for any $Y \in T_yV$, we have shown that $$f^*dx_i = \sum_{j = 1}^l \frac{\partial f_i}{\partial y_j}dy_j = df_i$$
I'd like to point out, before trying to answer this question, that your definitions confuse me a little. For instance, what is $I(y)$? And in your expression for $\omega$,
$\omega = \sum_{1 \le i_1 < . . . i_k \le n}I(y)dx_i$,
why is there apparently a multi-index of some sort on the $\Sigma$ symbol which doesn't seem (to me at least) to occur in the summand $I(y)dx_i$? Well, perhaps this stuff is explained in Guillemin and Pollack, which I haven't looked at in quite awhile, fine book though it be. Not trying to be critical here, merely seeking clarification.
Having said these things, let's try to prove that
$f^*dx_i = \sum_{j = 1}^l\frac{\partial f_i}{\partial y_j}dy_j = df_i$.
I'm going to try to do this the way I learned it, mostly from general principles; as I indicated above, I don't have a copy of Guillemin and Pollack in front of me, so if I seem like I'm winging it, bear with me . . .
The first thing you need to know is that $f^*$ is the adjoint of the map $f_*$, in the sense usually used in linear algebra: if $T:V \to W$ is a linear map between vector spaces $V$ and $W$, then for any $\sigma \in W^*$ we define the linear functional $T^*\sigma \in V^*$ by the formula $T^*\sigma(v) = \sigma(Tv)$ for vectors $v \in V$. Thus it is easily seen that $T^*:W^* \to V^*$ is also a linear map. This idea of course applies pointwise with respect to $U$ and $V$, i.e. fiberwise with respect to the tangent and cotangent spaces of $U$ and $V$. Now let's look at $f^*dx_i$. For any tangent vector $Y \in T_yU$, we have
$(f^*dx_i)(Y) = dx_i(f_*(Y))$,
and with
$Y = \sum_{j = 1}^l Y^j\frac{\partial}{\partial y^j}$
we obtain
$(f^*dx_i)(Y) = (f^*dx_i)(\sum_{j = 1}^l Y^j\frac{\partial}{\partial y^j})$,
and by linearity of everything this yields
$(f^*dx_i)(Y) = \sum_{j = 1}^lY^j (f^*dx_i)(\frac{\partial}{\partial y^j}) = \sum_{j = 1}^lY^jdx_i(f_*(\frac{\partial}{\partial y^j}))$.
We scrutinize $f_*(\frac{\partial}{\partial y^j})$. With $f = (f_1, f_2, . . . f_k)$, with each $f_p$, $1 \le p \le k$ being a function of the $y_q$, $1 \le q \le l$, and $g:U \to R$ and sufficiently differentiable, the vector field $f_*(\frac{\partial}{\partial y^j})$ on $U$ may be applied to $g$:
$f_*(\frac{\partial}{\partial y^j})[g(x_1, x_2, . . . x_k)] = \frac{\partial}{\partial y^j}(g(f_1(y_1, . . . y_l), f_2(y_1, . . . y_l), . . . f_k(y_1, y_2, . . . , y_l)))$
$= \sum_{n = 1}^k \frac{\partial g}{\partial x_n} \frac{\partial f_n}{\partial y_j}$,
this last equality following from the fact that $x_p = f_p(y_1, y_2, . . . , y_l)$ and the chain rule. Thus we see that the vector field $f_*(\frac{\partial}{\partial y^j})$ satisfies
$f_*(\frac{\partial}{\partial y^j}) = \sum_{n = 1}^k \frac{\partial f_n}{\partial y_j}\frac{\partial}{\partial x_n} $,
and if this is inserted into our previous expression for $(f^*dx_i)(Y)$,
$(f^*dx_i)(Y) = \sum_{j = 1}^lY^jdx_i(f_*(\frac{\partial}{\partial y^j}))$,
it follows that
$(f^*dx_i)(Y) = \sum_{j = 1}^l \sum_{n = 1}^k Y^j dx_i(\frac{\partial f_n}{\partial y_j}\frac{\partial}{\partial x_n})$,
and using $dx_i(\frac{\partial}{\partial x_n}) = \delta_{in}$,
$(f^*dx_i)(Y) = \sum_{j = 1}^l Y^j \frac{\partial f_i}{\partial y_j} = \sum_{j = 1}^l \frac{\partial f_i}{\partial y_j}dy_j(Y) = df_i(Y)$,
since we have $Y = \sum_{j = 1}^l Y^l \frac{\partial}{\partial y_l}$ and $dy_j(Y) = Y^j$. Since this holds for any $Y \in T_yV$, we have shown that
$f^*dx_i = \sum_{j = 1}^l \frac{\partial f_i}{\partial y_j}dy_j = df_i$,
as per request. QED.
Whew! Too many indices and subscripts to keep track of!
Now, as I recall, establishing this formula on $1-$forms allows it to be extended in the usual manner to all of $\Lambda(T*U)$, i.e., all form (fields) by careful use of the definitions of the $\wedge$ product and a lot of maneuvering of matrices and indices.
Gotta run, my night job beckons.