Thanks to Godel's lemma it is possible to 'code' a great deal of finite combinatorics and number theory within mathematics. How exactly do we do this? In particular, how do I make statements about functions and unordered (finite) sets in the language of $P\!A$?
For finite sets, I have seen a convention that we code them as sequences in increasing order. For example, I code the set $\{x_0,\ldots,x_{k-1}\}$ by $z$ where $z$ codes the sequence $(x_0,\ldots,x_{k-1})$, if $x_0<\cdots<x_{k-1}$; that is, $\beta(z,i)=x_i$ for all $i<k$, where $\beta$ is Godel's beta function.
How is this then implemented into formulae, do I need a formula $\psi(t)\leftrightarrow \text{'$t$ codes a finite set'}$? For example, if I wanted to say that every finite set from $N$ has a least element: $$\forall n\big(\psi(n)\rightarrow \forall i\le n((n)_0\le(n)_i)\big),$$ where $(n)_i$ abbreviates $\beta(n,i)$.
As for functions, I code them with sets of ordered pairs (sequences of length two). $z$ codes a function from $X$ to $Y$ (where $X$ and $Y$ are finite sets of naturals) if $z$ is a finite set such that each element $x_i$ is an ordered pair where $(x_i)_0$ is in $X$ and $(x_i)_1$ is in $Y$. With my notation (mimicking my above attempt), this gets messy: $$(\text{$z$ codes a function $f\colon X\to Y$})\leftrightarrow\psi(z)\wedge\forall i<\mathrm{len}(z)\big[ \\\mathrm{len}((z)_i)=2 \\\wedge\exists j<\mathrm{len}(X)\big(((z)_i)_0=(X)_j\big) \\\wedge\exists j<\mathrm{len}(Y)\big(((z)_i)_1=(Y)_j\big) \\\wedge\big(\forall j,k<\mathrm{len}(z)(((z)_j)_0=((z)_k)_0\rightarrow((z)_j)_1=((z)_k)_1)\big)\big],$$ where $\mathrm{len}(z)$ is the length of the sequence coded by $z$.
Am I along the right track? Any sources would be appreciated, I cannot find any examples of formalizing statements in $P\!A$!
Yes, that's basically right. The details of Goedel coding (rather than Goedel's lemma - I'm not sure what lemma you refer to, but it's really the definitions here that are powerful) are dealt with in any good exposition of Goedel's theorem (e.g. Peter Smith's book), but they tend to use only what is needed - e.g. if they don't need to (say) use natural numbers to code finite Abelian groups, they won't show how this is done.
But you've hit on the right track. Basically:
We know how to code finite sets.
OK, great! Every other finite object we need in finite combinatorics/algebra/etc. is a finite set with structure, and these can always be viewed as a set of sets satisfying certain properties (you've only mentioned talking about sets, but we can also talk about sets of sets using the same idea, and sets of sets of sets, and so on). So no new ideas are needed here.
For example, let's look at Ramsey's theorem for pairs:
Here's how we code this up.
First, the set of edges of the complete graph on $k$ vertices is the set of two-element subsets of $\{1, 2, ..., k\}$.
And a two-coloring of a graph can be represented as a subset of the edges (you're colored red if you're in the subset, and blue otherwise).
So now we know how to say "For every two-coloring of the complete graph on $k$ vertices": this is just "For all sets of two-element sets of numbers $\le k$."
Given a coloring $c$, a homogeneous set of size $n$ is a subset $S$ of $\{1, 2, ..., k\}$ such that $\vert S\vert=n$ and for each $a, b, c, d\in S$ with $a\not=b$ and $c\not=d$, we have $\{a,b\}\in S\iff \{c, d\}\in S$.
There's nothing tricky here, so you know how to say "there is a homogeneous subset of size $n$."
And now we're done! Putting together all the pieces we get an expression in the language of arithmetic which codes the statement "For each $n$, there is some $k$ such that every two-coloring of the complete graph on $k$ vertices has a homogeneous set of size $n$."
Note that this was all entirely routine: the point is that once you know how to code finite sets, coding other finite objects presents no difficulty.
Note that in this example, we had a number of choices. E.g. we represented a coloring as a single set, while it would arguably be more natural (but also more tedious) to represent it as a partition of the vertex set. Different choices lead to different codings, but that's fine: we only ever need some coding which "behaves well" (that is, such that PA proves the important facts about it - e.g. in the case of Goedel's theorem, we need the fact that if PA proves $\varphi$, then PA proves the arithmetized statement "PA proves $\varphi$", usually written "$\Box_{PA}\varphi$").