Been stumped on this past paper question for a while, it's in the context of clustering and the next part is using single linkage bottom-up hierarchical clustering to form a dendrogram using your distance function.
My question is what exactly is a distance function in this context? I know about euclidean and Manhattan but how would you use it for this?

There is not going to be a formula for this distance (like for Manhattan, Euclidean, etc). The values must be based on the given table. At a minimum, we'd like to have:
Let $n(p,q)$ be the number of times that $p$ and $q$ are bought together. My first attempt would be: $$d(p,q)=\begin{cases} 0\quad &\text{ if }p=q \\ 1/n(p,q)\quad &\text{ if } p\ne q\end{cases}$$
You can then do some computations to check whether those desirable properties hold (symmetry is ok, the triangle inequality is unlikely to hold).
More sophisticated distances are:
$$d_1(p,q)=\begin{cases} 0\quad &\text{ if }p=q \\ \sum_{r}n(p,r)/n(p,q)\quad &\text{ if } p\ne q\end{cases}$$ which can be interpreted as the reciprocal of conditional probability of buying $q$, having bought $p$. Unfortunately, this is not symmetric.
$$d_2(p,q)=\begin{cases} 0\quad &\text{ if }p=q \\ \sum_{r}(n(p,r)+n(q,r))/n(p,q)\quad &\text{ if } p\ne q\end{cases}$$ which is a symmetric version of $d_1$, but not as transparent.
To get the triangle inequality from anything like this would be unusual. But in applications this is not necessarily a problem.