On the wikipedia page for the discrete Fourier transform, the first sentence says:
In mathematics, the discrete Fourier transform (DFT) converts a finite list of equally spaced samples of a function into the list of coefficients of a finite combination of complex sinusoids, ordered by their frequencies, that has those same sample values.
But as I learned it, the Fourier matrix $\mathcal{F}$ takes as its inputs vectors of coefficients $(c_0, c_1, \dotsc, c_{n-1})$ and produces a vector $(f(0), f(2\pi/n), f(2\cdot 2\pi/n), \dotsc, f((n-1)\cdot 2\pi/n))$, where $$f(x) = c_0 + c_1 e^{ix} + c_2 e^{2ix} + \dotsb + c_{n-1}e^{(n-1)ix}.$$
So the above description seems to be describing what the inverse Fourier matrix does. Why is this?
It's little bit more than just convention and comes down to how you identify the characters of the underlying group. Of course once you make a choice, everything works out similarly.
In the case of DFT, the group is the finite cyclic group $\mathbb{Z}_n$. When talking about characters, i.e. group homomorphims from $\mathbb{Z}_n$ to the unit circle, you can go around the circle either counter-clockwise or clockwise. If the former, $+i$ appears in your formula. If the latter, $-i$.
Same thing applies to the Fourier transform on $\mathbb{R}$. Do you wind the line around the circle counter-clockwise or clockwise, i.e. $i$ or $-i$? Do you wind it with period $1$ or $2 \pi$?...etc.