The standard definitions for the discrete fourier transform (DFT) and inverse are as follows:
$$ a_k = \sigma_x \sum_{x=0}^{n-1} a_x \exp\left(-i 2 \pi \frac{x k}{n}\right) \qquad a_x = \sigma_k \sum_{k=0}^{n-1} a_k \exp\left(i 2 \pi \frac{x k}{n}\right) $$
where the $\sigma$'s are the normalization factors, constrained by $\sigma_x \sigma_k = 1/n$. For both the transform and inverse the indices run from $0$ to $n-1$. Applying the Nyquist–Shannon sampling theorem, we say that the high frequencies alias and so interpret everything above the Nyquist frequency as negative frequencies. This means that the summation index in the exponent effectively runs from $0$ to $\lfloor (n-1)/2 \rfloor$ and then from $- \lfloor n/2 \rfloor$ to $-1$. These values directly correspond to the underlying time and frequency grids, which means that the first index of the coordinate grid is the origin, the first half of the grid is positive, and the second half is negative. There is enough symmetry in the two equations that this rationale applies both ways. Regardless of whether it is the forwards or backwards transform, these equations expect the same ordering to their inputs and have the same ordering to their outputs.
Why is this the standard definition and implementation? Is there some reason this ordering was chosen instead of one that has inputs and outputs that are arranged on a monotonic grid (negative time/frequency to positive)? Is it something to do with the algorithms for the fast fourier transform (FFT)?
It is easy to come up with equivalent definitions that order the arrays in a more intuitive monotonic order (just like the continuous fourier transform), that do not require appealing to aliasing to explain the results, and that do not require jumping through fftshift, ifftshift hoops in order to display things in a human readable manner.
Some explicit examples would be to run the summation indices from $-\lfloor n/2 \rfloor$ to $\lfloor (n-1)/2 \rfloor$:
$$ a_k = \sigma_x \sum_{x=-\lfloor n/2 \rfloor}^{\lfloor (n-1)/2 \rfloor} a_x \exp\left(-i 2 \pi \frac{x k}{n}\right) \qquad a_x = \sigma_k \sum_{k=-\lfloor n/2 \rfloor}^{\lfloor (n-1)/2 \rfloor} a_k \exp\left(i 2 \pi \frac{x k}{n}\right) $$
or subtract $\lfloor n/2 \rfloor$ from the value used in the exponent:
$$ a_k = \sigma_x \sum_{x=0}^{n-1} a_x \exp\left(-i 2 \pi \frac{\left(x - \lfloor n/2 \rfloor\right) k}{n}\right) \qquad a_x = \sigma_k \sum_{k=0}^{n-1} a_k \exp\left(i 2 \pi \frac{x \left(k - \lfloor n/2 \rfloor\right)}{n}\right) $$
Why are definitions such as these not standard?
FFTs are widely used and the ordering of the inputs and outputs is one of the common elements that trip people up. It would be nice to know that there was a good reason for dragging everyone through that learning process. From a user standpoint, the complexity seems higher as well. As it stands, if one wants to keep their arrays gridded monotonically they have to bookend all calls to fft or ifft with fftshift and ifftshift, i.e. fftshift(fft(ifftshift(...))) or fftshift(ifft(ifftshift(...))).
I would also like to re-emphasize that this is not just about the interpretation of the frequency domain. Basic mathematical calculations do not work unless you define the correct range and order for both the time and frequency grids.
This is easily seen in an example that calculates derivatives using properties of the fourier transform, which requires knowing the correct time grid while in the frequency domain and the correct frequency grid while in the time domain. The first part of the example naively assumes that the ordering and values of the coordinates are given directly by the summation indices. As seen in figure 1, the results from the fourier derivative (of a gaussian function) clearly do not match the derivative calculated through finite differences. If however, the time and frequency grids are defined as in the exponent's effective summation index, and the arrays are arranged in the positive then negative format (using fft/ifftshifts), as seen in figure 2 the fourier derivative calculation is in full agreement with the finite difference method.
(In Python)
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import fft, ifft, fftshift, ifftshift
pi = np.pi
#--- Setup
n = 2**6
dt = 1
dv = 1/(n*dt)
ndv = n*dv
#--- Plotting
plt.figure("Naive", figsize=plt.figaspect(1/2))
plt.clf()
plt.suptitle("Naive Implementation")
ax0 = plt.subplot2grid([1,2], [0,0])
ax1 = plt.subplot2grid([1,2], [0,1])
#--- Naive Time and Frequency Grids
v_grid = dv*np.arange(n)
t_grid = dt*np.arange(n)
#--- Time Domain Derivative
a_t = np.exp(-0.5*((t_grid-t_grid.mean())/(5*dt))**2)
a_v = fft(a_t*dt)
da_dt = ifft(1j*2*pi*v_grid * a_v*ndv)
ax0.set_title("Time Domain Derivative")
ax0.plot(t_grid, da_dt.real, label="Fourier")
ax0.plot(t_grid, np.gradient(a_t, t_grid), '.', label="Finite Difference")
ax0.legend()
#--- Frequency Domain Derivative
a_v = np.exp(-0.5*((v_grid-v_grid.mean())/(5*dv))**2)
a_t = ifft(a_v*ndv)
da_dv = fft(-1j*2*pi*t_grid * a_t*dt)
ax1.set_title("Frequency Domain Derivative")
ax1.plot(v_grid, da_dv.real, label="Fourier")
ax1.plot(v_grid, np.gradient(a_v, v_grid), '.', label="Finite Difference")
ax1.legend()
#--- Plotting
plt.figure("Shifted", figsize=plt.figaspect(1/2))
plt.clf()
plt.suptitle("Shifted Implementation")
ax0 = plt.subplot2grid([1,2], [0,0])
ax1 = plt.subplot2grid([1,2], [0,1])
#--- Correct Time and Frequency Grids
v_grid = dv*np.arange(-n//2, n - n//2)
t_grid = dt*np.arange(-n//2, n - n//2)
#--- Time Domain Derivative
a_t = np.exp(-0.5*((t_grid-t_grid.mean())/(5*dt))**2)
a_t_shift = ifftshift(a_t)
a_v_shift = fft(a_t_shift*dt)
a_v = fftshift(a_v_shift)
da_dt = fftshift(ifft(ifftshift(1j*2*pi*v_grid * a_v*ndv)))
ax0.set_title("Time Domain Derivative")
ax0.plot(t_grid, da_dt.real, label="Fourier")
ax0.plot(t_grid, np.gradient(a_t, t_grid), '.', label="Finite Difference")
ax0.legend()
#--- Frequency Domain Derivative
a_v = np.exp(-0.5*((v_grid-v_grid.mean())/(5*dv))**2)
a_v_shift = ifftshift(a_v)
a_t_shift = ifft(a_v_shift*ndv)
a_t = fftshift(a_t_shift)
da_dv = fftshift(fft(ifftshift(-1j*2*pi*t_grid * a_t*dt)))
ax1.set_title("Frequency Domain Derivative")
ax1.plot(v_grid, da_dv.real, label="Fourier")
ax1.plot(v_grid, np.gradient(a_v, v_grid), '.', label="Finite Difference")
ax1.legend()