I have been reading about the mathematics behind Perlin noise, a gradient noise function often used in computer graphics, from Ken Perlin's presentation and Matt Zucker's FAQ.
I understand that each grid point, $X$, has a pseudo-random gradient associated with it, $g(X)$ - just a vector of unit length that appears random. When finding the noise value at a point $P$, for each grid point surrounding it, $Q$, the dot product $g(Q) \cdot (P-Q)$ is found. Then these dot products are interpolated to find the noise value at point $P$.
The thing I don't understand, however, is why we use gradients. There is another type of noise function called value noise in which each grid point has a scalar value rather than a gradient. I've seen articles that say gradient noise produces higher quality noise but they don't explain why. I can't seem to visualise how this dot product makes the noise any better quality. What does "quality" even mean here? Why did Ken Perlin decide to use gradients?
This is just my guess.
In short: Gradient noise leads in general to more visual appealing textures because it cuts low frequencies and emphasizes frequencies around and above the grid spacing.
Let's compare a naive value noise procedure with a naive gradient one, for a grayscale image.
Value noise: we paint the points in the grid with random values (white noise) and fill the surrounding pixels by linear interpolation. This will look ugly because (among other things) some of the random grid points will happen to have similar values, and then there will be large spots with nearly uniform color (low frequency). [*] Specifically, the pixel values in the neighborhood of a grid point will be all similar - and so we depend on the other grid points being distinct to have high frequencies... and this will be at most (with luck) of the order of the grid separation.
Gradient noise: we compute a random (uniform, white noise) gradient in each grid point, and compute the values by interpolating the dot products of the gradient with the distances. Consider again what happens in the neighborhoood of a grid point, specifically over a small circumference, disregarding the effect of other distant grid points. It's seen that the computed image value (as a dot product) in this small neighborhood will visit -smoothly but fully- the white-black range. Then, we can expect that the image values will never have uniform spots, i.e., we won't practically have frequencies below that of the grid spacing.
[*] A similar problem arises in halftoning/dithering: it's visually unpleasant to use binary white noise because of the low frequency component; a nicer dithering algorithm, as Floyd-Steinberg, produces instead high frequency ("blue noise").