I understand that the KL divergence between two discrete probability distributions $p$ and $q$ is defined as
$$D(p||q) = \sum_i p_i\log\frac{p_i}{q_i}$$
This quantity is not symmetric and doesn't satisfy the triangle inequality and is therefore not a metric. However, the Wikipedia article has a section on the connection between the KL divergence and the Fisher information. I am not familiar with the Fisher information and do not fully follow what is said on the Wikipedia article but it seems to imply that if $p$ and $q$ can be parameterized by some $\theta$ and $\theta$ is sufficiently small, then the KL divergence does behave like a metric?
Can someone elucidate this idea? In general, can one say that for $p\approx q$ (the role of $\theta$ is not clear), the KL divergence is a "distance" and if yes, is there an intuitive way to see this?
Learning from Wikipedia is a terrible idea (not peer reviewed and anyone can edit), why don't you read a book on this stuff, for example, "Information Geometry" by Shun-ichi Amari.
I don't know why you call $p, q$ probability distributions. Distributions are functions, but what you have are clearly vectors in the simplex. Those are the images of the probability distribution, i.e. $\Pr[X = i]= p_i \in [0,1], p = (p_i)_{i = 1}^n$, where $X$ is some underlying random variable, which is said to have categorical distribution. The domain of KL divergence are not functional spaces, they are the simplex.
The Hessian of the KL divergence is so-called Fisher's information matrix. That's the connection.
KL divergence is never a metric. Metric has a specific and rigorous definition in mathematics. Some people call it a distance, but they are using it in a colloquial way. It is an example in a class of divergences called Bregman divergence. The only Bregman divergence that is a metric is the Euclidean distance.