Suppose I have some map $F: \mathbb{R^{m}} \rightarrow \mathbb{R^{n}}$. I'm struggling to get an intuitive understanding of the meaning of the total derivative $dF_p: \mathbb{R^{m}} \rightarrow \mathbb{R^{n}}$ at a point $p \in \mathbb{R^{m}}$. This will obviously be represented some $n \times m$ matrix. Now, each entry in this matrix will be of the form $\frac{\partial F_i}{\partial x_j}$, and thus will be a function of $(x_1,...,x_m)$.
As I understand, we input the values of $p \in \mathbb{R^{m}}$ into this matrix, thus giving an $n \times m$ matrix of real numbers, and then this defines the map $dF_p: \mathbb{R^{m}} \rightarrow \mathbb{R^{n}}$.
However, the Regular Value Theorem states that we require this map to be surjective. This implies that we will be able to apply $dF_p$ to any point $q \in \mathbb{R^{m}}$ (otherwise how are we able to $hit$ all of the values in $\mathbb{R^{n}}$?) But this map is the differential at $p$, and so how does this make sense to apply it anywhere other than $p$?
You don't apply the matrix to $p$. Informally, you apply it to a vector pointing away from $p$. The idea is this: for each point $p\in\mathbb R^n$ at which $F$ is differentiable,you may find a linear map $\mathrm dF_p:\mathbb R^n\to\mathbb R^m$ such that $$F(p+v)\approx F(p)+\mathrm dF_p(v).$$ Here, $\approx$ means that a certain limit is $0$ as $v$ goes to $0$, but that's not important for my answer. So if you want to approximate $F(p+v)$, you only need to apply the linear map to $v$. You don't apply it to points in the domain, but to vectors pointing between points. This may not be obvious in the setting of vector spaces as domains, but it's a bit more natural in the setting of affine spaces: an affine space is a set of points $A$ together with a vector space $V$ which describes vectors pointing between the points. If we have a differentiable map between the point sets of affine spaces, the differential is a linear map between the vector spaces associated with the affine spaces. So the differential of a function at a point has a genuinely different domain and range than the function itself. It's just that in the setting of vector spaces, the set of points and the set of vectors pointing between the points is the same, so the difference is less apparent.