$F$ is a function from $V$ to $V$ where $V$ is a $n$-dimensional vetor space and $p \in V$.
In the article Jacobian determinant it says:
"If the Jacobian determinant at $p$ is positive, then $F$ preserves orientation near $p$."
This statement sounds intuitional and natural but how can I prove this? I looked up a few books but I always found the statement without proof.
First, you have to choose an orientation. In 3d, you might think of this as the choice of the right-hand rule for cross products. Indeed, this is the notion that, given an orientation for a surface and an orientation for the 3d volume, there is a unique normal vector that, when considered in combination with the surface, yields an oriented volume that is oriented the same way as the ambient space.
Still, the choice of orientation is independent of how axes are arranged. You could have a right-handed set of axes and still choose a left-handed orientation. Here, we're not concerned with the choice of orientation so much as what you can do once you've chosen one.
In exterior algebra, the orientation of the vector space can be expressed as a wedge product, as a so-called $n$-vector. If $b_1, b_2, \ldots, b_n$ form a basis, then the wedge product
$$B = b_1 \wedge b_2 \wedge \ldots \wedge b_n$$
is an $n$-vector that may reflect a choice of orientation. When there is an inner product, we can normalize $B$ and get the result that there are only two choices for orientation: $+\hat B$ or $-\hat B$. Magnitudes are irrelevant here. If that bothers you, think of a line: a line might be oriented one direction or the other. Choosing a larger vector as the direction of that line is no different from choosing a smaller vector that points in the same (not opposite) direction. Orientation of larger spaces works the same way.
The action of a linear operator $F$ that is $V \to V$ can be extended to act upon $n$-vectors like $B$. This action follows from tensor algebra, but basically, it's this:
$$F(B) \equiv F(b_1) \wedge F(b_2) \wedge \ldots \wedge F(b_n) = (\det F) B$$
The result that this is the determinant follows from the dimensionality of $n$-vectors: they form a vector space like that of scalars, in the sense that all members of the vector space are merely scalar multiples of each other. This should make sense again if you think about it in 3d: all volumes are just scalar multiples of other volumes, but with the added notion that a "negative" volume is one that is oriented the other way.
Hence, the action of $F$ on $B$ is constrained to be some scalar multiplication, and we can call that characteristic scalar the determinant. It's insightful to prove that this is the same as other common definitions of the determinant, but I would regard this one as more fundamental than others.
So now, what do we have? We said that $B$ might reflect a choice of orientation on $V$. If $\det F$ is positive, then $F(B)$ has the same sign as $B$. Orientation is indifferent to the "magnitude" of $B$ (again, magnitude can only be defined in the presence of an inner product here), so we can look at $V$ as the domain of the transformation, oriented as $+B$, and as the codomain of the transformation, also oriented as $+B$.
I might question whether this is a proof or merely a long-winded definition of orientation. Nevertheless, looking at $F(B)$ as the orientation of the codomain, and realizing that that orientation might not necessarily be the same as the orientation of the domain, even when both the domain and codomain are the same vector space, seems like a good way to think about things.
Okay, so your question is more about using the Jacobian (and its determinant) for a potentially nonlinear map? All right. At any point $p$, there is a tangent space $T_p V$. The tangent space admits its own basis, which may vary from point to point. For instance, consider a curvilinear coordinate system like spherical coordinates in 3d: the basis vectors $e_r, e_\theta, e_\phi$ vary at different points, but at every point, they span a local, flat vector space (the tangent space). While for a flat vector space the tangent space can be identified with the base vector space, it can be helpful to keep the concepts distinct, particularly when dealing with curved manifolds.
Now, while the potentially nonlinear map $F$ may map points to points in some arbitrary (but hopefully, differentiable) manner, the Jacobian map does something different: at every point, it is simply a linear map from the tangent space to tangent space! Here, since $F: V \to V$, the Jacobian at a point $p$, call it $\underline F_p$, is just $\underline F_p: T_p V \to T_p V$.
And once you have that, you can apply the same logic as I used for linear maps above. Instead of orientation for all of $V$, however, we're talking instead only about the orientation of the tangent space at a given point. But, since we're dealing with differentiable functions and such, this notion of orientation is still good for some neighborhood around $p$.