I am a senior student of engineering and I have been studying calculus for a while when I reached the part of vector calculus I felt that this part is inconsistent and there is a multiple questions came, the most important one is from where does this system rise? I know it was the development of Grassman and Hamilton vector analysis systems , but there was no direct proof of some parts like:
Why in 2-D curl subtracting the partial derivatives from each other give amount of rotation?
Why does adding up the partial derivatives in $i$ , $j$ , $k$ directions in gradient gives the fastest increase of a function and from where did this came?
Why in 3-D curl I have to take the 2-D curl in the normal direction?
Please if there is any books that I can read more to understand the full image provide me with it. It's my first question here too I am sorry if I picked up the wrong tag.


In what follows I'll admit that you feel alright with $\mathbb{R}^n$, the space of $n$ dimensions and euclidean. You have many questions that really arises naturally when studying multivariable calculus on some books. Since your question is a little general I'll try to just talk a little about some of those things and then I'll recommend books.
Well, the fact that the gradient gives the direction of greater rate of change is easily proved by noting that if $f : \mathbb{R}^n \to \mathbb{R}$ is a smooth function and $v \in \mathbb{R}^n$ is a unity vector, then we have
$$\nabla f (p) \cdot v = \left \|\nabla f(p) \right \| \cos(\theta)$$
Where $\theta$ is the angle between $\nabla f(p)$ and $v$. Since $|\cos(\theta)|\leq 1$ the maximum of the quantity above is obtained setting $\theta = 0$, in other words, the maximum of the directional derivative is in the direction of the gradient.
But this is just one of the points that confuses you. Looking at your question, I believe that you also ask yourself: why this important vector called gradient is composed by the partials of the function? And this, is a simple result of the real definition of differentiability. A function $f : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at a point if it has linear approximation at the point. In other words, $f$ is differentiable at the point if there's a linear function $Df(p) : \mathbb{R}^n \to \mathbb{R}^m$ such that the amount $f$ increases when you walk along a vector $v$ can be approximated by the value of $Df(p)$ on $v$ with small error. With this definition you can prove that the gradient must be what it is.
Divergence and Curl interpretations can be understood by application of some theorems from integral calculus of several variables. However, to really get the most of this it's probably better to deal with objects called differential forms. Some might say: "my god, this is too sofisticated to talk about in this questions", however, it's the real good way to understand the intricacies of those things. It's not just a coincidence that the fundamental theorems (Gradient' theorem, Stokes' theorem and Gauss' theorem) look so closely related.
Probably the best for you is to get started by Apostol's Calculus Volume 2. This book will show why the definition of derivative as a linear transformation is a good definition, it will show you why the gradient is how it is, it will prove all the important theorems and it'll develop curl and divergence in a systematic way. Once you feel confortable with this kind of approach, you can attack Spivak's Calculus on Manifolds. Very rigorous indeed, however, it's approach reveals the intricacies that exists. Also, there's another book called "Vector Calculus" by "Jerrold E. Marsden", this one mix intuition and rigor, in other words he try to make things really make sense to the reader and also try to get things proved.
The main point is: vector calculus is not incosistent. Using books that tend to forget about rigor can be "good" in the point of being easier, however, they'll hide the true nature of things. There are reasons for all of those things you ask and they are shown in books like Apostol's and Spivak's.
Of course I spoke in general about your question. If you want something more detailed about one point, and if it fits in this question, tell me in comment and I'll add an edit. If not, ask again more specifically here in Math.SE.
I hope this helps you somehow. Good luck in your studies!
EDIT: When I told about getting an interpretation of curl and div using theorems from integral calculus I meant using both the corresponding fundamental theorem and the mean value theorem for integrals. I'll talk about it a little loosely, just to explain what my point is. We'll tackle divergence in $\mathbb{R}^3$. The point is, take a vector field $F : \mathbb{R}^3 \to \mathbb{R}^3$ and take some $3$-dimensional region $\mathcal{B}$. Gauss' theorem tells you that:
$$\iiint_{\mathcal{B}}{\operatorname{div}(F)dV} = \iint_{\partial \mathcal{B}}{F\cdot n dA}$$
Where $\partial \mathcal{B}$ is the boundary of $\mathcal{B}$ and $n$ is the unit normal to $\partial \mathcal{B}$. The point is that the mean value theorem states that there's a point $(x_0, y_0, z_0) \in \mathcal{B}$ such that the LHS can be written as:
$$\iiint_{\mathcal{B}}{\operatorname{div}(F)dV} = \operatorname{div}(F)(x_0, y_0, z_0) V(\mathcal{B})$$
The point is that given then some arbitrary point $(x_0, y_0, z_0)$ we can get a collection of regions $\mathcal{B}$ that encloses the point and that get smaller and smaller, so that in the limit when the volume of those regions goes to zero we get the above equallity. So in the end we can see that:
$$\operatorname{div}(F)(x_0, y_0, z_0) = \lim_{V(\mathcal{B}) \to 0} \frac{1}{V(\mathcal{B})}\iint_{\partial \mathcal{B}} F \cdot n dA$$
This equation is what you can find on wikipedia's page about divergence. Look that this is saying that the divergence is the flux of the field across the boundary of a region when this region shrinks to the point $(x_0, y_0, z_0)$, so that the divergence can be thought of as a measure of "local" flux.