Let $X$ be a topological space and $\mathcal{F}$ be a presheaf on $X$, with value in the category $\mathsf{C}$.
To be more specific, $\mathcal{F}$ is a functor from the category $(\mathsf{Open}_X)^{\mathrm{op}}$ to $\mathsf{C}$, where objects in $(\mathsf{Open}_X)$ are open sets of $X$ and $\mathrm{Hom}_{(\mathsf{Open}_X)}(U,V)$ is a singleton iff $U \subseteq V$, otherwise the hom set is empty.
In this broader context above, I want to describe the sheaf property. I have seen two descriptions:
Description 1: The following equalizer diagram holds for any open set $U \subseteq X$ and its cover $\{U_i\}_{i \in I}$: $$F(U) \rightarrow \prod_{i} F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod_{i, j} F(U_i \cap U_j)$$
This is adapted from Wikipedia and the note Section 2.2.
Description 2: The following equalizer diagram is exact: $$0 \rightarrow F(U) \rightarrow \prod_{i} F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod_{i, j} F(U_i \cap U_j)$$
This is adapted from "Gortz and Wedhorn's Algbraic Geometry 1 - Schemes".
I have figured out what the two arrows means from the helpful post Definition of sheaf using equalizer.
My question 1: In Description 1, it seems that this ONLY means that the two paths from $F(U)$ to $\prod_{i, j} F(U_i \cap U_j)$ (via the "upper arrow" and the "lower arrow") coincide and they both equal to the restriction $F(U) \rightarrow \prod_{i, j} F(U_i \cap U_j)$ directly. ** I cannot see why this is the sheaf property**.
My question 2: In Description 2, we claimed that the "diagram of equalizer" exacts. But I have not found any references to really define what does the exactness mean for such a diagram.
My motivation: My original attempt is to understand when a certain presheaf is indeed a sheaf. For example, the kernel of a sheaf morphism is automatically a sheaf while cokernel and image do not. (We still need to do sheafification.) I'm trying to give a reasonable explanation of such phenomenons and it seems that by working from such categorical perspective might help. (So to some extent, I cannot accept the Description 2 since the EXACTNESS of such diagrams are quite vague and I don't know how to play with it categorially.) However, Description 1 (with merely an "equalizer diagram") may not be equivalent to the sheaf property (from my naive thought).
My question 3: How to really describe the sheaf property categorially and how such consideration shed light on my motivation? Any references are welcome!
Sorry for such a long (and maybe vague) post. Thank you all for commenting and answering.

Description 1 is indeed a formulation of the sheaf axiom, but the way you interpreted it is incorrect. Saying that that sequence is an equalizer does not just mean that it commutes; it means that if any other map $X \longrightarrow \prod F(U_i)$ commutes with the two arrows to $\prod F(U_i \cap U_j)$, then the map $X \longrightarrow \prod F(U_i)$ factors uniquely through the equalizer $F(U) \longrightarrow \prod F(U_i)$. That is, given such a map from $X$, we have the following commutative diagram:
$$ \require{AMScd} \begin{CD} F(U) @>>> \prod F(U_i)\\ @AAA @AidAA\\ X @>>> \prod F(U_i) \end{CD} $$
where in fact the map $X \longrightarrow F(U)$ is unique among all maps making this square commute. I wrote this diagram as a square instead of a triangle for two reasons. First of all, the AMScd package doesn't allow diagonal arrows. But this is a blessing in disguise in this case, as it illustrates that the equalizer of $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$ is not just the object $F(U)$, but the map $F(U) \longrightarrow \prod F(U_i)$. Then the diagram above essentially says that the map $F(U) \longrightarrow \prod F(U_i)$ is terminal among all maps $X \longrightarrow \prod F(U_i)$. I won't specify the category in which it is literally terminal, but I hope this gets the idea across.
For a more concrete example, we can take two maps $f_0, f_1: A \longrightarrow B$. It turns out that their equalizer is $\{a \in A : f_0(a) = f_1(a)\}$ along with the inclusion map into $A$. For instance, if $f_0 = f_1$ then this is just $A$ itself. And indeed, in this case, any map $X \longrightarrow A$ commutes with $A {{{f_1} \atop \longrightarrow}\atop{\longrightarrow \atop {f_0}}} B$, but only the identity $A \longrightarrow A$ is the equalizer. This terminal/unique factoring condition is an essential part of the definition.
Now, let's see why that is the sheaf axiom. This won't be a wholly rigorous proof but it does give the key idea. Let's take a map $\{*\} \longrightarrow \prod F(U_i)$ such that the diagram $\{*\} \longrightarrow \prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$ commutes. A map from a singleton is just a choice of element, so call the image of $*$ under this map $(f_i)_{i \in I}$, where each $f_i \in F(U_i)$. Then the assumption of commutativity says that $(f_i)_{i \in I}$ maps to the same thing under both maps $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$. In other words, it means that $(f_i|_{U_i \cap U_j})_{i, j} = (f_j|_{U_i \cap U_j})_{i, j}$. This is the exact compatibility for the gluing assumption! Now, since we assumed that we had an equalizer diagram, we have the following commutative diagram:
$$ \require{AMScd} \begin{CD} F(U) @>>> \prod F(U_i)\\ @AAA @AidAA\\ \{*\} @>>> \prod F(U_i) \end{CD} $$
so let $f$ be the image of $*$ under $\{*\} \longrightarrow F(U)$. Then this says precisely that $f \mapsto (f_i)_{i \in I}$ under $F(U) \longrightarrow \prod F(U_i)$. In other words, $f|_{U_i} = f_i$ for all $i$. Thus, saying that that diagram was an equalizer allowed us to take a compatible sequence of sections $f_i \in F(U_i)$ and glue them to a section $f \in F(U)$. Now, this was not a whole proof. First of all, I didn't show uniqueness of the glued map $f$, but this comes from uniqueness of the induced map in the equalizer. I also only showed that equalizer $\implies$ sheaf (with the usual gluing definition). The converse isn't too bad once you have the main idea.
Now for your second description, you made an error in transcribing it. There should not be two maps $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$. Instead, you want one arrow that is the difference of these two maps. But of course, that means we need to sensibly define the difference of maps, so this works if $F$ is a sheaf of abelian groups (or more generally in some abelian category). This way we can make sense of what that $0$ at the start is and what exactness means. That is, we say that a diagram $A \xrightarrow{f} B \xrightarrow{g} C$ of abelian groups is exact if $im(f) = ker(g)$. And we say that a longer sequence is exact if each 3-long segment like this is exact. Now, let me return to equalizer for a moment. Recall that I said that the equalizer of $A {{{f_0} \atop \longrightarrow}\atop{\longrightarrow \atop {f_1}}} B$ is $\{a \in A : f_0(a) = f_1(a)\}$ along with the inclusion into $A$. Well, if these are both abelian groups, and if the $f_i$ are group homomorphisms, then this is precisely the kernel of $f_0 - f_1$. In other words, equalizers can be computed as kernels when you're working with abelian groups, so exactness of the sequence $$ 0 \longrightarrow F(U) \longrightarrow \prod F(U_i) \longrightarrow \prod F(U_i \cap U_j) $$ will turn out to be the same as the equalizer diagram you drew.
This last question is a harder to answer, and my expertise here is not so great. The answer, in the case of sheaves of abelian groups, seems to be that kernels play nicely and cokernels do not because sheafification is a left adjoint to the forgetful functor from the category of sheaves to the category of presheaves. This means that the forgetful functor is left exact (preserves kernels), but is not generally right exact (preserves cokernels). This is probably hard to parse, and I'm afraid I don't know this well enough to give a more elementary answer, but at the very least here are two references that give some more information (but are still very abstract):
When the the presheaf of image of morphism of sheaves is a sheaf?
Why does sheafification functor being left adjoint imply that the presheaf kernel is a sheaf kernel