If a graph $G = (V, E)$ on $n$ vertices has no $p$-clique, $p \geq 2$, then $$|E| \leq \Big(1- \frac{1}{p-1}\Big) \frac{n^2}{2} \;\;\; \;\;\; \;\;\;(1)$$ We get a proof from the book "Proofs from THE BOOK" as given below
Now I would like to understand the motivation of the function that was being maximized -
$$f(w)=\sum_{v_i v_j \in E} w_i w_j$$ The function is used to write $\frac{|E|}{n^2}$, but for uniform distribution it is more intuitive to write, $\frac{|E|\times 2}{n}$,
since each edge has two vertices, with each node having $1/n$ weight, thus each edge would have weight $1/n+1/n=2/n$, it implies $\frac{|E|\times 2}{n}$. to
follow this line of thinking we have to change the function for the above part of the proof, I don't
know whether it yields the same result, but I wonder why it is not used? why $f(w)=\sum_{v_i v_j \in E} w_i w_j$ is used instead of $$f(w)=\sum_{v_i v_j \in E} w_i + w_j$$? What is the motivation ?

The reason $f(w) = \sum_{v_i v_j \in E} w_i + w_j$ won't work is that this function doesn't actually "notice" the adjacency relation in the graph. It only notices vertex degrees.
Here's why: if vertex $v_i$ has degree $\deg(v_i)$, then $w_i$ will appear in $\deg(v_i)$ different terms of this sum, and we can just collect them together and rewrite $f(w)$ as $\sum_{v_i \in V} w_i \cdot \deg(v_i)$. This is only affected by the degree sequence of $G$: the only parts of the definition that depend on $G$ are $\deg(v_1), \deg(v_2), \dots, \deg(v_n)$.
There are many examples of non-isomorphic graphs with the same degree sequence, and therefore the same $f(w)$, so this function isn't very good at distinguishing different graphs.
We can still get some bound on the number of edges using $f(w)$, it just won't be a useful one. Here's how. If $w_1 + \dots + w_n =1$, $f(w)$ is a weighted average of the degrees $\deg(v_1), \dots, \deg(v_n)$. Therefore it is maximized by finding a maximum degree vertex $v_k$, and setting $w_k = 1$ (and the other components to $0$). So we have $f(w) \le \Delta(G)$, where $\Delta(G)$ is the maximum degree, and by comparing this to the value of $f$ at $w = (\frac1n, \dots, \frac1n)$, we can deduce the bound $|E(G)| \le \frac n2 \cdot \Delta(G)$.
As for the function $f$ actually used in the proof, we can interpret it in the following way: if two random vertices are chosen independently according to the probability distribution $w$, what is the probability that they are adjacent?
When $w$ is the uniform distribution on $V(G)$, this measures the edge density of $G$: it is $\frac{|E(G)|}{n^2}$. When $w$ is the uniform distribution on a $k$-vertex set $S$, it measures the edge density of the induced subgraph $G[S]$: it is $\frac{|E(G[S])|}{k^2}$. So there's a sense in which choosing $w$ to maximize $f$ is a relaxation of finding the densest subgraph of $G$.
(We allow non-uniform distributions $w$ as well, which don't correspond to any specific subgraph density - but it turns out that allowing them doesn't change the maximum.)
The content of Turán's theorem is essentially that by this measure of density (there are other measures of edge density, such as $\frac{|E(G)|}{\binom n2}$, they are just harder to work with), $K_{p-1}$ is the densest $K_{p}$-free graph, so any $K_p$-free graph $G$ has at most the edge density of $K_{p-1}$. This gives us a bound on $|E(G)|$.