I have been studying the proof given by Dirichlet for infinitely many primes in an arithmetic progression and I cannot help but wonder this :
Why did Dirichlet need to first show $$\sum_{p\leq x, \hspace{2mm} p \hspace{1mm} \equiv \hspace{1mm} h (\text{mod} \hspace{1mm} k)} \frac{\log (p)}{p} = \frac{1}{\phi(k)}\log(x) +O(1)$$ in order to prove his point? I understand how this implies the infinitude primes in an arithmetic progression but why use $\frac{\log (p)}{p}$ in the first place?
Isn't there a better way? What motivated Dirichlet to do this?
This is an excellent question, and it is worth understanding. The specific choice of weight function $\frac{\log p}{p}$ takes a bit of effort to explain, so I'll try to be as clear and organized as I can.
Let $\Lambda(n)$ denote the von Mangoldt function: $$ \Lambda(n) = \begin{cases} \log p & \text{if $n = p^k$}, \\ 0 & \text{otherwise}. \end{cases} $$ In general, when working with sums over prime numbers, say something of the form $$ \sum_{p\leq x} f(p), $$ it is generally easier to work with the weighted sums $$ \sum_{n\leq x} \Lambda(n) f(n). $$ The contribution of prime powers is generally small, so really the only terms contributing to the sum are $n=p$, and we get the sum over primes with a weight of $\log p$ at each prime. Since one can usually remove the weight $\log p$ via partial summation, there is really no technical reason to avoid using it.
The reasons one works with sums involving $\Lambda$ is that the Dirichlet series associated to $f(n)\Lambda(n)$ can often be written in terms of familiar $L$-functions, say $\zeta(s)$, the Riemann zeta function, or $L(s,\chi)$, the Dicihlet $L$-function associated to a Dirichlet character $\chi$.
Okay, so from what I've said so far, it seems clear that one should consider the sums $$ \sum_{\substack{n\leq x\\ n \equiv h(mod \hspace{1mm} k)}} \Lambda(n). $$ This begs the question, why include the extra weight $\frac{1}{n}$ (which is the same as $\frac{1}{p}$ because of the presence of $\Lambda$)? This is a more delicate question. In general, the logarithmic sums $$ \sum_{n\leq x} \frac{f(n)}{n} $$ are easier to handle than the summatory functions $$ \sum_{n\leq x} f(n). $$ The opening paragraphs of this article by Terry Tao go into detail about this. To quote him directly,
A further discussion of this phenomenon can be found in these notes, linked by Tao in his article.
To illustrate this, it should be noted that the estimate
$$ \sum_{\substack{n\leq x\\ n \equiv h(mod \hspace{1mm} k)}} \Lambda(n) \sim \frac{x}{\phi(k)} $$ is the classic Siegel–Walfisz theorem, and is more difficult to prove than Dirichlet's theorem.
TL;DR - The specific weight $\frac{\log p}{p}$ is used because one simply needs to know less about the complex-analytic behavior of the relevant Dirichlet series (in this case the Dirichlet $L$-functions $L(s,\chi)$). Dirichlet only set out to prove an infinitude of primes in coprime residue classes; his theorem does not give an asymptotic count of such primes. The asymptotic count of such primes is precisely the content of the Siegel–Walfisz theorem, and one needs a deeper understanding of the Dirichlet $L$-functions $L(s,\chi)$ in order to prove it.