Let $V$ be a vector space over $\mathbb R$. Given $u, v\in V$ where $u\neq 0$, there is a unique vector $w\in V$ such that:
I) $w$ is parallel to $u$;
II) $w-v$ is perpendicular to $u$.
In fact, if we let
$$w=\frac{\langle u, v\rangle}{\langle u, u\rangle}u$$
then $w$ is parallel to $u$ by the very definition, and
$$\langle u, w-v\rangle=\langle u, w\rangle-\langle u, v\rangle =\frac{\langle u, v\rangle}{\langle u, u\rangle} \langle u, u\rangle-\langle u, v\rangle=\langle u, v\rangle=0.$$
As to the uniqueness, if $w^\prime$ has the same properties as $w$, then
$$w^\prime = \alpha u$$
for a given scalar $\alpha\in\mathbb R$. Then,
$$0=\langle u, w-v\rangle=\langle u, w\rangle-\langle u, v\rangle=\alpha \langle u, u\rangle-\langle u, v\rangle$$
what implies
$$\alpha=\frac{\langle u, v\rangle}{\langle u, u\rangle}u$$ so that
$$w^\prime=\frac{\langle u, v\rangle}{\langle u, u\rangle}u=w.$$
The vector
$$\textrm{pr}_{u}(v)=\frac{\langle u, v\rangle}{\langle u, u\rangle}u$$
is called the orthogonal projection of $v$ over $u$.
What is the relevance of $\textrm{pr}_u(v)$ after all?
I know, for instance, that if $W\subset V$ is a subspace and $\{w_1, \ldots, w_m\}$ is a basis for $W$, then the vector
$$\textrm{pr}_W(v)=\frac{\langle w_1, v\rangle}{\langle w_1, w_1\rangle}w_1+\ldots + \frac{\langle w_n, v\rangle}{\langle w_m, w_m\rangle}w_m$$
provides the vector on $F$ which is closest to $v$. But again, and so what?
Can anyone point me out some further implications of the existence of an orthogonal projection? Up to this point, I see it as a tool, but it is not clear what is that good for.
Finding the closest approximation to something is a very important problem to be able to solve. For example, in statistics I might wish to model a data set using a linear model, say linear regression. It is not easy to determine how I might find a "line of best fit", but if I can precisely compute the vector in the span of my modelling parameters which is closest to the "vectors" (data points with potentially multiple coordinates) of my data set, I can go home happy. I could also wish to find an optimal function in a function space, given some restrictions, if those restrictions formed a vector subspace.
In maths generally, in mechanics specifically, it is useful to know how much of a quantity "lies" in a given direction. I may wish to split the force acting on a body into three orthogonal components, based on a local coordinate system, to solve a mechanics problem, such as the magnitude of the friction in one direction, or perhaps the tension on a cord attached to the body in another direction, etc.
Orthogonal bases are nice in linear algebra. Being able to express a vector in terms of them is useful, knowing it is possible is useful in proofs - I have seen proof authors say "w.l.o.g express $v$ in terms of an orthogonal basis" and this will simplify the proof. To reiterate what I believe is the main point, in consideration of a subspace of interest I will often want to solve optimisation problems which are elegantly (and computationally efficiently) solved with orthogonal projection. Orthogonal projectors, as operators, are also nice since they satisfy self adjointness.