One of the biggest challenge for me to understand probability is to make sense of this concept of outcomes and events. To put it plainly, it just doesn't feel like mathematics anymore when we talk about Head or Tail.
We are used to deal with sets in all other areas of mathematics e.g. the set of integers, the real line, a vector, a function, a set of sets,...Then you are hit in the face with "events" which are set of "outcomes" of an "experiment".
I cannot grasp why I feel more comfortable when people say that "$1$ is an element of the integers" than when people say "head is an element in the space of outcomes which contains head and tail".
Perhaps the latter doesn't contain any numbers? You cannot put it into a computer? Perhaps you can call "1" a number whereas head is ...an linguistic variable, a word, a string, a binary number, an "outcome"...what is head?
Perhaps the mapping is ill defined in my mind. You have "1", you define a function "+" and you take another number "1", it cranks out "2". Whereas you have "head" and you define a random variable which so happens to assign the number "1/2" to head and that is somehow a function.
Does anyone share my concerns when learning about probability? How can I let myself to see that concepts such as "events", "outcomes", "sample space", "random variable" are not so far fetched compared to other branches of mathematics?
The field of probability can be made mathematically rigorous. Introductory textbooks on probability tend to use terminology like 'sample space', 'outcome', 'event', and outcomes are given names like 'Heads' or 'HTT' or 'King of Spades', all in an attempt to keep things informal and intuitive. These texts often refrain from defining such concepts precisely, which leads to dissatisfaction among people like you who are mathematically oriented.
In fact all these informal terms can be couched in the familiar language of sets and functions, and you'll see this in more advanced courses. The 'sample space' is a set, typically called $\Omega$. An 'outcome' is a member of that set, typically denoted $\omega$. An 'event' is a subset of the sample space. A random variable is a function $X$ mapping $\Omega$ to the real numbers. A probability $P(\cdot)$ is a function mapping an event (i.e., a set) into the interval $[0,1]$. There are axioms that specify the properties we expect to be satisfied by this mapping. (To be rigorous you have to impose additional assumptions: on what sets are eligible to receive a probability, and what functions are allowed to be called random variables, although these conditions are unnecessary when the sample space $\Omega$ is finite. These concepts are more fully explored in a course on Real Analysis, specifically Measure Theory, which furnishes the mathematical foundation for probability theory.)
Using the language of set theory, an expression like $P(X\ge0)$ is understood as $P(\{\omega: X(\omega) \ge 0\})$, i.e., the event $\{X\ge0\}$ is the set of points in $\Omega$ where the function $X$ has value at least zero, and $P(X\ge0)$ is the value that the mapping $P(\cdot)$ assigns to this set. Eventually, though, it becomes tedious to constantly fill in the $\omega$ which is hiding in the background, and we get comfortable omitting it. (Indeed, it is possible to reason probabilistically without consciously thinking about that implicit $\omega$ underneath every event.)
Another example: If your experiment consists of tossing a coin twice, the sample space $\Omega$ has four elements, which we label $HH$, $HT$, $TH$, $TT$. We could just as well have used the less evocative names $\omega_1$, $\omega_2$, $\omega_3$, $\omega_4$ for these elements. The event "exactly one head was seen" is the event $\{HT, TH\}$ consisting of two elements (or, if you like, $\{\omega_2, \omega_3\}$). If the random variable $X$ is defined as "the number of heads seen", then the event $\{X=1\}$ is also this set, since it's the set of points $\omega$ where $X(\omega)=1$. Assuming the coin is fair, we define a mapping $P(\cdot)$ that maps each singleton set $\{\omega_i\}$ to the value $1/4$, and use the axioms of probability to deduce the value of $P(A)$ for all other sets $A$. And so on.