I am taking a course on Markov chains which is intended to be accessible without a rigorous understanding of probability measure, but I'm not grasping even basic definitions and it is hindering my progress. Specifically, I want to precisely understand the definition of a discrete random variable. The following is an excerpt from Markov Chains by Norris. Below I will detail my confusion.
Let $I$ be a countable set. Each $i\in I$ is called a state and $I$ is called the state space. We say that $\lambda=(\lambda_i:i\in I)$ is a measure on $I$ if $0\leq \lambda_i <\infty$ for all $i\in I$. If in addition the total mass $\sum_{i\in I} \lambda_i$ equals $1$, then we call $\lambda$ a distribution. We work throughout with a probability space ($\Omega, \mathcal{F}, \mathbb{P})$. Recall that a random variable $X$ with values in $I$ is a function $X:\Omega\to I$. Suppose we set $$\lambda_i = \mathbb{P}(X=i)=\mathbb{P}(\{\omega:X(\omega)=i\}).$$
My goal is to write down an example of a discrete probability space and a random variable. Let's say that $I=\{a,b,c\}$ with $\lambda_i=1/3$. So, the probability of chooseing $a$ from this set is 1/3. How could you express this in the language of some probability space $(\Omega, \mathcal{F},\mathbb{P})$?
What would $\Omega$ be? According to wikipedia, it is the state space. I would have thought that $I$ is the state space, and yet the book is distinguishing between $\Omega$ and $I$. My understanding is that $\mathcal{F}$ is a sigma algebra on $\Omega$. I can read the definition of a sigma algebra, but I don't have good intuition for the role it is playing here. Somehow, a measure on some function $X:\Omega\to I$ can capture the notion that choosing $a$ has probability 1/3.
My hope is that this simple example, perhaps accompanied by some comments on how to interpet this formalism, will be enough to get me on my way.
If you were just going to deal with one random variable, you could take $\Omega = I$, but typically there are going to be lots of them, and $\Omega$ has to be rich enough to handle all of them. For example, if you were dealing with $5$ random variables $X_1$ to $X_5$, each taking values in the state space $I = \{a,b,c\}$, then you might make $\Omega = I^5$, and each $X_i$ would be one of the coordinate maps $(x_1, \ldots, x_5) \to x_i$.
The $\sigma$-algebra $\mathcal F$ tells you what subsets of $\Omega$ you can assign probabilities to. At this stage, it can be all the subsets of $\Omega$. When you deal with continuous random variables or infinite collections of random variables, this will not be possible because of the existence of non-measurable sets.