Probability theory in machine learning

104 Views Asked by At

I have read that probability model of a random phenomenon given by, for example, probability density function (PDF) or cumulative distribution function (CDF) can be used to develop a machine learning algorithm for that phenomenon. However, I could not find how is this done. See the following paragraph for this web page, for example.

A Probability Density Function is a tool used by machine learning algorithms and neural networks that are trained to calculate probabilities from continuous random variables. For example, a neural network that is looking at financial markets and attempting to guide investors may calculate the probability of the stock market rising 5-10%. To do so, it could use a Probability Density Function in order to calculate the total probability that the continuous random variable range will occur.

Consider the following example PDF.

$$ f_X(x)= \begin{cases} \frac{k}{x} & 2<x\le6 \\ 0 & \text{otherwise.} \end{cases} $$

Can someone elaborate how this PDF can be used to develop pseudo code for a machine learning algorithm? I will also be interested in a short article or quick tutorial that explains the usage of a PDF in the development of a machine learning algorithm with the help of some examples.

If my question is not very clear or there are gaps, please help me improve it.

1

There are 1 best solutions below

2
On BEST ANSWER

First of all, starting from a known distribution is not that meaningful in terms of machine learning. We don't need machine learning or any additional method if we already know the samples are following some given PDF.

What we are interested is, estimating the distribution that the sample is following. An easy example is parametric estimation. Suppose that the sample is drawn from some distribution that obeys a known model, let's say $Bernoulli(p)$. Then estimating the probability $p$ is enough for estimating the entire distribution.

Notice that the PDF of $Bernoulli(p)$ is $P(x)=p^x(1-p)^{1-x}$ for $x\in \{0,1\}$. Given an i.i.d sample $\chi=\{x^t\}_{t=1}^{N}$ where $x^t\in \{0,1\}$, we can estimate the parameter $p$ by maximizing the log likelihood subject to the parameter $p$ $$L(p|\chi)=\log\prod_{t=1}^{N} p^{x_t}(1-p)^{1-x^t}=\sum_{t=1}^{N} x^t\cdot\log p+(N-\sum_{t=1}^{N} x^t) \log (1-p)$$ and $\frac{dL}{dp}=0$ gives the estimation $$\hat p=\frac{\sum_{t=1}^{N} x^t}{N}$$ and we have estimated the distribution of the sample $\chi$.