How to read and understand the underlying mathematical concept about Data Programing

Question

How to read and understand the underlying mathematical concept about Data Programing

74 Views Asked by Bumbble Comm At 26 Mar 2026 - 11:06

I was going through the official paper << https://arxiv.org/pdf/1605.07723.pdf>> Data Programming:Creating Large Training Sets, Quickly and came across this two equations:

Λ=(Λ1,…,Λm) are m labelling functions which labels our training data.

Here are my doubts regarding equation 1:

What does in equation1 means? And what is value of Y, is it just {-1,1} or entire true labels of data in data-set and The final value of equation 1 will be just a number, right?

How would I read this equation 1 and 2 in mathematical way?

Also equation 2 is compared with logistic regression loss function. But when we minimize loss function we also have true labels attached to it. But I am not able to understand what is true lables in case of equation 2 given in the above image?

How we put λ(x) from equation 2 in equation 1? x here would be unlabelled training data-point. What does that inner summation means in equation 2 as y' is already taken care in equation 1. Can someone elaborate more on equation 2 on lines of data-frame with x features.

Updated based on answer.

I have written equation2 based on equation1. Please let me know if it is correct?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I'm not totally familiar with that paper but I think I can give you some insight. So, $\Lambda = (\Lambda_1,\dots,\Lambda_m)$ are some random variables. Each are assumed to be independent, and can have values in $\{-1,0,1\}$ or equivalently in $\{-Y,0,Y\}$. Hence, each variable have a distribution function which will look something like this: $$ p_i(\Lambda_i) = \left\{ \begin{array}{l} \beta_i\alpha_i & \text{if $\Lambda_i = Y$} \\ \beta_i(1-\alpha_i) & \text{if $\Lambda_i = -Y$}\\ (1-\beta_i) & \text{if $\Lambda_i = 0$}\\ \end{array}\right. $$ with parameters $\alpha_i,\beta_i$. Note that $\beta_i\alpha_i + \beta_i(1-\alpha_i) + (1-\beta_i) = 1$ so it is a true probability distribution. Now, note that $p_i(\Lambda_i) $ can be written compactly using the indicator function: $$ \mathbf{1}_{\{p(x)\}} = \left\{ \begin{array}{ll} 1 & \text{if $p(x)$ is true} \\ 0 & \text{if $p(x)$ is false}\\ \end{array}\right. $$ for some predicate $p(x)$ (which might state something like "x=4"). Using this function, we can write: $$ p_i(\Lambda_i) = \beta_i\alpha_i\mathbf{1}_{\{\Lambda_i=Y\}} + \beta_i(1-\alpha_i)\mathbf{1}_{\{\Lambda_i=-Y\}} + (1-\beta_i)\mathbf{1}_{\{\Lambda_i=0\}} $$ Now, we can look at the joint probability distribution of the whole $\Lambda=(\Lambda_1,\dots,\Lambda_m)$. Since all $\Lambda_i$ are independent, then the PDF of $\Lambda$ will be the multiplication of all $p_i(\Lambda_i)$.

Then, we want to take a look at the joint probability distribution of both $\Lambda$ and $Y$. Note that $Y$ can only take values in $\{-1,1\}$ with equal probability, so $1/2$ for each. When we combine these, the joint probability density $(\Lambda,Y)$ is $\mu_{\alpha,\beta}(\Lambda,Y)$ as in equation (1).

Finally, by obtaining the maximum of the log-likelihood as in equation 2 (obtaining the "best" $(\alpha,\beta)$), you obtain a probability distribution from which you can sample data that "look like" data in the training dataset, with generated predicted classes $Y$.

So, answering to your questions:

What does $\mathbf{1}_{\{\Lambda_i=Y\}}$ in equation 1 means? Its the indicator function used to write the PDF of $\Lambda$ more compactly.

What is value of Y, is it just {-1,1} or entire true labels of data in data-set? Its the unknown predicted class. At this point, is neither fixed nor given in a dataset. It is a random variable which when sampling from $\mu_{\alpha,\beta}$ you obtain a concrete value: this is precisely what a generative algorithm is, you "learn" a probability distribution from data, and then sample from it.

The final value of equation 1 will be just a number, right? Its the joint probability function of $(\Lambda,Y)$, when $\Lambda,Y$ are not fixed, but random variables.

But I am not able to understand what is true labels in case of equation 2 given in the above image? The data from where you are learning is contained in the dataset $S$. In (2) you replace the random variables $\Lambda$ with concrete values $\lambda(x)$ obtained from all points $x\in S$.

Hope this helps!

How to read and understand the underlying mathematical concept about Data Programing

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in RESEARCH

Trending Questions

Popular # Hahtags

Popular Questions