please try to be lenient with me because I really have forgotten most of the stuff, so I will probably be making incorrect assumptions, word the problem incorrectly, etc.
Context
I'm trying to calculate the amount of daily users an application has. Currently the application sends a ping to the servers if at least 24 hours have passed since the last ping.
It needs to be taken into account that the application might not send a signal if it's not switched on. Examples:
- The application is opened for the first time, ping is sent immediately
- The application is opened for 24h and 1 second, a ping is sent at the beginning and at the 24th hour
- The application is opened and closed, and opened again after 8 hours, a ping is sent only the first time
- The application is opened and closed, and opened again after 30 hours, a ping is sent the first time, and at hour 30
The pings are summed on a daily basis and presented as the daily users the application has. I will refer to this as daily signal.
Questions
My objective is to understand:
How the original signal (users using the application) is transformed by the different design decisions to arrive to the daily users number.
If and where does Nyquist–Shannon sampling theorem apply.
- I have failed to understand what happens when a signal has higher frequency than a the sampling its being measured with, and what does this mean to when reconstructing the signal. From what I understand, a lot of aliasing is introduced, however I don't see how it can't be dealt with in this scenarios.
How to calculate the error of the final output VS the desired output (users that pinged in a day VS users active in a day)
So, if I am understanding correctly,
I don't see how the sampling theorem is relevant here. As you pointed out, the process you are trying to reconstruct is discontinuous (infinite bandwidth) and the sampling theorem does not apply here.
I think it is impossible to give a definite answer for the number of users with gaps between measurements. You can only hope to provide a probabilistic answer, i.e., something of the form "given the measurements, the number of users within 24h was 10 with probability p(10), 11 with probability p(11), and so on."
The only approach I can think of is to consider a stochastic model for the arrival and departure of users in the system, which will provide a probabilistic characterization of the number of users entering the system in any given interval, without any measurements. Given the measurements, you would consider the conditional probabilities (i.e., if you sampled, say, 8 users at some instant during the interval of interest, the probability of the number of users being less than 8 is zero, although there might be a non-zero probability that this will be the case if no measurements are available.)
You could maybe start by the simplest mathematical model for the arrival times in a queue, the Poisson point process. This however, does not model how long the users stay in the system (i.e., when they depart). You should look into queueing theory for such models.
P.S.: Apologies if I came off as offensive in my comment, that was not my intention.