Query description into mathematical notation

157 Views Asked by At

I need to formalize two query descriptions into mathematical notation. The source of this description, if anyone is interested in broader context.

Query 1: Frequent routes

The goal of the query is to find the top 10 most frequent routes during the last 30 minutes. A route is represented by a starting grid cell and an ending grid cell. All routes completed within the last 30 minutes are considered for the query. The output query results must be updated whenever any of the 10 most frequent routes changes.

I have managed to come up with this formal description:

$$\max\nolimits_{R_i \in R}$$

Where the following applies:

$R$ = frequency of all routes, $R_i$ = frequency of i-th route

But how would I specify top 10, not only the maximum?

Query 2: Profitable areas

The goal of this query is to identify areas that are currently most profitable for taxi drivers. The profitability of an area is determined by dividing the area profit by the number of empty taxis in that area within the last 15 minutes. The profit that originates from an area is computed by calculating the median fare + tip for trips that started in the area and ended within the last 15 minutes. The number of empty taxis in an area is the sum of taxis that had a drop-off location in that area less than 30 minutes ago and had no following pickup yet.

I have managed to come up with this formal description:

$$profitability\_of\_cell_X = \frac{median\_profit\_in\_cell\_id_X}{empty\_taxies\_in\_cell\_id_X} = \frac{\mu_{T_i \in Trip(X)} (T_{i,fare} + T_{i,mta})}{\sum\nolimits_{Ta \in X} 1}$$

Where the following applies:

$X$ = Cell X = Area X, $\mu$ = median value

Again I would need to specify that we are interested in top 10 most profitable areas?

Time constraints

I do not know how to add time constraints into formal mathematical notation. Is this even possible?

1

There are 1 best solutions below

3
On

Check this. Search the best (efficient and/or clear) way to construct a list of the 10 biggest values from the set of frequencies and name it, by example as $F_{10}$. The concepts of max and min maybe essential, and relations of order between elements trough their index.

By example you can define the set of frequencies in some way, it depends of your problem and context.

After you define a subset of it and, by example, you define the maximum of the intersection as equal or lesser to the minimum of your subset.

A similar idea is something like this:

$$F=\{f_i\}\\ F_{10}\subset F: (|F_{10}|=10) \land (\max(F-F_{10}) \le \min F_{10})$$

Because $F$ is a variable set or a family of sets you can add some kind of index to notate the moment about what you refer, if you need it, as a sequence.

For your query 1 you only must change $F$ by $R$ and $F_{10}$ by anything that you want to denote the set of top 10 frequencies.