Optimizing Linear Programming using Deep Neural Network .

1.4k Views Asked by At

Are you afraid of maths? I guess you say "NO". Well, maths is scaring me always, this time again :(

I was going through this Research Paper. In this paper , I don't understand how do they solve the following optimization problem P2?

enter image description here

And How do they generated the dataset for training the deep neural network from solution vector of P2 problem?

1

There are 1 best solutions below

2
On BEST ANSWER

For the sake of exposition, we have the following linear program: \begin{align} & \min_{t_s} \sum_{s \in S_n} t_s \sum_{k\in\mathcal{K}_n^s} P_{nk} \\ & \;\text{s.t. } \sum_{s\in S_n}t_s \log_2(1+\text{SINR}^s_k) \geq \sum_{f\in\mathcal{F}_{nk}} L_f \,\;\forall\,\;\,k\in\mathcal{K}_n\\ & \;\;\;\;\;\;\, \sum_{s\in S_n} t_s \leq T \end{align} where $n$ is the cell, $T$ is the time limit, $\mathcal{K}_n$ is the set of users, $S_n$ is the set of all possible clusters for cell $n$ (i.e. $|S_n|=2^{|\mathcal{K}_n|}-1$), $s\in S_n$ is a cluster index, $\mathcal{K}_n^s\subseteq S_n$ is the user set for cluster $s$, $P_{nk}$ is the transmit power for small base station $n$ to user $k$, SINR$^s_n$ is the signal-to-interference-plus-noise ratio for cluster $s$ and user $k\in\mathcal{K}_n^s$, $L_f$ is the size of file $f$, $\mathcal{F}_{nk} $ are the files of user $k$, and $t_s$ is the time duration of using cluster $s$ in content delivering.

Ok, so basically we are summing over all possible clusters, in order to minimize the sum of transmit powers associated to each cluster, weighted by the time duration. We essentially have a total time limit, and assign time to spend on each cluster. But note that there are a LARGE number of clusters (exponential in the cell size). Thus the goal is to use a function approximator (here, a DNN) to help reduce the cost of solving the linear program.

How do they actually do this? Since we are solving for the vector of times $t_s$, one might expect to simply learn a mapping $g: \mathcal{P}\rightarrow \mathbb{R}_+^{|S_n|} $, i.e. given a set of parameters $p\in\mathcal{P}$ (i.e. values for $L_f$, $P_{nk}$, etc...), the network $g$ would give values for the cluster times $\mathcal{T}_S=(t_{s_1},\ldots,t_{s_{|S_n|}})$. However, they do not do this. Quoting from page 3:

Ideally, we expect to train the DNN as a black-box algorithm that essentially works as a mapping from a problem instance to its corresponding optimal solution. However, doing so is impractical as the combinatorial nature of P2 poses high complexity in function approximation. This results in challenges to guarantee the optimality of the estimated solution.

In other words, the deep network is unable to directly generate solutions to the problem (at least in their case, for the networks they tried). The high output dimensionality in particular seemed to pose a problem. So instead, page 3 can be quoted as saying

Though it is hard to directly use the DNN to precisely generate exact solutions for P2, it helps to identify the pattern of user channel conditions in relationship of optimal clustering strategies. The basic idea is to use DNN to predict the users that are most promising in terms of clustering in the optimal solution, according to the channel conditions and demands.

So instead of trying to predict the solution, they attempt to predict which users to ignore when trying to compute the solution using an LP solver. In other words, the DNN outputs a binary vector $\mu$ of size $\mathcal{K}_n$, where $\mu_i$ decides whether user $i$ participates in the clustering. Quote (page 4):

Each element $\mu_k$ indicates that whether or not user $k$ participates in clustering scheduling.

So the deep network tells us which users can be ignored in the solution of the linear program. The parameter $\rho$ controls how many users will be ignored. This greatly reduces the cost of solving the linear program.


In this paper, I don't understand how they solve the optimization problem P2?

They test two methods:

  1. Solving the linear program via a solver in Matlab. Quote (page 4):

    In terms of the optimization algorithm, we optimally solve P2 by applying the LP solver in Matlab, where the simplex algorithm or the interior-point algorithm is adopted.

    You can read a little more about these algorithms here and here. You can also look at Matlab's linear program solver linprog here.

  2. Using the DNN to compute $\mu$, which gives a list of users to remove, then solve the simplified LP as above.


How do they generate the dataset for training the deep neural network from the solution vector of the P2 problem?

Since they have an LP solver, they simply have to run it on instances of the problem to generate the training data. From page 4:

The training data is generated by solving the optimization problem for $|\mathcal{K}_n|$ users in each cell $n$.

and page 3:

During the training phase, the DNN is trained by data sets consisting of the parameters of P2 as the input, and the users participating in clustering in the optimal solution obtained by the solver as the desired output.

Recall that the deep network is supposed to predict which users will participate in the clustering i.e. in the solution P2. Since we have an optimal LP solver, given a bunch of problem instances, we can simply compute the optimal solution, and then check which users

In other words, given $m$ "problem sets", i.e. linear program parameter sets, $X=(x_1,\ldots,x_m)$, we can use the LP to generate $m$ binary vectors of which users participated in the clustering $Y=(y_1,\ldots,y_m)$. Note that each $x_i$ is a list of parameter values, while $y_j$ is the binary $\mu$ vector for the problem described by $x_j$. The training set is then simply $(X,Y)$, where the neural network is a map $g:X\rightarrow Y$.

Hopefully that helps.