Wondering if my following question is an application of information theory:
Lets say we have a factory and ship boxes of stuff outside. If a competitor stands outside my factory, observes the stream of boxes coming out of a factory and note their sizes, he can get valuable information about my manufacturing process. So in order to hoodwink him, I manually expand the boxes to sizes different than the original ones. In order to make sure that I am doing a good job at masking the original size distribution from the resultant distribution, I must make sure that I maximize relative entropy between output distribution q and input distribution p, given that I cannot just create a uniform output distribution. Is this premise correct? Do you find any flaw in the question?
The constraint, of course, is that we must use minimum amount of box material. In other words, sum of box sizes for a unit period of time must not exceed a threshold B.
i.e. $\frac{\sum_{j=1}^{i}{BS_i}}{T} \le B$
Other constraints are:
1] Both p and q must be probability distributions i.e. sum up to 1
2] $BS_i \le MS$, where MS is the maximum size of a box.
--
If we take $BS_i$ as the size of the $i^{th}$ box, then we have an optimization function applying Lagrange multiplier $\lambda$:
$ \Lambda(BS_i,\lambda,p,q) = D(q||p) + \lambda (\frac{\sum_{j=1}^{i}{BS_i}}{T} - B) $ $ = \sum_{S=1}^{n}{q(S) \log \frac{q(S)}{p(S)}} + \lambda (\frac{\sum_{j=1}^{i}{BS_i}}{T} - B)$
$ = \sum_{S=1}^{n}{q(S) \log q(S)} -\sum_{S=1}^{n}{q(S) \log p(S)} + \lambda (\frac{\sum_{j=1}^{i}{BS_i}}{T} - B)$
To proceed: Lets say we at the $i^{th}$ iteration. We know the actual box sizes that we already sent out for 1...i-1 iterations, and the actual size of the present box. i.e. $BS_0..BS_i$ are known, and hence p(BS) can be calculated and hence is a known value. We know the output box sizes till the $i-1^{th}$ iteration.
So, we have,
$ \Lambda(BS_i,\lambda,q) = \sum_{S=1}^{n}{q(S) \log q(S)} -\sum_{S=1}^{n}{k*q(S) } + \lambda (\frac{\sum_{j=1}^{i}{BS_i}}{T} - B) $
How do we now solve the optimization function, in order to get the output box size for this $i^{th} iteration?
We have, $\frac{\partial\Lambda}{\partial BS_i} = 0$, $\frac{\partial\Lambda}{\partial q(S)} = 0$, and $\frac{\partial\Lambda}{\partial \lambda} = 0$
The partial differentiation wrt $\lambda$ results in a trivial solution. Since q(S) is dependent on $BS_i$, we will get a factor of $\frac{\partial q(S)}{\partial BS_i}$ in the other two equations.
How do I proceed from here? What I am thinking wrong?
Since your box size is most likely a finite set, the entropy maximizing distribution over this set is uniform. Let's call this PMF over your set of box sizes $\mathcal{X}$, $t(x)$.
You say however that because of practical constraints, there is a cost constrained associated with your factory process. Let's assume there is a set of costs $c = [c_{1},c_{2},\dots ,c_{N}]$ corresponding to each box size in your set $\mathcal{X}$
We are now ready to formulate the optimization problem. \begin{equation} \begin{array}{rrclcl} \displaystyle \min_{p} & {\text{D}(p||t)} \\ \textrm{s.t.} & c^{T}p\leq S\\ & 1^{T}p = 1 \end{array} \end{equation}
cost function is the KL divergence between the desired distribution and the uniform distribution. First constraint is the cost inequality constraint and the second constraint is the simplex probability constraint (must sum to 1).
The problem is obviously convex (convex function with linear inequality and equality constraints) and can be solved exactly with tools like CVX and the solvers available there.
However the solution can be found also in a simpler fashion by writing the Lagrangian and KKT conditions. \begin{equation} L(p,\mu,\lambda) = \text{D}(p||t) + \lambda(c^{T}p - S) + \mu(1^{T}p - 1) \end{equation}
Spelling out the KKT conditions one gets the following conditions: \begin{equation} \begin{array}{rrclcl} p_{i} = 0 & \text{if} & t_{i}=0 \\ \log(p_{i}) = \log(t_{i}) - 1 - \mu - \lambda c_{i}& \text{if} & t_{i}>0\\ \end{array} \end{equation}
From these conditions we get that the optimal distribution $p$ is given by: \begin{equation} p_{i}= \frac{t_{i}e^{-\lambda c_{i}}}{\sum_{j}t_{j}e^{-\lambda c_{j}}} \end{equation}
where the optimal value for $\lambda$ is obtained from : \begin{equation} \frac{\sum_{i}c_{i}t_{i}e^{-\lambda c_{i}}}{\sum_{i}t_{i}e^{-\lambda c_{i}}} = S \end{equation}
I am not sure this is exactly what you meant to solve but at least this is what I understood you waned and how I would approach the problem as I understood it.
Now, in order to determine which box to output each time you would only have to sample from the optimal output distribution $p$.