I'm working on a dynamic programming problem and actually, I'm not quite sure whether it is dynamic programming since moving average $M$ is based on previous $M$. No need to consider the efficiency. The problem requires selling a product over $T$ time periods and maximizing the total actual sale amount. The total number of products is $N$ and I plan to sell some products over different periods $n_0, n_1, \cdots, n_{T-1}$ and $\sum_{i=0}^{T-1} n_i = N$.
In conclusion, this question wants to find the most optimal schedule for $n_0, n_1, \cdots, n_{T-1}$ such that $\sum_{i=0}^{T-1} n_i = N$, which maximizes the $\sum_{i=0}^{T-1} S_i$.
And the actual sale amount $S_i$ are based on current moving average $M$ and current $n_i$.
Assume that $\alpha = 0.001$ and $\pi = 0.5$
- Initialize $M = 0$. Then for $i = 0, 1, \ldots, T-1$
- Compute new $M_i = \lceil 0.5*(M_{i-1} + n_i)\rceil$
- At time $i$ we sell $S_i \ = \ \lceil(1 - \alpha M_i^\pi)n_i \rceil$ products
Continue this process until the last step. For example, assume we already know $n_i$ for all periods, the trading will be below
M = 0
T = 4
N = 10000
alpha = 1e-3
pi = 0.5
S = np.zeros(T,dtype='i')
n = np.array([5000,1000,2000,2000])
print(n)
total = 0
for i in range(T):
M = math.ceil(0.5*(M + n[i]))
S[i] = math.ceil((1 - alpha*M**pi)*n[i])
total += S[i]
print('at time %d, M = %d and we sell %d products' %(i,M,S[i]))
print('total sold =', total)
My idea is to keep track of the state based on time period, products left, and moving average as index and store the actual sale in a high dimension matrix. I think the upper bound for moving average is just $[0,N]$ I'm still confusing how to program it. Could someone provide ideas about dynamic programming? Thank you very much. The below is some of my crude codes but the output is a little strange.
def DPtry(N,T,alpha,pi,S):
schedule = np.zeros(T)
M = 0
for n in range(0,N+1):
for m in range(0,n+1):
S[T-1,n,m] = math.ceil((1 - alpha*m**pi)*n)
for k in range(1,T):
t = T - k - 1
print("t = ",t)
for n in range(0,N+1):
for m in range(0,n+1):
best = -1
for plan in range(0,n+1):
salenow = math.ceil((1 - alpha*m**pi)*plan)
M = math.ceil(0.5*(m + plan))
salelater = S[t+1,n-plan,M]
candidate = salenow + salelater
if candidate > best:
best = candidate
S[t,n,m] = best
print(S[0,N,0])
N = 100
T = 5
pi = .5
alpha = 1e-3
S = np.zeros((T,N+1,N+1))
DPtry(N,T,alpha,pi,S)
```