Hello math@stackexchange community !
I have a simple question that seems to have a non trivial answer.
Given a discrete one dimensional signal $w(x)$ defined in a finite range, and the boxcar (rectangular) function $r(x)$
$$r(x)=\begin{cases} 1 & \mbox{if }0\leq x \leq 1; \\ 0 & \mbox{elsewhere} \end{cases}$$
I would like to find the coefficients $a_i,\ b_i,\ c_i $ of the sum
$$w' = \sum_{i=0}^{N}\ { a_i \cdot r\left(\frac{x}{b_i} - c_i\right)}$$
("sum of $N$ rectangles in any range and of any height") such as $\sum_i\ \left| w_i - w_i'\right|$ is minimized (for a given $N$).
This problem seems related to:
- Discrete wavelet transform
- $l_1$ regularized solution of an overdetermined linear system
- Maximum subarray problem
However, to my understanding it does not fit any of these cases:
- $r(x)$ is not a wavelet basis,
- the problem cannot be solved (practically) as a linear system because the (finite) set of $a_i,\ b_i,\ c_i $ values is too large to compute explicitly (length of $w$ in the order of $10^4$),
- Since $a_i$ is undefined, it does not fit as a maximum subarray problem.
Right now I have an approximate solution (iteratively solving the problem via maximum subarray formulation by brute force exploring a subset of possible $a_i$ values), however the idea of "decomposing a signal as a sum of rectangles" seems general enough to think that someone has already addressed it in the past.
Do any of you have a suggestion on how to tackle this problem ?
Has it already been solved in the past, by a method I am not aware of ?
Thank you very much for your answers.
I experimented with your code a bit, and found so many things that I figured I should just post another answer:
As stated in a comment, you made a mistake when you ported my code. (BTW, I meant "convergence" instead of "conversion," of course).
The
explore_afunction doesn't do what it says on the tin (so you managed to break your own code as well, I guess :-)). Since it directly uses the value of the maximum subarray as the score, only the two instances ofa_valueclosest to zero are ever taken. You need to calculate the effect on $|w-w'|$ instead.It doesn't quite match your description either: The true effect of
a_valueis that it turns positive samples belowa_valueinto negative samples (and negates the array if it is negative). However, this is actually a nice solution for the problem that I tried to describe in my first answer. So I decided to fix the code; now it seems to work really well.I also think that zero should always be tried as an
a_value, both with and without negating the signal (which is equivalent to finding the maximum and minimum subarray, similarly to my first algorithm).This can be combined both with length restriction and trimming, as described in my first answer. Length restriction turns out to produce worse results, and it is also rather expensive. The only positive effect is that it degrades to the "maximum absolute value" algorithm once the maximum length becomes 1, so it reaches convergence more quickly. Trimming tends to help a bit, but (as in every greedy algorithm) a better result in one step can lead to worse results in the following steps.
Here is an updated graph (of a different signal, since unfortunately you did not use fixed random seeds, and with different order and colors because they are chosen automatically):
explore_ais hidden behindexplore_a_opt(i.e., with trimming), andexplore_a_len(with length restriction) is hidden behindexplore_a_len_opt.Looking at this graph, a few more things come to mind:
The fixed
explore_aseems to produce better results thanbrute_force(which is actually a misleading name because it is really still a greedy algorithm, just with brute force in each step). I guess this means thatbrute_forcedoesn't quite do what it is supposed to either.The test signal isn't particular well-suited for approximation using rectangles. It is essentially white noise with two rectangles added to it. Once the algorithms have "found" these rectangles, they can't do much better than
abs_maxany more.It probably makes most sense to compare the algorithms at a number of steps that is significantly lower than the signal length. (Otherwise, what's the point?)
So here is a graph belonging to a signal of length 500 with more low-frequency content, showing the first 100 steps:
Finally, since all algorithms are greedy, I think you should compare them to something like simulated annealing as well (as I mentioned in the comments).