What is the bound of the runtime for the following algorithm?

25 Views Asked by At

Suppose we have a list of N predictors, and we are trying to select a subset of these predictors that give the top M best fit models for a regression. I intend to use BFS to get figure out the desired subsets, but the runtime will go crazy as N increases. The worst case is that I have to try out every single subset of these predictors, which gives a runtime of $O(2^N)$.

Instead, I attempt to use a greedy approach to reduce the runtime:

First we start with an empty set, and we add single predictor to the set and compute its accuracy. In other word, now we have N models (and we have N subsets), each using only 1 predictor. Then, we will only keep the best M of these subsets.

Then, we go on to the second layer. Currently we have M subsets, each of which has only 1 predictor. For each of these subsets, pick one at a time, and predictors (not in the current subset) to this set, and compute its accuracy. Now, for each subset, it should have N - 1 branches, each of which is a subset with 2 predictors. Again, keep only the top M branches out of these N - 1 branches. Also, it is important to note that since we are iterating the M single-element subsets in a sequential way (not in parallel), if the generated 2-element subset is already computed before, we will abandon it immediately so that it won't be counted in the runtime.

Go on to the third layer, and fourth, and so on... Strategy is the same, but be sure not to over-count the subsets that I have already computed in other branches.. If at the current layer we have less than M branches to extend (there are less than M predictors NOT in the current subset), then keep all of them as long as they give better performance than the model at the current layer.

Stop when we have tried all subsets, or when we can no longer find any model in the extended layer with better performance than the model at the current layer.

I want to know total number of subsets I have to compute. It doesn't have to be an exact number, but it has to be a range that is as compact as possible.

Thanks!