How to develop an algorithm to prioritize set members based on various criteria

84 Views Asked by At

I tried looking for responses and Google. It has been a while since I used math to any capacity and the lack of application is only surpassed by my inability to articulate the concepts. That's my long-winded way of saying I tried but IDK if I even typed in what I really needed to here and in Google to find an answer.

The problem is that I have a dataset with 100,000 members consisting of text strings. They are domain names. There are several metrics to measure:

  • name length
  • does it contain a dash
  • does it contain numbers
  • is it a recognized word(s) only
  • is it a trademark
  • google statistics
  • alexa statistics

Some of this might seem complicated to automate or get but it's really not that bad... except for the HMM I need to make for recognizing words. From my reading I feel that is totally doable. But let's say I gather all the metrics.... I'm absolutely lost as to how to use them to prune the list, or even where to begin for self-help. Even that alone would be of great assistance.

This problem came about when my brother and I did an initial pruning of a 90k member set and were left with 2,900 domains in our top bucket and 11,000 in our runner up. We were hoping for something more manageable. I'd like to get it to 30 in each.

(I apologize if my tags are off. Not sure where to go.)

1

There are 1 best solutions below

0
On BEST ANSWER

Not too long after I had asked this question, during a break coding, I did a search and something caught my eye, "Using Weighted Criteria to Make Decisions".

A summarized version is as follows:

  • Chose which features you would like to use as criteria in the model
  • After collecting the values of those criteria they must be converted to a similar numerical scale so the model is comparing apples-to-apples.
  • Weights are assigned to each criteria
  • The sum of the products of the weight / criteria score pairs comprise the result for each record
  • The model is run and weights are adjusted until results are in line with expectations. Hence it is an iterative process.

The article recommends reading up on decision analysis, value-focused thinking, and multiple-objective decision. Philipph recommended multicriteria approximation algorithms which was helpful.

If I find anything of further interest I'll update.