I have a list of products where I want to find the best product for a particular user by asking them questions on the criteria of the product they are looking for.
For a simple example, let's say the products are t-shirts, and each t-shirt has 3 properties:
- color: red, blue, or green
- size: small, medium, or large
- material: cotton, polyester, silk
I want to filter the products by asking the user questions such that they can reach the product(s) that suit their preferences in the least amount of questions.
What I am thinking is:
- calculate the entropy on the distribution of each category
- ask the user their preference for the category with the lowest entropy
- filter the products based on their answer
- repeat for the remaining categories and products
- stop once the number of products to select from is below some threshold
1. calculate the entropy on the distribution of each category
This will be done by taking the proportion of each value in the category among the total number of products left, and calculating the entropy across this list of proportions
Let's take color for example:
- There are 10 t-shirts remaining
- 1 is red
- 6 are blue
- 3 are green
- entropy([0.1, 0.6, 0.3]) = 0.89
2. ask the user their preference for the category with the lowest entropy
Let's say color had the lowest entropy among the remaining categories. I ask the user which color t-shirt they would like, and they say green.
3. filter the products based on their answer
We remove all t-shirts which aren't green
4. repeat for the remaining categories and products
We repeat the process on the remaining green t-shirts with the size and material categories.
Obviously, this example of t-shirts is heavily simplified, and in reality my use case is a lot more complex with thousands of products and tens of categories to filter on.
Is this a valid approach to try to narrow the search down in as little questions as possible?