I'm having trouble with this probability problem:
Let’s say we have training data on 1000 pieces of fruit and among them
500 are bananas, 300 are oranges and 200 are other fruits. We consider
3 features of each fruit, whether it’s long or not, sweet or not and yellow
or not, as displayed in the table below.
Fruit Long Sweet Yellow
Banana 400 350 450
Orange 0 150 300
Other 100 150 50
Now given an addition fruit with the features: Long, Sweet and Yellow,
what is your prediction, the fruit is a banana, an orange or an other fruit? Why?
The only way I can manage to come up with an answer is to assume that the probabilities that a fruit is long, sweet, or yellow are completely independent, which doesn't really make sense in a real world example.
Am I correct in my assumption that this is the only way to solve the problem? Or am I missing something?
You're right.
Consider for example that it is consistent with the given data that none of the other fruits are all of long, sweet and yellow. In that case, a long sweet yellow fruit would certainly be a banana.
On the other hand, it is also consistent with the given data that all of the 50 yellow others are also long and sweet, but only 200 of the bananas have all three properties. In that case a long sweet yellow fruit could be either a banana or an "other".
So you're right that you need to make some additional assumptions -- but note well that the exercise doesn't ask you to be sure what the probabilities objectively are, just for "what is your prediction?" Assuming independence seems to be as principled as anything else you could do with this background.
(Part of the exercise is surely to be explicit about which additional assumptions you're making, so good work so far!)