Matching Metric - How to Normalize for different Amounts in Equation

36 Views Asked by Bumbble Comm At 09 Apr 2026 - 5:33

I'm a non-mathematician.

I'm trying to find if tweets are on topic with news articles algorithmically. Part of this involves taking each word from every tweet and seeing if it's in the news articles and counting the matches. If the number is high, it's probably on the same topic, if it's low, it's probably not.

However, if I have 10 tweets and 200 words in the news articles, the match number needs to be normalized vs. 3 tweets on 400 words in news articles. Assuming each tweet has an average of 10 words to make comparison easier. How would one come up with a number of matches that is reflective of the easier or harder time to match based on amount of words?

Original Q&A

Matching Metric - How to Normalize for different Amounts in Equation

Related Questions in STATISTICS

Related Questions in DATA-ANALYSIS

Related Questions in DATA-MINING

Trending Questions

Popular # Hahtags

Popular Questions