Understanding an Equation and how to implement it

48 Views Asked by At

A common method for linking language with psychological variables involves counting words belonging to manually-created categories of language. One counts how often words in a given category are used by an individual, the percentage of the participants' words which are from the given category:

enter image description here

where enter image description here is the number of the times the participant mentions enter image description here and enter image description here is the set of all words mentioned by the subject.

I currently have 5 categories with some words in each, I also have 100 texts = 900 words, So i am trying to get how many words from each category was used in the 100 text using the above equation.

1

There are 1 best solutions below

0
On

The above equation gives you the probability with which a particular subject text may belong to a particular category.

The probability can be calculated as the division of (Sum of frequency with which each word in the category appears in the subject text) with the (Sum of frequency of each word of the subject text inside the subject text).

So if category "Sports" has words : cricket , volleyball, football.

And if text is : "Cricket football game football"

So probability that category of text is "Sports" can be calculated as:

p(Sports | text) = (freq (cricket) in text + freq(volleyball) in text + freq(football) in text) / (freq(Cricket)+freq(football)+freq(game))

= (1+ 0 + 2)/(1 + 2 +1)

=3/4

You calculate this probability for each category for every subject. And the category which has highest probability for your subject is the category of the subject.