Math notations for categorical variables

1k Views Asked by At

I would like to formalise some operations I am doing, however it is unclear how I should deal with categorical variables.

Imagine a dataset with 15 distinct couples (ID). Each couple was observed 3 times (time).

Each partner has responded to two questions: p and act. p is a dummy vector {0, 1} while act is a categorical variable with 4 levels {a,b,c,d}. _m refers to partner male and _w to partner female.

      ID  time   p_m   p_w  act_m  act_w
 1     A     1     1     1      c      b
 2     A     2     1     1      b      c
 3     A     3     1     1      c      d
 4     B     1     1     1      b      b
 5     B     2     0     1      a      a
 6     B     3     1     1      b      b
 7     C     1     1     1      b      b
 8     C     2     1     1      c      c
 9     C     3     1     1      c      b
10     D     1     1     1      c      b
11     D     2     1     0      b      a
12     D     3     1     1      c      b
13     E     1     1     1      d      d
14     E     2     1     1      b      c
15     E     3     1     1      c      c

First, I am interested in formalising the matches on p. Because p is a dummy variable, it seems that I can simply write:

$joint_{jt} = (p_{jt}^{m} \times p_{jt}^{w})$

where $t$ denotes time, $p_{jt}^{m}$ denotes partner $m$ response at time $t$ belonging to couple number $j$ (and vice versa for partner $w$).

      ID  time   p_m   p_w  act_m  act_w joint_j
 1     A     1     1     1      c      b       1
 2     A     2     1     1      b      c       1
 3     A     3     1     1      c      d       1
 4     B     1     1     1      b      b       1
 5     B     2     0     1      a      a       0
 6     B     3     1     1      b      b       1
 7     C     1     1     1      b      b       1
 8     C     2     1     1      c      c       1
 9     C     3     1     1      c      b       1
10     D     1     1     1      c      b       1
11     D     2     1     0      b      a       0
12     D     3     1     1      c      b       1
13     E     1     1     1      d      d       1
14     E     2     1     1      b      c       1
15     E     3     1     1      c      c       1

However, it is unclear to me if I can use the $\times$ operator for categorical variables. Basically, what I am interested in is when $joint_{jt} = 1$ and $p_m == p_w$, then 1.

My question is, how do you formalise $p_m == p_w$, when $p$ is categorical? I have been told that I could not use logical operators in my papers (econ, sociology field), but that I should use the arithmetic operators. So, how do you express TRUE/FALSE mathematically?

Now I have

$jointact_{jt} = (joint_{jt} \times act_{jt}^{m} \times act_{jt}^{w})$

But this seems wrong to me.

Could I for instance do, first define a vector $act$ if $act_{jt}^{m} = act_{jt}^{j}$ is true

$$ act_{jt} = \left\{\begin{array}{ll} 1 & act_{jt}^{m} = act_{jt}^{w}\\ 0 & otherwise \end{array}\right. $$

and then combine $act_{jt}$, with $join_{jt}$, like

$jointact_{jt} = (joint_{jt} \times act_{jt})$

     ID  time   p_m   p_w  act_m  act_w join_j joinact_j
 1     A     1     1     1      c      b      1         0
 2     A     2     1     1      b      c      1         0
 3     A     3     1     1      c      d      1         0
 4     B     1     1     1      b      b      1         1
 5     B     2     0     1      a      a      0         0
 6     B     3     1     1      b      b      1         1
 7     C     1     1     1      b      b      1         1
 8     C     2     1     1      c      c      1         1
 9     C     3     1     1      c      b      1         0
10     D     1     1     1      c      b      1         0
11     D     2     1     0      b      a      0         0
12     D     3     1     1      c      b      1         0
13     E     1     1     1      d      d      1         1
14     E     2     1     1      b      c      1         0
15     E     3     1     1      c      c      1         1
2

There are 2 best solutions below

0
On BEST ANSWER

Not being able to use logical operators does not make sense. They are hard to avoid (your suggestion for $act_{jt}$ uses them to define cases) and their use makes the paper easier to read.

Fortunately, there is a notationally convenient alternative using orthogonality. Let $\langle \cdot, \cdot \rangle$ be an inner product and let $\{a,b,c,d\}$ be an orthonormal basis. Then, by definition, $\langle a, a \rangle = \langle b, b \rangle = \langle c, c \rangle = \langle d,d \rangle = 1$, while other inner products such as $\langle a,b \rangle$ are $0$. Using $joint_j = p_j^m \cdot p_j^w$, you can write: $$jointact_j = joint_j \langle act_j^m, act_j^w \rangle$$

0
On

Given two integer-valued categorical variables $p_m$ and $p_w$, there are very many ways to express equality via a 1 or 0 using only "arithmetic" operations. E.g.

$1-{\dfrac {\left||p_m-p_w|+1\right|}2}+{\dfrac {\left||p_m-p_w|-1\right|} 2},$

$$\lim_{n\to\infty} \exp(-n|p_m-p_w|)$$

Try using one of these in your paper, then ask the editor to explain to you again why you're not allowed to use logical operators.