Notions of consistency / heterogeneity in sets of vector values

160 Views Asked by At

The problem

Let us consider a row vector u of size $n\in\mathbb{N}$, containing only binary values (0,1): $$u=(u_1 \cdots u_n), n\in\mathbb{N}$$ $$\forall i \in \{1\ldots n\}, u_i \in\{0,1\}$$

I would like to define notions of consistency and heterogeneity (for lack of better terminologies...), in order to quantify how the vector values are distributed. Concretely, I would like to mathematically quantify the difference between this (ideal) vector $$u=(0,0,0,0,1,1,1)$$ where I can find a rank k such as: $$\forall i \in \{2\ldots k\} \ \ u(i)=u(i-1) $$ and $$ \forall i \in \mathbb{N}, i>k+2 \ \ u(i)=u(i-1)$$ and this vector $$v=(0,1,0,1,0,1,0)$$ where I can't.
In practice, my vectors (which represent experimental data) will most likely look like that: $$ u=(0,0,0,1,0,0,0,1,1,1,1,0,1,1) $$ and I will have to determine empirically until what point I can consider a vector to contain "homogeneous" values which can be "grouped" into two sets. In this case, the first 7 components (0,0,0,1,0,0,0) would be one set, the next 7 components (1,1,1,1,0,1,1) another set.

Definitions

Heterogeneity

I would define the concept of heterogeneity $\mathscr{H}$ of the m-s+1 sub-values (ranging from u(s) to u(m)) of a vector u as follows: $$\mathscr{H}(u,s,m)=\sum_{k=s}^{m-1}{|u_{k+1}-u_{k}|}$$ $$s\in\{1...n\}, m\in\{1...n\}, n=size(u), s<m$$ Example: $$u=(0,0,1,1,1,0,0)$$ $$\mathscr{H}(u,3,5)=0$$

Perfectly consistent vector

A given vector u of size n is said to be perfectly consistent if and only if $$\mathscr{H}(u,1,n)=0$$ Example: $$u=(0,0,0,0,0,0,0)$$ $$ and $$ $$v=(1,1,1,1,1,1,1) $$ are perfectly consistent vectors of size 7

Perfectly homogeneous vector

A given vector u of size n is said to be perfectly homogeneous if and only if $$\exists i, i\in\{2...n\}, \mathscr{H}(u,1,i-1)=\mathscr{H}(u,i,n)=0$$ The vector u is then said to be homogeneous of rank i.
For instance: $$u=(0,0,0,1,1,1,1)$$ is homogeneous of rank 4.

I most likely need to refine this definition, but you get the idea.

Perfectly heterogeneous vector

A given vector u of size n is said to be perfectly heterogeneous if and only if $$\mathscr{H}(u,1,n)=\max_{v, size(v)=n}\mathscr{H}(v,1,n) $$ Heterogeneity is maximal given all sets of possible vectors of size n containing binary values (0,1). Example: $$u=(0,1,0,1,0,1,0)$$ is a perfectly heterogeneous vector of size 7. So is $$v=(1,0,1,0,1,0,1)$$

My question

Assuming what I wrote above makes sense, I feel I am reinventing the wheel. This surely exists and has be done already somewhere, but my lack of mathematical knowledge prevents me from knowing even what to google and where to look for. Could anyone point me toward useful resources that would help me achieve what I am trying to do here? Is there a better way to name those concepts, instead of "heterogeneity" and "consistency"? Is what I am trying to do a particular case of a broader field, from which I could use the results / theorems with particular conditions?

Misc. considerations

  • This is my first time writing maths in English and on MathOverflow. Any help on improving the quality of this post will be greatly appreciated.
  • I have very basic maths knowledge from undergrad. I need this for modeling correlations in a psychology experiment. The value of the rank i of a homogeneous vector does not matter. What matters is that, as much as possible, my vectors contain two clear "groups" of 0 and 1.
  • I am not sure about the tags for this question. If you have any recommendation, I'd be grateful as well.
1

There are 1 best solutions below

1
On

I'm not sure I fully understand your question, but it seems to make it much simpler is if you transform the vector by just looking at the places where it changes, so for example (0,0,1,1,1)-> (0,1,0,0) and (0,1,0,1,0)-> (1,1,1,1). Then homogeneous and heterogeneous correspond to the number of 1's in your vector.