Help understanding the concept of full rank exponential families

1.8k Views Asked by At

I am studying Exponential Families and there are some concept I do not quite understand completely. Here are a two definitions for rank of an exponential family and a full rank exponential family:

Definition 1:

Let $\mathscr{P}=\{P_\eta:\eta\in H\}$ is an $s-$dimensional minimal exponential family. If $H$ contains an open $s-$dimensional rectangle, then $\mathscr{P}$ is called full-rank. Otherwise, $\mathscr{P}$ is called curved.

Definition 2:

An exponential family is of rank $k$ if and only if the generating statistic $T$ is $k-$dimensional and $\{1, T_1(X), \ldots,T_k(X)\}$ are linearly independent with positive probability. Formally, $P_\eta\left(\sum_{j=1}^{k}a_jT_j(X)=a_{k+1}\right)<1$ unless all $a_j$ are zeros.

Here, definition 1 I took from this note and definition 2 is from Doksum & Bickel's.

It is definition 2 that makes me confused. When I read the first sentence of definition 2, I translate it as follows: There exists a set $A$ such that $P_\eta(A)>0$, and if $\sum_{j=1}^{k}a_jT_j(X)=a_{k+1}$ for all $x\in A$, then $a_1=a_2=\cdots=a_{k+1}=0.$ But then how is it equivalent to the second sentence of definition 2? In other words, how should I understand the sentence "Formally, $P_\eta(\sum_{j=1}^{k}a_jT_j(X)=a_{k+1})<1$ unless all $a_j$ are zeros" correctly?

1

There are 1 best solutions below

0
On

I think the second sentence means,

$$ \forall(a_1,\ldots,a_{k+1})\neq0,\, P_\eta(\sum a_jT_j(X) = a_{k+1}) \lt 1 $$ or restated in conditional argument form, $$ (a_1,\ldots,a_{k+1})\neq0 \Rightarrow P_\eta(\sum a_jT_j(X) = a_{k+1}) \lt 1 $$

And it is equivalent to the following forms.

$$ \begin{aligned} &(a_1,\ldots,a_{k+1})\neq0 \Rightarrow P_\eta(\sum a_jT_j(X) \neq a_{k+1}) \ge 0 & (1)\\ &P_\eta(\sum a_jT_j(X) \neq a_{k+1}) = 0 \Rightarrow \forall(a_1,\ldots,a_{k+1}) = 0 & (2)\\ &P_\eta(\sum a_jT_j(X) = a_{k+1}) = 1 \Rightarrow \forall(a_1,\ldots,a_{k+1}) = 0 & (3) \end{aligned} $$

(2) is contrapositive of (1).

I think that this notion is equivalent to $P$-affine independence. Check this link p.15

Also, I think the reason why there is "positive probability" in the definition 2 of linear independence of statistics is because (3) is exactly analogous to the definition of linear independence of vectors except that it is defined in almost sure sense which can be interpreted as positive probability in easy term.

Also I think the "minimal" part of definition 1 can be derived from definition 2 with independence of parameter assumption added. Again, check this link p.15 Theorem 2.1.9 (i). (Theorem 2.1.9 (i) redirects readers to Witting (1985), Thm. 1.153, p. 145. for proof, but the book is Mathematische Statistik I Parametrische Verfahren bei festem Stichprobenumfang written in German. I also want to understand the full proof but I could not find it in English.)