Whenever I've encountered the multinoulli distribution before, I've understood it. However, the book I'm currently reading has some notation that is new to me. Here is the context in which I've found it:
The multinoulli, or categorical, distribution is a distribution over a single discrete variable with k different states, where k is finite. The multinoulli distribution is parameterized by a vector p $\in [0,1]^{k-1}$, where $p_i$ gives the probability of the i-th state. The final, k-th state's probability is given by $1 - \mathbf1^T\mathbf p$. Note that we must constrain $\mathbf1^T\mathbf p\le 1$.
What I haven't seen before is the notation used in the creation of vector p and the vector $\mathbf1$.
In the creation of the vector, what does a range raised to a power mean? Does it simply mean that each element in the vector have value $[0, 1]$? If so, is this notation standard?
In regards to the vector $\mathbf{1}$, is it just a vector of all $1$s? And if so, is this notation common?
While I think the above are the correct interpretations, I want to make sure I'm prepared for the next time I encounter them.
As mentioned as a requirement for the notation tag, this is from the book Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Specifically, it is from section 9 of the probability theory review chapter.
Yes, $[a,b]^m$ does mean an $m$-dimensional space each of whose dimensions can range from $a$ to $b$. Similarly, anything that can be interpreted as a set of values, raised to some $m$-th power, means an $m$-dimensional space each of whose dimensions can assume any of the values in that set.
Yes, $\mathbf 1$ is used as a vector of all $1$s, and note that $\mathbf 1^T \mathbf p$ comes out to be the sum of the values in $\mathbf p$. So that last probability is merely one minus the sum of the other probabilities.