Show that the random vector follows a multinomial distribution and find it's parameters.

117 Views Asked by At

I am trying to show the following in the below setup, I have written my answers and approach below. I am having a hard time understanding the second and last part, especially the last part.

Consider random variables $(X, Y)$, $X\in \mathbb{R}$ and $Y \in \{1, \dots, K\}$. Consider the model $P(Y = j|X = x) \propto e^{\alpha_j + \beta_jx}$ where $(\alpha_1, \beta_1, \dots, \alpha_k, \beta_k)$ are parameters of the model and are known.

a) Write down the full expression for $P(Y = j|X)$ not just the proportional version given above.

My answer - $P(Y = j|X) = \frac{e^{\alpha_j + \beta_jx}}{\sum_{i = 1}^{K} e^{\alpha_i + \beta_ix}}$

b) Given $X = x$, we generate $Y |X = x$ from $P(Y = j|X = x)$. Let $Z = (Z_1,··· ,Z_k)^T$ such that $Z_j = I(Y = j)$. Show that $Z|X = x$ follows from a Multinomial distribution. What are the underlying parameters of this Multinomial distribution?

I can tell intuitively that $Z$ ~ $M_k\bigg(1,\frac{e^{\alpha_j + \beta_jx}}{\sum_{i = 1}^{k} e^{\alpha_i + \beta_ix}}, \dots, \frac{e^{\alpha_K + \beta_Kx}}{\sum_{i = 1}^{k} e^{\alpha_i + \beta_ix}} \bigg)$ I'm not sure if this is correct and even if it is, I'm not sure how to formally show this.

c) Suppose that $X_1,··· ,X_n$ are IID from a PDF $q(x)$. Then we generate $Y_i|X_i$ from $P(Y_i = j|X_i)$, using the model described in the above. This leads to a set of random variables $Y_1,··· ,Y_n$. Then we define $W = (W_1,··· ,W_k)^T$ such that $$W_j = \sum_{i=1}^{n}I(Y_i = j)$$

Show that the vector follows a multinomial distribution and find it's parameters.

This is the part that I'm having the most trouble with. I can't understand what the probabilities will be for this part. Can anyone please explain what the parameters will be?

1

There are 1 best solutions below

0
On BEST ANSWER

Let $[K]=\{1,\dots,K\}$. We consider any random variable $X\in\mathbb R$ and the joint distribution $(X,Y)\in\mathbb R\times[K]$ is given by the Markov kernel $P(Y=j|X=x)=\frac{1}{z_x}\exp(\alpha_j+\beta_jx)$. Let $p_x=(P(Y=j|X=x))_{j\in[K]}$ be the point probabilities.

a) Yes, we have $z_x=\sum_{j=1}^K\exp(\alpha_j+\beta_jx)$.

b) Notice that $Y$ given $X=x$ follows a categorical distribution with parameters $k=K$ and $p_x$. The support of $Z$ is $\{e_j:j\in[K]\}$, where $e_j\in\{0,1\}^K$ is the $j$-th base vector, given by $e_j(i)=1$ if and only if $i=j$. This is due to the fact that $Y$ can only take one value, meaning that we have $Z=e_j$ if and only if $Y=j$. This also shows that $P(Z=e_j|X=x)=P(Y=j|X=x)=p_x(j)$, so $Z$ given $X=x$ follows a multinomial distribution with parameters $n=1$ and $p_x$. Here, I define the multinomial distribution as the absolute frequencies of $n$ IID random variables with point probabilities $p_x$. If you define it by the point probabilities, then notice that $\binom{1}{e_j}\prod_ip_x(i)^{e_j(i)}=p_x(j)$.

c) We have a slight issue here. The intended assumption is that $Y_i|X=x$ for $x\in\mathbb R^n$ has point probabilities $p_{x(i)}$, meaning that we do not only condition on $X_i$, but on the entire vector $X$. Here, the additional information is that the random variables $Y_1,\dots,Y_n$ are conditionally independent. Since we have explicitly been given the pdf $q(x)$, I guess we're supposed to verify manually that $P(Y=y)=\int P(Y=y|X=x)\prod_iq(x_i)\mathrm dx=\int \prod_i(p_{x_i}(y_i)q(x_i))\mathrm d x=\prod_i\int p_{x_i}(y_i)q(x_i)\mathrm{d}x_i$, using the shorthand $x=(x_1,\dots,x_n)$. So, with $p^*=(\int p_x(j)q(x)\mathrm d x)_{j\in[K]}$ we have $\sum_jp^*(j)=\int\sum_jp_x(j)q(x)\mathrm dx=1$ and further $Y=(Y_1,\dots,Y_n)$ are IID with point probabilities $p^*$ by the above. This means that they follow a categorical distribution, and as I said in the last part, according to my definition this means that the absolute frequencies $W$ follow a multinomial distribution. But we can also compute this by hand. We have $W=w$ if there exist exactly $w(j)$ indices $i$ with $Y_i=j$. There are $\binom{n}{w}$ possibilities to select the indices for each category, where $\binom{n}{w}=\binom{n}{w(1),\dots,w(K)}$ is a multinomial coefficient. No matter how we select the indices, the joint probability that we get the outcome $j$ a number $w(j)$ of times, is $\prod_{j=1}^Kp^*(j)^{w(j)}$. Putting this together gives $P(W=w)=\binom{n}{w}\prod_{j=1}^Kp^*(j)^{w(j)}$.

Finally, notice that we never needed the actual distribution $p_x$, this works for any choice (doesn't have to be $\frac{1}{z_x}e^{\alpha_j+\beta_jx}$).