I recently read the paper Nonparametric Variational Auto-encoders for Hierarchical Representation Learning
And I am confused about how the nCRP prior is ever conditioned on the input data sequence. All the nCRP ever sees is the latent Zmn output of the VAE encoder, is this because during training the nCRP will ultimately output the $z_{mn}$ after receiving the conditioned VAE encoder output?
Note: By conditioned I mean during training the $q(z|x_i)$ term is a specific instance of $z$ for a specific $x_i$
I understand now. In the context of that particular architecture, the latent variables $z_{mn}$ were initially the outputs of the encoder network $q_\phi(z)$ which is an approximation of $p(z|x)$ where the $D_{KL}$ is the forcing function that fits $q_\phi(z)$ to $p(z|x)$.
Then $z_{mn}$ is then treated as the observations of the nCRP prior and variational inference is used to train the parameters of the nCRP to maximize the likelihood of $z_{mn}$. So the conditioning isn't done with $x_i$ it's done via the encoder. Where the encoder transforms each $x_i$ into a sample $z_i$, and a path is chosen based on $z_i$ in order to sample the new latent variables from the nCRP prior.