I am studying Machine Learning through Prof. Ng's Stanford lectures. In lecture 3 (lecture notes) he models the output variable $Y$ and input variable $X$ with parameters $\theta$ as
$$Y = \theta^T X + \epsilon \tag{1}$$
where $\epsilon$ is a Gaussian with mean $0$ and variance $\sigma^2$. He says that this implies that the probability distribution of $Y$ conditional on $X$ is
$$Y\mid X ;\theta\sim G(\theta^Tx, \sigma^2)$$
I cannot come up with a rigorous proof of the above. I intuitively understand that if $Y$ is conditioned on $X$, then we can treat $X$ as constant in equation $(1)$, and we can treat $Y$ as the sum of a Gaussian and a constant, which will be also be Gaussian.
Is there a more formal proof of the statement?