Conditional Likelihood estimation

73 Views Asked by At

I'm reading a book called "Bayesian Reasoning and Machine Learning" and I have come across a question that gives me some problems. Unfortunately I neither have solutions, nor anyone else who I know that reads the book, so I hope someone can help me here.

"Consider a situation in which we partition observable variables into disjoint sets x and y and that we want to find the parameters that maximize the conditional likelihood:

$$ CL(\theta) = \frac{1}{N} \sum_{n=1}^{N}p(y^n | x^n, \theta)$$

for a set of training data $\{(x^n,y^n), n \in \{1,...,N\}$. All data is assumed generated from the same distribution $p(x,y|\theta_0)=p(y|x,\theta_0)p(x|\theta_0)$, for some unknown parameter $\theta_0$. In the limit of a large amount of i.i.d. training data, does $CL(\theta)$ have an optimum at $\theta_0$?"

First of all shouldnt the given likelihood function include a log? I mean the likelihood is product of the probabilities of each data point if the data is i.i.d., so it should be the product right, and I should only get a summation if I take the log of CL.

$$ CL(\theta) = \frac{1}{N} \prod_{n=1}^{N}p(y^n | x^n, \theta)$$

Second and most importantly I'm having problems isolating $\theta$, if I'm not given an explicit function, if I take the partial derivative w.r.t $\theta$, how can I actually solve for $\theta$ if I'm not given an explicit pdf? The actual distribution $p(x,y|\theta_0)=p(y|x,\theta_0)p(x|\theta_0)$ also doesn't help me. Can someone give me some clues how to approach this question?