Pattern Recognition and Machine Learning Exercise 3.6

48 Views Asked by At

This is problem 6 from Chapter 3 of Bishop's Pattern Recognition and Machine Learning. Note that $\pmb{W}$ is an $M \times K$ parameter matrix, $\pmb{T}$ is an $N \times K$ matrix of target variables, $\pmb{\Phi}$ is the design matrix, and $\pmb{\Sigma}$ is the covariance matrix. So far, I've shown that from the log likelihood function $$ \ln L(\pmb{W}, \pmb{\Sigma}) = -\frac{N}{2} | \pmb{\Sigma} | - \frac{1}{2} \sum_{n=1}^{N} \big( \pmb{t}_n - \pmb{W}^{\top} \phi(\pmb{x}_n) \big)^{\top} \pmb{\Sigma}^{-1} \big( \pmb{t}_n - \pmb{W}^{\top} \phi(\pmb{x}_n) \big) $$ we can differentiate with respect to $\pmb{W}$ and set it equal to $\pmb{0}$ to get $$ - \sum_{n=1}^{N} \pmb{\Sigma}^{-1} \big( \pmb{t}_n - \pmb{W}^{\top} \phi(\pmb{x}_n) \big) \phi(\pmb{x}_n)^{\top} = \pmb{0}.$$

However, the next step is where I am getting stuck. I need to show that the maximum-likelihood solution of $\pmb{W}$ has the property that each column is given by $$ \pmb{w}_{ML} = (\pmb{\Phi}^{\top} \pmb{\Phi})^{-1} \pmb{\Phi}^{\top} \pmb{t}.$$

So far, my idea is to perform the summation to obtain $$ \pmb{\Sigma}^{-1} (\pmb{T} - \pmb{W}^{\top} \pmb{\Phi})\pmb{\Phi}^{\top} = \pmb{0}$$

and manipulate the matrix products to get the form $$ \pmb{W}_{ML} = (\pmb{\Phi}^{\top} \pmb{\Phi})^{-1} \pmb{\Phi}^{\top} \pmb{T}. $$

However, I'm not sure how to show this, as I haven't been able to obtain the desired form in this manner. I would greatly appreciate any guidance on how to proceed from here, and if my idea is correct.