My English is not very good, so I am sorry for the inappropriate expression.
I have a question in "Pattern Recognition and Machine Learning"(Christopher M. Bishop), and it is a formula which is described in (3.13) of ch 3.1.1
I don't understand why we get (3.13) $$ \nabla_{\mathbf{w}} \ln p(\mathbf{t} | \mathbf{w}, \beta) = \beta \sum_{n=1}^{N}\{t_n - \mathbf{w}^\mathrm{T}\phi(\mathbf{x_n}) \} {\phi(\mathbf{x_n})}^\mathrm{T} $$ instead of $$ \nabla_{\mathbf{w}} \ln p(\mathbf{t} | \mathbf{w}, \beta) = \beta \sum_{n=1}^{N}\{t_n - \mathbf{w}^\mathrm{T}\phi(\mathbf{x_n}) \} {\phi(\mathbf{x_n})}. $$
The first is ${\phi(\mathbf{x_n})}^\mathrm{T}$ and the second is ${\phi(\mathbf{x_n})}$ on the right hands side. Could anyone teach me ? Many thanks !
Your answer is correct. The derivative of scalar with respect to a column vector should be a column vector. Please check out the link. https://yousuketakada.github.io/prml_errata/prml_errata.pdf