I'm working out very basic matrix calculus identities. I'm using Andrew Ng's CS 229 Lecture 1 notes, pg. 9, equation 2: http://cs229.stanford.edu/notes/cs229-notes1.pdf : $$\nabla_{A^{T}}f(A) = (\nabla_{A}f(A))^{T}$$ where A is a matrix (or vector).
My question is: if $\vec{y}$ is a function of $\vec{x}$, can you confirm and show me that the following is true: $$\frac {\partial \vec{y}}{\partial \vec{x}^{T}} \stackrel{?}{=} (\frac {\partial \vec{y}}{\partial \vec{x}})^{T}$$ where I've used Ng equation 2 with $f(A) = \vec{y}(\vec{x}), A=\vec{x}$.
Similarly, can you show me if the following is true: $$\frac {\partial \vec{y}^{T}}{\partial \vec{x}} \stackrel{?}{=} (\frac {\partial \vec{y}^{T}}{\partial \vec{x}^{T}})^{T} \stackrel{?}{=} (\frac {\partial \vec{y}}{\partial \vec{x}})^{T}$$ where the first equation used Ng equation 2 with $f(A) = \vec{y}^{T}(\vec{x}), A=\vec{x}^{T}$, and the second equation I've just written because it seems to make sense.
If all of these statements are true, it says that if numerator XOR denominator contains transpose, the derivative matrix is transposed compared to $\frac {\partial \vec{y}}{\partial \vec{x}}$, but if numerator XNOR denominator contains transpose, the derivative matrix is NOT transposed compared to $\frac {\partial \vec{y}}{\partial \vec{x}}$.
P.S.: I prefer the numerator layout over the denominator layout because it has properties that are more intuitive to me (e.g. chain rule follows familiar left to right form, and Jacobian is same form as in multivariable calc classes). This is what I mean by "numerator layout": https://en.wikipedia.org/wiki/Matrix_calculus#Numerator-layout_notation
Please use the numerator layout convention, because even if you think it's arbitrary or prefer different notation, this will make it an order of magnitude easier for me to understand and not worry about transposes.
Thanks.