I try to solve the normal equation and find this

I understand that it is a scalar by vector derivation and is
But I cannot understand the answer, why the sigma sign disappear and why $\theta$ transpose?(I think that sigma sign should remain)
I am new to matrix calculus and I have read the matrix calculus in wiki for a while(and see the identity matrix, seems helpful) but still cannot really understand.
How to determine numerator or denominator layout? Can it be freely chosen?(For example, is it numerator due to w is column vector?)
First of all, you seek to minimize $E(x)$ which is positive, so you can drop the $\frac{1}{2}$ and just minimize
$R(\textbf{w})=\sum_{i=1}^N(y(x_i,\textbf{w})-t_i)^2$
For simplicity, write $Y=y(x,w)$ and $e=Y-T$, where $T=(t_1,...,t_N)\in\mathbb{R}^N$. Obvioulsy, $R$ is the squared vector norm:
$R(w)=\sum_{i=1}^N(e_i)^2=||e(w)||^2$
Let us now define the Jacobian of a given function $f:\mathbb{R}^N\rightarrow\mathbb{R}^M$ as \begin{equation} \tag 1 \frac{\partial{f}}{\partial{w}}=(\frac{\partial{f}}{\partial{w_1}} ... \frac{\partial{f}}{\partial{w_N}}) \end{equation} Note that sometimes (as in the answer you supplied in your question) the above equation defines the gradient instead of the Jacobian. But for this question's sake, it really doesn't matter. What is important is that whichever convention is chosen, the gradient is awlays defined to be the transpose of the Jacobian (or vice versa).
Now, with that in mind, we will have \begin{equation} \tag 2 \frac{\partial{R}}{\partial{w}}=2e^T\frac{\partial{e}}{\partial{w}} \end{equation}
(I will come back to the reasons for that later.) We seek a minimum, so we equate this to zero. Thus:
\begin{equation} \tag 3 e^T\frac{\partial{e}}{\partial{w}}=(Y-T)^T\frac{\partial{(Y-T)}}{\partial{w}}=(\phi w-T)^T\phi=0 \end{equation} transposing that, we have \begin{equation} \phi^T(\phi w-T)=0 \end{equation}
Which gives you the desired answer. Note that in the answer you supplied, you see the transpose of equations 2 and 3. That is because in that answer, they use equation 1 as the definition of gradient instead of the Jacobian.
Now, let's see why equation 2 holds. Let's prove a more general case! Let $m$ and $n$ be vector functions from $\mathbb{R}^M$ to $\mathbb{R}^N$. That is $m=(m_1(X),...m_N(X))^T$ and $n=(n_1(X),...n_N(X))^T$ with $X\in\mathbb{R}^M$ We will prove that \begin{equation} \frac{\partial{m^Tn}}{\partial{X}}=m^T\frac{\partial{n}}{\partial{X}}+n^T\frac{\partial{m}}{\partial{X}} \end{equation}
Note that
\begin{equation} \frac{\partial{m^Tn}}{\partial{X_i}}=\sum_{j=1}^N\frac{\partial{m_jn_j}}{\partial{X_i}}=\sum_{j=1}^N\frac{\partial{m_j}}{\partial{X_i}}n_j+\frac{\partial{n_j}}{\partial{X_i}}m_j \end{equation}
So, we have (definition of Jacobian, i.e. equation 1)
\begin{equation} \tag 4 \frac{\partial{m^Tn}}{\partial{X}}= \begin{matrix} (\sum_{j=1}^N\frac{\partial{m_j}}{\partial{X_1}}n_j+\frac{\partial{n_j}}{\partial{X_1}}m_j && ... &&\sum_{j=1}^N\frac{\partial{m_j}}{\partial{X_M}}n_j+\frac{\partial{n_j}}{\partial{X_M}}m_j \end{matrix} )= ( \begin{matrix} \sum_{j=1}^N\frac{\partial{m_j}}{\partial{X_1}}n_j && ... && \sum_{j=1}^N\frac{\partial{m_j}}{\partial{X_M}}n_j \end{matrix} )+( \begin{matrix} \sum_{j=1}^N\frac{\partial{n_j}}{\partial{X_1}}m_j && ... && \sum_{j=1}^N\frac{\partial{n_j}}{\partial{X_M}}m_j \end{matrix} )=( \begin{matrix} n^T\frac{\partial{m}}{\partial{X_1}} && ... && n^T\frac{\partial{m}}{\partial{X_M}} \end{matrix} )+( \begin{matrix} m^T\frac{\partial{n}}{\partial{X_1}} && ... && m^T\frac{\partial{n}}{\partial{X_M}} \end{matrix} )=n^T\frac{\partial{m}}{\partial{X}}+m^T\frac{\partial{n}}{\partial{X_M}} \end{equation}
So equation 2 is really nothing more but a spetial case of equation 4 (and equation 3 is the reason $\phi$ was transposed in the answer you supplied). That's it! I hope that helps!