Neural Networks dot product / matrix multiplication

Question

Neural Networks dot product / matrix multiplication

2.1k Views Asked by Bumbble Comm At 27 Mar 2026 - 6:07

I was studying neural networks and I bumped into a problem. Normally, we identify the weights as such $w_{ij}^{(l)}$, where $i$ is the node number of the connection in layer $l+1$ and $j$ is the node number of the connection in layer $l$.

Thus the matrix for a NN with $4$ nodes in the first hidden layer

$W^{(1)} = \begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\ w_{21}^{(1)} & w_{22}^{(1)} & w_{23}^{(1)} \\ w_{31}^{(1)} & w_{32}^{(1)} & w_{33}^{(1)} \\ w_{41}^{(1)} & w_{42}^{(1)} & w_{43}^{(1)} \end{bmatrix}$

Now this is multiplied by the input.

HERE IS THE PROBLEM

Normally the input is represented with the features in the columns, and the samples in the rows, for instance

$X = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ x_{41} & x_{42} & x_{43} \\ x_{51} & x_{52} & x_{53} \end{bmatrix}$

would tell us that we have $3$ features and $5$ samples.

Normally we always see the formula $WX + \mathbb{b}$, however this would mean

$\begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\ w_{21}^{(1)} & w_{22}^{(1)} & w_{23}^{(1)} \\ w_{31}^{(1)} & w_{32}^{(1)} & w_{33}^{(1)} \\ w_{41}^{(1)} & w_{42}^{(1)} & w_{43}^{(1)} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ x_{41} & x_{42} & x_{43} \\ x_{51} & x_{52} & x_{53} \end{bmatrix}$

which clearly doesn't work. Thus we should actually have $WX^T + b$, right?

$\begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\ w_{21}^{(1)} & w_{22}^{(1)} & w_{23}^{(1)} \\ w_{31}^{(1)} & w_{32}^{(1)} & w_{33}^{(1)} \\ w_{41}^{(1)} & w_{42}^{(1)} & w_{43}^{(1)} \end{bmatrix} \begin{bmatrix} x_{11} & x_{21} & x_{31} & x_{41} & x_{51} \\ x_{12} & x_{22} & x_{32} & x_{42} & x_{52} \\ x_{13} & x_{23} & x_{33} & x_{43} & x_{53} \end{bmatrix}$

which gives us the correct answer (I think). What is going on??

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-10-24 21:18:44

In neural networks's activation formula you have to do the product of each neuron by its weights. Transposition happens because you have written the X matrix backwards; you wrote:

Normally the input is represented with the features in the columns, and the samples in the rows

If you reverse the way you set the matrix, you obtain the transposition. This is not a mistake, everyone does what he wants.

$WX + b = WX^T$ it depens on X matrix'set.

Remember that you have to get first the sum for each neuron in H layer or you can not activate it. Assuming $WX$ the operation below, using question's notation, is the first dot product:

$ h_{11} = w_{11}*x_{11} + w_{12}*x_{12} + w_{13}*x_{13} $

where $h$'s $i$ is the node number in H layer and $j$ is the sample.

Simple rappresentation of your network $i$ = $j$ = sample

Some time ago, I've used tensorflow to solve the Xor problem:

XOR_X = [[0,0],[0,1],[1,0],[1,1]]
XOR_Y = [[0],[1],[1],[0]]

for i in range(100000):
   sess.run(train_step, feed_dict={x_: XOR_X, y_: XOR_Y})

as you can see features are in the columns and the samples in the rows.

In output matrix : sumH matrix, each row is a neuron, each column a sample. Hope to be helpful, if you need some clarifications write a comment. Best regards Marco.

Neural Networks dot product / matrix multiplication

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions