Calculating a weighted sum with matrices, as used in neural networks.

746 Views Asked by At

I have been working with neural networks for some time without employing matrices. So for example, if you have an array of $n$ inputs $X_i$ to a neuron, each feeding that neuron via a connection with weight $W_i$ and the neuron has a sigmoid activation function $\sigma$ then the activation of that neuron (ignoring bias) will be:

$$ \sigma( \sum_{i=1}^{i=n}(X_i W_i) ) $$

But it seems that many other people think and talk about neural networks in terms of matrix operations. Sadly I seemed to have missed matrices in my education and am not comfortable with them at all. I am trying to educate myself but I feel I am getting some things confused. In particular, I have seen people say that the aforementioned function describing the activation of a neuron can be expressed as:

$$ \sigma( W^T X ) $$

This confuses me because if both W and X are $n \times 1$ matrices then surely $(W^T X)$ is an $n \times n$ matrix... surely the $\sigma$ function wants to take a single scalar argument? What am I misunderstanding?