What is the formal definition for a neural network and do you have any good sources to read?

1.2k Views Asked by At

I'm trying to find a mathematical definition for a neural network, does anyone have a source for a clear, mathematical definition of neural network? From my own knowledge I would say something like: A neural network is a function $F:\mathbb{R}^n \to \mathbb{R}^m$ that is determined by means of training on a dataset through a process called backpropogation. And then explain what backpropogation is.

Can you help me find a concise, precise resource for neural networks (or just give a definition for a neural network)?

I did find this "definition" on wikipedia that is close to what I'm looking for but still feels a bit vague (wikipedia: Math of ANN): "Mathematically, a neuron's network function is defined as a composition of other functions, that can further be decomposed into other functions."

1

There are 1 best solutions below

0
On BEST ANSWER

I found a paper, Neural Network Classification and Formalization by Fiesler, which goes into detail about the formal definition. I have summarized it here.

A neural network is a 4-tuple $\mathcal{N}= (C,T, S(0), \Phi)$ consisting of constraints, topology, initialization state, and transition functions, as defined below.

The constraints $C=(C_W,C_\Phi,C_A)$ dictate the range of values in the network, where $C_W \subset \mathbb{R}$ is called the weight constraint, $C_\Phi \subset \mathbb{R}$ is the local threshold or bias constraint, and $C_A \subset \mathbb{R}$ is the activity or neuron value constraint.

The topology is an ordered pair $T =(F,I)$, which consists of the framework and interconnection structure.

The framework $F=\{ c_l \in C_A^{N_l} : l \in \{ 1,2,\ldots L\}\}$ is the set of $L \in \mathbb{N}$ clusters $c_l$, where the $l$th cluster contains $N_l \in \mathbb{N}$ neurons $n_{l,i} \in C_A$. The majority of neural networks in practical use, including in this work, have ordered clusters, in which the clusters are called layers.

The interconnection structure is determined by the relation $R$ on $D_R \subset \Omega_{l} \times \nu $, where $\Omega_l = \mathcal{P}(\{n_{l,1},n_{l,2}, \ldots n_{l,N_l}\})$ contains all source neurons for each connection, $\nu = \{ n_{m,j} : 1 \leq m \leq L, 1 \leq j \leq N_m\}$ is the set of all neurons, and $D_R$ contains $W\in \mathbb{N}$ related (connected) pairs of subsets of a layer and the corresponding neuron to which those subsets are connected: $I=\{ (\omega_{l,i}, n_{m,j}) \in D_R : \omega_{l,i} R n_{m,j} , 1 \leq l < m, 1 \leq i \leq W_l \} $, where there are $W_l \in \mathbb{N}$ connections from layer $l$.

The initialization state $S(0)=\{W(0),\Theta(0),A(0)\}$,
where $W(0) =\{ W_{\omega_{l,i} m_j}\in C_W: 1\leq l \leq L, \omega_{l,i} \in \Omega_l, 1 \leq m \leq L, 1 \leq j \leq N_m \}$ is the intial weight state where $W_{\omega_{l,i} m_j}$ is a (higher order) weight from source neurons $\omega_{l,i}$ of layer $l$ to neuron $j$ of layer $m$ (if $m > l+1$); $\Theta(0) =\{ \theta_{{l,i}}\in C_\Theta: 1 \leq l < L, 1 \leq i \leq W_l\}$ , and $A(0) = \{a_{1,i}\in C_A: 1\leq i\leq N_1$}.

The transition functions make up a 4-tuple $\Phi = (nf, lr, cf, of)$, which includes:

(this area could use more detail)

The neuron/activation/transfer function $nf:c^*\to C_A$, $c^* \subset c_l$ for some $l$, which specifies the output of a neuron given its inputs;

the learning rule, $lf:C_A^{N_1} \times C_W^W \to C_W^W $, where $C_A^{N_1}$ is the data and $C_W^W$ is the set of all possible weights, which defines how weights and biases will be updated;

the clamping function $cf$, which determines when certain neurons will be unaffected by new information;

and ontogenic functions $of$ that specify changes in the neural network topology.