I am trying to understand the notation used in this famous tutorial
http://ufldl.stanford.edu/wiki/index.php/Neural_Networks
On the very first line, the $i$ in $(x^{(i)},y^{(i)})$ is used to indicate the training example, i.e. the sample index, the time frame (correct me if I am wrong) for the input and output data. In the following lines, $x_i$ is used as the neuron index. This is a bit confusing, I am not sure if the two indices actually refer to the same thing.
This is also used in this other page on the softmax classifier, where the $x_i$ notation is dropped and $x^{(i)}$ is used instead: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
My question is: what is the difference between $x^{(i)}$ and $x_i$? In other words, is the notation used here confusing or is actually accurate?
It looks like the superscript identifies the sample (both for inputs and outputs). Each sample is a vector, and the components of the vector are distinguished by the subscript.
There appear to be inputs $x^{(i)}$ and outputs $y^{(i)}$. So the training samples are the pairs $(x^{(i)},y^{(i)})$. The inputs and the outputs needn't have the same dimensions.
So $x^{(1)} = (x^{(1)}_1,x^{(1)}_2,\ldots,x^{(1)}_n)$, and $x^{(2)}=(x^{(2)}_1,x^{(2)}_2,\ldots,x^{(2)}_n)$, and so on.
Similarly $y^{(1)} = (y^{(1)}_1,y^{(1)}_2,\ldots,x^{(1)}_m)$, and $y^{(2)}=(y^{(2)}_1,y^{(2)}_2,\ldots,y^{(2)}_m)$, and so on.
Here, $n$ is the dimension of the inputs and $m$ is the dimenion of the outputs.
One might expect a better notation using boldface for the vectors, such as $\mathbf{x}^{(1)}=(x^{(1)}_1,x^{(1)}_2,\ldots,x^{(1)}_n)$.