I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:
- [ What defines | What is] the echo state of an ESN?
- What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?
First here is a little piece of code that shows the important part of initialization:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand('seed', 42);
trainLen = 2000;
testLen = 2000;
initLen = 100;
data = load('MackeyGlass_t17.txt');
% Input neurons
inSize = 1;
% Output neurons
outSize = 1;
% Reservoir size
resSize = 1000;
% Leaking rate
a = 0.3;
% Input weights
Win = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
% Reservoir weights
W = rand(resSize, resSize) - 0.5;
Running the reservoir:
I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:
The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?
- If we assume that
twas just a time parameter thenX(:,t)represents the network state of timet, isn't it?
In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is
- why is a
1(I guess it is the bias) and the input valueuappended to this vector:X(:,t-initLen) = [1;u;x];
So:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the reservoir with the data and collect X.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Allocated memory for the design (collected states) matrix
X = zeros((1+inSize) + resSize, trainLen - initLen);
% Vector of reservoir neuron activations (used for calculation)
x = zeros(resSize, 1);
% Update of the reservoir neuron activations
xUpd = zeros(resSize, 1);
for t = 1:trainLen
u = data(t);
xUpd = tanh( Win * [1;u] + W * x );
x = (1-a) * x + a * xUpd;
if ( t > initLen )
X(:,t-initLen) = [1;u;x];
end
end
Training part:
The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.
What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.
So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Train the output
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Set the corresponding target matrix directly
Yt = data(initLen+2:trainLen+1)';
% Regularization coefficient
reg = 1e-8;
% Get X transposed - needed twice therefore it is a little faster
X_T = X';
% Yt * pseudo_inverse(X); (linear regression task)
Wout = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);
Running the ESN in a generative mode:
I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the trained ESN in a generative mode. no need to initialize here,
% because x is initialized with training data and we continue from there.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = zeros(outSize,testLen);
u = data(trainLen+1);
for t = 1:testLen
xUpd = tanh( Win*[1;u] + W*x );
x = (1-a)*x + a*xUpd;
% Generative mode:
u = Wout*[1;u;x];
% This would be a predictive mode:
%u = data(trainLen+t+1);
Y(:,t) = u;
end
It works pretty well as you can see (generative mode):

I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.