Echo State Network learning Mackey-Glass function, but how?

528 Views Asked by At

I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:

  • [ What defines | What is] the echo state of an ESN?
  • What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?

First here is a little piece of code that shows the important part of initialization:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

rand('seed', 42);

trainLen = 2000;
testLen  = 2000;
initLen  = 100;
data     = load('MackeyGlass_t17.txt');

%         Input neurons
inSize  = 1; 
%         Output neurons 
outSize = 1;
%         Reservoir size
resSize = 1000;
%         Leaking rate
a       = 0.3; 
%         Input weights
Win     = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
%         Reservoir weights
W       = rand(resSize, resSize) - 0.5;

Running the reservoir:

I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:

The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?

  • If we assume that t was just a time parameter then X(:,t) represents the network state of time t , isn't it?

In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is

  • why is a 1 (I guess it is the bias) and the input value u appended to this vector: X(:,t-initLen) = [1;u;x];

So:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Run the reservoir with the data and collect X.
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%       Allocated memory for the design (collected states) matrix
X     = zeros((1+inSize) + resSize, trainLen - initLen);

%       Vector of reservoir neuron activations (used for calculation)
x     = zeros(resSize, 1);

%       Update of the reservoir neuron activations
xUpd  = zeros(resSize, 1);

for t = 1:trainLen
    
    u    = data(t);
    
    xUpd = tanh( Win * [1;u] + W * x );    
    x    = (1-a) * x + a * xUpd;
    
    if ( t > initLen )
        X(:,t-initLen) = [1;u;x];
    end
    
end

Training part:

The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.

What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.

So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Train the output
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%       Set the corresponding target matrix directly
Yt    = data(initLen+2:trainLen+1)';

%       Regularization coefficient
reg   = 1e-8;  

%       Get X transposed - needed twice therefore it is a little faster
X_T   = X';

%       Yt * pseudo_inverse(X); (linear regression task)
Wout  = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);

Running the ESN in a generative mode:

I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Run the trained ESN in a generative mode. no need to initialize here, 
% because x is initialized with training data and we continue from there.
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Y = zeros(outSize,testLen);
u = data(trainLen+1);

for t = 1:testLen 
    
    xUpd   = tanh( Win*[1;u] + W*x );
    x      = (1-a)*x + a*xUpd;
    
    %        Generative mode:
    u      = Wout*[1;u;x];

    %      This would be a predictive mode:
    %u      = data(trainLen+t+1);

    Y(:,t) = u;
    
end

It works pretty well as you can see (generative mode):

enter image description here

I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.