In our today’s material, we will remind readers of the concept of an artificial neural network (ANN), as well as what they are, and consider the issues of solving the forecasting problem using ANNs in general and recurrent ANNs in particular.
Neural networks
First, let’s remember what an artificial neural network is. In one of the articles, I’ve seen a great explanation of one type of NN, convolutional, check it before reading further.
ANN is a network of artificial neurons (“black box” with multiple inputs and one output) that transforms a vector of input signals (data) into a vector of output signals using a function called an activation function. At the same time, at least one intermediate layer is present between the layer of “receiving” neurons and the output layer.
The type of structure of the ANN defines the concept of feedback: thus, in the ANN of direct propagation, the signal goes sequentially from the input layer of neurons along with the intermediate ones to the output; the recurrent structure implies the presence of feedbacks when the signal from the output or intermediate neurons partially arrives at the inputs of the input layer of neurons (or one of the external intermediate layers).
Recurrent neural networks
If we dwell on recurrent ANNs in a little more detail, it turns out that the most modern (and considered the most “successful”) of them originate from a structure called a multilayer perceptron (a mathematical model of the brain – ANN of direct propagation with intermediate layers). At the same time, since their inception, they have undergone significant changes – and the ANNs of the “new generation” are much simpler than their predecessors, despite the fact that they allow you to successfully solve the problems of memorizing sequences. So, for example, the most popular today Elman’s network is designed in such a way that the return signal from the inner layer goes not to the “main” input neurons, but to additional inputs – the so-called context. These neurons store information about the previous input vector (stimulus); it turns out that the output signal (network response) depends not only on the current stimulus, but also on the previous one.
Solution of the forecasting problem
It is clear that Elman networks are potentially suitable for forecasting (in particular, time series). However, it is also known that feedforward neural networks successfully cope with this task – although not in all cases. As an example, we propose to consider one of the most popular variations of the forecasting problem – time series forecasting (VR). The problem statement is reduced to the choice of an arbitrary VR with N samples. Then the data is divided into three samples – training, testing, and control – and fed to the input of the ANN. The result will be presented as a time series value at the required time.
Why recurrent ANNs?
It is clear that the decision on the topology of the ANN can influence the result; But back to the beginning of the conversation: why did we deliberately choose to forecast using a recurrent network as the topic of this article? After all, if you “google”, prediction of VR in works is usually done using multilayer perceptrons (we remember that these are feedforward networks) and the backpropagation method. It is worth clarifying here: yes, indeed, in theory, such ANNs solve the forecasting problem well – provided that the degree of noise (errors and gaps in the input data), for example, of the original time series, is minimal.
In practice, the time series are quite noisy, which naturally causes problems when trying to forecast. The use of collections of feedforward networks allows to reduce the degree of error – however, this significantly increases not only the complexity of the structure itself, but also the time of its training.
The use of Elman’s recurrent network allows solving the forecasting problem even on highly noisy time series (this is especially important for business). In the general case, this ANN is a structure of three layers, as well as a set of additional “context” elements (inputs). Feedbacks go from the hidden layer to these elements; each bond has a fixed weight of one. At each time interval, the input data is distributed among neurons in the forward direction; then shoes are applied to them the next rule. Thanks to fixed feedbacks, contextual elements always store a copy of the values from the hidden layer from the previous step (since they are sent in the opposite direction even before the training rule is applied). Thus, the noise of the time series is gradually leveled out, and along with it the error is minimized: we get a forecast that, in the general case, will be more accurate than the result of the classical approach, which is experimentally confirmed by Western works.
Conclusion
It is clear that the decision on the topology of the ANN can influence the result; But back to the beginning of the conversation: why did we deliberately choose forecasting using a recurrent network as the topic of this article? After all, if you “google”, prediction of VR in works is usually done using multilayer perceptrons (we remember that these are feedforward networks) and the backpropagation method. It is worth clarifying here: yes, indeed, in theory, such ANNs solve the forecasting problem well – provided that the degree of noise (errors and gaps in the input data), for example, of the original time series is minimal.
Having considered some aspects of the practical application of neural networks to solving the forecasting problem, we can conclude that the future of forecasting belongs to the recurrent model. At least, this applies to noisy time series – and, as you know, in practice, especially in business, one cannot do without inaccuracies and gaps in data. Western science, and after it, and enthusiastic practitioners have already understood this. In the post-Soviet space, the general public has yet to reach these conclusions – we hope that this material will help our readers draw their conclusions today.