When considering memory structures in neural software systems, we are primarily dealing with the art of orderly forgetting. A neural system gains hardly any or no information at all from a simple storing of data. The question is not so much: What must we remember? Rather it is: What must we forget when and how?
Normally, neural memory structures are not exact storage devices. On this point, they resemble our human memory. If we are asked, for example, what the air temperature was last Monday, we are most probably unable to give a precise answer. But perhaps we can remember, whether it was warmer or colder than it is today.
The techniques used in the implementation of neural memory structures are very familiar to many followers of technical analysis and technical trading. For many of them use so-called moving averages in the analysis of time series and charts. Few users, however, are aware that a moving average – regardless of how it is calculated – represents a memory structure. Usually, this mathematical tool is used merely as a filter in order to generalize a large amount of information in a meaningful way. For example, if our task is to describe the set of natural numbers between 1 and 10 as precisely as possible using only a single number, the average of these numbers would lend itself for this purpose:
The average indicates the center of the range, within which the individual elements of the set move. In the case of a moving average, a total time series is divided into time windows of length m. This time window is then gradually shifted such that for every point n of the time series with n >= m, an average value of the last m data points is obtained. The parameter m is the span of the moving average.
We obtain a series of values, which for every day n with n >= m states the center of the range, within which the last m data points of the original time series moved. Thus, on every day n, we are able to remember the approximate range of the last m data points. The average values constitute a new time series, which can be represented in a system of coordinates together with the original time series (see Fig. 1).
Since in this procedure a rigid window is shifted across a time series, data points that lie more than m steps back no longer play a role in the calculation. They disappear, so to speak, beyond the reach of memory. Hence, the time span m of the moving average can also be regarded as the extent or depth of memory.
Such a Simple Moving Average (SMA) has several disadvantages. First, it only provides valid values beginning with the data point m. Second, the memory depth is rigid. Data points that disappear beyond the horizon no longer play any role. They are forgotten. Apart from that, a system that proceeds according to this model requires storage space for the m last data points. This can quickly become a problem.
Fig. 1: The curves represented in this graph demonstrate the different characteristics of the various methods. The various types of memory display different reactions to strong and weak impressions. We can see that a Simple Memory on the basis of a Simple Moving Average (SMA) yields little information, since its memory window is rigid. Gamma and Adaptive Memory, on the other hand, show elements of long-term memory in addition to the short-term memory effect. Thus, a look at the starting points of the green and red curves is sufficient proof that the price must have been higher in the part of the past that is not represented, since the memory curves are significantly higher than the price curve.
Our short-term memory is more flexible. Within certain limits, the memory depth is variable. Strong impressions have an effect over longer periods of time, while weaker impressions lose their relevance more quickly. Such behavior can be reconstructed by means of a variant of the calculation of a moving average.
An Exponential Moving Average (EMA) only requires memory for one value and already yields valid values starting at n = 2. For a memory depth P, two exponents are calculated.
With the help of these exponents, a valid value can now be calculated for every data point n > 1 for the Exponential Moving Average.
The so-called Neural Gamma Memory uses precisely this algorithm. The formula has the effect that events of great significance that are further in the past than P will be remembered longer. Their significance slowly wanes, and they disappear from memory only later. Gamma Memory can be formed in a neural network of only two neurons and one synapse, where the synapse must store a signal for the time period between two impressions (see Fig. 2).
Fig. 2: The illustration above shows the flow of information between two neurons in a Gamma Memory structure. The current signal from outside (x) is weakened and supplemented with the likewise weakened value of the memory cell (m). The resulting signal (s) is transmitted as a weighted stimulus (y) to neuron B. Furthermore, the synapse stores the value of s in the memory cell (m). The weighted signal thus contains information about the current stimulus and the stimulus level of the past.
If we consider the flow of information in Gamma Memory (see Fig. 2), it becomes clear that there is an essential difference between the memory structures of our brain and those of an artificial neural network. For Gamma Memory, there are only fixed temporal steps – from one stimulus to the next. This does not present a problem in the analysis of time series, since here two data points in the time series normally have the same temporal interval. In our brain, the factor of time plays a more significant role. The stored signal in the memory cell (m) weakens in dependence on time. Without subsequent reinforcing stimuli on m, the energy level in m will drop. If it drops to zero, we have forgotten the stimulus.
In the analysis of time series of fixed temporal steps by means of neural software components, Gamma structures provide a relatively good emulation of memory of varying memory depths.
Gamma Memory demonstrates that algorithms for the implementation of memory structures can be reduced to the formulas for the calculation of moving averages, as long as the temporal intervals between two stimuli remain constant.
Accordingly, it is also possible to establish memory structures in neural networks, which feature the properties of Adaptive Moving Averages (AMA) or of Moving Linear Regression (MLR). For our systems of analysis, we have developed special memory structures based on regression methods, which combine these structures with features of Adaptive Memory. In this way, the Ivorix Adaptive Regression Memory implements a hybrid form of long- and short-term memory with predictive effects. This technology gives neural software components the ability of forming intelligent reactions on the basis of experiences in the more recent as well as in the more distant past.
Let us bring in once more an example from daily life: if the sky suddenly darkens on a summer day, we suspect that there will soon be a thunder storm. Gamma Memory is only able to supply the information that it is now darker than during the past few hours. Ivorix Adaptive Regression Memory, by contrast, can deliver a signal that warns: it will rain soon.
This is precisely the point where the difference between the retrieval of stored information and actual artificial intelligence becomes obvious. The mere flow of stimuli through the neural network suddenly gains a significance that makes it possible for the neural system to react in a meaningful way.