»An approximate solution to an exact problem is not the same
as an exact solution to an approximate problem.«
The great majority of today’s systems of »artificial intelligence« merely reproduce solutions previously laid out for them by human beings. Good teachers, however, teach their students how to learn by furnishing an environment that allows them to apply and exercise their abilities and special talents. Through careful guidance, the teacher promotes the full development of the student’s potential. A look at the current state of development of neural Artificial Intelligence (AI) applications reveals that, as teachers of these systems, humans do not exactly cover themselves in glory.
Have you ever wondered why computers cannot think (yet)? There is an – admittedly provocative – answer to this question: Because we do not allow it (yet).
The Problem of Classical Neural Networks
For over 15 years, neural networks have been used in various fields of application – from optical character recognition to the analysis of time series. Encouraged by successful developments in optical character recognition and other classificational problems, the relevant training algorithms were transferred to other fields almost without modification and often with dubious success.
While the task in optical character recognition is one of making reliable classifications and of recognizing patterns, a completely different set of tasks is at play in the use of neural networks in control engineering or in financial analysis.
Let us consider first the problem of optical character recognition. The task here is one of translating from one type of information (image) into another type of information (e.g. a certain letter). A capacity of memory with a certain degree of fuzziness plays the most important role in this regard. In the training process, the network is presented with a stimulus in the form of an image along with the desired response – for example: »This was an E«.
If we consider problems of control engineering or forecasting, on the other hand, different desirable properties come into the focus. Here, the neural network must learn the mapping from a certain input to a certain output in terms of a mathematical model, where the external stimulus formulated as a vector of real numbers (inputs) causes a reaction, which likewise consists of an individual real number or a vector of real numbers (outputs). This fundamental difference is accounted for by the choice of a different topology for the network, one that is suitable for representing such a conversion as a continuous function. Apart from that, however, very similar training methods are employed. The network is presented with a stimulus along with the desired output, and the error that occurs serves as the basis of an adjustment of the internal network parameters. Thus, the training aims at achieving the best approximation of a solution provided by the teacher.
The electronic miniature brain is forced to learn something by heart, as it were, in order to react as precisely as possible to a certain input with the desired output. Afterwards, control algorithms are used in order to fight this overly precise memorization: a dubious procedure, to say the least.
Does the storage of information equal intelligence? – If we understand intelligence as the ability to acquire information and knowledge and to use these purposefully in familiar and unfamiliar surroundings in solving a problem, it is clear that only one aspect is emphasized in the training of neural networks: the ability to acquire information. The conversion of this information into living knowledge, however, already presents a difficulty. And the situation is entirely problematic, if we look for characteristics such as creativity and flexibility in neural systems.
The Small Error as the Great Goal
The minimization of the error between the desired and the actual output of the network plays a central role in neural training algorithms. The human »teacher« makes two assumptions in this regard:
- The best possible solution – which the network is to approximate – is known; and imitating the teacher will lead to the goal.
- A small error (small deviation from the desired output) brings us closer to the goal.
Upon closer examination, however, both assumptions are questionable.
Let us consider the problem in terms of an example from the area of finance: by way of forecast, a neural network is designed to support our trading in a security. To this end, the network should forecast the daily return (percentage deviation) of the security for the following trading day. The conventional approach uses the actual percentage deviation on the following day as a training sample – in the hope that through a continuous and gradual adjustment of parameters, a stable mapping of (known) inputs to (unknown) outputs will form within the neural mesh. Such a straightforward function exists in the case of the prediction of a subsequent point of a sine curve. But it is highly improbable that such a function exists, if the target to be predicted is the result of a complex basic process that is partly chaotic and/or contains a lot of noise (as is often the case with financial data).
The network has a way of dealing with training samples that do not fit into an approximation function. Areas of the neural mesh that do not contribute to the general approximation function store the response to the inadequately representable sample. These areas are thus overspecialized in this one sample (or in these few samples) – a phenomenon known as “overfitting”. This way such a network can achieve good results in the training environment. The sample learned by heart, however, does not help in proving the system’s worth with respect to unlearned samples – the proper task of such a forecasting network. On the contrary: since in an artificial neural mesh – in contrast to our brain – an external stimulus flows through all regions of the network, these overspecialized regions can later cause interference in practical use and lead to significant errors.
Overly precise imitation leads to overspecialization or overfitting. This could be prevented through the use of training scenarios and algorithms that do not aim at imitation by presenting one and only one solution as an example.
An alternative is the formulation of a – to some degree fuzzy – solution space that offers sufficient latitude for various approaches toward a solution. Here it is assumed that we do not know the optimal solution for a certain problem. This can be illustrated with the example of a one-day forecast of the movement of a stock – a forecast, with which we are pursuing a goal, i.e. of trading profitably. Let us formulate the problem as directly as it is presents itself to us: the network is to issue trading recommendations that guarantee a stable monthly return above the money market rate. In this formulation, the pure one-day forecast is now only one – and perhaps a small – part of the solution; and many different approaches lead to our goal.
The approach of error minimization is problematic for another reason as well. Let us consider the following chart:
Fig. 1: The chart shows three data points of a time series (solid line) as well as forecasts of two neural models (dashed and dotted lines) issued at the respective points in time. Determining the forecast errors of the models, we see that both models display the exactly same error: at every data point, the deviation of both model forecasts from the actual observation is of equal magnitude. Nevertheless, Model 1 is obviously more precise, for it matches the course of the curve precisely, deviating from the observation only in terms of the level. Conventional training algorithms, however, assess both models as equivalent.
The example shows very clearly how an error assessment that is mathematically correct can often fail to meet the requirements of the user. Let us imagine that the pictured series represents the course of the price of a stock and that we want to trade on the basis of the forecasts. If we follow the forecasts of Model 2, we will incur losses on each of the three days. Even though the error of Model 1 is of the same magnitude, with the forecasts of this model, by contrast, we would make a profit.
Thus, in the traditional applications of neural networks, there are at least two mistaken assumptions at the very beginning of the learning process. Provocatively, one could say that we are underrating the actual abilities of the neural networks. We are not formulating the problems in terms of the complexity that they actually possess (e.g. in terms of trading strategy instead of forecast).
In human learning, the promotion of learning includes the call or demand to learn. This demand is in the first instance a challenge to the teacher. For the teacher must furnish a suitable learning environment and admit intelligent solutions. Otherwise, such solutions cannot evolve.
Evolution as the Driving Force
We encounter a very similar problem in the use of evolutionary approaches in connection with neural networks. Genetic algorithms are primarily used to find optimal topologies for networks. In the evaluation of the fitness of a certain topology, however, the same questionable error assessments are used as in the process of training with the result that what prevails is not intelligence but only the best imitator of the teacher.
Especially in evolutionary environments, pure error assessments fall even deeper into the trap of overfitting. If we expect selection and specialization to result in the promotion of artificial intelligence in a learning environment modelled on evolutionary principles, we first have to invest our intelligence and set up criteria of evaluation that reward and promote creative and generalizing approaches.
Lack of Courage
The currently established technologies in the area of neural networks betray a lack of courage. The artificial neural meshes contain many more possibilities than have been elicited from them so far. Human beings – developers and teachers of the networks – actually hinder the optimal development of the potential inherent in this technology. Ironically, not only do scientists and developers struggle with these problems but so do many potential users …
The reason for this lies in archetypal patterns of behavior. There is the fear of losing control, the fear of the artificial »other«, which solves problems very differently and perhaps much more efficiently than we do. A sense of superiority also plays a role: the creature must not surpass its creator. A solution, which we do not completely understand, is not a useful solution, even if it fulfills all other requirements.
Is it perhaps the fear of the HAL phenomenon(1 that effectively leads us to hinder the development of artificial intelligence, even as we begin to use systems capable of artificial intelligence?
The focus of the research and product development of Ivorix is evolving software: programs and program components that are able to learn, to improve themselves and to adapt to changing environmental conditions. In order to be able to develop such intelligence, the applications must perceive and understand their surroundings. They must develop memory structures and a capacity of abstraction that allow them to distinguish between suitable and unsuitable solutions.
The goal of our methodology in the field of neural networks is to admit the unknown. In order to make possible and promote the development of actual artificial intelligence, the problems are formulated in the way they actually present themselves in practical application. In this approach, only indispensable properties of the desired solution are described, but not the path toward the solution of the problem – if it is even known.
The technological advances include more suitable error assessments and above all modified training scenarios that demand of the network generally suitable solutions. Methods of regularization, some of which have been rendered practically useful for the very first time, serve merely as “gentle indicators” of direction. And an evolutionary environment consistently oriented towards the promotion of suitably generalizing models makes “life” difficult for mere imitators.
This environment allows for the development of neural software components that do not merely rehearse for us what we already know, but that present solutions and approaches to solutions for a variety of problems that were considered intractable or impossible to solve.