Normalization procedures are mathematical transformations of quantities (time series, vectors). In this type of operation, a vector is transformed from one value domain into another value domain in such a way that the output vector features certain statistical properties such as certain limit values, a certain variance, standard deviation or even a desired average.
A straightforward example for the necessity of such a transformation is the representation of a real time series in a chart. Suppose we want to represent the development of the price of a stock over a certain period by means of a chart. Within the period to be represented, the price fluctuates between $113.29 and $241.05. For generating our chart, we have an empty graph of 640×480 pixels. The system of coordinates in the case of computer output devices such as printers or displays is based on natural numbers, since these devices assemble the images from indivisible pixels. As a result, it is not possible to represent the price of $113.29 exactly on such a device, even if – as in our case – the value domain of the price lies within the value domain of the pixel coordinates. Two transformations would be desirable for the representation of this chart:
- The prices (real numbers) would have to be converted to the natural numbers of the system of coordinates of the output device in such a way that the proportions are preserved as precisely as possible.
- Since we shall want to utilize as much of the available output area as possible for the representation, the domain [113.29, 241.05] should also be scaled to the coordinate domain [0, 480].
Both transformations can be carried out in one step by means of a suitable normalization procedure, here: normalization into an interval. In the normalization, the input vector x is converted into a normalized output vector x’. That is to say, each element of the input vector is converted into an element of the output vector
stands for the normalization function. This function can be implemented in various ways, depending on what type of value domain the transformed vector should occupy and what types of statistical properties it should exhibit. Here, we would like to present five frequently used normalization procedures – mathematically as well as graphically and in C++ code.
- Normalization into an Interval
- Normalization to Zero Mean
- Normalization to Zero Mean and Unit Standard Deviation
- Normalization to Sum 1
- Normalization to Euclidean Norm 1
The transformation is not only dependent on the values in x, but also on the statistical properties of x such as value domain limits, variance and mean. For example, if we add to vector x an element that lies outside of the value domain of the other elements in x or that noticeably alters the variance in x, this has far-reaching effects on the values and statistical properties of the output vector.
This presents a problem in the case of time series analyses that are supplemented daily with new observational data. Thus, for example, the input vectors of a neural software component must be normalized such that their values lie within a range »visible« to an artificial neuron and that certain statistical boundary conditions necessary for the learning process are fulfilled.
The output of such a neural component, however, depends largely on the transformed elements of the input vectors. In practice, the input vectors are normalized anew following each additional observation (e.g. a new price in a historical series of prices) and are transmitted through the neural system. The input vector thus generated, however, may deviate noticeably from the previous vector such that the signals issued for past data points no longer agree with those that were in fact issued earlier.
This creates a problem for the historical evaluation of systems that are dependent on normalized data. It is difficult to draw conclusions from a historical evaluation of such a system to its practical application, if the statistical properties and absolute values within the input vectors are subject to daily fluctuations.
Ivorix’s processes of data preparation for neural software components take this fact into account by keeping values and statistical properties constant until a component is retrained. This procedure is a precondition for a fair historical assessment of any neural model.
Users of other neural software systems should inquire whether this condition, which is tremendously important for practical application, is taken into consideration in the systems they use.