There is a close correspondence between the behavior of self-organizing neural networks and the statistical method of principal component analysis. In fact, a self-organizing, fully interconnected 2-layer network with i inputs and j outputs (with i > j) can be used to extract the first j principal components from the input vector, thus reducing the size of the input vector by j-i elements. With j=1, such a network acts as a maximum eigenfilter by extracting the first principal component from the input vector. We want to start the discussion of the underlying technique with reference to this simplest case.
Fig. 1: Hebbian-based maximum eigenfilter. The nth element of input vector consisting of m inputs is transformed by the weight vector into . For clarity: m is the number of inputs, n the number of training samples.
The algorithm used to train the network is based on Hebb’s postulate of learning. It states that a synaptic weight varies with time, growing strong when the presynaptic signal and the postsynaptic signal coincide with each other (see Fig. 1). Assuming that all nodes use linear activation functions and no biases are applied such that
we may write
Here n denotes the number of training samples, and the learning rate. The attentive reader will notice that the unconstrained use of this learning algorithm would drive to infinity because the weight would always grow but never be decreased. In order to overcome this problem, some sort of normalization or saturation factor needs to be introduced. A proportional decrease of by a normalization term introduces competition among the synapses, which, as a principle of self-organization, is essential for the stabilization of the learning process.
Including a normalization term into Eq. 2, we can rewrite the learning rule as
where the addition in the denominator covers all of the synapses connected to the output neuron. For a low rate of learning , this equation can – without much loss of precision – be simplified to
The first two terms in Eq. 4 represent the usual Hebbian modification to . They account for the self-amplification effect responsible for the self-organizing nature of the described infrastructure. The second term prevents an unlimited growth of and is responsible for stabilization. It transforms into a form1 that is dependent on both the synaptic weightand the output .
During the course of training, we present all n elements of the input vector x to the small network. Initializing the weights to small positive values and choosing appropriately small, the algorithm converges quickly to its final state. The estimated weight vector w can be frozen by suspending further adaption. We may now feed the input vector x into the network and get a transformed time series from the output neuron: the first principal component of the input vector.
Combining several maximum eigenfilters as proposed above into one 2-layer network and implementing competition between the output nodes, we can now generate an infrastructure able to perform a complete Hebbian-based PCA (see Fig. 2).
Fig. 2: Principal Component Analysis (PCA) using a Hebbian 2-layer network. The input and output nodes use linear activation functions and are fully interconnected with Hebbian links. The unsupervised Hebbian learning algorithm extracts the first j principal components from input vector x using the weight vector w. The resulting output vector y may then be used as the input vector for a regular feedforward network. In our implementation, the PCA component and the regular computation component are combined as symbionts into one network.
Now, the synaptic weight is adapted in accordance with a generalized form of Hebbian learning:
The adjusted weight is calculated as
where the vector consists of all the synaptic links connected to neuron j. In contrast, the vector represented by in Eq. 9 is the vector of all synaptic links from neuron i to all output neurons. Using the notation as before, we can rewrite Eq. 7 as
where again denotes a modified version of the ith element of the input vector which is now a function of the index j
Going one step further, we can define:
and rewrite Eq. 7 in a form that corresponds to Hebb’s postulate of learning
In contrast to the maximum eigenfilter which requires a forward information flow for the backward adaption, the concurrent implementation for the PCA network requires an additional backward information flow. The algorithm can be split into two parts:
- The input vector is passed forward through the network according to Eq. 9.
- During a feedback pass, we iterate over all neurons in the output layer and their backward links2 in order to calculate for the particular link.
- The weights are updated according to Eq. 8.
The Hebbian-based maximum eigenfilter as well as the PCA network require the input vector to be normalized to the zero mean. Additionally, we found a normalization to the unit standard deviation useful. The reduced output vector inherits the zero-mean feature. However, the standard deviation of the elements of varies.
In general, the algorithm converges very quickly. There is, however, one serious problem: errors in the input vector (e. g. a forgotten split for a particular price) may cause some weights to break through the constraint mechanism and grow to infinity, which as a result destroys the PCA filter. The implementation should throw an exception when a particular weight grows above an appropriate upper limit.
- We introduce the use of in order to simplify the derivation of the more complex case of a complete Hebbian-based PCA.
- From a programmer’s point of view, and should be seen as variables. The proposed algorithm acts on the link level and should be a method of the object representing the synaptic link. The variable is a property of neuron i and is set to zero before the feedback pass starts; the variable is temporary, introduced merely in order to avoid multiple calculations of .