Betriebswirtschaftliche Publikationen - Gottfried Rudorfer's Homepage

Master Thesis of Gottfried Rudorfer

2010-09-13T19:26:51+00:00

Master Thesis of Gottfried Rudorfer

Title: Anwendung künstlicher Neuronaler Netzwerke als datengetriebenes Analysewerkzeug.

The master thesis is physically available at the library of the University of Economics and Business administration http://permalink.obvsg.at/wuw/AC00743182 .

The master thesis is available here.

Early Bankruptcy Detection Using Neural Networks

2010-09-13T18:42:04+00:00

Gottfried Rudorfer, Early Bankruptcy Detection Using Neural Networks,
APL Quote Quad, ACM New York, Volume 25, Number 4, June 1995, pages 171-178

Early Bankruptcy Detection Using Neural Networks

Gottfried Rudorfer

Department of Applied Computer Science

University of Economics and Business Administration

Augasse 2-6

A-1090 Vienna

Austria

gottfried@rudorfer.homedns.org

Download the PDF-version here.

Abstract

In 1993, Austria had the highest number of bankruptcies since 1945. The total liabilities came to approx. US$ 3 Billion.

Powerful tools for early detection of company risks are very important to avoid high economic losses. Artificial neural networks(ANN) are suitable for many tasks in pattern recognition and machine learning. In this paper we present an ANN for early detection of company failures using balance sheet ratios. The network has been successfully tested with real data of Austrian private limited companies. The research activities included the design of an APL application with a graphical user interface to find out the relevant input data and tune the ANN.

The developed APL workspace takes advantage of modern windowing features running on IBM compatible computers.

Keywords Artificial Neural Networks, Backpropagation, Discriminant Analysis, Bankruptcy, Balance Sheet Ratios, APL.

Motivation

Neural nets are currently being used to solve problems in many different research areas. Generally, they tend to assist or become an alternate way for traditional statistical and mathematical models.

Their main practical applications in economics have been for time series forecasting (i.e. [KR94]) and classification tasks (i.e. [TK90]).

Banks usually check the creditworthiness of companies to find the maximum amount of credit they are prepared to grant. The models used are formulated as a classification problem in a multidimensional space defined by a set of financial ratios calculated using the balance sheet.

In this paper we try to use neural networks as replacement for the widely used statistical discriminant analysis to early detect company failures. With this approach we bypass the disadvantages and problems with statistical discriminant analysis:

The major problem with statistical discriminant analysis is the assumption of a multivariate normal distribution of the sample data. This assumption is very often violated in practical problems.

We use multilayer feedforward networks also known as multilayered perceptrons (MLPs), because they are a class of universal approximators.

There exists a formal proof that "standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired accuracy, provided sufficiently many hidden units are available". [HSW89].

The networks under investigation have the following properties:

the net consists of many simple processing devices grouped into layers
one input/output pair is randomly selected and passed to the artificial neural network after transforming the data
the activation levels are propagated starting at the input layer through the hidden layers to the output layer
the activation level of the neurons in the output layer is fed to the environment
at least on hidden layer is necessary to be able to use the net as a universal approximator

Discriminant analysis tries to find a linear subspace of the given patterns such that projections of the patterns onto the subspace are grouped into well separated clusters. In our case, balance sheet ratios are used to assign companies into two clusters, one of sound and one for insolvent companies.

The relationships between statistical discriminant analysis and multilayered perceptrons (MLPs) shows the evidence of generic properties of MLPs classifiers [GTBFS91]. In other words, linear MLPs can be trained to perform mean square classification to discriminant analysis. The authors of the cited paper [GTBFS91] first concentrate on linear MLPs with one hidden layer. Such networks have two weight matrices W1 for the weights from the input-to-hidden and W2 for the weights from the hidden-to-output layers. These two weight matrices are adapted by the backpropagation learning algorithm. If this algorithm minimizes the error between desired and network output vectors, the first layer of the MLP performs discriminant analysis projection using the weights from input-to-hidden W1 and the second layer performs a classification on the output of the hidden units using the weights from hidden-to-output layers.

In this application neural nets perform statistical analysis of data. Therefore they require a large set of data to yield optimal estimation of the parameters. This also implies that the number of given patterns is larger than the dimension of the space in which they sit. Each cluster should be described by enough patterns belonging to that cluster.

Implementation

The heart of your forecasting system is a "multilayer feedforward network", also known as "multilayered perceptron" or "feed forward networks" trained with the "generalized delta rule", also known as "backpropagation" [RHW86]. For a detailed description of this type of network see [HKP91] and [Nil90] and their APL implementation see [KR94], [Alf91], [ES91], [Pee81] and [SS93]. It was fist implemented using Dyalog APL version 6.1 release 3 on HP9000/700 workstations [Dya91]. The code implementing the graphical user interface (GUI) was then ported to Dyalog APL Version 7.0 release 1 for MS-Windows on IBM compatible PCs [Dya94].

Figure 1: User interface of the simulator

The system consists of the following components:

The runtime version of the APL interpreter. With this APL dialect the vendor allows distribution of the runtime interpreter version free of charge.
The APL ANN workspace implements the classification system. The functions in the workspace are grouped into the following namespaces:
- Functions that create graphical objects like a window showing the multilayered perceptron network (see Figure 1). Often other than default properties are specified in GUI functions. Properties define an object's behavior, appearance and events that the object can generate. For each object able to generate events, a "callback function" can be assigned that is called when the corresponding event occurs. Events are very important for an application with a graphical front-end because they define how the objects react to the user. As an example, we show the APL code that allows the user to move the processing devices within the window.
- Functions for component file input and output. Because the execution time of such simulations is very high, logging facilities of the application must be provided. The application takes extensively advantage of nested vectors and matrices. With the component filesystem, APL variables can be created/appended directly to the hard disk.
- The last but not least part is the APL artificial neural network-code as described in [KR94]. The code consists of three individual parts: During network training, a main loop is used to present many input patterns to the artificial neural network. Within the loop there is a function that reads in and transforms the training data. The "FORWARD" function propagates the input pattern through the hidden layer(s) to the output layer. After the output value of the net is known, an error is calculated and the weight change is propagated backwards through the hidden layers to the input layer using the "BACKWARD" function.

The system is generally controlled via the graphical frontend because there is no session manager in the runtime version of the interpreter available.

Figure 1 shows the window of the "Multiple-Document Interface (MDI)" application. The MDI window consists of a menu bar and a tool bar at the top and a status field at the bottom. Within the main window, there are windows for general control of the system and for displaying details of the artificial neural network. The backprop view shows the activation level of the neurons (circles) and the weight values (lines). The window dump shows a net with 5 input, 3 hidden units and 1 output unit (5-3-1). A big filled circle indicates that the neuron fires and a transparent circle indicates (almost) no output of the neuron. The size of the circle is proportional to the output of a neuron. The lines connecting to each neuron from the previous layer are the graphical representation of the weight values. These can be negative or positive real numbers, which are encoded in two different colors. Again, a thick line represents a weight with a high negative or positive value.

Modeling

The Training Data

In order to indicate more practical relevance, we decided to use real data. First, we studied the insolvency statistics of the year 1993 [Hie93]:

total amount of insolvencies: 5,082 (+38 %)
total amount of obligations: US$ 2.9 Billion
17,000 employees evolved

In 1994, the "Atomic" company, one of the world's biggest producer of skiis became insolvent. Until now, the year 1994 seems to be another bad year of many insolvencies.

If we split up the total number of insolvencies according to types of company, we get the following statistics [Hie93]:

type of company	percentage of total
sole trader/partnership	33.87
unlimited companies	1.32
private limited companies	50.66
public limited companies	0.44
other	13.71

Table 1: Which company types have a high risk to become insolvent?

Another important information is the membership of an individual company in the branch of industry. If we consider the branch of industry, the classification capability will probably be better. The metal-processing, paper, building and construction and the textile industry are the most jeopardized area of business in Austria.

After studying the possible sources of data about insolvent and sound companies, we decided to only survey private limited companies with a minimum turnover of US$ 30,000. Firstly private limited companies tend to become more often insolvent than other company types. Secondly private limited companies with more than ATS 1 Million common share or more than 300 employees have to publish their balance sheet.

What relationships between balance sheet items should be calculated to best describe the financial position of a business firm?

We decided to calculate the following five financial ratios to train our artificial neural network[Ble85]:

cash flow / liabilities
quick (current) assets / current liabilities
quick (current) assets / total assets
liabilities / total assets
profit or loss / total assets

As test bed for our simulations we used 82 balance sheets, 59 of sound and 23 of insolvent companies. In order to average statistical outliers, we tried to get balance sheets of three successive years from each business firm. The 82 balance sheets were grouped in 62 training- and 20 test patterns.

The following figures show the distribution of the parameters:

Figure 2: Distribution of balance sheet ratio cash flow / liabilities

Figure 3: Distribution of balance sheet ratio quick (current) assets / current liabilities

Figure 4: Distribution of balance sheet ratio quick (current) assets / total assets

Figure 5: Distribution of balance sheet ratio liabilities / total assets

Figure 6: Distribution of balance sheet ratio profit or loss / total assets

The Algorithm

During the learning phase a randomly selected input/output pair is presented to the network. The input/output data is transformed into a range that the ANN can process. The output values are set to 1 for insolvent companies and to 0 for all others. The simulation environment helps the researcher to find a well performing network. Typically a session is divided into the following steps:

Select a file for saving the network status in the file box menu
Load the training data with the input/output pattern selection menu
Set the network parameters (learning rate, momentum term, output function, maximum number of iterations) in the control menu
Start learning
Load test patterns and test the network

We have conducted many experiments to find the size of a well performing network. Table 2 and Figure 7 show the results of different network topologies in the test data sample after 5,000 learning iterations, using a learning rate of 0.3 and a momentum of 0.8.

Table 2: Results for various network topologies

Figure 7: The performance of different network topologies

Because the network output has values between 0 and 1, we have to introduce a threshold to assign a company to the insolvent or sound cluster. For a threshold of 0.5, the 5-3-1 and 5-5-1 networks have the best performance with one wrong classification of an insolvent company (number 18).

We have chosen the smaller network for closer inspection.

Figure 8: Graphical view of the found 5-3-1 network

The weight matrix between input (L0) and hidden (L1) layer is given in Table 3:

Table 3: The weight matrix between input and hidden layer

The weight vector between hidden (L1) and output (L2) layer is shown in Table 4:

Table 4: Weight vector between hidden and output layer

The Data was transformed with linear functions (see Table 5).

Table 5: Minimum and maximum values of original and transformed data

One neuron in the hidden layer is a detector for a sound and another for an insolvent company. The third neuron has a very small impact on the final result because of small weight values.

A company seems to be in danger when the ratio liabilities / total assets or quick assets / assets has a high positive value. On the other side, sound companies have small liabilities / total assets and quick assets / assets ratios.

Dyalog APL GUI Code

In this paper we want to present a piece of code, that implements the graphical user interface (GUI). The following functions implement the neuron (circles in figure) movable functionality. This enables the user to reposition the processing devices in the "backprop view" window. The circle is redrawn by APL itself. Only the weights (lines) connected to the neuron needs to repositioned. This is done with the function MOVE.

Conclusion

In this paper we have presented an APL tool for early detection of company failures using neural networks. This computing devices proved themselves to be a viable alternative to discriminant analysis.

With this workspace we were able to find networks for detection of company failures. Graphical visualization helps understanding complex relations that exist in neural nets.

Further work will include the improvement of the APL product and a detailed documentation of the user interface.

References

[Alf91] M. Alfonseca. Advanced applications of APL: logic programming, neural networks and hypertext. IBM Systems Journal, 30(4):543-553, 1991.

[Ble85] Ernst Bleier. Insolvenzfrüherkennung mittels praktischer Anwendung der Diskriminanzanalyse. Service Fachverlag an der Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Vienna, Austria, 1985.

[Dya91] Dyadic Systems Limited, Riverside View, Basing Road, Old Basing, Basingstoke, Hampshire RG24 0AL, England. Dyalog Apl Users Guide for version 6.1, 1991.

[Dya94] Dyadic Systems Limited, Riverside View, Basing Road, Old Basing, Basingstoke, Hampshire RG24 0AL, England. Dyalog Apl User Guide, Language Reference, Windows Interface and Outer Products for Version 7.0, 1994.

[ES91] Richard M. Evans and Alvin J. Surkan. Relating Numbers of Processing Elements in a Sparse Distributed Memory Model to Learning Rate and Generalization. ACM APL Quote Quad, 21(4):166-173, 1991.

[GTBFS91] P. Gallinari, S. Thiria, F. Badran, and F. Folgelman-Soulie. On the Relations Between Discriminant Analysis and Multilayer Perceptrons. Neural Networks, 4:349-360, 1991.

[Hie93] Klaus Hierzenberger. Bericht zur Insovenzstatistik 1993. Kreditschutzverband von 1870, 1993.

[HKP91] John Hertz, Anders Krogh, and Richard G. Palmer. Introduction to the Theory od Neural Computation. Addison Wesley, Redwood City, California, 1991.

[HSW89] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer Feedforward Networks are Universal Approximators. Neural Networks, 2:359-366, 1989.

[KR94] Thomas Kolarik and Gottfried Rudorfer. Time Series Forecasting Using Neural Networks. ACM APL Quote Quad, 25(1):86-94, 1994.

[Nil90] Nils J. Nilsson. The Mathematical Foundations of Learning Machines. Morgan Kaufmann Publishers Inc., San Mateo, 1990.

[Pee81] Howard A. Peele. Teaching A Topic in Cybernetics with APL: An Introduction to Neural Net Modelling. ACM APL Quote Quad, 12(1):235-239, 1981.

[RHW86] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by backpropagating errors. Nature, 323(9):533-536, October 1986.

[RW94] Gottfried Rudorfer and Harald Wenisch. Isolvenzprognose mit Künstlichen Neuronalen Netzen. University of Economics and Business Administration, Augasse 2-6, 1090 Wienna, Austria, 1994.

[SS93] Alexei N. Skurihin and Alvin J. Surkan. Identification of Parallelism in Neural Networks by Simulation with Language J. ACM APL Quote Quad, 24(1):230-237, 1993.

[TK90] Kar Yan Tam and Melody Kiang. Predicting Bank Failures: A Neural Network Approach. Applied Artificial Intelligence, 4:265-282, 1990.

Time Series Forecasting Using Neural Networks

2010-09-12T21:02:40+00:00

Time Series Forecasting Using Neural Networks

Thomas Kolarik and Gottfried Rudorfer
Department of Applied Computer Science
Vienna University of Economics and Business Administration
Augasse 2-6, A-1090 Vienna, Austria
gottfried@rudorfer.homedns.org

Abstract

Artificial neural networks are suitable for many tasks in pattern recognition and machine learning. In this paper we present an APL system for forecasting univariate time series with artificial neural networks. Unlike conventional techniques for time series analysis, an artificial neural network needs little information about the time series data and can be applied to a broad range of problems. However, the problem of network ``tuning'' remains: parameters of the backpropagation algorithm as well as the network topology need to be adjusted for optimal performance. For our application, we conducted experiments to find the right parameters for a forecasting network. The artificial neural networks that were found delivered a better forecasting performance than results obtained by the well known ARIMA technique.

Motivation

Time series analysis as described by most textbooks [Cha91] relies on explicit descriptive, stochastic, spectral or other models of processes that describe the real world phenomena generating the observed data.

Usually, the parameters of a standard model like the ARIMA technique [BJ76] are derived from the autocorrelation and frequency spectrum of the time series. Problems with the ARIMA approach arise with time series of increasing variance or when the time series represents nonlinear processes.

The usage of artificial neural networks for time series analysis relies purely on the data that were observed. As multi layer feed forward networks with at least one hidden layer and a sufficient number of hidden units are capable of approximating any measurable function [HSW89,SS91], an artificial neural network is powerful enough to represent any form of time series. The capability to generalize allows artificial neural networks to learn even in the case of noisy and/or missing data. Another advantage over linear models like the ARIMA technique is the network's ability to represent nonlinear time series.

The APL programming language is very suitable for the task of implementing neural networks [Alf91,Pee81,ES91,SS93] because of its ability of handle matrix and vector operations. The forward and backward paths of a fully connected feed forward network can be implemented by outer and inner products of vectors and matrices in a few lines of APL code.

For our application, we decided to use a fully connected, layered, feed forward artificial neural network with one hidden layer and the backpropagation learning algorithm. The next section gives a short overview of the relevant definitions and algorithms.

A Very Short Introduction to Artificial Neural Networks

As mentioned above, our simulations utilized the ``multi layered perceptron model'' (MLP), also known as ``feed forward networks'' trained with the ``generalized delta rule'', also known as ``backpropagation''.

The foundations of the backpropagation method for learning in neural networks were laid by [RHW86].

Artificial neural networks consist of many simple processing devices (called processing elements or neurons) grouped in layers. Each layer is identified by the index . The layers and L are called the ``input layer'' and ``output layer'', all other layers are called ``hidden layers''. The processing elements are interconnected as follows: Communication between processing elements is only allowed for processing elements of neighbouring layers. Neurons within a layer cannot communicate. Each neuron has a certain activation level a. The network processes data by the exchange of activation levels between connected neurons (see figure 1):

**Figure:** Exchange of activation values between neurons

The output value of the i-th neuron in layer l is denoted by x_i^(l). It is calculated with the formula

x_i^(l) = g(a_i^(l))

where is a monotone increasing function. For our examples, we use the function (the ``squashing function''). The activation level a_i^(l) of the neuron i in layer l is calculated by

a_i^(l) = f(u_i^(l))

where is the activation function (in our case the identity function is used).

The net input u_i^(l) of neuron i in layer l is calculated as

where w_ij^(l) is the weight of neuron j in layer l-1 connected to neuron i in layer l, x_j^(l-1) is the output of neuron j in layer l-1. is a bias value that is subtracted from the sum of the weighted activations.

The calculation of the network status starts at the input layer and ends at the output layer. The input vector I initializes the activation levels of the neurons in the input layer:

For the input layer, is the identity function. The activation level of one layer is propagated to the next layer of the network. Then the weights between the neurons are changed by the backpropagation learning rule. The artificial neural network learns the input/output mapping by a stepwise change of the weights and minimizes the difference between the actual and desired output vector.

The simulation can be divided into two main phases during network training: A randomly selected input/output pair is presented to the input layer of the network. The activation is then propagated to the hidden layers and finally to the output layer of the network.

In the next step the actual output vector is compared with the desired result. Error values are assigned to each neuron in the output layer. The error values are propagated back from the output layer to the hidden layers. The weights are changed so that there is a lower error for a new presentation of the same pattern. The so called ``generalized delta rule'' is used as learning procedure in multi layered perceptron networks.

The weight change in layer l at time v is calculated by

where is the learning rate and is the momentum. Both are kept constant during learning. is defined

1.

for the output layer (l = L) as

where g'(u_i^(L)) is the gradient of the output function at u_i^(L).The gradient of the output function is always positive.

**Figure 2:** Weight adaptation between two neurons

The formula can be explained as follows: When the output x_k^(l) of the neuron i in layer l is too small, has a negative value. Hence the output of the neuron can be raised by increasing the net input u_k^(l) by the following change of the weight values:

if x_i^(l-1) > 0, then increase w_ki^(l)

if x_i^(l-1) < 0, then decrease w_ki^(l)

The rule applies vice versa for a neuron with an output value that is too high (see figure 2).

2.

for all neurons underneath the output layer (l < L) is defined by:

Finally the weights of layer l are adjusted by

Implementation

The time series modeling and forecasting system was implemented using Dyalog APL on HP 9000/700 workstations [Dya91] using the X11 window interface routines provided by the Xfns auxiliary processor. The system consists of two main components:

A toolkit of APL functions that drive the neural network and log parameters and results of the simulation runs to APL component files.
An X11-based graphical user interface that allows the user to navigate through the simulations and to compare the actual time series with the one generated by the neural network.

Figure 3 presents a screen dump of the user interface: the broken line shows the actual time series, the solid line represents the network's output. The forecast data is separated from the historical data - the network's training set - by the vertical bar in the right quarter of the graph. The menu in the upper left corner of figure 3 allows the user to select a view of the network's forecasting capability at different states throughout the learning phase.

By browsing through the logfiles of the simulation runs, past and present results can be compared and analyzed.

**Figure 3:** User interface of the forecasting system

Modeling

The Training Sets

As test bed for our forecasting system we used two well known time series from [BJ76]: The monthly totals of international airline passengers (thousand of passengers) from 1949 to 1960 (see figure 4), and the daily closing prices of IBM common stock from May 1961 to November 1962 (see figure 5).

**Figure 4:** International airline passengers

**Figure 5:** IBM share price

Table 1 gives some characteristics of these two time series: is the standard deviation, the mean, and n the number of observations. The airline time series is an example of time series data with a clear trend and multiplicative seasonality, whereas the IBM share price shows a break in the last third of the series and no obvious trend and/or seasonality.

**Figure:** Properties of time series

The next section is concerned with the question: How can a neural network learn a time series?

The Algorithm

The neural network sees the time series in the form of many mappings of an input vector to an output value (see figure 6). This technique was presented by [CMMR92].

A number of adjoining data points of the time series (the input window ) are mapped to the interval [0,1] and used as activation levels for the units of the input layer. The size s of the input window corresponds to the number of input units of the neural network. In a forward path, these activation levels are propagated over one hidden layer to one output unit. The error used for the backpropagation learning algorithm is now computed by comparing the value of the output unit with the transformed value of the time series at time t+1. This error is propagated back to the connections between output and hidden layer and to those between hidden and output layer. After all weights have been updated accordingly, one presentation has been completed. Training a neural network with the backpropagation algorithm usually requires that all representations of the input set (called one epoch) are presented many times. In our examples, we used 60 to 138 epochs.

For the learning of time series data, the representations were presented in a randomly manner: As reported by [CMMR92], choosing a random location for each representation's input window ensures better network performance and avoids local minima.

**Figure 6:** Learning a Time Series

The next section is concerned with the selection of the right parameters for the learning algorithm and the selection of a suitable topology for the forecasting network.

Network Parameters

The following parameters of the artificial neural network were chosen for a closer inspection:

The learning rate
() is a scaling factor that tells the learning algorithm how strong the weights of the connections should be adjusted for a given error. A higher can be used to speed up the learning process, but if is too high, the algorithm will ``step over'' the optimum weights. The learning rate is constant across presentations.
The momentum
The momentum parameter () is another number that affects the gradient descent of the weights: To prevent each connection from following every little change in the solution space immediately, the momentum term is added that keeps the direction of the previous step [HKP91], thus avoiding the descent into local minima. The momentum term is constant across presentations.
The number of input and the number of hidden units (the network topology).
The number of input units determines the number of periods the neural network ``looks into the past'' when predicting the future. The number of input units is equivalent to the size of the input window.

Whereas it has been shown that one hidden layer is sufficient to approximate continuous function [HSW89], the number of hidden units necessary is ``not known in general [HKP91]''. Other approaches for time series analysis with artificial neural networks report working network topologies (number of neurons in the input-hidden-output layer) of 8-8-1, 6-6-1 [CMMR92], and 5-5-1 [Whi88].

To examine the distribution of these parameters, we conducted a number of experiments: In subsequent runs of the network, these parameters were systematically changed to explore their effect on the network's modeling and forecasting capabilities.

We used the following terms to measure the modeling quality s_m and forecasting quality s_f of our system: For a time series

(1)

(2)

where is the estimate of the artificial neural network for period i and r is the number of forecasting periods. The error s_m (equation 1) estimates the capability of the neural network to mimic the known data set, the error s_f (equation 2) judges the networks's forecast capability for a forecast period of length r. In our experiments, we used r=20.

Note: For reasons of clarity, in this section we only present graphics for the IBM share price time series. The graphics for the airline passenger time series are very similar.

The figures 7 and 8 demonstrate the effect of variations of the learning rate and the momentum on the modeling (figure 7) and forecast (figure 8) quality: both graphics give evidence for the robustness of the backpropagation algorithm, high values of both and should be avoided.

**Figure:** Learning rate and momentum, IBM share price, modeling quality

**Figure:** Learning rate and momentum, IBM share price, forecasting quality

The figures 10 and 11 present the effect of different network topologies on the modeling (figure 10) and forecasting (figure 11) quality: The number of input units and the number of hidden units open an interesting view: artificial neural networks with more than approx. 50 hidden units are not suited for the task of time series forecasting. This tendency of ``over-elaborate networks capable of data-miming'' is also reported by [Whi88].

Another parameter we have to consider is the number of presentations. A longer training period does not necessarily result in a better forecasting capability. Figure 9 demonstrates this ``overlearning'' effect for the IBM share price time series: with an increasing number of presentations, the network memorizes details of the time series data instead of learning its essential features. This loss of generalization power has a negative effect on the network's forecasting ability.

**Figure 9:** Modeling vs. forecasting ability

These estimations of the network's most important parameters, although rough, allowed us to choose reasonable parameters for our performance comparison with the ARIMA technique, described in the next section.

**Figure:** Number of input and hidden units, IBM share price, modeling quality

Figure: Number of input and hidden units, IBM share price, forecasting quality

Comparison with ARIMA Modeling

We compared our results with the results of the ARIMA procedure of the SAS software, an integrated system for data access, management, analysis and presentation. The implementation of the ARIMA procedure of SAS follows the programs described by Box and Jenkins in Part V of their classic [BJ76].

The ARIMA model is called an autoregressive integrated moving average process of order (p, d, q). It is described by the equation

where X_t stands for in time ordered values of a time series, for n observations. U_t is a sequence of random values called ``white noise'' process. The backward difference operator is defined as

The variable d states how often the difference should be calculated, z is the so called backward shift operator which is defined as z^m X_t = X_t-m. The autoregressive operator a(z) of order p is defined as

the moving average operator b(z) of order q is defined as

We fitted an ARIMA model for each time series using the SAS system and let it predict the next 20 observations of the time series. The last 20 observations were dropped from the time series and used to calculate the prediction error of the models.

The following ARIMA models were calculated for the airline passenger time series (after a logarithmic transformation):

(1-z)(1-z¹²)X_t = (1 - 0.24169z - 0.47962z¹²) U_t

and for the IBM time series:

(1-z) X_t = (1 - 0.10538z) U_t

As an opponent for the ARIMA modeling technique, we selected those networks that delivered the smallest forecast error s_f for the respective time series data:

**Figure:** Number of input and hidden units, IBM share price, forecasting quality
series			# input	# hidden
			units	units
airline	0.1	0.9	70	45
IBM	0.1	0.9	80	30

In Table 2 the prediction errors for the artificial neural network (ANN), the artificial neural network using the logarithmic and transformation (ANN log,) and the ARIMA model are compared: The artificial neural network using the logarithmic and transformed time series outperformed the ARIMA models for both time series, whereas the ``simple'' artificial neural network predicted more accurately only for the IBM shares time series. This behavior can be explained as follows: the larger data range of the airline passenger time series leads to a loss of precision for the untransformed input set. Differencing and logarithmic transformations helped to eliminate the trend and mapped the time series data into a smaller range.

Figure: Forecasting errors for ANN and ARIMA model

Dyalog APL ANN Code

These functions present a complete implementation of the definitions given in the introduction. The main function NET learns the well known XOR (exclusive or) mapping. For reasons of simplicity, the input-output mapping of the XOR function is coded into the function NET.

BACKWARD

FORWARD

GRADIENT

NET

SQUASH

Conclusion and Further Work

We have presented a forecasting system for univariate time series that uses artificial neural networks. This computing devices proved themselves to be viable alternatives to conventional techniques. The system can be used in conjunction with other techniques for time series analysis or as a stand-alone tool.

Further work will include the comparison with other time series analysis techniques, development of hybrid techniques that combine the strength of conventional approaches with artificial neural networks and the application of our system to multivariate time series.

References

Alf91: M. Alfonseca.
Advanced applications of APL: logic programming, neural networks and hypertext.
IBM Systems Journal, 30(4):543-553, 1991.
BJ76: George E. P. Box and Gwilym M. Jenkins.
Time Series Analysis - forecasting and control.
Series in Time Series Analysis. Holden-Day, 500 Sansome Street, San Franciso, California, 1976.
Cha91: E. Chatfield.
The Analysis of Time Series.
Chapman and Hall, New York, fourth edition, 1991.
CMMR92: Kanad Chakraborty, Kishan Mehrota, Chilukuri K. Mohan, and Sanjay Ranka.
Forecasting the Behaviour of Multivariate Time Series Using Neural Networds.
Neural Networks, 5:961-970, 1992.
Dya91: Dyadic Systems Limited, Riverside View, Basing Road, Old Basing, Basingstoke, Hampshire RG24 0AL, England.
Dyalog Apl Users Guide, 1991.
ES91: Richard M. Evans and Alvin J. Surkan.
Relating Numbers of Processing Elements in a Sparse Distributed Memory Model to Learning Rate and Generalization.
ACM APL Quote Quad, 21(4):166-173, 1991.
HKP91: John Hertz, Anders Krogh, and Richard G. Palmer.
Introduction to the Theory od Neural Computation.
Addison Wesley, Redwood City, California, 1991.
HSW89: Kurt Hornik, Maxwell Stinchcombe, and Halbert White.
Multilayer Feedforward Networks are Universal Approximators.
Neural Networks, 2:359-366, 1989.
Pee81: Howard A. Peele.
Teaching A Topic in Cybernetics with APL: An Introduction to Neural Net Modelling.
ACM APL Quote Quad, 12(1):235-239, 1981.
RHW86: David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.
Learning representations by back-propagating errors.
Nature, 323(9):533-536, October 1986.
SS91: Hava Siegelmann and Eduardo D. Sontag.
Neural Nets Are Universal Computing Devices.
Technical Report SYSCON-91-08, Rutgers Center for Systems and Control, May 1991.
SS93: Alexei N. Skurihin and Alvin J. Surkan.
Identification of Parallelism in Neural Networks by Simulation with Language J.
ACM APL Quote Quad, 24(1):230-237, 1993.
Whi88: Halbert White.
Economic prediction using neural networks: the case of ibm daily stock returns.
In Proceedings of the IEEE International Conference on Neural Networks, pages II-451-II-459, 1988.