PREDICTING WATER LEVELS AND CURRENTS IN THE NORTH SEA USING CHAOS THEORY AND NEURAL NETWORKS

 

 

D. P. Solomatine

International Institute for Infrastructural, Hydraulic and Environmental Engineering (IHE)

Delft, The Netherlands.  Tel.: +31-15-2151815.  Fax: 2122921.

E-mail: sol@ihe.nl  Internet: www.ihe.nl/hi/sol

S. Velickov

International Institute for Infrastructural, Hydraulic and Environmental Engineering (IHE)

J. C. Wüst

North Sea Directorate, Ministry of Transport, Public Works and Water Management

P. O. Box 5807, 2280 HV Rijswijk, The Netherlands

 

 

Abstract: In the ship guidance and navigation the problem of predicting surge water levels and currents is extremely important. There is correlation between data on surge, currents, temperature, air pressure and wind and earlier publications dealt with using the input-output (connectionist) models like neural networks to model these relationships. In the course of this research it appeared also that the surge time series in itself has enough information to make predictions. The experiments with using linear prediction methods including autocorrelation and ARIMA models demonstrated insufficient accuracy. Non-linear methods were used and showed promising results for the short-term prediction. Features of chaotic behaviour were identified in surge, and methods of chaos theory were applied. The predictions are quite accurate. Possible techniques allowing for increase of the prediction accuracy and horizon (wavelet analysis, data mining techniques) were also identified.

 

Keywords: chaos theory, neural networks, water levels, currents, prediction, ship guidance

1    INTRODUCTION

Chaotic (highly sensitive to initial conditions) behaviour of many systems was observed by many researchers for a number of decades, but was first described as such by Lorenz (1963). An important characteristic of chaos is high sensitivity of results to initial conditions. Since the natural systems characterised by water level variability cannot be "restarted" with slightly different initial conditions (consider flood levels or levels in coastal waters), it is reasonable to follow a more practical definition: in a chaotic system close state space trajectories will diverge and they will never close on themselves. Chaos comprises a class of signal intermediate between regular sinusoidal or quasiperiodic motions and unpredictable, truly stochastic behaviour. The main reason for applying chaos theory is the existence of methods permitting to predict the future positions of the system in the state space. In this paper we base our considerations on work of Abarbanel (1996) and Tsonis (1992).

During the past two decades, the theory of chaos showed its applicability in solving a wide class of problems in many areas of natural sciences. However, chaotic signal analysis is still a novel approach in many areas related to civil engineering and to water-related problems in particular. Chaotic behaviour in various hydrological time series and water level data is being analysed for a number of years and reported by e.g. Hense (1987), Jayawardena and Lai (1994), Sivakumar et al., (1999a, 1999b), Rahman (1999). Applications of non-linear dynamic analysis (chaos theory) describing coastal waters and its comparison to other methods are reported by Frison et al. (1999) and Zaldivar (2000).

Often physical reasoning does not easily explain why physical systems behave chaotically. However, regardless of why chaotic behaviour occurs, chaotic signal processing can provide the framework to describe non-linear system behaviour. Coastal ocean water levels are good candidates for chaotic signal processing because the governing Navier-Stokes equations are inherently non-linear, and the observed broadband and continuous Fourier spectra are indicators of chaos.

2    CASE STUDY

Here we cover parts of an effort aimed at analysing the surge water levels and currents in the coastal waters of the Netherlands at Hook of Holland (at the entrance to the port of Rotterdam). Such predictions play an important role for making decisions on allowing the ships to enter the estuary. Hydro-meteorological data was collected at several measurement locations along the North Sea. The sampling time for water levels is 10 minutes and the available parameters are: astronomical water level, surge water level (measured water level minus astronomical water level), wind speed, wind direction and air pressure. Besides observations this data set contains analysed wind and air pressure averages representing 6 areas covering the region between the northern part of the North Sea and the English Channel. Most of results reported deal with the water level predictions based on the data on water levels collected at Hook of Holland station in the hydrological year 1994-1995. Obtained results on using chaos theory for predicting currents are still to be verified.

Previous studies showed the applicability of the artificial neural networks (ANNs) to predicting currents (Wüst et al. 1994) and water levels (Vaziri 1997). The above mentioned data in Hook of Holland were also used to train ANN to predict water levels, and it is in operation since 1996 in the North Sea Directorate.

Fig. 1    Surge water level time series

The assumption for the consecutive experiments reported below was that the surge time series carries enough information in itself to make prediction. Other data is currently used in the research of using other data mining techniques (neural networks, support vector machines, clustering and association rules).

Figure 1, shows surge water levels for Hook of Holland station in the hydrological year 1994-1995. These data exhibit predominant high-frequency fluctuations. However, embedded in these fluctuations there are some periodical tendencies, which suggest presence of some periodic components. Three techniques were applied for the identification of either the periodic (deterministic) or the stochastic components of time series: autocorrelation analysis, spectral analysis and chaos theory analysis.

3    LINEAR METHODS OF PREDICTION

Autocorrelation function values of the surge water level decrease practically in a linear fashion being 0.802 at time lag 36x10 min = 6 hours and 0.4 at 108x10 min = 18 hours, which suggest that there is serial dependency in this series. Since autocorrelations for consecutive lags are formally dependent, the autocorrelation of the difference was calculated (the time series was differenced with the time lag of 1) giving the correlation coefficient after lag 36 (6 hours time) equal to a very low value of 0.072. Some periodic components were identified after 6 and 12 hours with correlation coefficients of 0.088 and 0.231 respectively. This behaviour can be also confirmed examining the partial autocorrelation functions.

Spectral analysis of the surge water level shows large spectral density values at the beginning with sine and cosine components. Analysis of stationarity was not done. The observed broad spectrum may indicate chaotic behaviour.

Analysis of cross-correlations between all measured parameters showed that the component of wind in East-West direction has the largest negative correlation with the surge (–0.509) while the wind speed shows the largest positive correlation (0.418). These values are too low to be used for prediction.

ARIMA (autoregressive integrated moving average process) model was built as well (Box and Jenkins 1976). Various ARIMA parameters were tried and it was found that for the whole data set ARIMA (1,1,1) model gives results better than others; still its prediction RMS error for 2 hours ahead reaches 20 cm. Experiments were also performed with ARIMA models applied only for some periods were the process was found stationary. It was found that ARIMA (4, 1, 0) was performing reasonably well (RMSE in the order of several cm) only for 0.5 hour prediction. Overall, accuracy of ARIMA predictions was considered unacceptable. General reason for that is that the autocorrelation function of the surge time-series, which is based on linear regression, does not clearly represent the amount of the information that the system carries from the past. An approach when the coefficients in the AR part of ARIMA prediction were updated in time was also tried. The results were better but not satisfactory either. (It should be stressed that all these experiments were conducted without exogenous variables and that ARIMAX models will perform better, although probably not as good as the non-linear ANN which also uses exogenous variables). The obvious next step was to use non-linear prediction methods.

4    NON-LINEAR DYNAMICS AND CHAOS

One of the important foundations behind the methods of non-linear signal processing and the chaos theory is an embedding theorem by a Dutch mathematician F. Takens. It shows that the use of a single measured variable x(n) = x(t0 + nt) with t0 some starting time and t the sampling time, and its time delays provides n-dimensional space that is a proxy for the full multivariate state space of the observed system. The n-dimensional state vectors x(t) are then defined as:

where x(t) is a value of the time-series at time t, t is a suitable time delay (sampling time) and N is the embedding dimension. This vector fully represents the non-linear dynamics when N is a large enough. There are various methods to estimate N; in this work we used popular analytical methods.

Estimation of time delay t. For finding t  we used the average mutual information (AMI) function, and for finding N, the method of false nearest neighbours. The time delay t must be large enough that independent information about the system is in each component of the vector. However, t must not be so large that the components of the vectors x(t) are independent with respect to each other. If the time delay is too short, the vector components will be independent enough and will not contain any new information. For the considered case study t  was selected to be 9.

Estimation of embedding dimension N. The global embedding dimension N is the minimum number of time-delay co-ordinates needed so that the trajectories x(t) do not intersect in N dimensions. In dimensions less than N, trajectories can intersect because they are projected down into too few dimensions. Subsequent calculations, such as predictions, may then be corrupted. If it is too large, noise and other contamination may corrupt other calculations because noise fills any dimension.

The method of finding a proper N can be described using geometrical considerations: as N increases, attractors "unfold" and the vectors that are close in dimension N move to a significant distance apart in N+1. They are "false" neighbours in dimension N. The method of false nearest neighbours measures the percentage of false neighbours as N increases. Points that are close in N are marked and the number of these points that become widely separated in N+1 is calculated.

System evolution over shorter time intervals can be often adequately described in fewer dimensions than N. The local embedded dimension NL is the number of degrees of freedom that describe the short-term evolutions in small regions of phase space. The idea behind determining NL is to ensure that trajectories associated with close neighbours have to remain close for some time period.

Assessment of prediction horizon. An approximate estimate of prediction horizon can be made with the help of Lyapunov exponents. These parameters (their number is equal to the state dimension) are widely used in classical control theory as indicators of stability—they describe the rate at which close trajectories diverge or converge. If all exponents are all zero or negative, the system trajectories on phase space do not diverge and the system is stable. Chaotic systems obviously are not stable, so they have positive Lyapunov exponents, and the largest Lyapunov exponent λ1 describes the upper limit of accuracy for a predictive model, such as the described above. The largest Lyapunov exponent for the considered case is found to be 0.5. It is possible to give an assessment of the prediction horizon as T = t / λ1 = 9 / 0.5 = 18 (that is 18 x 10 min = 3 hours). This assessment is only an estimate and does not give direct indication of the associated error.

Prediction with the local models. Having parameters t and N identified and the phase space reconstructed, one can build the prediction model in a form of multidimensional maps:

where the phase space x(t) is the current state of the system and x(t+T) is the state of the system after a time interval T and fT  is a mapping function. The problem is then to find a good expression (local models) for the function f T . A generalised scheme that was applied for constructing and testing the local models in this study is presented in Figure 2. The data is embedded and then divided into training and testing set. Based on the training set, the embedded data space is quantified (using K-NN algorithm at this stage). Local data sets are then constructed for each of the prototype vectors. Finally, local data models (linear in this case study) are constructed based on the local data sets which are then used to predict the dynamics of the system (move the system from state x(t) into state x(t+T)).

Fig. 2    A generalised scheme for constructing and testing local models

Fig. 3    Local linear model for Surge time-series prediction (data for the hydrological year 1994/95 with 10-minute interval are used; 50000 samples used for training and 2000 for testing). Embedded dimension = 4, Time delay t=9 time steps. Prediction Horizon=6 time steps (1 hour). RMSE=3.6 cm

Several local data models were built in order to predict the surge water level for different time horizons. Data for the hydrological year 1994/95 with a 10-minute interval were used; 50000 samples used for training and 2000 for testing. The embedding dimension used was 4, t was equal to 9 time steps. See Table 1 and Figure 3 for details.

                            Table 1    Prediction errors of the model for different time horizons

Error

20 min

1 h

2 h

2.5 h

3 h

RMSE

2.277

3.614

5.481

5.928

6.116

MAE

1.707

2.656

4.005

4.32

4.451

A testing set (2000 samples in total) was chosen to contain two types of dynamic behaviour of the system. The first part is characterised by small amplitude and variance of the surge (cases 50000-51400), the second part is characterised by large variations both in variance and the surge amplitude (values between –47 cm and 79 cm). Such a selection of the testing set was done in order to test the predicting capabilities of the trained local linear models for contrasting dynamic states of the system.

5    CONCLUSIONS

Non-linear methods of prediction applied to univariate time series perform better than linear (autocorrelation and ARIMA models) ones in predicting surge water levels in the coastal zone. Results for horizons from 20 minutes to 1 hour prediction are excellent (RMSE between 2.23 – 3.6 cm), extending the prediction horizon to 2 hours has showed that there is still enough local predictive information embedded into the attractor of the system (RMSE around 5.5 cm). Finally, the 3 hours prediction has shown that the local linear models are able to correctly predict the amplitudes of the surge, in “stormy” situations as well, but with a phase error (this pushed RMSE up to 6.1 cm). The reason for its presence might be a systematic nature, as well as the presence of low-frequency periodic components, and of course the linearity of the local model used.

Identification, decomposition and removal of the components that produce the mentioned phase error can be done using transformation from “amplitude-time” domain into the “frequency-time” domain utilising techniques such as wavelet analysis. In spite of the good results achieved with the chaos theory, the prediction horizon may be increased through inclusion into the analysis of other hydrometeorological variables as well. Current research is aimed at using pattern recognition methods (Velickov et. al 2000).

The performance of chaos theory-related methods to predicting currents was also investigated, other data-driven techniques were applied as well. The restricted space of this paper does not allow covering them here—they are to be addressed in the presentation.

 

Acknowledgements

Authors are grateful to the North Sea Directorate of the Rijkswaterstaat, (Ministry of Transport, Public Works and Water Management, the Netherlands) for the permission to use the measurement data and for the partial financial support for this work. This work has been accomplished in the framework and with the partial financial support of the project "Data mining, knowledge discovery and data-driven modelling" of the Delft Cluster programme.

References

[1]    Abarbanel, H.D.I., “Analysis of Observed Chaotic Data”, Springer-Verlag, New York, 1996.

[2]    Box, G. E. P., & Jenkins, G. M., “Time series analysis: Forecasting and control”, San Francisco: Holden-Day, 1976.

[3]    Frazer, A. M. and H. L. Swinney, “Independent Coordinates for Strange Attractors from Mutual Information”, Physical Review A 33 (2), pp1134-1140, 1996.

[4]    Frison, T. W., H. D. I. Abarbanel, M. D. Earle, J. R. Schultz and W. Scherer, “Chaos and predictability in ocean water levels”, Journal of Geophysical Research, 104 (4), pp. 7935-7951, 1999.

[5]    Hense, A., “On the possible existence of a strange attractor for the southern oscillation Beitr”, Phys. Atmosph. 60 (1), 34-37, 1987.

[6]    Jayawardena, A. W., Lai F., “Analysis and prediction of chaos in rainfall and stream flow time series”, J. Hydrol. 153, pp. 23-52, 1994.

[7]    Lorenz, E. N., “Deterministic nonperiodic flow”, J. Atmos. Sci., 20, pp.130-141, 1963.

[8]    Rahman, M. Analysis and prediction of chaotic timeseries. MSc Thesis, IHE/DHI, 1999.

[9]    Sivakumar, B., Liong, S. Y., Liaw, C. Y., Phoon, K. K., “Singapore rainfall behaviour: chaotic?”, J. Hydrol. Eng., ASCE 4 (1), pp. 38-48, 1999.

[10]    Velickov, S., Price R.K., Solomatine D.P. and Yu, X. “Application of Data Mining Techniques for Remote Sensing Image Analysis”. Proc. 4th Int. Conference on Hydroinformatics, Iowa, USA, July 2000.

[11]    Vaziri M. Predicting Caspian sea surface water level by ANN and ARIMA models. J. of Waterway, Port, Coastal and Ocean Engineering, July/August 1997, pp. 158-162.

[12]    Wüst J.C. and van Noort G.J.H.L. “Neural network current prediction for shipping guidance”. Proceedings Conf. Oceans'94 OSATES, Brest, France, 13-16 September 1994, pp. I-58 - I-63.

[13]    Zaldivar, J.M., Gutierrez, E., Galvan, I.M., Strozzi, F. and Tomasin, A. “Forecasting high waters at Venice Lagoon using chaotic time series analysis and non-linear neural networks”. Journal of Hydroinformatics, vol. 2, No.1, pp. 61-84, 2000.