D. P. Solomatine
International Institute for
Infrastructural, Hydraulic and Environmental Engineering (IHE)
Delft, The Netherlands. Tel.: +31-15-2151815. Fax: 2122921.
E-mail: sol@ihe.nl Internet: www.ihe.nl/hi/sol
S. Velickov
International Institute for
Infrastructural, Hydraulic and Environmental Engineering (IHE)
J. C. Wüst
North Sea Directorate, Ministry of
Transport, Public Works and Water Management
P. O. Box 5807, 2280 HV Rijswijk,
The Netherlands
Abstract: In the ship guidance and navigation the
problem of predicting surge water levels and currents is extremely important.
There is correlation between data on surge, currents, temperature, air pressure
and wind and earlier publications dealt with using the input-output
(connectionist) models like neural networks to model these relationships. In the
course of this research it appeared also that the surge time series in itself
has enough information to make predictions. The experiments with using linear
prediction methods including autocorrelation and ARIMA models demonstrated
insufficient accuracy. Non-linear methods were used and showed promising results
for the short-term prediction. Features of chaotic behaviour were identified in
surge, and methods of chaos theory were applied. The predictions are quite
accurate. Possible techniques allowing for increase of the prediction accuracy
and horizon (wavelet analysis, data mining techniques) were also identified.
Keywords: chaos theory, neural networks, water levels,
currents, prediction, ship guidance
Chaotic (highly
sensitive to initial conditions) behaviour of many systems was observed by many
researchers for a number of decades, but was first described as such by Lorenz
(1963). An important characteristic of chaos is high sensitivity of results to
initial conditions. Since the natural systems characterised by water level
variability cannot be "restarted" with slightly different initial conditions
(consider flood levels or levels in coastal waters), it is reasonable to follow
a more practical definition: in a chaotic system close state space trajectories
will diverge and they will never close on themselves. Chaos comprises a class of
signal intermediate between regular sinusoidal or quasiperiodic motions and
unpredictable, truly stochastic behaviour. The main reason for applying chaos
theory is the existence of methods permitting to predict the future positions of
the system in the state space. In this paper we base our considerations on work
of Abarbanel (1996) and Tsonis (1992).
During the past two
decades, the theory of chaos showed its applicability in solving a wide class of
problems in many areas of natural sciences. However, chaotic signal analysis is
still a novel approach in many areas related to civil engineering and to
water-related problems in particular. Chaotic behaviour in various hydrological
time series and water level data is being analysed for a number of years and
reported by e.g. Hense (1987), Jayawardena and Lai (1994), Sivakumar et al.,
(1999a, 1999b), Rahman (1999). Applications of non-linear dynamic analysis
(chaos theory) describing coastal waters and its comparison to other methods are
reported by Frison et al. (1999) and Zaldivar (2000).
Often physical
reasoning does not easily explain why physical systems behave chaotically.
However, regardless of why chaotic behaviour occurs, chaotic signal processing
can provide the framework to describe non-linear system behaviour. Coastal ocean
water levels are good candidates for chaotic signal processing because the
governing Navier-Stokes equations are inherently non-linear, and the observed
broadband and continuous Fourier spectra are indicators of chaos.
Here we cover parts of an effort aimed at analysing the surge water levels and currents in the coastal waters of the Netherlands at Hook of Holland (at the entrance to the port of Rotterdam). Such predictions play an important role for making decisions on allowing the ships to enter the estuary. Hydro-meteorological data was collected at several measurement locations along the North Sea. The sampling time for water levels is 10 minutes and the available parameters are: astronomical water level, surge water level (measured water level minus astronomical water level), wind speed, wind direction and air pressure. Besides observations this data set contains analysed wind and air pressure averages representing 6 areas covering the region between the northern part of the North Sea and the English Channel. Most of results reported deal with the water level predictions based on the data on water levels collected at Hook of Holland station in the hydrological year 1994-1995. Obtained results on using chaos theory for predicting currents are still to be verified.
Previous studies showed the applicability of the artificial neural networks (ANNs) to predicting currents (Wüst et al. 1994) and water levels (Vaziri 1997). The above mentioned data in Hook of Holland were also used to train ANN to predict water levels, and it is in operation since 1996 in the North Sea Directorate.

Fig. 1 Surge water level time series
The assumption for the consecutive experiments
reported below was that the surge time series carries enough information in
itself to make prediction. Other data is currently used in the research
of using other data mining techniques (neural networks, support vector machines,
clustering and association rules).
Figure 1, shows surge water levels for Hook of Holland station in the hydrological year 1994-1995. These data exhibit predominant high-frequency fluctuations. However, embedded in these fluctuations there are some periodical tendencies, which suggest presence of some periodic components. Three techniques were applied for the identification of either the periodic (deterministic) or the stochastic components of time series: autocorrelation analysis, spectral analysis and chaos theory analysis.
Autocorrelation function values of the surge water level
decrease practically in a linear fashion being 0.802 at time lag 36x10 min = 6
hours and 0.4 at 108x10 min = 18 hours, which suggest that there is serial
dependency in this series. Since autocorrelations for consecutive lags are
formally dependent, the autocorrelation of the difference was calculated (the
time series was differenced with the time lag of 1) giving the correlation
coefficient after lag 36 (6 hours time) equal to a very low value of 0.072. Some
periodic components were identified after 6 and 12 hours with correlation
coefficients of 0.088 and 0.231 respectively. This behaviour can be also
confirmed examining the partial autocorrelation functions.
Spectral analysis of the surge water level shows large spectral
density values at the beginning with sine and cosine components. Analysis of
stationarity was not done. The observed broad spectrum may indicate chaotic
behaviour.
Analysis of cross-correlations between all measured parameters
showed that the component of wind in East-West direction has the largest
negative correlation with the surge (–0.509) while the wind speed shows the
largest positive correlation (0.418). These values are too low to be used for
prediction.
ARIMA (autoregressive integrated moving average
process) model was built as well (Box and Jenkins 1976). Various ARIMA
parameters were tried and it was found that for the whole data set ARIMA (1,1,1)
model gives results better than others; still its prediction RMS error for 2
hours ahead reaches 20 cm.
Experiments were also performed with ARIMA models applied only for some periods
were the process was found stationary. It was found that ARIMA (4, 1, 0) was
performing reasonably well (RMSE in the order of several cm) only for 0.5 hour prediction.
Overall, accuracy of ARIMA predictions was considered unacceptable. General
reason for that is that the autocorrelation function of the surge time-series,
which is based on linear regression, does not clearly represent the amount of
the information that the system carries from the past. An approach when the
coefficients in the AR part of ARIMA prediction were updated in time was also
tried. The results were better but not satisfactory either. (It should be
stressed that all these experiments were conducted without exogenous variables
and that ARIMAX models will perform better, although probably not as good as the
non-linear ANN which also uses exogenous variables). The obvious next step was
to use non-linear prediction methods.
One of the
important foundations behind the methods of non-linear signal processing and the
chaos theory is an embedding theorem by a Dutch mathematician F. Takens. It
shows that the use of a single measured variable x(n) = x(t0 +
nt) with t0 some starting
time and t the sampling time, and its time
delays provides n-dimensional space that is a proxy for the full multivariate
state space of the observed system. The n-dimensional state vectors x(t) are then defined as:
where x(t) is a value of the time-series at
time t, t is a suitable time delay (sampling
time) and N is the embedding
dimension. This vector fully represents the non-linear dynamics when N is a
large enough. There are various methods to estimate N; in this work we used
popular analytical methods.
Estimation of time delay t. For finding t we used the average mutual information
(AMI) function, and for finding N,
the method of false nearest neighbours. The time delay t must be large enough that
independent information about the system is in each component of the vector.
However, t must not be so large that the
components of the vectors x(t) are
independent with respect to each other. If the time delay is too short, the
vector components will be independent enough and will not contain any new
information. For the considered case study t was selected to be 9.
Estimation of embedding dimension
N. The global
embedding dimension N is the minimum number of time-delay co-ordinates needed so
that the trajectories x(t) do not intersect in N dimensions. In
dimensions less than N, trajectories
can intersect because they are projected down into too few dimensions.
Subsequent calculations, such as predictions, may then be corrupted. If it is
too large, noise and other contamination may corrupt other calculations because
noise fills any dimension.
The method of
finding a proper N can be described using geometrical considerations: as N increases, attractors "unfold" and the
vectors that are close in dimension N
move to a significant distance apart in N+1. They are "false" neighbours in
dimension N. The method of false
nearest neighbours measures the percentage of false neighbours as N increases. Points that are close in N are marked and the number of these
points that become widely separated in N+1 is calculated.
System evolution
over shorter time intervals can be often adequately described in fewer
dimensions than N. The local embedded dimension NL is the number of degrees
of freedom that describe the short-term evolutions in small regions of phase
space. The idea behind determining NL is to ensure that
trajectories associated with close neighbours have to remain close for some time
period.
Assessment of prediction horizon.
An approximate
estimate of prediction horizon can be made with the help of Lyapunov exponents.
These parameters (their number is equal to the state dimension) are widely used
in classical control theory as indicators of stability—they describe the rate at
which close trajectories diverge or converge. If all exponents are all zero or
negative, the system trajectories on phase space do not diverge and the system
is stable. Chaotic systems obviously are not stable, so they have positive
Lyapunov exponents, and the largest Lyapunov exponent λ1 describes the upper limit
of accuracy for a predictive model, such as the described above. The largest
Lyapunov exponent for the considered case is found to be 0.5. It is possible to
give an assessment of the prediction horizon as T = t / λ1 = 9 / 0.5 = 18 (that is 18 x 10 min = 3 hours).
This assessment is only an estimate and does not give direct indication of the
associated error.
Prediction with the local models.
Having parameters
t and N identified and the phase space
reconstructed, one can build the prediction model in a form of multidimensional
maps:
where the phase space x(t) is the current state of the system
and x(t+T) is the state of the system after a
time interval T and fT is a mapping function. The problem
is then to find a good expression (local models) for the function f T . A generalised scheme
that was applied for constructing and testing the local models in this study is
presented in Figure 2. The data is embedded and then divided into training and
testing set. Based on the training set, the embedded data space is quantified
(using K-NN algorithm at this stage). Local data sets are then constructed for
each of the prototype vectors. Finally, local data models (linear in this case
study) are constructed based on the local data sets which are then used to
predict the dynamics of the system (move the system from state x(t) into state x(t+T)).
Fig. 2 A generalised scheme for constructing and testing local models

Fig. 3 Local linear model for Surge time-series prediction (data for the hydrological year 1994/95 with 10-minute interval are used; 50000 samples used for training and 2000 for testing). Embedded dimension = 4, Time delay t=9 time steps. Prediction Horizon=6 time steps (1 hour). RMSE=3.6 cm
Several local data
models were built in order to predict the surge water level for different time
horizons. Data for the hydrological year 1994/95 with a 10-minute interval were
used; 50000 samples used for training and 2000 for testing. The embedding
dimension used was 4, t was equal to 9 time steps. See Table 1 and
Figure 3 for details.
Table 1 Prediction errors of the model for different time
horizons
|
Error |
20 min |
1 h |
2 h |
2.5 h |
3 h |
|
RMSE |
2.277 |
3.614 |
5.481 |
5.928 |
6.116 |
|
MAE |
1.707 |
2.656 |
4.005 |
4.32 |
4.451 |
A testing set (2000 samples in total) was chosen to contain two types of dynamic behaviour of the system. The first part is characterised by small amplitude and variance of the surge (cases 50000-51400), the second part is characterised by large variations both in variance and the surge amplitude (values between –47 cm and 79 cm). Such a selection of the testing set was done in order to test the predicting capabilities of the trained local linear models for contrasting dynamic states of the system.
Non-linear methods
of prediction applied to univariate time series perform better than linear
(autocorrelation and ARIMA models) ones in predicting surge water levels in the
coastal zone. Results for horizons from 20 minutes to 1 hour prediction are
excellent (RMSE between 2.23 – 3.6 cm), extending the prediction horizon to 2
hours has showed that there is still enough local predictive information
embedded into the attractor of the system (RMSE around 5.5 cm). Finally, the 3
hours prediction has shown that the local linear models are able to correctly
predict the amplitudes of the surge, in “stormy” situations as well, but with a
phase error (this pushed RMSE up to 6.1 cm). The reason for its presence might
be a systematic nature, as well as the presence of low-frequency periodic
components, and of course the linearity of the local model used.
Identification,
decomposition and removal of the components that produce the mentioned phase
error can be done using transformation from “amplitude-time” domain into the
“frequency-time” domain utilising techniques such as wavelet analysis. In spite of the good
results achieved with the chaos theory, the prediction horizon may be increased
through inclusion into the analysis of other hydrometeorological variables as
well. Current research is aimed at using pattern recognition methods (Velickov
et. al 2000).
The performance of
chaos theory-related methods to predicting currents was also investigated, other
data-driven techniques were applied as well. The restricted space of this paper
does not allow covering them here—they are to be addressed in the presentation.
Acknowledgements
Authors are grateful to the North Sea Directorate of the Rijkswaterstaat, (Ministry of Transport, Public Works and Water Management, the Netherlands) for the permission to use the measurement data and for the partial financial support for this work. This work has been accomplished in the framework and with the partial financial support of the project "Data mining, knowledge discovery and data-driven modelling" of the Delft Cluster programme.
References
[1] Abarbanel,
H.D.I., “Analysis of Observed Chaotic Data”, Springer-Verlag, New York,
1996.
[2] Box, G. E. P.,
& Jenkins, G. M., “Time series analysis: Forecasting and control”, San
Francisco: Holden-Day, 1976.
[3] Frazer, A. M.
and H. L. Swinney, “Independent Coordinates for Strange Attractors from Mutual
Information”, Physical Review A 33
(2), pp1134-1140, 1996.
[4] Frison, T. W.,
H. D. I. Abarbanel, M. D. Earle, J. R. Schultz and W. Scherer, “Chaos and
predictability in ocean water levels”, Journal of Geophysical Research, 104 (4), pp. 7935-7951, 1999.
[5] Hense, A., “On
the possible existence of a strange attractor for the southern oscillation
Beitr”, Phys. Atmosph. 60 (1),
34-37, 1987.
[6] Jayawardena,
A. W., Lai F., “Analysis and prediction of chaos in rainfall and stream flow
time series”, J. Hydrol. 153, pp. 23-52, 1994.
[7] Lorenz, E. N.,
“Deterministic nonperiodic flow”, J. Atmos. Sci., 20, pp.130-141, 1963.
[8] Rahman, M.
Analysis and prediction of chaotic timeseries. MSc Thesis, IHE/DHI, 1999.
[9] Sivakumar, B.,
Liong, S. Y., Liaw, C. Y., Phoon, K. K., “Singapore rainfall behaviour:
chaotic?”, J. Hydrol. Eng., ASCE 4
(1), pp. 38-48, 1999.
[10] Velickov, S.,
Price R.K., Solomatine D.P. and Yu, X. “Application of Data Mining Techniques
for Remote Sensing Image Analysis”. Proc. 4th Int. Conference on
Hydroinformatics, Iowa, USA, July 2000.
[11] Vaziri M.
Predicting Caspian sea surface water level by ANN and ARIMA models. J. of
Waterway, Port, Coastal and Ocean Engineering, July/August 1997, pp.
158-162.
[12] Wüst J.C. and
van Noort G.J.H.L. “Neural network current prediction for shipping guidance”.
Proceedings Conf. Oceans'94 OSATES, Brest, France, 13-16 September 1994, pp.
I-58 - I-63.
[13] Zaldivar, J.M., Gutierrez, E., Galvan, I.M., Strozzi, F. and Tomasin, A. “Forecasting high waters at Venice Lagoon using chaotic time series analysis and non-linear neural networks”. Journal of Hydroinformatics, vol. 2, No.1, pp. 61-84, 2000.