Accurate prediction of surge and currents at the entrances to main Dutch harbours is a problem of vital importance for every-day navigation and decision making. The measurement station of Hook of Holland plays an essential role for the economically important harbour of Rotterdam. The water level predictions have great influence on the waiting time for deep draft ships under all weather conditions, so more accurate and more reliable predictions can save a lot of money for the Dutch industry. Huge amount of data collected at multiple marine platforms and other locations gives the possibility to apply novel data analysis techniques, generally known as data mining.
Data Mining or Knowledge Discovery in Databases (KDD) as it is also known, is a young interdisciplinary field of research where statistics, database technology, machine learning, expert systems, artificial neural networks, chaos theory, and data visualisation make a contribution. Due to the wide availability of huge amounts of data in electronics forms, and the imminent need for turning such data into useful information and knowledge for many fields of applications in water related problems, data mining has attracted the attention of water resources engineers in recent years.
This study explores some data mining techniques like basic statistical analysis, cluster analysis, classification and rule induction algorithms and prediction using linear and non-linear models. ARIMA, as a linear model and non-linear models like artificial neural networks and chaos theory have been used during the present study.
The practical part of this research has been focused on the investigation of the applicability of the mentioned data mining techniques to surge water level characterisation, classification and prediction at the Hook of Holland coastal station. Hydro-meteorological data set collected at the coast of The Netherlands were used (data on surge water level, wind speed, wind direction and air pressure for 6 years with ten-minutes interval). A basic statistical analysis and the investigation of the spatial and temporal relationships in the mentioned hydro-meteorological data set is carried out in order to characterise the surge water level time series. Cluster analysis, classification algorithms, and linear and non-linear prediction methods have been applied to predict surge water levels (as classes and as real values). Chaos Theory technique appeared to be the best suited for this problem, allowing for 3 hour prediction with RMSE of only 6.56 cm. In general, the goal and objectives of this research have been achieved and some recommendations have been stated which could serve as started point for further investigation.
Back to the list of MSc abstracts