Application of Machine Learning Techniques to Flood Forecasting in the Upper Reach of the Huai River

Yunpeng XUE

2001

Abstract

Accurate flood forecasting is critical for flood management. Good understanding of catchment hydrology plays central role here. This can be complemented by the appropriate use of physically-based models and data-driven models and other instruments of hydroinformatics. This study is focussed on the application of novel data-driven (machine learning) techniques.

The machine learning techniques have been applied to flood forecasting and classification in the upper reach of the Huai River China. Special interest has been given to the study of general performance and applicability of the so-called M5 model tree inductive learning technique in flood forecasting. This technique was compared with the multi-layer perceptron artificial neural network (ANN) using the same input and output data.

The M5 model tree algorithm is an extension of the decision tree inductive learning algorithm. First the samples with numeric attributes are classified according to similarity among them and organised into different classes, then the local linear regression model is built for each class. Thus the global non-linear relation is simulated through a set of simple local linear regression models. The advantages of model tree techniques are that the rule- based rainfall-runoff model is easy to understand and use by practitioners. Further more the classification and regression is very fast and always converges, so the training of the model trees is much faster than ANN which can easily be trapped in a local minimum.

The experiments show that the M5 model tree appeared to be at least as good as ANN in rainfall-runoff forecasting. The prediction performance is quite good in most flood events for both techniques, except for the peak of some special flood event (mainly due to data-related problems). Further experiments show that the prediction accuracy of a model tree can be improved by using a modular model approach, which is selecting the flood samples with special hydrological characteristics, and then using a model tree to classify the data into more accurate linear regression models in the tree leaves. The hybrid model, which is the combination of a model tree and ANN, gives the best prediction result.

Another machine learning method, namely decision tree inductive learning algorithm C4.5 was used to predict the condition of 'flood' and 'non-flood' one-day ahead. It gives very good result on flood generation condition, which shows another possibility of selecting the criteria to build a modular model.

Conclusions and recommendations are drawn regarding practical appliction and further improvements.

Back to the list of MSc abstracts