The application of data mining techniques for the regionalisation of hydrological variables

Hall, M. J.; Minns, A. W.; Ashrafuzzaman, A. K. M.

doi:https://doi.org/10.5194/hess-6-685-2002

Articles | Volume 6, issue 4

https://doi.org/10.5194/hess-6-685-2002

© Author(s) 2002. This work is licensed under
the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

https://doi.org/10.5194/hess-6-685-2002

© Author(s) 2002. This work is licensed under
the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

Articles | Volume 6, issue 4

31 Aug 2002

The application of data mining techniques for the regionalisation of hydrological variables

M. J. Hall, A. W. Minns, and A. K. M. Ashrafuzzaman

Abstract. Flood quantile estimation for ungauged catchment areas continues to be a routine problem faced by the practising Engineering Hydrologist, yet the hydrometric networks in many countries are reducing rather than expanding. The result is an increasing reliance on methods for regionalising hydrological variables. Among the most widely applied techniques is the Method of Residuals, an iterative method of classifying catchment areas by their geographical proximity based upon the application of Multiple Linear Regression Analysis (MLRA). Alternative classification techniques, such as cluster analysis, have also been applied but not on a routine basis. However, hydrological regionalisation can also be regarded as a problem in data mining — a search for useful knowledge and models embedded within large data sets. In particular, Artificial Neural Networks (ANNs) can be applied both to classify catchments according to their geomorphological and climatic characteristics and to relate flow quantiles to those characteristics. This approach has been applied to three data sets from the south-west of England and Wales; to England, Wales and Scotland (EWS); and to the islands of Java and Sumatra in Indonesia. The results demonstrated that hydrologically plausible clusters can be obtained under contrasting conditions of climate. The four classes of catchment found in the EWS data set were found to be compatible with the three classes identified in the earlier study of a smaller data set from south-west England and Wales. Relationships for the parameters of the at-site distribution of annual floods can be developed that are superior to those based upon MLRA in terms of root mean square errors of validation data sets. Indeed, the results from Java and Sumatra demonstrate a clear advantage in reduced root mean square error of the dependent flow variable through recognising the presence of three classes of catchment. Wider evaluation of this methodology is recommended.

Keywords: regionalisation, floods, catchment characteristics, data mining, artificial neural networks