Towards a more representative parametrisation of hydrologic models via synthesizing the strengths of Particle Swarm Optimisation and Robust Parameter Estimation

The development of methods for estimating the parameters of hydrologic models considering uncertainties has been of high interest in hydrologic research over the last years. In particular methods which understand the estimation of hydrologic model parameters as a geometric search of a set of robust performing parameter vectors by application of the concept of data depth found growing research interest. B́ardossy and Singh (2008) presented a first Robust Parameter Estimation Method (ROPE) and applied it for the calibration of a conceptual rainfall-runoff model with daily time step. The basic idea of this algorithm is to identify a set of model parameter vectors with high model performance called good parameters and subsequently generate a set of parameter vectors with high data depth with respect to the first set. Both steps are repeated iteratively until a stopping criterion is met. The results estimated in this case study show the high potential of the principle of data depth to be used for the estimation of hydrologic model parameters. In this paper we present some further developments that address the most important shortcomings of the original ROPE approach. We developed a stratified depth based sampling approach that improves the sampling from non-elliptic and multi-modal distributions. It provides a higher efficiency for the sampling of deep points in parameter spaces with higher dimensionality. Another modification addresses the problem of a too strong shrinking of the estimated set of robust parameter vectors that might lead to overfitting for model calibration with a small amount of calibration data. This contradicts the principle of robustness. Therefore, we suggest to split the available calibration data into two sets and use one set to control the overfitting. All modifications were implemented into a further developed ROPE approach that is called Advanced Robust Parameter Estimation (AROPE). However, in this approach the estimation of the good parameters is still based on an ineffective Monte Carlo approach. Therefore we developed another approach called ROPE with Particle Swarm Optimisation (ROPE-PSO) that substitutes the Monte Carlo approach with a more effective and efficient approach based on Particle Swarm Optimisation. Two case studies demonstrate the improvements of the developed algorithms when compared with the first ROPE approach and two other classical optimisation approaches calibrating a process oriented hydrologic model with hourly time step. The focus of both case studies is on modelling flood events in a small c tchment characterised by extreme process dynamics. The calibration problem was repeated with higher dimensionality considering the uncertainty in the soil hydraulic parameters and another conceptual parameter of the soil module. We discuss the estimated results and propose further possibilities in order to apply ROPE as a well-founded parameter estimation and uncertainty analysis tool.

Abstract. The development of methods for estimating the parameters of hydrologic models considering uncertainties has been of high interest in hydrologic research over the last years. In particular methods which understand the estimation of hydrologic model parameters as a geometric search of a set of robust performing parameter vectors by application of the concept of data depth found growing research interest. Bárdossy and Singh (2008) presented a first Robust Parameter Estimation Method (ROPE) and applied it for the calibration of a conceptual rainfall-runoff model with daily time step. The basic idea of this algorithm is to identify a set of model parameter vectors with high model performance called good parameters and subsequently generate a set of parameter vectors with high data depth with respect to the first set. Both steps are repeated iteratively until a stopping criterion is met. The results estimated in this case study show the high potential of the principle of data depth to be used for the estimation of hydrologic model parameters. In this paper we present some further developments that address the most important shortcomings of the original ROPE approach. We developed a stratified depth based sampling approach that improves the sampling from non-elliptic and multi-modal distributions. It provides a higher efficiency for the sampling of deep points in parameter spaces with higher dimensionality. Another modification addresses the problem of a too strong shrinking of the estimated set of robust parameter vectors that might lead to overfitting for model calibration with a small amount of calibration data. This contradicts the principle of robustness. Therefore, we suggest to split the available calibration data into two sets and use one set to control the overfitting. All modifications were implemented into a further developed ROPE approach that is called Advanced Robust Parameter Estimation (AROPE). However, in this approach the estimation of the good parameters is still based on an ineffective Monte Carlo approach. Therefore we developed another approach called ROPE with Particle Swarm Optimisation (ROPE-PSO) that substitutes the Monte Carlo approach with a more effective and efficient approach based on Particle Swarm Optimisation. Two case studies demonstrate the improvements of the developed algorithms when compared with the first ROPE approach and two other classical optimisation approaches calibrating a process oriented hydrologic model with hourly time step. The focus of both case studies is on modelling flood events in a small catchment characterised by extreme process dynamics. The calibration problem was repeated with higher dimensionality considering the uncertainty in the soil hydraulic parameters and another conceptual parameter of the soil module. We discuss the estimated results and propose further possibilities in order to apply ROPE as a well-founded parameter estimation and uncertainty analysis tool.

Introduction
Hydrologic models are designed to approximate the general physical mechanism which govern the rainfall-runoff process within a specific catchment. This is why these models have found favour with many hydrologists and engineers in practice and research. Most of the hydrologic models are driven by a vector of model parameters. These parameters are supposed to be estimated in order to approximate the general system behaviour which governs the rainfall-runoff process within a specific catchment. In most cases the model

Published by Copernicus Publications on behalf of the European Geosciences Union.
parameters cannot be related to measurements in a direct way, but are supposed to be estimated through indirect methods such as calibration. In the process of calibration, the modeller adjusts the values of the model parameters such that the model is able to closely match the behaviour of the real system it is intended to represent. Hence the success of a model application is strongly dependent on a good estimation of the model parameters.
In the past models were calibrated by hand. This is very labour-intensive and requires an experienced modeller with profound hydrologic knowledge. Thus recently automatic methods for model calibration have evolved significantly (e.g. Duan et al., 1992;Gupta et al., 1998) and have found a common acceptance and broad use in the hydrologic community (e.g. Hogue et al., 2000;Cullmann, 2006;Kunstmann et al., 2006;Marx, 2007;Grundmann, 2010). The parameter estimation of hydrologic models is affected by numerous uncertainties. Beven and Binley (1992) described the probability to estimate the same model performance for different estimated parameter vectors as the equifinality problem. Recently developed approaches address this problem by estimating the uncertainty of the model parameter vectors considering uncertainties in the observations and the model structure. The uncertainty is often expressed by providing a set of optimal parameter vectors. One well established approach for a parameter estimation including uncertainty are the Markov Chain Monte Carlo (MCMC) methods (e.g. Vrugt et al., 2003bVrugt et al., , 2009aKuczera et al., 2006). These methods require the setup of a complete Bayesian uncertainty framework. One major advantage of such a framework is the possibility to describe all relevant sources of uncertainty in a closed form and consequently the estimation of mathematically well founded results. However, also for those kind of approaches a modeller has to make assumptions of all sources of uncertainty to be considered. Often these assumptions are quite arbitrary because the information for a well founded decision is not available. Subsequently these decisions have a non negligible influence on the results. Thus, the uncertainty estimates might get a rather subjective touch -a fact that contradicts the original intention of the application of a Bayesian framework. Furthermore in many real-world applications modellers call for purpose-specific objectives in calibration, this is difficult to integrate in the likelihood function Bayesian uncertainty framework. For example the formulation and implementation of a likelihood function considering both peak flow difference and the Nash-Suttcliffe efficiency is not straightforward. That is why also alternative approaches, e.g. the previously mentioned generalized likelihood uncertainty estimation (GLUE) or the robust parameter estimation approach (ROPE) (Bárdossy and Singh, 2008) attracted scientific interest in the hydrologic community. The development of these methods was accompanied by a strong debate in the hydrologic community regarding the requirements of an appropriate framework for uncertainty estimation. The results of recently published studies comparing the newly developed MCMC method DREAM with the GLUE framework suggest that "formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest" (e.g. Vrugt et al., 2009b). Hence, the use of both Bayesian and non-Bayesian approaches might be reasonable regarding the requirements of a specific application.
Within this paper we will focus on a further development of the ROPE approach proposed by Bárdossy and Singh (2008). ROPE is a non-Bayesian approach that addresses the parameter and uncertainty estimation problem using the concept of data depth. Data depth is a statistical method used for multivariate data analysis which assigns a numeric value to a point with respect to a set of points based on its centrality. This provides the possibility of a center-outward orderings of points in Euclidean space of any dimension and opens up a new non-parametric multivariate statistical analysis method in which no distributional assumptions are needed. Recent studies of computational geometry and multivariate statistics (e.g. Liu et al., 2006;Bremner et al., 2008) showed that members that are in a geometrical central position with respect to a given point set or distribution, are more robust in order to represent the whole set. These points can be estimated applying the concept of data depth, which has recently attracted a lot of research interest in multivariate statistics and robust modelling (e.g. Cramer, 2003;Liu et al., 2006). This concept was basically adapted by an evolutionary parameter estimation method presented by Bárdossy and Singh (2008). They showed that it can be very useful for the estimation of robust hydrologic model parameters. This result was also found by preliminary studies with WaSiM in the Rietholzbach catchment (Pompe, 2009). In a simplified form the ROPE approach consists of two steps. In a first step a set of model parameters with good model performance is identified. According to Bárdossy and Singh (2008) these parameter vectors are from now on called the good parameter vectors. Thereafter a set of parameter vectors with high data depth with respect to the set of good parameter vectors is generated under the assumption that these parameter vectors are more likely to represent a robust solution than the complete set of good parameter vectors. Thus, they called this approach robust parameter estimation method. Within this scope the concept of robustness is related to the term of transferability. Thus, we call parameter vectors robust which not just lead to good model performance over a selected calibration time period but are transferable: they perform well for other time periods and might also perform well on other catchments. Such parameter vectors are more likely to lead to a hydrologically reasonable representation of the corresponding processes and are less sensitive. This means that small changes of the parameters should not lead to very different results on time periods.
The first results of the method provided by Bárdossy and Singh (2008) are very promising. Therefore we reviewed the presented methods and addressed some shortcomings of the Hydrol. Earth Syst. Sci., 16, 603-629, 2012 www.hydrol-earth-syst-sci.net/16/603/2012/ first approach by the implementation of more efficient and effective methods to sample deep parameter vectors and the introduction of well founded stopping criteria. All these ideas were assembled into a further developed version of the ROPE method, called Advanced Robust Parameter Estimation (A-ROPE). Nonetheless the A-ROPE method is still based upon the Monte Carlo method and consequently suffers from its main disadvantage, i.e. that it is slow and requires a large number of model runs to sample the feasible space with reasonable accuracy. The required number of samples increases exponentially with the number of considered parameters. That is in particular a problem for computationally intensive process-oriented models where the number of samples is strictly limited by the available computational capacity. However, the effectiveness of the depth based sampling of parameter vectors is highly dependent on the quality of the identified set of good parameter vectors. To overcome the shortcomings of the Monte Carlo method for the estimation of the good parameter vectors, we suggest to substitute it by an approved evolutionary search strategy for highdimensional parameter spaces. We are convinced that Particle Swarm Optimisation (PSO) is a suitable candidate for this task. PSO is a search strategy that bases on the concept of swarm intelligence in order to estimate the global optimum for a given single-objective optimisation problem. We modified the search strategy used in the normal PSO algorithm in order to identify a set of good parameter vectors with given tolerance. The modification adapts ideas coming from multiobjective PSO algorithms that also estimate a set of optimal parameters instead of just one global optimum. Afterwards the second step of the ROPE procedure, the depth based parameter sampling can be applied. The new approach merges the strength of PSO and depth based sampling. It is entitled ROPE with Particle Swarm Optimisation (ROPE-PSO). The remainder of this paper is organised as follows: After the introduction, we will introduce the concept of data depth and subsequently the ROPE method provided by Bárdossy and Singh (2008). This is followed by a presentation of the newly developed approaches A-ROPE and ROPE-PSO and the underlying concepts. The presentation of the algorithms is completed by a brief introduction of an approach provided by Grundmann (2010) that allows considering the uncertainty in soil hydraulic parameters for the calibration of hydrologic models. The general idea of this approach is important to understand the following case studies. We studied both algorithms calibrating a process-oriented hydrologic model with a high temporal resolution (hourly instead of daily time-step) in a catchment where the dominant processes have high dynamics. The focus of the model calibration is the modeling of flood events. The estimated results are discussed comparing them to estimates obtained by the first ROPE algorithm presented by Bárdossy and Singh (2008) and other automatic parameter estimation approaches. Krauße and Cullmann: Particle Swarm Optimisation and Robust Parameter Estimation 3 the Monte Carlo method and consequently suffers from its main disadvantage, i.e. that it is slow and requires a large number of model runs to sample the feasible space with reasonable accuracy. The required number of samples increases exponentially with the number of considered parameters. That is in particular a problem for computationally intensive process-oriented models where the number of samples is strictly limited by the available computational capacity. However, the effectiveness of the depth based sampling of parameter vectors is highly dependent on the quality of the identified set of good parameter vectors. To overcome the shortcomings of the Monte Carlo method for the estimation of the good parameter vectors, we suggest to substitute it by an approved evolutionary search strategy for highdimensional parameter spaces. We are convinced that Particle Swarm Optimisation (PSO) is a suitable candidate for this task. PSO is a search strategy that bases on the concept of swarm intelligence in order to estimate the global optimum for a given single-objective optimisation problem. We modified the search strategy used in the normal PSO algorithm in order to identify a set of good parameter vectors with given tolerance. The modification adapts ideas coming from multiobjective PSO algorithms that also estimate a set of optimal parameters instead of just one global optimum. Afterwards the second step of the ROPE procedure, the depth based parameter sampling can be applied. The new approach merges the strength of PSO and depth based sampling. It is entitled ROPE with Particle Swarm Optimisation (ROPE-PSO). The remainder of this paper is organised as follows: After the introduction, we will introduce the concept of data depth and subsequently the ROPE method provided by Bárdossy and Singh (2008). This is followed by a presentation of the newly developed approaches A-ROPE and ROPE-PSO and the underlying concepts. The presentation of the algorithms is completed by a brief introduction of an approach provided by Grundmann (2010) that allows considering the uncertainty in soil hydraulic parameters for the calibration of hydrologic models. The general idea of this approach is important to understand the following case studies. We studied both algorithms calibrating a process-oriented hydrologic model with a high temporal resolution (hourly instead of daily time-step) in a catchment where the dominant processes have high dynamics. The focus of the model calibration is the modeling of flood events. The estimated results are discussed comparing them to estimates obtained by the first ROPE algorithm presented by Bárdossy and Singh (2008) and other automatic parameter estimation approaches.
2 Parameter estimation using data depth metrics

Data depth
The algorithm applies the technique of data depth, a new approach used for multivariate data analysis that provides the Fig. 1: 2-dimensional point set shaded according to assigned depth. A darker point represents higher depth. The lines indicate convex hulls enclosing the 25%, 50%, 75% and 100% deepest points. The used depth function was halfspace depth. possibility to analyse, quantify and visualise data sets. Most proposed metrics used in data depth function are inherently geometric, with a numeric value assigned to each data point that represents its centrality within the given data set. Data depth is a statistical method used for multivariate data analysis which assigns a numeric value to a point with respect to a set of points based on its centrality. Tukey (1975) introduced this concept first in order to estimate the center of a multivariate dataset. A formal definition of an arbitrary depth function D for the d-dimensional space R d is given as follows: The following concepts apply to the data depth methodology and distinguish it from other statistical methods.
-Non-parametric methodology: Scientific measurements can be viewed as sample points drawn from some unknown probability distribution, where the analysis of the measurements involves computation of quantitative characteristics of the probability distribution (estimators), based on the data set. If the underlying distribution is known (for example normal distribution, lognormal distribution, Cauchy, etc.), the characteristics of the data can be computed using methods from classical statistics. However, in most real life experiments the underlying distribution is not known. The concept of data depth requires no assumption about the underlying distribution and data is analysed according to the relative position of the data points.
-Center-outward ordering of points: The data depth concept allows the creation of a multivariate analog to 2 Parameter estimation using data depth metrics

Data depth
The algorithm applies the technique of data depth, a new approach used for multivariate data analysis that provides the possibility to analyse, quantify and visualise data sets. Most proposed metrics used in data depth function are inherently geometric, with a numeric value assigned to each data point that represents its centrality within the given data set. Data depth is a statistical method used for multivariate data analysis which assigns a numeric value to a point with respect to a set of points based on its centrality. Tukey (1975) introduced this concept first in order to estimate the center of a multivariate dataset. The possibilities of the concept of data depth is illustrated in Fig. 1 using a small 2-dimensional synthetical data set. A formal definition of an arbitrary depth function D for the d-dimensional space R d is given as follows: The following concepts apply to the data depth methodology and distinguish it from other statistical methods.
-Non-parametric methodology: scientific measurements can be viewed as sample points drawn from some unknown probability distribution, where the analysis of the measurements involves computation of quantitative characteristics of the probability distribution (estimators), based on the data set. If the underlying distribution is known ( the underlying distribution is not known. The concept of data depth requires no assumption about the underlying distribution and data is analysed according to the relative position of the data points.
-Center-outward ordering of points: the data depth concept allows the creation of a multivariate analog to the univariate statistical analysis tool of rank statistics. Rank statistics is based on the ordering of onedimensional observations, where the order reflects extremeness, contiguity, variability or the effect of external contamination. In higher dimensions the order of multivariate data is not well defined, and several ordering methods were suggested. The data depth concept provides a method of extending order statistics to any dimension by ordering the points according to their depth values.
-Application to multivariate (high-dimensional) data sets: the concept of data depth is defined with respect to points in Euclidean space in any dimension, thus enabling the derivation of multivariate distributional characteristics of a data set.
-Robustness: in the statistical analysis of data sets, observations that deviate from the main part of the data (outliers) can have an undesirable influence on the analysis of the data. Many depth functions are robust against the possibility of several outliers that may occur in the data and yield nevertheless reasonable results. Tukey (1975) introduced this concept first with the definition of the halfspace depth. According to Donoho and Gasko (1992) the halfspace depth of an arbitrary point θ ∈ R d with respect to a d-dimensional data set Z = {z i = (z i1 ,··· ,z id ); i = 1,··· ,n} is defined as the smallest number of data points in any closed halfspace with boundary through θ. This is also called the Tukey or location depth, and it can be written as where u ranges over all vectors in R d with ||u|| = 1. Very often the halfspace depth is normalised by division with the number of points in the set Z: The first publication of Tukey (1975) was then followed by many generalizations and other definitions of this concept, e.g. convex-hull peeling depth, simplicial depth, regression depth and L1 depth. A good overview of a broad range of different definitions of the concept of data depth and its application for multivariate data analysis is given by Hugg et al. (2006) and Liu et al. (2006). A comprehensive study of different data depth measures in robust parameter estimation is provided in Krauße and Cullmann (2011b).

The ROPE algorithm
Algorithm 1 ROPE 1: Select d model parameters, to be considered for calibration and identify prior boundaries [x lb ,x ub ] for all selected parameters 2: n random parameter vectors forming the set X n are generated in the d-dimensional rectangle bounded by the defined boundaries. 3: repeat 4: The hydrologic model is run for each parameter vector in X n and the corresponding model performances are calculated 5: The subset X * n of the best performing parameters is identified. This might be for example the best 10 % of X n 6: m random parameter vectors forming the set Y m are generated, such that ∀θ ∈ Y m : D(θ |X * n ) ≥ L where L ≥ 1 7: The set Y m is relabeled as X n and steps 3-6 are repeated until . 8: until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors 9: return Y m Bárdossy and Singh (2008) applied the principle of data depth in order to generate parameter vectors that are deep with respect to a previously identified set of good parameter vectors. The algorithm is called ROPE. All details are provided in the pseudocode listing in Algorithm 1. Note that the notation was marginally changed from Bárdossy and Singh (2008) in order to have a consistent syntax with other publications in the field of data depth. In principle the general proceeding of this algorithm, can be divided into three important parts. After handling and pre-processing the input, a set of good parameter vectors is identified (line 4 and 5). Afterwards a set of deep parameter vectors (w.r.t. the good ones) is generated (line 6). These two operations are evolutionary repeated and after each iteration a stopping criterion is checked (line 8). The iterated Monte Carlo sampling is used in order to circumvent the shortcoming of a normal Monte Carlo simulation in order to improve the sampling quality with a limited number of samples and simultaneously applying the principle of data depth. The general approach of the presented ROPE algorithm is well-founded. Bárdossy and Singh (2008) showed that ROPE might be very useful for the estimation of robust hydrologic model parameters. Bárdossy and Singh (2008) demonstrated the performance of the ROPE algorithm calibrating the conceptual hydrologic model HBV for a catchment in south-west Germany on a daily time step. The estimated results support the concept of depth based sampling. Parameter vectors with high data depth corresponded to a better transferability to other time periods. We could reproduce these results calibrating the process-oriented model WaSiM on a daily time step However, in further studies focussing on flood events we experienced problems in particular with the application of the latter two parts of the ROPE algorithm, the generation of deep parameter vectors and the exact definition of the stopping criterion. In the following we will give a brief overview of the problems and explain how the new A-ROPE algorithm addresses these shortcomings. One of the major premises of the application of the concept of data depth is the assumption that the set of good parameter vectors is geometrically well-structured. In concrete terms we rely on the assumption that the depth contours will be indicative of the shape of the cloud of good parameter vectors, while generating deep parameters. However, most existing depth measures are unable to cope with non-elliptic, non-convex or multimodal data sets (Hugg et al., 2006). This can affect the robustness of the estimated deep parameter vectors because the parameter space of most hydrologic models is dominated by distinct regions of attraction and non-convex multidimensional ridges (e.g. Duan et al., 1992;Sorooshian et al., 1993;Grundmann, 2010). Another issue is the efficiency of the depth based sampling. A very simple sampling strategy of candidates is a uniform sampling within the bounding box for the considered set of good parameter vectors. This strategy gets ineffective and computationally intensive for higher dimensions. That is due to the fact that the volume ratio of the bounding box to the set of parameter vectors itself decreases with rising dimension. This issue is illustrated by Fig. 2 where the ratio between the volume of the unit sphere and the unit cube is plotted. In addition, the computational complexity of most depth functions increases tremendously for higher dimensions. An approximation of the halfspace depth used in this paper can for instance be computed in polynomial time: O(md 3 + mdn). In this equation d denotes the number of dimensions, n is the number of points in the reference point set and m is the number of iterations that determine the accuracy of the results.

A-ROPE
To overcome these problems we propose to substitute the uniform sampling of deep parameter vectors with an alternative stratified sampling strategy that samples candidate points that are more likely to have a high data depth. In order to do this, a Gaussian mixture model is fit to the underlying set of good parameter vectors. This is done using an expectation maximization (EM) algorithm that assigns posterior probabilities to each component density with respect to each point in the set. The number of components can either be set due to prior knowledge or estimated by the EM algorithm. Afterwards the candidate points can be sampled from a Gaussian mixture model that is an approximation of the set of the reference point set. Consequently good parameter vectors are more likely to be deep with respect to the reference set. In addition this offers advantages to depth-based sampling from non-elliptic or multimodal distributions. Most existing depth measures are unable to cope with non-convex or However, in further studies focussing on flood events we experienced problems in particular with the application of the latter two parts of the ROPE algorithm, the generation of deep parameter vectors and the exact definition of the stopping criterion.
In the following we will give a brief overview of the problems and explain how the new A-ROPE algorithm addresses these shortcomings. One of the major premises of the application of the concept of data depth is the assumption that the set of good parameter vectors is geometrically well-structured. In concrete terms we rely on the assumption that the depth contours will be indicative of the shape of the cloud of good parameter vectors, while generating deep parameters. However, most existing depth measures are unable to cope with non-elliptic, non-convex or multimodal data sets (Hugg et al., 2006). This can affect the robustness of the estimated deep parameter vectors because the parameter space of most hydrologic models is dominated by distinct regions of attraction and non-convex multidimensional ridges (e.g Duan et al., 1992;Sorooshian et al., 1993;Grundmann, 2010). Another issue is the efficiency of the depth based sampling. A very simple sampling strategy of candidates is a uniform sampling within the bounding box for the considered set of good parameter vectors. This strategy gets ineffective and computationally intensive for higher dimensions. That is due to the fact that the volume ratio of the bounding box to the set of parameter vectors itself decreases with rising dimension. multimodal data sets (Hugg et al., 2006). A clustering of such data sets and a subsequent sampling from single clusters provides the possibility to filter points that are enclosed by the convex hull of the data set, but are actually located outside. The new strategy, entitled GenDeep, is provided in pseudocode in Algorithm 2. Further details of this approach and a number of case studies are provided in Krauße and Cullmann (2011b). Just for a small insight we present the result of a case study discussed in that paper where we performed the depth-based sampling with respect to a previously sampled data set following a banana-shaped and multi-modal distribution (cf. Fig. 3).

Algorithm 2 GenDeep
1: Perform a cluster analysis on the set of good parameter vectors X * n , e.g. with the expectation maximization (EM) algorithm according to Dempster et al. (1977), which identifies the most probable number of clusters k in X * n and assigns all members of the set X * n to one (in case of ambiguity also to more than one) of the clusters c i , where i ∈ {1,...,k}.
A third issue of the ROPE algorithm is the loosely defined stopping criterion: "until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors" (Bárdossy andSingh, 2008, p. 1280). The problem is that there are countless possibilities in the prior estimation of the tolerance in the model performance due to uncertainty in the observation data and it  ties in the prior estimation of the tolerance in the model performance due to uncertainty in the observation data and it can hardly be determined exactly. A broad definition of this tolerance can lead to sets with inferior model performance, whereas a tighter tolerance can easily result in overfitting. This is a severe shortcoming because it undermines the actual goals of the algorithm. Overfitting in the context of robust parameter estimation means that the model performance on the calibration data still can be increased by further shrinking the estimated set of the deep model parameter vectors, whereas the model performance on (reasonably similar) control data decreases by further shrinking. Fig. 4 illustrates this fact with the results of the calibration of WaSiM in the Rietholzbach catchment w.r.t. to flood events. The FloodSkill criterion was used as objective and the flood events no.4 and no.14 were used as calibration and control data, respectively. It is evident that the model performance on the control data considerably decreases from iteration 3 whereas the model performance on the calibration data could be increased by further iterations.
To address the problem of overfitting, we implemented two changes to the algorithm. First, we slightly changed the evolutionary shrinking of the generated deep parameter vectors. To avoid the unintended exclusion of possibly robust parameter vectors close to the boundary of the initial set of good model parameters, we suggest merging the set of generated deep parameter vectors and the identified good parameter vectors as initial set for the next iteration, as follows: Second, we suggest the splitting of the data used for model calibration in a calibration and a control set. The calibration set is used for the actual model calibration, whereas the control set is just used to supervise the control process in order to avoid overfitting. In each iteration of the algorithm the model performance is estimated both on the calibration and control set. The moment the performance does not improve anymore for the control set, the algorithm is stopped. This kind of   Bárdossy and Singh (2008); Flood event no. 4 was used for calibration and event no. 14 was used as control set approach is a state of the art method in the supervised training of artificial neural networks in order to avoid overfitting (Tetko et al., 1995). The splitting of the calibration data can be done according to an approach based on self-organising maps (SOM) (May et al., 2010) or any other possible strategy. In the case of limited calibration data as for the calibration of hydrological models for flood forecasting we suggest a subjective balanced split, e.g. such that the both calibration and control set contain small and large flood events and all types of characteristics. All these improvements were integrated into a new approach, we call Advanced Robust Parameter Estimation (A-ROPE). A flowchart of this advanced Monte Carlo-Depth Fig. 3. Two-dimensional data sets (crosses) and corresponding deep points estimated by uniform sampling (blue dots) and stratified sampling with GenDeep (red dots) using the halfspace depth. The datasets were sampled from a non-elliptic banana-shaped distribution (left) and a Gaussian mixture model (right).
can hardly be determined exactly. A broad definition of this tolerance can lead to sets with inferior model performance, whereas a tighter tolerance can easily result in overfitting. This is a severe shortcoming because it undermines the actual goals of the algorithm. Overfitting in the context of robust parameter estimation means that the model performance on the calibration data still can be increased by further shrinking the estimated set of the deep model parameter vectors, whereas the model performance on (reasonably similar) control data decreases by further shrinking. Figure 4 illustrates this fact with the results of the calibration of WaSiM in the Rietholzbach catchment w.r.t. to flood events. The FloodSkill criterion was used as objective and the flood events no. 4 and no. 14 were used as calibration and control data, respectively. It is evident that the model performance on the control data considerably decreases from iteration 3 whereas the model performance on the calibration data could be increased by further iterations.
To address the problem of overfitting, we implemented two changes to the algorithm. First, we slightly changed the evolutionary shrinking of the generated deep parameter vectors. To avoid the unintended exclusion of possibly robust parameter vectors close to the boundary of the initial set of good model parameters, we suggest merging the set of generated deep parameter vectors and the identified good parameter vectors as initial set for the next iteration, as follows: Second, we suggest the splitting of the data used for model calibration in a calibration and a control set. The calibration set is used for the actual model calibration, whereas the control set is just used to supervise the control process in order to avoid overfitting. In each iteration of the algorithm the model performance is estimated both on the calibration and control set. The moment the performance does not improve anymore for the control set, the algorithm is stopped. This kind of approach is a state of the art method in the supervised training of artificial neural networks in order to avoid overfitting 6 Krauße and Cullmann: Particle Swarm Optimisation and Robust Parameter Estimation Fig. 3: Two-dimensional data sets (crosses) and corresponding deep points estimated by uniform sampling (blue dots) and stratified sampling with GenDeep (red dots) using the halfspace depth. The datasets were sampled from a non-elliptic bananashaped distribution (left) and a Gaussian mixture model (right).
ties in the prior estimation of the tolerance in the model performance due to uncertainty in the observation data and it can hardly be determined exactly. A broad definition of this tolerance can lead to sets with inferior model performance, whereas a tighter tolerance can easily result in overfitting. This is a severe shortcoming because it undermines the actual goals of the algorithm. Overfitting in the context of robust parameter estimation means that the model performance on the calibration data still can be increased by further shrinking the estimated set of the deep model parameter vectors, whereas the model performance on (reasonably similar) control data decreases by further shrinking. Fig. 4 illustrates this fact with the results of the calibration of WaSiM in the Rietholzbach catchment w.r.t. to flood events. The FloodSkill criterion was used as objective and the flood events no.4 and no.14 were used as calibration and control data, respectively. It is evident that the model performance on the control data considerably decreases from iteration 3 whereas the model performance on the calibration data could be increased by further iterations.
To address the problem of overfitting, we implemented two changes to the algorithm. First, we slightly changed the evolutionary shrinking of the generated deep parameter vectors. To avoid the unintended exclusion of possibly robust parameter vectors close to the boundary of the initial set of good model parameters, we suggest merging the set of generated deep parameter vectors and the identified good parameter vectors as initial set for the next iteration, as follows: Second, we suggest the splitting of the data used for model calibration in a calibration and a control set. The calibration set is used for the actual model calibration, whereas the control set is just used to supervise the control process in order to avoid overfitting. In each iteration of the algorithm the model performance is estimated both on the calibration and control set. The moment the performance does not improve anymore for the control set, the algorithm is stopped. This kind of   Bárdossy and Singh (2008); Flood event no. 4 was used for calibration and event no. 14 was used as control set approach is a state of the art method in the supervised training of artificial neural networks in order to avoid overfitting (Tetko et al., 1995). The splitting of the calibration data can be done according to an approach based on self-organising maps (SOM) (May et al., 2010) or any other possible strategy. In the case of limited calibration data as for the calibration of hydrological models for flood forecasting we suggest a subjective balanced split, e.g. such that the both calibration and control set contain small and large flood events and all types of characteristics. All these improvements were integrated into a new approach, we call Advanced Robust Parameter Estimation (A-ROPE). A flowchart of this advanced Monte Carlo-Depth  Bárdossy and Singh (2008); flood event no. 4 was used for calibration and event no. 14 was used as control set. (Tetko et al., 1995). The splitting of the calibration data can be done according to an approach based on self-organising maps (SOM) (May et al., 2010) or any other possible strategy. In the case of limited calibration data as for the calibration of hydrological models for flood forecasting we suggest a subjective balanced split, e.g. such that the both calibration and control set contain small and large flood events and all types of characteristics.
All these improvements were integrated into a new approach, we call Advanced Robust Parameter Estimation (A-ROPE). A flowchart of this advanced Monte Carlo-Depth based sampling approach is provided in Fig. 5 Sample n parameter vectors in the feasible space, Θ, according to the given priors and constraints: The model m h is run for the calibration and control set; the corresponding model performances are calculated by a purpose-specific objective function f : A subset X n * ⊂ X n of the good performing parameter vectors in X n is identified, such that X * n comprises all parameter vectors in X n better than a given percentile b. b is dynamically adapted from 0.1 at the beginning up to 0.9:

Identify good
Generate a set of deep parameters Y m w.r.t. X * n GenDeep strategy

Generate robust
Stopping criterion satisfied?
(2) The spread of the model performances gets smaller than tol f ?
(3) Improvement on calibration data gets smaller than a tolerance?
(4) Improvement on control data decreases w.r.t previous iteration?
Output: Y m =set of robust parameter vectors ζ=calibration performance  and velocity v i = 0 5: end for 6: while stop criteria not met do 7: for all particles i do 8: update local best positionx i as best position found so far from the particle with index i 9: update global best positionǧ and corresponding fitness value g as best position found so far from the whole swarm 10: end for 11: for all particles i do 12: update velocity using equation update position using equation end for 15: end while 3 Merging the strengths of swarm intelligence and depth based parameter sampling

Particle Swarm Optimisation
An approved search and optimisation strategy for highdimensional parameter spaces is the Particle Swarm Optimisation which was first presented by Kennedy and Eberhart (1995). It is a population based search algorithm which tries to solve an optimisation problem with arbitrary dimensionality by having a population (swarm) of candidate solutions, here called particles. The performance of each particle is computed and afterwards these particles are moved around in the search-space. The movement is guided by the best found positions of each of the particles and the currently found global best solution. Often those optimisation problems are formulated as the problem of finding the minimum of a function. Algorithm 3 gives a simple version of a PSO algorithm for the minimisation of a function f with upper and lower boundaries x lb and x ub respectively. A good introduction into the ideas of swarm intelligence and further reading is given in Kennedy et al. (2001).

Robust parameter estimation applying Particle Swarm Optimisation: ROPE-PSO
The so far presented depth based parameter estimation algorithms rely on the Monte Carlo method in order to identify a set of good parameter vectors with a given tolerance. Hence the proposed method suffers from the shortcomings of the Monte Carlo method, namely a slow convergence and therefore a large number of samples are needed to estimate a stable solution. This is a major disadvantage for the calibration of computationally intensive process-oriented models. Thus in real-world application the maximum number of model runs has to be limited to a computationally feasible maximum. We try to overcome this problem by substituting the Monte Carlo based estimation of a set of good model parameter vectors with a PSO based approach. Although PSO and other evolutionary optimisation approaches were not designed as uncertainty analysis methods they can be adapted in order to be used for a uncertainty quantification with a given tolerance (Mohamed et al., 2010). One possibility to do this is to integrate a Markov Chain element and use them within the scope of a Bayesian framework. Vrugt et al. spent a lot of effort on the development of effective and efficient algorithms following this approach, e.g. SCEM-UA (Vrugt et al., 2003b) and DREAM (Vrugt et al., 2009a). Another approach is to store all so far found parameter vectors within a given uncertainty limit to an archive and direct the search accordingly. One well-known example is the use of PSO algorithms for the approximation of a Pareto optimal set of parameter vectors as the solution of a multi-objective calibration problem, e.g. as provided by Gill et al. (2006).

Algorithm 4 VPAC operator
1: pick random numbers φ 1 ,φ 2 ∼ U (0,1) 2: update positions using equations We will focus on the latter kind of approach. Thus, we developed an algorithm that thoroughly but economically explores the space and stores all solutions within a given uncertainty tolerance to an archive. The basis of our algorithm is a modified version of the PSO presented by Settles and Soule (2005) which is actually a hybrid between a genetic algorithm (GA) and PSO. Thus, we call this algorithm PSO-GA u . It can be controlled by a parameter ψ that is called breeding ratio. This parameter determines the proportion of the population that is not moved according to PSO but is transformed using the GA. Thus, values for the breeding ratio parameter range from [0 − 1]. Settles and Soule (2005)  for all particles θ i ∈ X n do 9: evaluate model performance f (θ i ) 10: update the local bestθ i as the best position found so far from the particle with index i in case that f (θ i )better than f(ǧ + tol f ) 11: end for 12: update global best positionǧ as best position found so far from the whole swarm 13: add all current positions with a performance better than f (ǧ) + tol f to the archive X * 14: remove all solutions with a performance worse than f (ǧ) + tol f from the archive X * // GA 15: n GA ← ψ · n; discard n GA particles from the population while preserving the 10% best (elitism) 16: init empty genetic offspring X GA ← ∅ 17: for i = 1 to n GA 2 do 18: select a pair {θ 1 ,θ 2 } from the population by tournament selection 19: apply the VPAC operator to generate new offspring: {θ 1 ,θ 2 } ← VPAC({θ 1 ,θ 2 }); consider that this notation includes an update of the velocities and personal best according to the VPAC operator 20: end for // PSO 22: assign to each particle a random memberǧ i ∈ X * as a "personal" global best 23: for all particles θ i ∈ X n do 24: update velocity using equation merge the population with genetic offspring X n ← X n ∪ X GA 28: end while 29: return the set X * as an approximation of the distribution of good parameter vectors within the uncertainty bounds defined by tol f would be with an even mix of both GA and PSO. However, other values for the breeding ratio may provide better results depending on the characteristics of the considered calibration problem. For further details we refer to Settles and Soule (2005) and referred literature. The evolution of the particles by the GA is done using the following approach: from the pool of possible breeding particles candidates are nominated by tournament selection and recombined. In order to do this they introduced the Velocity Propelled Averaged Crossover (VPAC) operator. The goal is to create two child particles whose position is between the parent's position, but accelerated away from the parent's current direction (negative velocity) in order to increase diversity in the popula-tion. Algorithm 4 shows how the new child position vectors and velocities are calculated using VPAC. The child particles retain their parent's velocity vector. The previous best vector is set to the new position vector, restarting the child's memory. Towards the end of a typical PSO run, the population tends to be highly concentrated in a small portion of the search space, effectively reducing the search space. With the addition of the VPAC crossover operator, a portion of the population is always pushed away from the group, increasing the diversity of the population and the effective search space. The general movement of the PSO part is a standard implementation that can be controlled by the following parameters: the particle inertia weight ω determines the velocity of the proper motion of the particles, the cognitive attraction φ 1 that controls the degree of movement towards the local optimum and the social attraction φ 2 doing the same with respect to the global optimum of the swarm. We set the algorithm's parameters according to literature recommendations (cf. Table 1). For further details and studies regarding the setting of the algorithm's parameters refer to Perez and Behdinan (2007), Settles (2005) that also provide references to additional literature and materials. The used stopping criterion is either a fixed number of iteration steps that has to be set according to the given problem or a maximum number of members in the set of good parameter vectors. Another option might be a check that assesses the stability of the estimated set. We suggest to carry out some test runs with different limits and check the stability of the estimated parameter vectors. For further details regarding this issue we refer to Gill et al. (2006) and Cabrera and Coello (2010).
Algorithm 6 ROPE-PSO 1: Execute the PSO based PSO-GA u procedure to estimate a set of good model parameter vectors X * with a model performance within a given tolerance tol f 2: Apply the GenDeep algorithm to sample a set of parameter vectors Y with high data depth w.r.t. X * 3: return Y An important difference is done considering the local and global optimum of the swarm. Contrary to normal PSO algorithms the PSO-GA u algorithm does not just account for one global optimum, but for a set of good parameter vectors. Good parameter vectors are all points evaluated so far that correspond to a model performance better than the global optimum found so far plus a given uncertainty tolerance tol f which must be set with regards to the specific problem. For environmental model calibration tol f should be set according to the accuracy of the used observations and other sources of uncertainty to be considered. To ensure a sufficient but nonetheless economical cover of the feasible space not just towards the global optimum the PSO-GA u algorithm follows an idea that is used in multi-objective PSO algorithms: all so far found good parameter vectors are stored in an archive X * . An important difference between a normal PSO algorithm considers the movement at the end of each iteration. Instead of moving the whole swarm towards the so far found global best position, the algorithm assigns to each particle one random member of the archiveǧ i ∈ X * . We call this position "personal" global best (cf. Algorithm 5, line 14). Another issue affects the update of the local best positionx i . Unlike in a normal PSO, it will only be changed in case that the old local optimum corresponds to a model performance worse than the global optimum plus the uncertainty bound. This prevents a too small shrinkage of the swarm and ensures that the algorithm not just searches into the direction of the so far found global optimum but samples from the whole region within the given tolerance. Consider that this algorithm can also prevent overfitting just by adding another objective function that assesses the performance on a control set. In this case the position of a particle is considered to be better in case that it shows an improvement on both the calibration and the control data. This basic idea can be used to advance this method into a full multi-objective calibration procedure. A pseudocode listing of the complete proposed PSO-GA u algorithm is provided in Algorithm 5.
The set of good parameter vectors estimated by PSO-GA u should not only comprise the global optimum but cover the complete region within the given uncertainty bounds. That is an issue in so far that PSO, i.e. the underlying algorithm of ROPE-PSO, was unlike Monte Carlo not designed to be used as an uncertainty analysis method. Therefore we studied the sets of good parameter vectors estimated by PSO-GA u and checked whether they follow the same distribution as those estimated by iterative Monte Carlo used in the ROPE and A-ROPE algorithm. We demonstrate this issue using the Rosenbrock function as defined in Eq. (5) as an example. It is a smooth single extremum test function that represents the existence of large flat regions on the error surface. This is quite typical for hydrologic models (Duan et al., 1992). We set a (subjective) tolerance value of 0.2 and applied the iterative Monte Carlo method from in both ROPE and A-ROPE, and the PSO-GA u algorithm in order to estimate a set of good parameter vectors within the given tolerance.
The estimated scatter plots are provided in Fig. 6. The dark grey regions represent the true distribution of parameter vectors with a function value less than the given tolerance. It was estimated by a Monte Carlo simulation using Latin hypercube sampling with a number of 1000.000 samples in the target region. Considering the fact that the global optimum of the Rosenbrock function is zero at the position θ opt = {1,1} this region matches exactly the targeted distribution. It is evident that the parameters provided by PSO-GA u cover the complete region of good parameter vectors, suggesting that the modified PSO approach provides a correct estimation of the target distribution and hence does not collapse to a small region comprising the estimated global minimum. The developed PSO-GA u approach can be easily used to substitute the Monte Carlo based approach in a robust parameter estimation algorithm. The new approach called Robust Parameter Estimation using PSO (ROPE-PSO) applies PSO-GA u in order to obtain a set of good parameter vectors X * with a given uncertainty. Afterwards a set of deep parameter vectors with respect to X * is sampled using the previously introduced by the GenDeep sampling strategy. A pseudocode listing of the developed ROPE-PSO approach is given in Algorithm 6. The proposed approach methodology can be extended to be used for an uncertainty analysis Monte Carlo method from in both ROPE and A-ROPE, and the PSO-GA u algorithm in order to estimate a set of good parameter vectors within the given tolerance.
The estimated scatter plots are provided in Fig. 6. The dark grey regions represent the true distribution of parameter vectors with a function value less than the given tolerance. It was estimated by a Monte Carlo simulation using Latin hypercube sampling with a number of 1000.000 samples in the target region. Considering the fact that the global optimum of the Rosenbrock function is zero at the position θ opt = {1,1} this region matches exactly the targeted distribution. It is evident that the parameters provided by PSO-GA u cover the complete region of good parameter vectors, suggesting that the modified PSO approach provides a correct estimation of the target distribution and hence does not collapse to a small region comprising the estimated global minimum.
The developed PSO-GA u approach can be easily used to substitute the Monte Carlo based approach in a robust parameter estimation algorithm. The new approach called Robust Parameter Estimation using PSO (ROPE-PSO) applies PSO-GA u in order to obtain a set of good parameter vectors X * with a given uncertainty. Afterwards a set of deep parameter vectors with respect to X * is sampled using the previously introduced by the GenDeep sampling strategy. A pseudocode listing of the developed ROPE-PSO approach is given in Algorithm 6. The proposed approach methodology can be extended to be used for an uncertainty analysis within a proper statistical context, by relating the likelihood of the parameter vectors to their depth. However further research is required to complete this task (e.g. Bárdossy and Singh, 2008).
The algorithm was implemented in a robust parameter estimation framework which comprises other published ROPE approaches. The implementation was done in the MATLAB programming language. It is open source and available from the author.

Accounting for uncertain soil information on hydrologic parameter estimation
The soil hydraulic parameters determine the water retention and conductivity curves and thus govern the process of water movement in the unsaturated zone. For this reason they also influence the generation of direct runoff and interflow in a hydrologic model. In many studies the soil hydraulic parameters are considered as physically based parameters and are used as fixed values. Often those values are simply estimated by applying a pedotransfer function to physical soil properties, e.g. the distribution of the grain-size fractions, humus content and bulk density. Typically the soil information is given in a classified form which provides a possible range of the physical soil properties referring to the used classification system. This information is often visualised in a soil texture triangle. However, in most cases the pedotransfer function is just applied to the mean value for the considered soil type. The uncertainty due to classified soil information is typically not considered. However, neglecting this uncertainty can influence the accuracy and uncertainty of the estimation of other conceptual model parameters. Grundmann (2010) studied this problem and presented a new method to account for those uncertainties. The proposed approach has a high degree of general applicability because it is independent of the used soil classification system and the used pedotransfer function. Their approach can be summarized by the following algorithmic steps: 1. Identify the lower and upper boundaries of the grainsize fractions for each pre-dominant soil type in the within a proper statistical context, by relating the likelihood of the parameter vectors to their depth. However further research is required to complete this task (e.g. Bárdossy and Singh, 2008). The algorithm was implemented in a robust parameter estimation framework which comprises other published ROPE approaches. The implementation was done in the MATLAB programming language. It is open source and available from the author.

Accounting for uncertain soil information on hydrologic parameter estimation
The soil hydraulic parameters determine the water retention and conductivity curves and thus govern the process of water movement in the unsaturated zone. For this reason they also influence the generation of direct runoff and interflow in a hydrologic model. In many studies the soil hydraulic parameters are considered as physically based parameters and are used as fixed values. Often those values are simply estimated by applying a pedotransfer function to physical soil properties, e.g. the distribution of the grain-size fractions, humus content and bulk density. Typically the soil information is given in a classified form which provides a possible range of the physical soil properties referring to the used classification system. This information is often visualised in a soil texture triangle. However, in most cases the pedotransfer function is just applied to the mean value for the considered soil type. The uncertainty due to classified soil information is typically not considered. However, neglecting this uncertainty can influence the accuracy and uncertainty of the estimation of other conceptual model parameters. Grundmann (2010) studied this problem and presented a new method to account for those uncertainties. The proposed approach has a high degree of general applicability because it is independent of the used soil classification system and the used pedotransfer function. Their approach can be summarized by the following algorithmic steps: 1. Identify the lower and upper boundaries of the grainsize fractions for each pre-dominant soil type in the catchment, according to the given soil information and classification system.
2. For each considered soil type, draw a set of possible samples of the grain-size fractions by uniformed sampling (uniform distribution) over the identified range.
3. Apply a suitable pedotransfer function to each sample in order to estimate a set of soil hydraulic parameters describing their prior distribution.
4. The estimated parameters can be scaled to a scaling parameter β using a similar media concept in order to reduce their dimensionality. A suitable approach is presented in Warrick et al. (1977). The scaling parameter β can now be considered for calibration.
The estimated distribution of the scaling parameters is a wellfounded a priori estimate of the uncertainty in the soil hydraulic parameters and can be used to study the influence of the uncertain soil information on the simulation results of hydrologic models. Furthermore this information can be used as a well-founded prior distribution for a subsequent model calibration considering the soil hydraulic parameters as calibration parameters. This approach contradicts the physically based philosophy of the Richards equation model. On the other hand, this opens the possibility to adapt this model to the processes in the catchment. We are convinced that this is a reasonable approach because the soil hydraulic parameters can hardly be determined with the necessary precision on a catchment scale just be using soil information maps. During calibration the scaling parameter β is adjusted to suitable values. The model however cannot directly be driven with β but with a vector of soil hydraulic parameters. They can be derived from β using a mapping or rescaling procedure as follows. First, the mapping assigns a soil hydraulic parameter vector from the previously generated set to every  ies (Cullmann, 2006;Pompe, 2009;Grundmann, 2010) these three parameters have been proven to be sensitive with respect to modelling flood events. Besides the specified upper and lower boundaries of the model parameters, the additional constraint k i ≥ 1.05 k d was introduced in order to account for the basic consideration that the direct runoff from a cell has a shorter travel time to the catchment outlet than the generated interflow in the unsaturated zone. The reference parameter vector θ wb estimated for water-balance simulations was used to estimate reasonable pre-event model states.
Within our case studies we assume that the influence of observation errors in temperature measurements is negligible for the simulation of flood events whereas the uncertainty of the measured precipitation can be expressed by an ensem- ble. To keep the problem still computationally feasible we do not consider the influence on the estimated parameter sets due to the uncertainties in the observed precipitation and just use the ensemble mean for the model calibration, but just focus on the influence on the observation errors of the measured discharge. Following the assumptions of Bárdossy and Singh (2008) we assume an accuracy of the measured discharge q obs (t) of 5%. Thus, the real but unknown discharge q(t) can be written as: with (t) being a random error. This random error is due to uncertainties of the rating curve, non-uniqueness of the stage discharge relationship, changes of the cross section etc. (Bárdossy and Singh, 2008). As many other authors (e.g. Kuczera et al., 2006;Bárdossy and Singh, 2008) we assume that this error obeys a normal distribution with a standard deviation of the measurement accuracy: N (0,0.05). For each observed discharge time series we used this model and produced an ensemble with 100 members. This ensemble can be used in order to assess the uncertainty due to observation errors. It also opens the possibility to assess the uncertainty of the results of classical optimisation algorithms as follows. For such an algorithm we calibrate the model with respect to every single ensemble member of the set of possible observa- estimated scaling parameter and stores this information into a lookup table. For a new β value that is not stored in the database the rescaling selects the vectors of soil hydraulic parameters from the previously generated database with the closest corresponding scaling value. Grundmann (2010) used the estimated prior distribution and estimated the posteriori distribution of the soil hydraulic parameters in the context of a Bayesian framework. The parameter estimation framework used in this paper is no closed Bayesian uncertainty framework. The priori estimates serve however as a wellfounded starting population of the evolutionary approach estimating the good model parameter vectors. The mapping used in the rescaling ensures that the soil hydraulic parameters cannot leave the variation intervals given by the prior distributions. Consider once again that we want to adapt the soil hydraulic parameters to the catchment characteristics. Therefore we do not use a probability-possibility transformation and propagate the prior uncertainty described by the possibility distribution through the model. For further information on this approach we refer to Grundmann (2010) and Warrick et al. (1977).  ies (Cullmann, 2006;Pompe, 2009;Grundmann, 2010) these three parameters have been proven to be sensitive with respect to modelling flood events. Besides the specified upper and lower boundaries of the model parameters, the additional constraint k i ≥ 1.05 k d was introduced in order to account for the basic consideration that the direct runoff from a cell has a shorter travel time to the catchment outlet than the generated interflow in the unsaturated zone. The reference parameter vector θ wb estimated for water-balance simulations was used to estimate reasonable pre-event model states.
Within our case studies we assume that the influence of observation errors in temperature measurements is negligible for the simulation of flood events whereas the uncertainty of the measured precipitation can be expressed by an ensem- ble. To keep the problem still computationally feasible we do not consider the influence on the estimated parameter sets due to the uncertainties in the observed precipitation and just use the ensemble mean for the model calibration, but just focus on the influence on the observation errors of the measured discharge. Following the assumptions of Bárdossy and Singh (2008) we assume an accuracy of the measured discharge q obs (t) of 5%. Thus, the real but unknown discharge q(t) can be written as: with (t) being a random error. This random error is due to uncertainties of the rating curve, non-uniqueness of the stage discharge relationship, changes of the cross section etc. (Bárdossy and Singh, 2008). As many other authors (e.g. Kuczera et al., 2006;Bárdossy and Singh, 2008) we assume that this error obeys a normal distribution with a standard deviation of the measurement accuracy: N (0,0.05). For each observed discharge time series we used this model and produced an ensemble with 100 members. This ensemble can be used in order to assess the uncertainty due to observation errors. It also opens the possibility to assess the uncertainty of the results of classical optimisation algorithms as follows. For such an algorithm we calibrate the model with respect to every single ensemble member of the set of possible observa- In two real world case studies the developed approach is tested on the calibration of the hydrologic model WaSiM-ETH/6.4 (in the further referred to as WaSiM). WaSiM is a spatial distributed process-oriented rainfall-runoff model and was developed by Schulla (1997) at the ETH Zurich. WaSiM has been used successfully for modeling the rainfall-runoff processes in several studies in catchments located within mid mountain ranges (e.g Grundmann, 2010) and especially also in the pre-alpine Rietholzbach catchment Gurtz et al. (1999Gurtz et al. ( , 2003b; Krauße and Cullmann (2011b). Furthermore WaSiM-ETH has been used for extrapolation of extreme flood events by Cullmann (2006). For this study we used the version with the approach according to Richards for the simulation of the unsaturated zone. An overview of the model structure is given in Fig. 7. For further details of the model, we refer to Schulla and Jasper (2007) and the official website of the model http://www.wasim.ch. The model was calibrated focussing on flood events in the small prealpine Rietholzbach catchment (3.18 km 2 ). As a sub-basin of the Thur catchment it is located in the north-east of Switzerland. A 3-D-view of the catchment area is given in Fig. 8. This basin has been observed as a research catchment by the ETH Zurich since 1975. Continuous hourly measurements have taken place since 1981. For the case studies presented in this paper we used a time series consisting 27 yr of meteorological (temperature, precipitation, global radiation, and wind speed) and discharge measurements. Due to its longterm observation as a research catchment and its limited size, the Rietholzbach catchment has a long record of hourly data sets and the perturbing impact of data heterogeneity is relatively small in this catchment. The data we based our study upon is a time series consisting 27 yr of meteorological and discharge measurements. Out of this time series we selected a set of 24 flood events with a peak flow of at least 1 mm h −1 . All events are in the time from May until October to avoid the problem of modeling snow accumulation Hydrol. Earth Syst. Sci., 16, 603-629, 2012 www.hydrol-earth-syst-sci.net/16/603/2012/                         and melting processes. An overview of all selected 24 flood events with their specific characteristics and pre-conditions is given in Table 2. A significant number of studies have been conducted in this basin. For further information we refer to Gurtz et al. (1999); Zappa (2002) and the website http://www.iac.ethz.ch/research/rietholzbach. Table 3 gives the model parameters considered for calibration. Those are the storage coefficients of direct runoff and interflow, k d and k i , and the drainage density dr which is a scaling parameter of interflow generation. In previous studies (Cullmann, 2006;Pompe, 2009;Grundmann, 2010) these three parameters have been proven to be sensitive with respect to modelling flood events. Besides the specified upper and lower boundaries of the model parameters, the additional constraint k i ≥ 1.05 k d was introduced in order to account for the basic consideration that the direct runoff from a cell has a shorter travel time to the catchment outlet than the generated interflow in the unsaturated zone. The reference parameter vector θ wb estimated for water-balance simulations was used to estimate reasonable pre-event model states.
Within our case studies we assume that the influence of observation errors in temperature measurements is negligible for the simulation of flood events whereas the uncertainty of the measured precipitation can be expressed by an ensemble. To keep the problem still computationally feasible we do not consider the influence on the estimated parameter sets due to the uncertainties in the observed precipitation and just use the ensemble mean for the model calibration, but just focus on the influence on the observation errors of the measured discharge. Following the assumptions of Bárdossy and Singh (2008) we assume an accuracy of the measured discharge q obs (t) of 5%. Thus, the real but unknown discharge q(t) can be written as: with (t) being a random error. This random error is due to uncertainties of the rating curve, non-uniqueness of the stage discharge relationship, changes of the cross section etc. (Bárdossy and Singh, 2008). As many other authors (e.g. Kuczera et al., 2006;Bárdossy and Singh, 2008) we assume that this error obeys a normal distribution with a standard deviation of the measurement accuracy: N (0,0.05). For each observed discharge time series we used this model and produced an ensemble with 100 members. This ensemble can be used in order to assess the uncertainty due to observation errors. It also opens the possibility to assess the uncertainty of the results of classical optimisation algorithms as follows.
For such an algorithm we calibrate the model with respect to every single ensemble member of the set of possible observations. All estimated best parameter vectors are merged into a set that expresses the uncertainty due to the observation errors. For further details refer to Bárdossy and Singh (2008).
To ensure comparability the same should be done for the different ROPE algorithms either. However, in order to reduce the computation time, we previously checked the stability of Table 4. Objective functions used in this study, where x i and y i (θ ) are the observed and simulated discharge (by the parameter vector θ) at time-step i respectively and n is the number of observation points FloodSkill NS − rPD the estimates of the considered ROPE algorithms for a subset of the discharge ensemble members and subsequently just used the ensemble mean for calibration. A successful calibration requires the definition of a performance criterion that quantifies the matching with the intended objective. Focussing on flood events a suitable performance criterion should quantify both the model's ability to provide a good estimate of the peak flow values and to provide a behaviour that is at least in fundamentals similar to the catchment behaviour. A good measure to asses the the first point is the relative peak flow deviation (rPD) which is defined as the absolute value of the relative deviation of the simulated with respect to the observed peak flow value. The general quality of the fit of the model and the catchment behaviour can be assessed by a global performance criterion, e.g. the efficiency criterion according to Nash and Sutcliffe (1970) (NS). It has been widely used to quantify the global performance of hydrologic models. We combined both the Nash efficiency and the relative peak flow deviation into an aggregated performance criterion, we call Flood-Skill. The higher the FloodSkill the better is the model's ability to represent the catchment's behaviour focussing on flood events. A perfect fit corresponds to a FloodSkill of one. A least acceptable model with Nash value of 0.5 and a relative peak flow deviation of 0.5 correspond to a Flood-Skill of zero 1 . A formal definition of the FloodSkill and the previously introduced criteria is given in Table 4.

Case study I: Comparison of ROPE, A-ROPE and ROPE-PSO considering just the conceptual model parameters
In a first case study we studied the developed ROPE approaches on the basis of a first real world case study. The original ROPE approach and the two classical optimisation algorithms, the interior-point method (IPM) according to  ,2,3,4,5,7,8,9,10,12,... 13,15,16,17,19,21,22,23} Waltz et al. (2006) which is a gradient based method and a genetic algorithm (GA) according to Conn et al. (1997) served as additional benchmarks. Nonetheless the focus always remains on A-ROPE and ROPE-PSO. The conceptual model parameters of the model WaSiM were calibrated focussing on flood events. Out of the mentioned set of 24 flood events, three events were used for calibration, three for overfitting control and the remaining events were used for validation (cf. Table 5). Both the calibration and control set contain three events covering the possible range of peak flow values in our database of 24 flood events. The ROPE and A-ROPE were set with a population size of 2500 2 and a maximum of five iterations. For PSO-GA u we set the population size to 50. These settings were set after a small number of test runs with different population sizes. The used objective was the proposed FloodSkill criterion. Before deep parameters were generated we validated all good parameter vectors and studied the relationship between their corresponding data depth and model performance on the validation data. An overview of the results is provided in Fig. 9. Especially for the results estimated by ROPE-PSO the correlation between the calibration objective, i.e. the FloodSkill criterion, and the data depth is clearly positive. The correlation for the results estimated by the Monte Carlo based ROPE algorithms is less intense. This a first hint that the set estimated by ROPE-PSO is a stable solution and the application of depth based sampling makes sense in order to improve the calibration results. The scatterplot given in Fig. 10 where each parameter vector of the estimated set is shaded according to its validation performance shows that the parameter vectors with worse model performance are particularly located at the boundary of the estimated set. However, this conclusion does not hold true for the single performance criteria the FloodSkill consists of 3 . As shown in Fig. 11, the optimal regions for this criteria are even located on opposite sides of the estimated set. This is an indication of a clear tradeoff between the two objective criteria rPD and problems of the model to represent the global system behaviour of the catchment and a good representation of the peak flow values. This may also be due to the relatively coarse time step considering the small catchment size. In this case a multi-objective calibration might be useful, but this is in the scope of future research. In a second step deep parameter vectors were sampled with respect to the estimated set of good parameter vectors. The statistics of the estimated samples in comparison with those estimated by ROPE and A-ROPE are given in Table 6. Although both sets overlap each other, it is evident that the mean value of the storage coefficients k d and k i of the ROPE-PSO estimates are considerably higher than those estimated by A-ROPE. However, the correlation between the two most sensitive model parameters focussing on flood events, k d and dr is approximately the same for both algorithms. The standard deviation for the ROPE-PSO estimates is smaller than the estimates of the Monte Carlo based estimates or all considered model parameters except for the less sensitive parameter k i . We assume that the iterative Monte Carlo based sampling cannot identify the region with the best parameters as exact as the PSO based ROPE-PSO due to its limited sample size (2500 samples per iteration) and the subjective fixed boundary 4 that is used for the determination of the best parameter in the individual iteration steps. Due to a coarse resolution in the first iterations, this might exclude regions at the outer boundary of the good parameters that might contain better parameters. An adaptive selection of the boundary might solve this problem for the Monte Carlo based search. The validation results of the estimated parameter vectors are shown in Table 7. The ROPE-PSO estimates provide a slightly better validation performance than those estimated by A-ROPE. ROPE-PSO also outperforms all other compared algorithms compared in the case study of Krauße and Cullmann (2011a). In comparison with the classic optimisation algorithms IPM and GA the model performances achieved by the depth based estimates in validation are not just better in mean but show a clearly smaller standard deviation. Among themselves the variances of ROPE, A-ROPE and ROPE-PSO are approximately in the same range.
These results indicate that the estimated solution is robust and transferable. The proposed algorithm also converges faster with less parameter vectors to be evaluated. We studied the stability of the solution estimated by ROPE-PSO for a maximum number of model evaluations of 3000, 1000 and 500. According to this test we set the limit to 1000 whereas the Monte Carlo based A-ROPE made full use of the 10 000 parameter evaluations. This advantage in terms of computational efficiency gets even more weight considering that one parameter evaluation takes approximately three minutes on a standard CPU 5 . Table 6. Mean value (Mean), standard deviation (Std), coefficient of variation (CV), minimum, maximum and correlation coefficients between the generated samples for the conceptual model parameters estimated by ROPE (a), A-ROPE (b) and ROPE-PSO (c).   6: Mean value (Mean), standard deviation (Std), coefficient of variation (CV), minimum, maximum and correlation coefficients between the generated samples for the conceptual model parameters estimated by ROPE (a), A-ROPE (b) and ROPE-PSO (c).  the single performance criteria the FloodSkill consists of 3 . As shown in Fig. 11, the optimal regions for this criteria are even located on opposite sides of the estimated set. This is an indication of a clear tradeoff between the two objective criteria rPD and problems of the model to represent the global system behaviour of the catchment and a good representation of the peak flow values. This may also be due to the relatively coarse time step considering the small catchment size. In this case a multi-objective calibration might be useful, but this is in the scope of future research. In a sec-3 These criteria are NS and rPD. ond step deep parameter vectors were sampled with respect to the estimated set of good parameter vectors. The statistics of the estimated samples in comparison with those estimated by ROPE and A-ROPE are given in Table 6. Although both sets overlap each other, it is evident that the mean value of the storage coefficients k d and k i of the ROPE-PSO estimates are considerably higher than those estimated by A-ROPE. However, the correlation between the two most sensitive model parameters focussing on flood events, k d and dr is approximately the same for both algorithms. The standard deviation for the ROPE-PSO estimates is smaller than the estimates

Case study II: Calibrating WaSiM for flood forecasting considering the uncertainty of the soil hydraulic parameters
In a second case study we calibrated WaSiM again, however this time we additionally considered the uncertainty in the soil hydraulic parameters in the model calibration. As already introduced in Sect. 4 the uncertainty due to coarse soil information and the resulting uncertainty in the soil hydraulic parameters can have a tremendous influence on the model uncertainty in the case of flood events. In a preliminary study with WaSiM in the Rietholzbach catchment (e.g. Seifert, 2010) we could prove this conclusion. From the five predominant soil types in the basin (Table 8), we found the soil hydraulic parameters of SL and SiL and the soil parameter k rec to be sensitive referring to the simulated discharge for flood events. k rec defines the gradient of the saturated conductivity k s with increasing soil depth and has a valid range between 0.01 and 1. Its default value estimated for water balance calibration runs was set to 0.1. This value of k rec is used to estimate reasonable pre-event model states. We considered both k rec and the introduced soil hydraulic parameters together with the conceptual model parameters for model calibration.
In a first step prior to the actual model calibration we estimated the prior uncertainty in the soil hydraulic parameters of the soils SL and SiL according to the approach developed in Grundmann (2010). As a results we obtained a set of 10 000 soil hydraulic parameter vectors for each soil. Following the ideas of this approach we mapped the computed set of soil hydraulic parameter vectors to two scaling parameters β LS and β SiL , one for each soil. The scaling was Hydrol. Earth Syst. Sci., 16, 603-629

17
FloodSkill     Table 8. Expectation values of the physical properties of the prevailing soil types in the Rietholzbach catchment, classified according to USDA, and corresponding soil hydraulic parameters; the parameterisation of the soil hydraulic parameters is done for each soil according to the approach provided in Grundmann (2010) by the help of the pedotransfer functions provided in Wösten et al. (1999) and Brakensiek et al. (1984); the expectation values are the mean over 10 000 realisations done using the algorithm provided in Warrick et al. (1977). The prior uncertainty of the soil hydraulic parameters with best fits for Gaussian (N ), logarithmic Gaussian (logN ), Gamma ( ) and bimodal Gaussian (GM) distribution is given in Fig. 12. The fitted distributions are just an additional information in order to show that the estimated prior distributions could also be described by commonly used distribution functions. Consider that the residual water content θ r was constantly 0.01 for both SL and SiL with a deviation of less than 10e − 14 and thus is not shown in the plots. The distribution of the saturated conductivities has the maximum density in the range of the lowest possible values and is characterised by a high skewness. The other soil hydraulic parameters according to the Mualem -Van Genuchten model have distributions which can be approximated by a normal distribution. The distribution of the corresponding scaling parameters is given in Fig. 13. It is evident that the distribution of the scaling parameters is strongly influenced by the distribution of the saturated conductivities. This is due to the relatively high spread of this parameter. The conceptual and the soil parameters form a six dimensional calibration problem with the model parameters {k d ,k i ,dr,β SL ,β SiL ,k rec } to be estimated. Considering that the error surface of WaSiM is very bumpy this is already a challenging calibration problem, especially for Monte Carlo based approach. Again we estimated the parameters of WaSiM with ROPE, A-ROPE and ROPE-PSO. The Monte Carlo based algorithms were limited to a maximum of 10 000 model parameter vector evaluations whereas the ROPE-PSO was tested with both a maximum of 2000, 3000, 5000 and 10 000 model runs. There was not much difference between the estimates of the different runs. This is why we set the maximum to 3000 in order to save computing time. This is another confirmation of the computational efficiency of the ROPE-PSO approach.
The distribution of the parameter vectors estimated by A-ROPE and ROPE-PSO is given in Fig. 14. Furthermore the distribution of the soil hydraulic parameters corresponding to the scaling parameters β SL and β SiL estimated by both algorithms are given in Fig. 15. The soil hydraulic parameters corresponding to the identified distribution of the scaling parameter parameters were generated using the rescaling procedure provided in Sect. 4. For further details we refer to Warrick et al. (1977) and Grundmann (2010  conductivity in the soil model. Preferential flow in macropores can become a dominant process within the Rietholzbach catchment (Germann, 1981). However, the Richards equation implemented in WaSiM just describes the process of matrix flow in the unsaturated zone and is not able to describe the process of preferential flow. With WaSiM it can either be modelled by adjusting the saturated conductivities in the soil module to higher values or by adapting the conceptual model parameters, in particular the parameters k d and k rec , in order to represent this phenomena. From a process-oriented point of view it might be better to "blame" the preferential flow on the conceptual model parameters instead of trying to describe a physically completely different process by a physically based model, i.e. trying to fit the Richards equation to represent preferential flow and matrix flow instead of just matrix flow. In general the distribution of the parameters estimated ROPE-PSO has a much smaller spread than the ones estimated by A-ROPE. This suggests that the Monte Carlo based algorithm cannot identify the region with the best model performance as the PSO based ROPE-PSO. Consequently the region comprising the good parameter vectors, i.e. the best plus a given uncertainty tolerance, gets larger and might be less robust, i.e. less transferable to other flood events, which means the validation set.
The calibration performance of all considered algorithms is given in Fig. 16. The results of PSO-GA u (before deep samples were drawn) are given for an additional comparison. The calibration performance results are better than the results for the calibration in the previous case study just considering the conceptual model parameters. That result is not surprising. The better fit in the calibration might just be due to a larger number of free model parameters and has to be confirmed on the validation data in order to be considered as improvement. A better model fit is always possible with more free model parameters, but just makes sense if the estimated parameters can be transferred on other time periods or events. Referring to the mean estimated FloodSkill the original ROPE with a value of 0.43 is slightly outperformed by the A-ROPE (0.46) and a bit more by the ROPE-PSO algorithm (0.48). The corresponding uncertainty intervals according to the uncertainty in the observations are nearly the same for all algorithms. They correspond to a bandwidth of the FloodSkill of approximately 0.1. Hence the PSO provides advantages in finding the region with the best model performance, however the differences are just marginal. Consider however that the slightly worse model performance of the A-ROPE estimates correspond to a much wider distribution of the model parameters as already discussed based on the results provided in Fig. 14. Furthermore it is evident that the deep parameter vectors estimated by ROPE-PSO in comparison with PSO-GA u do not have a better model performance on the calibration data. The deep parameter vectors just show a small decrease of the standard deviation of the corresponding model performances.
The transferability of the estimated parameters and thus show that the parameter vectors estimated by ROPE-PSO are distributed over a considerably smaller region than those estimated by A-ROPE. The estimated parameter distributions even indicate that the ROPE-PSO estimates form a subregion of the large region described by the parameter vectors estimated by A-ROPE. This suggests that the PSO based search strategy in ROPE-PSO can more precisely identify the region in the parameter space that corresponds to the highest model performance. Consequently the upper boundary of the least model performance defined by the assumed measurement uncertainty gets higher and the region of good model parameter vectors can be more precisely identified. Another issue concerns the distribution of the soil hydraulic parameters only. By means of comparison of observed and sim-   Fig. 13. Prior distribution of the scaling paramters β SL and β SiL ulated discharge, the parameter estimation algorithms try to reject soil hydraulic parameter vectors that are not suitable to represent the catchment's behaviour and identify a distribution with the most suitable model parameters. It is obvious that the spread of the distributions of the soil hydraulic parameters compared to their prior uncertainty gets smaller for both algorithms. Furthermore it stands out that the mean ks of the ROPE-PSO estimates is smaller than the prior expectancy whereas the mean ks of the A-ROPE values is higher than the prior value. In terms of the model that means that ROPE-PSO identifies parameter vectors that try to simulate just the slow matrix flow in the unsaturated zone whereas faster runoff processes in the unsaturated zone, e.g. preferential flow, are approximated by a fit of the conceptual model parameters controlling direct runoff. Probably that is also the reason why the less sensitive conceptual parameter k i can be much better identified than in the previous case study. In contrast A-ROPE identifies parameter sets which try to represent the faster components by a higher saturated conductivity in the soil model. Preferential flow in macropores can become a dominant process within the Rietholzbach catchment (Germann, 1981). However, the Richards equation implemented in WaSiM just describes the process of matrix flow in the unsaturated zone and is not able to describe the process of preferential flow. With WaSiM it can either be modelled by adjusting the saturated conductivities in the soil module to higher values or by adapting the conceptual model parameters, in particular the parameters k d and k rec , in order to represent this phenomena. From a process-oriented point of view it might be better to "blame" the preferential flow on the conceptual model parameters instead of trying to describe a physically completely different process by a physically based model, i.e. trying to fit the Richards equation to represent preferential flow and matrix flow instead of just matrix flow. In general the distribution of the parameters estimated ROPE-PSO has a much smaller spread than the ones estimated by A-ROPE. This suggests that the Monte Carlo based algorithm cannot identify the region with the best model performance as the PSO based ROPE-PSO. Consequently the region comprising the good parameter vectors, i.e. the best plus a given uncertainty tolerance, gets larger and might be less robust, i.e. less transferable to other flood events, which means the validation set. The calibration performance of all considered algorithms is given in Fig. 16. The results of PSO-GA u (before deep samples were drawn) are given for an additional comparison. The calibration performance results are better than the results for the calibration in the previous case study just considering the conceptual model parameters. That result is not surprising. The better fit in the calibration might just be due to a larger number of free model parameters and has to be confirmed on the validation data in order to be considered as improvement. A better model fit is always possible with more free model parameters, but just makes sense if the estimated parameters can be transferred on other time periods or events. Referring to the mean estimated FloodSkill the original ROPE with a value of 0.43 is slightly outperformed by the A-ROPE (0.46) and a bit more by the ROPE-PSO algorithm (0.48). The corresponding uncertainty intervals according to the uncertainty in the observations are nearly the same for all algorithms. They correspond to a bandwidth of the Flood-Skill of approximately 0.1. Hence the PSO provides advantages in finding the region with the best model performance, however the differences are just marginal. Consider however that the slightly worse model performance of the A-ROPE estimates correspond to a much wider distribution of the model parameters as already discussed based on the results provided in Fig. 14. Furthermore it is evident that the deep parameter vectors estimated by ROPE-PSO in comparison with PSO-GA u do not have a better model performance on the calibration data. The deep parameter vectors just show a small decrease of the standard deviation of the corresponding model performances.
The transferability of the estimated parameters and thus the performance of the algorithms they are generated by can just be shown using the validation data. An overview of the estimated model performances on the validation events 6 is given in Fig. 17. Regardless of the used parameter estimation procedure, the achieved model performance on the validation data is better than the calibration with the conceptual model parameters only. This result confirms the results of Grundmann (2010), that a consideration of the uncertainty in the soil hydraulic parameters for flood events can improve the performance of a process-oriented hydrologic model when focus lies on modeling on flood events. Although the A-ROPE provides a slight advantage over the original ROPE approach, both Monte Carlo based algorithms ROPE and A-ROPE are clearly outperformed by the PSO based PSO-GA u and related ROPE-PSO estimates. This suggests that the previously discussed advantages of the PSO approach providing a better global optimum and a consequently smaller region of good parameter vectors correspond to a better robustness. That means that the set of estimated parameters contains less outliers with clearly below-average performance. Consid-6 For details refer to Table 5 ering the limited number of maximum function evaluations used for the ROPE-PSO algorithm these results also confirm the efficiency of the PSO based approach. This is not surprising but should be taken into consideration when choosing between the Monte Carlo based and PSO based ROPE algorithms. Referring to the results of PSO-GA u and ROPE-PSO it is obvious that the parameters with high data depth do not have just a marginal better model performance on the validation data but also much less outliers on the side of the distribution corresponding to a lower model performance. For instance the worst overall FloodSkill for PSO-GA u is 0.18 whereas it is 0.26 after the depth based sampling, i.e. for ROPE-PSO. These results show the advantages of the depth based sampling, namely the possibility to filter out parameter vectors corresponding to a more volatile and consequently lower model performance. However a comparison of the Monte Carlo based algorithms (ROPE and A-ROPE) with the PSO based ROPE-PSO also show the limits of this approach. The performance of the sampled deep parameter vectors requires both an optimal sampling of the set of good parameter vectors and an effective sampling of deep parameters with respect to this set. This result is also supported by a clearly lower standard deviation of the ROPE-PSO results referring to all compared performance criteria. The better accuracy of the ROPE-PSO estimates together with less negative outliers is also reflected by a reduced model uncertainty. That means that not just the parameter uncertainty but also the complete model uncertainty can be tremendously reduced. Figure 18 shows the hydrographs and the corresponding parameter and model uncertainties for both algorithms. The model errors were computed by two normal distribution fitted on the positive and negative discharge errors, transformed with the normal quantile transformation (NQT) (Krzysztofowicz, 1997) according to a method presented by Engeland et al. (2010).

Discussion and conclusions
-This paper presents two new depth based parameter estimation method, that are well suited for the robust calibration of hydrologic models considering uncertainties. The Advanced Robust Parameter Estimation (A-ROPE), is a modified version of the depth based parameter estimation procedure presented by Bárdossy and Singh (2008    the performance of the algorithms they are generated by can just be shown using the validation data. An overview of the estimated model performances on the validation events 6 is given in Fig. 17. Regardless of the used parameter estimation procedure, the achieved model performance on the validation data is better than the calibration with the conceptual model parameters only. This result confirms the results of Grundmann (2010), that a consideration of the uncertainty in the soil hydraulic parameters for flood events can improve the performance of a process-oriented hydrologic model when focus lies on modeling on flood events. Although the A-ROPE provides a slight advantage over the original ROPE approach, both Monte Carlo based algorithms ROPE and A-ROPE are clearly outperformed by the PSO based PSO-GA u and related ROPE-PSO estimates. This suggests that the pre-6 For details refer to Table 5 viously discussed advantages of the PSO approach providing a better global optimum and a consequently smaller region of good parameter vectors correspond to a better robustness. That means that the set of estimated parameters contains less outliers with clearly below-average performance. Considering the limited number of maximum function evaluations used for the ROPE-PSO algorithm these results also confirm the efficiency of the PSO based approach. This is not surprising but should be taken into consideration when choosing between the Monte Carlo based and PSO based ROPE algorithms. Referring to the results of PSO-GA u and ROPE-PSO it is obvious that the parameters with high data depth do not have just a marginal better model performance on the validation data but also much less outliers on the side of the distribution corresponding to a lower model performance. For instance the worst overall FloodSkill for PSO-GA u is 0.18 whereas it is 0.26 after the depth based sampling, i.e. for of good parameter vectors by a newly developed algorithm, entitled PSO-GA u . We study the effectiveness of the newly developed algorithms in two case studies calibrating a process-oriented hydrologic model focussing on flood events. The results are compared with estimates generated by the original ROPE algorithm and two stats-of-the-art optimisation algorithms.
-In a first case study we compared the original ROPE, and the newly developed A-ROPE and ROPE-PSO approaches estimating three conceptual model parameters of the model WaSiM. We study the effect of observation uncertainty and confirm the results of Bárdossy and Singh (2008): the parameter vectors estimated by classic optimisation algorithms can lead to very different results in the validation and are not robust. Parameter vectors with equal model performance on the calibration data can lead to very different results in validation. Considering a set of identified parameters with good model performance on the calibration data, members with shallow data depth near the boundary are sensitive to small changes and have a less probability to perform well on other time periods than solutions with high depth. The depth based parameter estimation approaches can identify a set of parameter vectors that shows a clearly better performance in the validation with tighter variation intervals, i.e. less outliers. In comparison with the original ROPE, the modifications of the Monte Carlo based A-ROPE provide slight advantages in terms of the validation performance. These results are even outperformed by the substitution of the Monte Carlo based sampling by ROPE-PSO approach where the good parameters are determined using an adapted PSO strategy.
-In a further case study we increased the number of considered model parameters. The additional parameters allow to account for the uncertainty in the soil hydraulic  The performance of the sampled deep parameter vectors requires both an optimal sampling of the set of good parameter vectors and an effective sampling of deep parameters with respect to this set. This result is also supported by a clearly lower standard deviation of the ROPE-PSO results referring to all compared performance criteria. The better accuracy of the ROPE-PSO estimates together with less negative outliers is also reflected by a reduced model uncertainty. That means that not just the parameter uncertainty but also the complete model uncertainty can be tremendously reduced. Fig. 18 shows the hydrographs and the corresponding parameter and model uncertainties for both algorithms. The model errors were computed by two normal distribution fitted on the positive and negative discharge errors, transformed with the normal quantile transformation (NQT) (Krzysztofowicz, 1997) according to a method presented by Engeland et al. (2010).
6 Discussion and conclusions -This paper presents two new depth based parameter estimation method, that are well suited for the robust calibration of hydrologic models considering uncertainties. The Advanced Robust Parameter Estimation (A-ROPE), is a modified version of the depth based parameter estimation procedure presented by Bárdossy and Singh (2008). There are two differences between the A-ROPE algorithm and the original ROPE algorithm. The further development enables us sampling from different non-convex regions of attraction and at the same time preventing the algorithm from overfitting. The second algorithm is a PSO based Robust Parameter Estimation algorithm algorithm, entitled Robust Parameter Estimation with Particle Swarm Optimisation (ROPE-PSO). The major difference between ROPE-PSO and the previously presented ROPE algorithm is a substitution of the Monte Carlo based approach for the identification parameters and an additional conceptual parameter of the soil module in WaSiM. The results of this case study show the limits of the Monte Carlo based ROPE and A-ROPE approaches for problems with a nonsmooth parameter surfaces with large flat areas in higher dimensions. Nonetheless the modifications in A-ROPE help to achieve an improved set of robust solutions. In this case study the effective and efficient PSO based search strategy in ROPE-PSO can show its full potential estimating an optimal approximation of the set of good parameter vectors. This is an important pre-requisite for the effectivity of the depth based sampling. The PSO based strategy can identify a much more concentrated set of good parameter vectors with the same tolerance interval on the calibration data. As a consequence the final deep solutions estimated by ROPE-PSO outperform the ROPE and A-ROPE estimates by lengths.
-The case studies in this paper revealed that the used hydrologic model WaSiM is not able to represent the correct peak flow values and the global catchment behaviour in terms of the streamflow at the catchment outlet with the same parameter vectors equally well. The tradeoff in these two objectives that are important for the modelling of flood events suggests the application of a multi-objective calibration strategy. Consider that the presented algorithm can be easily altered to a general multi-objective parameter estimation procedure.
-The application of data depth metrics can help to identify sets of robust parameter vectors. In general parameters with low data depth are near the boundary of a set of good model performance in the calibration are sensitive to small changes and do transfer to other time periods less well as high depth ones. However the model performance of the sampled deep parameters is also dependent to the quality of the estimated good parameter vectors.  We study the effectiveness of the newly developed algorithms in two case studies calibrating a process-oriented hydrologic model focussing on flood events. The results are compared with estimates generated by the original ROPE algorithm and two stats-of-the-art optimisation algorithms.
-In a first case study we compared the original ROPE, and the newly developed A-ROPE and ROPE-PSO approaches estimating three conceptual model parameters of the model WaSiM. We study the effect of observation uncertainty and confirm the results of Bárdossy and Singh (2008): the parameter vectors estimated by classic optimisation algorithms can lead to very different results in the validation and are not robust. Parameter vectors with equal model performance on the calibration data can lead to very different results in validation. Considering a set of identified parameters with good model performance on the calibration data, members with shallow data depth near the boundary are sensitive to small changes and have a less probability to perform well on other time periods than solutions with high depth. The depth based parameter estimation approaches can identify a set of parameter vectors that shows a clearly better performance in the validation with tighter variation intervals, i.e. less outliers. In comparison with the original ROPE, the modifications of the Monte Carlo based A-ROPE provide slight advantages in terms of the validation performance. These results are even outperformed by the substitution of the Monte Carlo based sampling by ROPE-PSO approach where the good parameters are determined using an adapted PSO strategy.
-In a further case study we increased the number of considered model parameters. The additional parameters allow to account for the uncertainty in the soil hydraulic parameters and an additional conceptual parameter of the soil module in WaSiM. The results of this case study show the limits of the Monte Carlo based ROPE and A-ROPE approaches for problems with a nonsmooth parameter surfaces with large flat areas in higher dimensions. Nonetheless the modifications in A-ROPE help to -The case studies in this paper just consider a limited number of calibration parameters parameters. This is sufficient for the given model setup considering the small amount of observation data for flood events. The suggested method might perform even better for calibration tasks with a higher amount of useful calibration data or lower process dynamics where the set of good parameter vectors is much more clearly defined, e.g. water balance simulations. Bárdossy and Singh (2008) applied the original ROPE method for the estimation of nine parameters in the conceptual model HBV in a much larger catchment on daily basis. According to these results and our experience with other small test problems, we strongly believe that the developed technique might provide good results for higher dimensions as well. Consider however that the application of complex data depth functions, e.g. halfspace depth is limited on lower dimensions due to the computational complexity and the required number of solutions. For a limited number of points and a high dimensionality, all points are in the convex hull and have low depth.
The robust parameter estimation approach is a relatively new method which was applied to a limited number of case studies. We strongly propose a comparison with established uncertainty estimation methods, e.g. MCMC, GLUE or multi-objective calibration, in further research. Furthermore we suggest the further development of the ROPE method to a well-founded calibration tool considering uncertainties, e.g. assigning a likelihood based on the depth of the estimated parameters instead of their model performance as it is done in classical approaches. Due to the probably high tradeoff between the model's ability to represent both the peak flow values and the global system behaviour equally well, we propose the development and application of a multiobjective version of the presented approach.
Hydrol. Earth Syst. Sci., 16, 603-629 Fig. 18: Hydrograph prediction uncertainty associated with the uncertainty in the model (lighter shading) and parameter estimates (darker shading) for the flood events 4 (a), 12 (b) and 19 (c), estimated by A-ROPE (left column) and ROPE-PSO (right column). The dots correspond to the observed streamflow data. The shaded areas of uncertainty correspond to the 95% confidence intervals.  12 (b) and 19 (c), estimated by A-ROPE (left column) and ROPE-PSO (right column). The dots correspond to the observed streamflow data. The shaded areas of uncertainty correspond to the 95% confidence intervals.