The influence of conceptual model structure on model performance : a comparative study for 237 French catchments

Models with a fixed structure are widely used in hydrological studies and operational applications. For various reasons, these models do not always perform well. As an alternative, flexible modelling approaches allow the identification and refinement of the model structure as part of the modelling process. In this study, twelve different conceptual model structures from the SUPERFLEX framework are compared with the fixed model structure GR4H, using a large set of 237 French catchments and discharge-based performance metrics. The results show that, in general, the flexible approach performs better than the fixed approach. However, the flexible approach has a higher chance of inconsistent results when calibrated on two different periods. When analysing the subset of 116 catchments where the two approaches produce consistent performance over multiple time periods, their average performance relative to each other is almost equivalent. From the point of view of developing a well-performing fixed model structure, the findings favour models with parallel reservoirs and a power function to describe the reservoir outflow. In general, conceptual hydrological models perform better on larger and/or wetter catchments than on smaller and/or drier catchments. The model structures performed poorly when there were large climatic differences between the calibration and validation periods, in catchments with flashy flows, and in catchments with unexplained variations in low flow measurements.


Introduction
Building efficient hydrological models remains a challenging issue, despite the huge efforts by the community to develop and improve models since the pioneering work of NAM (DHI, 2008) or GR4J  to name a few. Although these models have been continuously improved and adapted over the years, the core of their structure remained more or less similar and it was assumed to be general enough to be applicable in a variety of basins. For example, Bergström (1995) provides a detailed review of the applications of the HBV model on catchments over the five continents. 10 Hence, an end-user would take the model as it is proposed by the developers and apply it on his case study. This approach has several advantages. It saves time to the end-user who can take the model from the shelf, without bothering about the tedious process of developing a model. This is particularly useful when one wishes to apply the model on many catchments, as it may be the case in operational conditions. 15 This is also interesting on catchments where data may be too limited to develop a full model from scratch. Despite its attractiveness, the fixed modelling approach is based on the assumption that a given model structure can be applied in many different catchments. However, this may not be the case. Processes included in the model may not correspond to the dominant ones occurring in the studied catchment. This may cause Introduction of this approach is to give room to reduce structural uncertainty when applying the model, which should result in more robust model applications. Models such as RRMT (Wagener et al., 2001), MMS (Leavesley et al., 2002), FLEX (Fenicia et al., 2008), FUSE (Clark et al., 2008) and SUPERFLEX (Fenicia et al., 2011;Kavetski and Fenicia, 2011) are aimed at increasing model flexibility in the model construction phase of a modelling study. However this approach may be more time consuming for the enduser. It may also end up with several model structures performing equivalently. Several researchers have already compared the performance of different model structures. The results of these studies have provided different insights. For instance, Chiew et al. (1993) indicated that relatively simple model structures can be used for 10 larger time scales (months, years), where Refsgaard and Knudsen (1996) concluded that models of different complexity performed equally well. Perrin et al. (2001) showed that complex models outperform simple ones in calibration but not in validation. Reed et al. (2004) found that a lumped model, used as benchmark, generally showed equivalent or better overall performance than distributed models. More recent studies of 15 Breuer et al. (2009) andSeiller et al. (2012) showed slight differences between models of differing complexity. In most of these intercomparisons and in hydrological modelling studies in general, there is a common aim to find an appropriate model for a particular purpose or condition, considering hydrological signature, catchment type and spatial and temporal scale (Rogers, 1978;Moussa and Bocquillon, 1996;Booij, 2003). 20 The objective of this study is to evaluate the influence of model structure on model performance for different catchment properties. It extends the work of Fenicia et al. (2011), Kavetski and Fenicia (2011) and Fenicia et al. (2013 carried out on only a few catchments. Based on a large set of 237 French catchments, this study tries to find more general conclusions on connections between model structure, catchment 25 characteristics and model performance and ultimately help reducing errors or subjectivity in model conceptualization.
The questions we wish to answer are: -What is the influence of different catchment characteristics on the relationship between model structure and performance?
-What are the differences in performance between a fixed and flexible model struc-5 ture?
To address these questions, we will use SUPERFLEX as a modelling framework to construct and compare different model structures. SUPERFLEX is an example of a flexible modelling approach and is based on 12 individual model structures. The GR4H model is tested as an example of a fixed modelling approach. This hourly model 10 and its daily version, GR4J, are a widely applied model that has shown good average performance on many catchments in different parts of the world (Le Moine et al., 2007;Coron et al., 2012;Perrin et al., 2003;Valéry et al., 2010), but for which structural inadequacy or the lack of flexibility in model structure is suggested as one of the main reasons for its failures (Kavetski and Fenicia, 2011;Perrin et al., 2003;Le Moine, 2008;15 Wagener, 2003;Andréassian et al., 2010).
In Sect. 2, we will present the data and models used, as well as the evaluation methodology. Section 3 discusses the general results, followed by a discussion and conclusions of this work in Sects. 4 and 5 respectively.

Data
This study is based on a large data set of 237 catchments spread throughout France (Fig. 1). They represent a large variety of conditions in terms of physical characteristics (size, geology, etc.) and climate (Table 1) all catchments. Precipitation originates from the reanalysis produced by Metéo-France (Tabary et al., 2012) and discharge data were taken from the French data base Banque Hydro (MEDD, 2007). Physical characteristics of the 237 catchments were also available (Bourgin et al., 2011;European Environment Agency, 2006).

5
To investigate the performance of the model structures on different types of catchments, the 237 catchments were classified based on four catchment characteristics: area, Wetness Index, permeability and the ratio of runoff coefficients in summer and winter (RC S/W ). The latter is considered a more integral property of climate and catchment characteristics and distinguishes groundwater dominated catchments from catchments 10 where runoff occurs more directly. RC S/W was calculated using: where P S and Q S (resp. P W and Q W ) are the mean precipitation and discharge during three summer (resp. winter) months (July-September, resp. January-March) over tenyear time series (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006). This ratio is best explained by two extreme cases. (1) High RC S/W : when runoff compared to rainfall is high in summer, it is likely that the river is fed by additional groundwater that was stored in winter. This means that some rainfall in winter did not go to runoff and thus lowering the runoff coefficient in winter. This case is classified as "groundwater dominated runoff".
(2) Low RC S/W : when runoff compared to rainfall in summer is low, water might be lost to potential evapotranspiration and 20 little water from storage will flow in the river. Rainfall in winter is then likely to flow in the river more directly as there is little storage. This case is classified as "direct runoff".The Wetness Index (also referred to as Aridity Index, Middleton and Thomas, 1992) was calculated as the ratio between mean precipitation and PE over the ten-year time series. Introduction For catchment area, Wetness Index and RC S/W , the catchments were divided into three classes with approximately equal number of catchments. Table 2 shows the ranges of each class and the number of catchments per class. The qualitative scales (e.g. small, medium and large for catchment area) chosen here may not correspond to the scales often found in the literature for these four characteristics, but were chosen 5 here to distinguish between catchments with below-median, median or above-median characteristics. Permeability was determined following the information provided by the European Environment Agency (2006) based on the type of bedrock underlying the catchments.

10
The SUPERFLEX modelling approach that includes 12 different model structures, and the GR4H model were used to investigate the influence of model structure on model performance. All thirteen structures are lumped and use the same rainfall and PE over the whole catchment as inputs and generate discharge as output.

15
Twelve structures (SF01-SF12) as proposed by Fenicia et al. (2013) are used in the SUPERFLEX framework. They cover a broad range of model complexities (Fig. 2). Starting from a very simple structure (SF01), the complexity is gradually increased by adding reservoirs, lag-functions and junction elements. In this way, the influence of individual components can be assessed. 20 In the SUPERFLEX structures, rainfall (Pt) and potential evaporation are used as inputs. Potential evapotranspiration is systematically corrected with a calibrated ratio Ce to fulfil the water balance. Actual evapotranspiration (noted Ei, Eu or Ef for the interception, unsaturated zone and fast reservoirs, respectively) empties one or two reservoirs in each structure. Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | SF01 consists of a single fast reservoir (FR) with a residence time Kf and a power function denoted by α. SF02 consists of a single reservoir as well, but it is an unsaturated zone reservoir (UR) that uses both a linear and a threshold type power function (β) to describe the outflow from the reservoir.
In structures SF03 to SF05, an unsaturated zone reservoir (UR) is connected in series to the fast reservoir. These three structures vary in the functions by which the flow to and from the reservoirs is described and the number of calibrated parameters. SF05 uses power functions to describe flow from both reservoirs, denoted by α and β, and a lag-function that distributes flow over multiple time steps. Structures SF06, SF07, SF09, SF10 and SF11 all use three reservoirs, but of different types and with 10 the introduction of more complex connections and functions. SF07 is the first structure with a riparian zone reservoir (RR) connected in parallel allowing for more independent flow along two flow paths. SF08 is a simple structure with only two parallel reservoirs: a slow reservoir (SF) is introduced with a residence time Ks independent to that of the fast reservoir. From this structure onward, complexity is again increased up to the 15 introduction of an interception reservoir (IR) with a threshold type function in structure SF12. Parameters M and D are ratios that divide water between different reservoirs. Table 3 shows an overview of the model structures and the type of functions used. A more detailed description of these structures can be found in Fenicia et al. (2013).

20
The GR4H model (Le Moine, 2008) is used as an example of a fixed model structure. In GR4H (see diagram in the bottom right corner of Fig. 2), rainfall and PE are subtracted to find the net precipitation P n or net evaporation E n . P n is partitioned between storage into a soil moisture accounting reservoir S(P s ), and effective rainfall (P t = P n − P s ). This reservoir is depleted by a percolation function Perc that is added to effective rainfall.

25
Effective rainfall is then routed to the outlet via a two-branch routing module. The first branch (10 % of effective rainfall) is routed by a single unit hydrograph. The other 90 % are routed by a unit hydrograph and a non-linear reservoir named routing store R. A water exchange function F is applied to the two flow components, to simulate import or export of groundwater with the underlying aquifer or neighbouring catchments.

Evaluation procedure
All model structures were calibrated and validated using the split sample test (Klemes, 5 1986), in which ten years of the available data were split into two independent subperiods (1997-2001 and 2002-2006). Calibration was performed on each sub-period followed by validation on the other sub-period. To reduce initialisation problems, an initial value was given to the level in the reservoirs (as a fraction of reservoir capacity) and three years of data (1994-1996 and 1999-2001 resp.) preceding the test subperiods, were used for model warm-up. Note that the data used for warm-up prior to 1997 originate from the SAFRAN reanalysis (Vidal et al., 2010). Parameter optimization was done according to the methods described in Kavetski and Fenicia (2011) using the Bayesian Total Error Analysis (BATEA) framework (Kavetski et al., 2006;Kavetski and Evin, 2011). Here, the objective function is 15 a weighted least square (WLS) scheme. It accounts for the knowledge that prediction errors are commonly larger for high flows and intends to give more balance between high and low flows. A quasi-Newton optimization was applied using twenty different initial values across the parameter space. This local optimization method was shown to be effective and efficient using a limited number of multistarts (i.e. twenty) on a smoothed 20 parameter space (Kavetski and Kuczera, 2007).
Model performance was evaluated based on the validation results using the four criteria described in the next section. The thirteen model structures were compared using the average score of the evaluation criteria for all catchments. For the comparison of the flexible approach with the fixed structure of GR4H, consistency rules were applied HESSD 10,2013 The influence of conceptual model structure on model performance

Evaluation criteria
The four evaluation criteria (CR1 to CR4, by Eqs. (2)-(5)) focus on different aspects of model performance (high flow, low flow, volume error and variability of predictions): where Q obs and Q sim represent the observed and simulated discharge respectively, at time step i , N is the number of time steps, the overbar represents an average over the 10 selected period, ε is a small constant (one-hundredth of mean flow, see Pushpalatha et al., 2012) and σ is the standard deviation over the selected period. CR1 * is the well-known Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970) which is most sensitive to peaks in discharge . CR2 * is the Nash-Sutcliffe efficiency based on the inversed discharge emphasizing low flow errors (Pushpalatha 15 et al., 2012). CR3 * is based on the Relative Volume error and thus emphasizes any error in the water balance between observed and simulated discharge . CR1 * to CR3 * have values between 1 (perfect fit) and −∞ and are transformed HESSD 10,2013 The influence of conceptual model structure on model performance to a value between 1 and −1 to avoid the influence of very low negative values on the calculation of mean performance (Mathevet et al., 2006;Pushpalatha et al., 2012). The fourth criterion (CR4) is the ratio between standard deviations of observed and simulated discharges with a maximum value of 1 meaning the simulated discharge was able to reproduce the variability in the observed discharge and a minimum value of −1 5 when the difference in standard deviation becomes very large (Gupta et al., 2009). Since all the selected criteria are non-dimensional, the average of CR1-CR4 over the validation periods was used as the measure of overall performance to compare model structures.

10
The stability of the model between the two calibration periods for a given catchment was evaluated by checking for parameter and structural consistency. Here, a model structure was considered parametrically inconsistent when at least one of the parameters departed by more than 50 % of the average between the two periods. In that case, this structure was left out of the final comparison.

15
In the case of SUPERFLEX, structural inconsistency was considered to occur when the best model structure identified on the two calibration periods was not the same. Therefore, only if a SUPERFLEX structure scores within the best 10 % in both calibration periods (of one catchment), it is considered consistent and eligible for comparison. The best SUPERFLEX structure for a catchment is the simplest consistent structure 20 within 10 % of the highest scoring structure (see "Rank" in Table 3). When two equivalently performing structures have the same number of parameters, then the one that has the lower number of functions is preferred. Here complexity is measured by the number of parameters (see Table 3).
Thanks to the double model calibration and validation procedure and the consistency rules, models that give efficient and consistent results on a catchment could be identified. Despite being somewhat arbitrary, the consistency rules have made the model evaluation more severe.  Figure 3 shows the distribution of performance for all model structures on the 237 catchments for the two validation sub-periods. The figure shows that the seven best performing model structures (GR4H, SF04-SF07, SF11 and SF12) show very similar 5 average performance despite their structural differences. Six of the model structures (SF01, SF02, SF03, SF08, SF09 and SF10) perform poorly compared to the best ones and do not seem to be good candidate structures for a fixed-structure approach. By comparing the performance and complexity, one can see that structures with an unsaturated zone reservoir or a power function perform on average considerably better than 10 structures without these components.

Average performance of the thirteen individual structures
The calibrated values of the power function parameter α (describing the outflow of the fast reservoir FR) show a wide range for the different catchments and a high correlation between the two periods (not shown here; see Van Esse, 2012, for more details). This indicates that this is an effective parameter to calibrate. The results are similar 15 for power function parameter β (describing the outflow of the unsaturated zone reservoir UR). However, for structures with both power functions, the values of power β are clustered around 1. This indicates that adding a second power function to the structure is far less effective than the first one and that the power function at the outflow of a reservoir at the end of a structure (like FR) is more effective. 20 Comparing several pairs of structures showed that the lag-function and interception reservoir (IR) do not increase performance for most catchments. For the riparian zone reservoirs (RR), the analysis shows larger differences: for some catchments this reservoir does increase performance while for others, the effects are negative. These results show that, at least within the SUPERFLEX framework, the lag-function and the inter-25 ception and riparian zone reservoirs do not increase model performance on average, which questions their usefulness. 10,2013 The influence of conceptual model structure on model performance The average performance and performance range of the GR4H model are close to those of models SF04-SF07. The fixed power functions describing reservoir outflow in GR4H are expected to be important components, just as the power functions in the SUPERFLEX structures. The more complex models SF11 and SF12 are more robust by limiting the number of strong model failures, which is very valuable. However, on average, they are not able to outperform the simpler models (including GR4H).

Performance across catchment classes
To investigate the effect of catchment characteristics on model performance, average performance is analysed for four catchment characteristics: catchment area, Wetness Index, permeability and the ratio of runoff coefficients in summer and winter. Figure 4a shows that model performance on large catchments is generally better than that on small catchments. Apparently, it is easier to model the rainfall-runoff relationship in catchments where hydrological processes are mixed and have a smoother response. This corroborates previous findings by Merz et al. (2009) on a large set of Austrian catchments.
15 Figure 4b shows that model performance on wet catchments is generally better than that on dry catchments. The difference between dry catchments and the catchments that are classified as wet or moist is large for all structures. This is in agreement with literature showing that dry catchments are generally more difficult to model due to the higher non-linearity in processes (e.g. Atkinson et al., 2002). Only SF11 and SF12 20 show little difference in performance over the three classes (however their performance is still worse for dry catchments). The parallel fast and slow reservoirs plus the unsaturated zone reservoir with a power function seem to be better able to simulate both wet and dry catchments. Figure 4c shows that model structures with two reservoirs in series (SF03-SF07) 25 perform better on impermeable catchments than on semi-permeable or permeable catchments. Adding a parallel slow reservoir in structures SF08 to SF12 decreases performance on impermeable catchments. The performance is again somewhat better 5469 Introduction for SF11 and SF12, as these structures likely benefit from a power function that was introduced in these structures. GR4H shows the same order in efficiencies over the permeability classes as the structures with reservoirs in series (SF04-SF06). The GR4H structure does include a parallel flow path, but there are some distinct differences: one path is without a reservoir and it is not possible to calibrate them independently since 5 the division over the flow paths is fixed and both paths use routing unit hydrographs based on the same parameter x 4 . The RC S/W classification in Fig. 4d separates the structures in an interesting way. Structures SF04 to SF07 perform much worse on the groundwater driven catchments than on the other catchments, while SF09 and SF10 simulate these catchments best of 10 the three classes. This reversed order of performance can be explained by the parallel slow reservoir component in SF09 and SF10. This component allows independent fast and slow flow, hence enabling a slow groundwater component while maintaining the ability to produce high flow in case of a storm event. The parallel riparian zone reservoir in SF07 does not have this effect because it is not connected to the unsaturated zone 15 reservoir and only a maximum of 20 % of rainfall is routed through this reservoir.
The difference in performance between structures with reservoirs in series or parallel was already established by e.g. Jakeman and Hornberger (1993), and more recently by Kavetski andFenicia (2011) andFenicia et al. (2013). Jakeman and Hornberger found that the most commonly identified configuration for a rainfall-runoff model is two stor-20 ages in parallel. This difference is clearly observed when comparing the performance per catchment of SF05 and SF11 using the RC S/W classification. Figure 5 shows the average performance of SF05 against that of SF11, two identical models apart from the use of a parallel slow reservoir in SF11. The figure confirms that the more complex SF11 does not perform better overall than SF05 (mean performance of 0.57 and 0.56, 25 respectively), but that it significantly improves the performance for most groundwater dominated catchments. This shows the value of the flexible distribution of flow (ratio D) and the independent residence times (Kf and Ks) of the parallel reservoirs for these type of catchments. 10,2013 The influence of conceptual model structure on model performance  Figure 6 shows the performance distribution of the flexible and fixed approaches with and without accounting for the cases where inconsistent results between calibration periods occurred. Without the consistency rules, the flexible approach performs better on the catchment set, which is a clear advantage of the flexible approach. However, 5 when the inconsistent catchments are removed from the set, both approaches perform very similar. The flexible approach is inconsistent more often (90 times) than the fixed approach (67), possibly caused by the multiple model structures in the flexible approach resulting in structural inconsistency besides parametric inconsistency (in both approaches). Note that the comparison of performance when accounting for consistency rules is made on two different sub-sets, hence results are not completely comparable. However, the mean results on the overlapping (116 catchments) are very similar, for GR4H 0.62 and SUPERFLEX 0.61. It must be noted that the choice for the 50 % threshold for parameter inconsistency and the 10 % range for structural inconsistency considerably influence these results: 15 e.g. a 5 % structural inconsistency range would render 134 catchments with inconsistent results for the flexible approach while a 25 % range results in only 26 inconsistent catchments. Also, many of the SUPERFLEX structures are more prone to parameter inconsistency as they have more parameters.

Comparison of fixed and flexible modelling approaches
To find out more about which of the flexible model structures perform well, we look at 20 the number of times each structure is selected as best. Table 4 shows that SF04, SF09 and SF11 are selected most often. These structures are successful because of their use of only the most effective components. Very simple structures and structures with only small differences from these best three are selected much less often. This shows a disadvantage of hypothesising a model structure a priori: many structures give sim-25 ilar results which make choosing the best one difficult. Combining all 13 structures, GR4H is selected the largest number of times (138) showing the good generality of the model. For 63 other catchments, however, one of the twelve SUPERFLEX structures HESSD 10,2013 The influence of conceptual model structure on model performance  Perrin et al. (2001), who showed that choosing a specific structure for a given catchment significantly improved the perfor-10 mance compared to using a fixed structure. Second, this research has shown some of the limitations of using the SUPERFLEX approach on a large scale: large similarity in performance of different models makes selection of the best model difficult and thus, reduces the amount of knowledge on connections between model structure and hydrological processes that can be gained. This becomes even more difficult given 15 the large differences in results between the two calibration periods and the demand that the best model structure should work well for that specific catchment under all possible conditions (see also best-compromise model, Seiller et al., 2012). Furthermore, the SUPERFLEX approach poses a practical limitation in that modellers have to deal with a different model for every catchment, and thus becoming experienced with 20 the approach takes more time. The general advantages and disadvantages of model structures and components and their relevance for different catchment types can only be studied based on a large set of catchments (Andréassian et al., 2006). For the calibration, a weighted least square (WSL) scheme was used putting more emphasis on low flow than the usual least square approach. However, in spite of this 25 improved objective function, all model structures scored poorly on the low flow criterion (CR2 * ). As noted by Thyer et al. (2009), the WSL is expected to only be a small HESSD 10,2013 The influence of conceptual model structure on model performance   10,2013 The influence of conceptual model structure on model performance

Conclusions
This study found that relatively simple model structures with some key components can lead to a good simulation of the discharge from a catchment, which corroborates previous findings (Jakeman and Hornberger, 1993;Perrin et al., 2001). The analysis of thirteen individual lumped conceptual model structures on a large catchment set with 5 different characteristics showed that: increasing model complexity does not always lead to higher performance however, complex structures perform poorly for fewer catchments; conceptual hydrological models generally perform better on large or wet catchments than on small or dry catchments;

10
the use of a power function to describe reservoir outflow significantly increases mean model performance compared to models that use linear functions; independently calibrated parallel reservoirs increase model performance in permeable, groundwater dominated catchments; and a lag-function between reservoirs or an interception reservoir is unlikely to lead to 15 a significant increase in model performance.
On the whole catchment set, the flexible modelling approach provides better average results than the fixed modelling approach. Generally, selecting the best model structure for each catchment gives the best results. However, the results of the two approaches are equivalent when applying consistency rules on parameters and structures for two 20 calibration periods. The risk of equifinality between structures and differences in model performance between periods mean that care should be taken when selecting the best model structure and shows a disadvantage of hypothesizing a model structure a priori. Therefore, appropriate model structure evaluation schemes should be designed to enhance the robustness of the selected model structures and avoid unexpected model 25 failures (Andréassian et al., 2009(Andréassian et al., , 2010 10,2013 The influence of conceptual model structure on model performance   Table 3. Main characteristics of the twelve SUPERFLEX structures and GR4H where N res and N θ are the number of reservoirs and calibrated parameters respectively. "Rank" denotes the complexity rank of each structure based on the number of parameters, and number and type of function(s), where 1 is the most simple and 13 is de most complex.  10,2013 The influence of conceptual model structure on model performance