Ecologically relevant streamflow characteristics (SFCs) of ungauged catchments are often estimated from simulated runoff of hydrologic models that were originally calibrated on gauged catchments. However, SFC estimates of the gauged donor catchments and subsequently the ungauged catchments can be substantially uncertain when models are calibrated using traditional approaches based on optimization of statistical performance metrics (e.g., Nash–Sutcliffe model efficiency). An improved calibration strategy for gauged catchments is therefore crucial to help reduce the uncertainties of estimated SFCs for ungauged catchments. The aim of this study was to improve SFC estimates from modeled runoff time series in gauged catchments by explicitly including one or several SFCs in the calibration process. Different types of objective functions were defined consisting of the Nash–Sutcliffe model efficiency, single SFCs, or combinations thereof. We calibrated a bucket-type runoff model (HBV – Hydrologiska Byråns Vattenavdelning – model) for 25 catchments in the Tennessee River basin and evaluated the proposed calibration approach on 13 ecologically relevant SFCs representing major flow regime components and different flow conditions. While the model generally tended to underestimate the tested SFCs related to mean and high-flow conditions, SFCs related to low flow were generally overestimated. The highest estimation accuracies were achieved by a SFC-specific model calibration. Estimates of SFCs not included in the calibration process were of similar quality when comparing a multi-SFC calibration approach to a traditional model efficiency calibration. For practical applications, this implies that SFCs should preferably be estimated from targeted runoff model calibration, and modeled estimates need to be carefully interpreted.
Reliable runoff information is fundamental for many water resources-related tasks such as flood prevention, drought mitigation, management of drinking water supply and hydropower, or river restoration. Runoff modeling is a tool that can be used to create runoff time series when observed time series are not available. Runoff simulations usually focus on either representing the general shape of the hydrograph or on accurately simulating specific streamflow characteristics relevant to a respective application. However, the extraction of streamflow characteristics (SFCs) from a simulated time series may produce poor estimates when these characteristics were not included in model calibration. Ecologically relevant SFCs are properties of the annual streamflow hydrograph defining the structure and functioning of aquatic and riparian biodiversity (Richter et al., 1996; Poff et al., 1997). The accurate prediction of streamflow characteristics is a core determinate to defining how streamflow and aquatic communities relate. A large number of SFCs have been suggested to characterize ecologically relevant aspects of the flow regime (Tharme, 2003) and have become the basis for decision-support systems integrating resource management with ecological response (Cartwright et al., 2017).
Multivariate regression or runoff models are used to estimate SFCs when observed streamflow time series data are not available (Hailegeorgis and Alfredsen, 2016). The estimation of SFCs with linear regression usually relates a single SFC to catchment characteristics such as climate, land cover, and geographic and geologic variables (e.g., Sanborn and Bledsoe, 2006; Carlisle et al., 2010; Knight et al., 2012). This approach is inflexible in a sense that the regression is SFC-specific and does not allow for analysis of potential water-use and land management (Murphy et al., 2013). These disadvantages can be partially overcome by applying runoff models. Simulated streamflow time series from runoff models can be used to calculate any SFC and, by changing model input and parameters, different scenarios such as climate change, groundwater withdrawals, land use, and riverine change can be simulated (Poff et al., 2010; Murphy et al., 2013; Olsen et al., 2013; Shrestha et al., 2014). While statistical models such as multiple linear regressions often provide greater accuracy (Murphy et al., 2013), runoff models provide opportunities for also evaluating climate or land-use change scenarios.
Runoff models are used in both ecohydrology and hydrological modeling as tools to simulate specific aspects of the runoff regime. The terms, SFCs or ecological flow indices, are often used to refer to such specific aspects of the flow regime in ecohydrology studies, whereas the more recently introduced term, hydrological signatures, has been used in hydrological modeling (Jothityangkoon et al., 2001; Wagener et al, 2007). Hydrological signatures can often support a physical interpretation of the way a catchment functions and are seen as valuable metrics especially for modeling ungauged catchments (Jothityangkoon et al., 2001), for selecting appropriate model structures (Euser et al., 2013) or guiding model parameter selection in a meaningful way (Yilmaz et al., 2008), and for classifying catchments (Wagener et al., 2007; Sawicz et al., 2011). Regardless of the terminology and the ultimate goal, the basic goal is the quantification of certain aspects of a streamflow time series. In this paper, we use the term SFC as equivalent to hydrological signature, but generally prefer the term SFC to emphasize their ecological relevance.
Location of the 25 study catchments in the Tennessee River basin (Table 1 in Vis et al., 2015, for more information).
Estimated streamflow characteristics are prone to significant errors when calculated from simulated time series (Murphy et al., 2013; Shrestha et al., 2014; Vis et al., 2015). This is due in part to the objective functions used for evaluating the model error such as the commonly used model efficiency (Nash and Sutcliffe, 1970) or volume error, which do not ensure that a model reproduces particular streamflow characteristics. These objective functions subsequently guide model parameter calibration, which strongly influences the simulated hydrograph (for an overview, see Pfannerstill et al., 2014) in terms of annual, seasonal, and monthly volumes and magnitudes. For example, Vis et al. (2015) compared model simulation from calibrations based on only the model efficiency with calibrations based on the combination of multiple objectives such as model efficiency, model efficiency of log-transformed flow, volume error, and Spearman rank correlation. All these calibration approaches tended to overestimate low flows and underestimate medium and high-flow-related SFCs. Estimation accuracy varied greatly between SFCs, with absolute biases between 3 and 33 %. Large differences in estimation accuracy are also reported by Shrestha et al. (2014) and Ryo et al. (2015). Their multi-objective calibration approach resulted in runoff simulations favoring high flows at the expense of the estimation accuracy of low flows. The large variability in estimated SFC accuracy as well as the bias in the estimates can generally be observed independently of the model used to simulate the runoff time series (Caldwell et al., 2015). A remedy to this large variability and bias is to incorporate SFCs into model calibration schemes. For example, Westerberg et al. (2011) and Pfannerstill et al. (2014) focused on specific evaluation points or segments of the flow-duration curve (FDC) during model calibration. Both studies report better overall performance for the simulated hydrograph with a FDC-based calibration compared to a more traditional calibration approach using, for example, the model efficiency (Nash and Sutcliffe, 1970). However, runoff models calibrated using FDC have to be constrained by additional SFCs if one is interested in the exact timing of events or when snow-related runoff processes are of importance (Westerberg et al., 2011). Yilmaz et al. (2008) combined information on different segments of the FDC with the runoff ratio and the rainfall–runoff lag time to guide model parameter selection in terms of primary catchment functions. These hydrologically meaningful signatures generally improved hydrograph simulation, but their value was limited for the process of vertical redistribution of excess rainfall in the catchment. In a recent study, Kiesel et al. (2017) compared estimates of ecologically relevant SFCs simulated from model calibrations using different objective functions including SFCs and the Kling–Gupta efficiency (Gupta et al., 2009). They found that including all SFCs of interest in the model calibration resulted in better SFC estimates than a calibration using the Kling–Gupta efficiency. Instead of aiming at a well-simulated, general hydrograph, Hingray et al. (2010) and Olsen et al. (2013) focused on certain aspects of the streamflow regime that were considered most important. Their results, which are echoed by Murphy et al. (2013), suggest that the runoff model performs reasonably well for the aspects on which it is calibrated, whereas it only modestly represents other runoff characteristics. Hence, developing an approach to increase the accuracy of estimated SFCs from runoff model time series continues to be an open challenge in hydrological modeling.
This study expands on the study of Vis et al. (2015) where various
combinations of traditionally used objective functions were evaluated with
respect to a suite of ecologically relevant SFCs. Their model calibrations
with the model efficiency (
The following questions are addressed in this paper:
How well is a single SFC simulated when that SFC is used as the model
objective function? (Objective function is the SFC of interest.) How well is a single SFC simulated when the model objective function
contains one or multiple other SFCs? (Objective function can include the SFC
of interest, but generally contains one or multiple other SFCs.) How does the accuracy of estimated SFCs vary between traditional
calibration approaches and those where the SFCs of interest are included?
(Objective functions are different combinations of SFC(s) and the model
efficiency.)
Throughout this study, we refer to traditional and “SFC-based” objective
functions. Traditional objective functions were defined as efficiency
criteria based on statistical performance metrics computed from (transformed)
model residuals (e.g.,
The study catchments are all located in the 106 000
Description of streamflow characteristics used to calibrate the
runoff model (adapted from Knight et al., 2014; U.S. Geological Survey, 2014)
(
Thirteen SFCs assessed in this study were chosen for use in model scenarios based on discernible functional connections with fish community diversity (Knight et al., 2008, 2014). This set of 13 SFCs represents each of the major flow regime components commonly used in ecological studies (e.g., Olden and Poff, 2003; Arthington et al., 2006; Caldwell et al., 2015): magnitude, ratio, frequency, variability, and date (Table 1). For this study the SFCs were additionally grouped according to flow conditions (mean, low, and high flow), because different aspects of the hydrograph have been shown to be sensitive to the objective function used for model calibration (for an overview, see Pfannerstill et al., 2014). The SFCs were calculated using the U.S. Geological Survey (2014) EflowStats R package. Please note that some of the tested SFCs (DH13, ML20, MA26, DH16, and FL2) are defined as scaled with the median, mean, or total runoff. The scaling leads to SFC values that are dependent on flow magnitudes. The magnitude of the simulation error for DH13, ML29, MA26, DH16, and FL2 is therefore dependent on runoff magnitudes, whereas the sign of the simulation error is not affected by the normalization.
The HBV (Hydrologiska Byråns Vattenavdelning) model (Bergström, 1976; Lindström et al., 1997) is a bucket-type hydrologic model for simulating continuous runoff series. Model inputs are daily rainfall and air temperature, as well as daily potential evaporation values. Hydrologic processes are represented by four different routines corresponding to snow, soil water, groundwater, and runoff routing, with a combined total of 16 parameters. In the snow routine, snow accumulation and snowmelt are calculated by a degree-day method. Snowmelt together with rainfall and potential evaporation are input to the soil-water routine, where the actual evaporation and the groundwater recharge are computed based on the soil-moisture storage. The groundwater (or response) routine consists of a connected shallow and deep groundwater reservoir and simulates peak flow, intermediate runoff, and baseflow. These three runoff components are taken together and transformed by a triangular weighting function during the routing process to calculate the runoff at the catchment outlet. Runoff can be modeled in a semi-distributed way by separating a catchment into elevation bands. Thereby, the snow and soil-water routines are calculated for each elevation band, whereas the groundwater storage and the runoff routing routines are treated as a lumped representation of the entire catchment. HBV exists in different versions, whereby the general structure of the model remains the same. The version applied in this study is HBV-light (Seibert and Vis, 2012). Like for all bucket-type models, parameters in the HBV model cannot be determined a priori: they are identified by model calibration instead. More detailed information on the HBV model can be found in Bergström (1976), Lindström et al. (1997), and Seibert and Vis (2012).
For each of the 25 catchments the number of elevation bands was defined by
splitting the catchment into elevation zones of 200
Objective functions used in model calibration. Objective functions
were calculated with observed (obs) and simulated (sim) runoff (
Model simulations were run for two time periods, one lasting from the hydrological years (1 October until 30 September) 1984 to 1996 and the other lasting from 1997 to 2009. The approximately 3 years preceding each simulation period (January 1982 to September 1984 and January 1995 to September 1997, respectively) served to establish state variables of the model. A warm-up period was needed to ensure that the different state variables at the beginning of the simulation period were consistent with the preceding meteorological conditions and parameter values. The two simulation periods were used for model calibration and validation. For calibration, a genetic algorithm (Seibert, 2000) was used and the range of possible parameter values was specified based on previous studies (Lindström et al., 1997; Seibert, 1999; Table 2 in Vis et al., 2015). The 100 independent calibration trials allowed us to account for parameter uncertainty or equifinality (Beven and Freer, 2001) and resulted in a set of 100 calibrated parameter sets for each objective function (Fig. 2).
Flow chart of the modeling approach consisting of calibration,
validation, and evaluation in time period 1 (1984–1996) and time period 2
(1997–2009) and completed for each of the five objective function types
The complete model calibration process was conducted for 25 catchments and
using data from all five different types of objective functions (see Table 2
for the exact equations) that focused on different aspects of the hydrograph.
In the first step, model parameters were constrained by maximizing the model
efficiency (
Next, a new efficiency measure that consisted of one single SFC
(
Performance measures used in model evaluation. Performance measures
were calculated with observed (obs) and simulated (sim) runoff (
Based on the results from the individual SFCs, an objective function
consisting of equally weighted normalized SFCs was defined
(
Model performance in calibration and validation was evaluated by means of
normalized SFC error,
As there are significant differences in the SFC ranges, a normalization was needed that allowed comparison of the different SFCs. Instead of normalizing in terms of relative error, an approach was applied that normalizes the SFC estimation error. The normalization of a SFC was computed as the absolute simulation error divided by the range of possible values for that SFC in the respective catchment (Table 3). To calculate these SFC ranges, 10 000 Monte Carlo simulations were run for each respective catchment using randomly chosen parameter values from the previously identified parameter space (Lindström et al., 1997; Seibert, 1999; Table 2 in Vis et al., 2015). The Monte Carlo simulations represented the potential variation in a certain SFC if no information was available to constrain the runoff model. The range was then calculated as the difference between the 10th and 90th percentiles of the simulated SFC values.
The HBV model was capable of reproducing the observed runoff for the study
catchments reasonably well. Model calibration on
Model calibration results for the 13 SFCs confirmed that HBV-light is capable
of estimating different SFCs with a high level of precision if the respective
SFC was used as an objective function (
Validation results (Fig. 3b) exhibited a similar pattern in model performance
to the calibration results. The median absolute normalized error of the 13
SFCs was relatively low for model runs based on the objective functions
Model performance in
Absolute normalized TA1 error (nSFC) in
Comparison of absolute normalized SFC errors (nSFC) in validation
calculated from model calibrations with the objective functions
Comparison of absolute normalized SFC errors (nSFC) in validation
calculated from model calibrations with the objective functions
Normalized SFC errors (nSFC) in validation depending on the objective function used in calibration. Model performance values correspond to the median of the 25 catchments and are shown for both modeling time periods (period 1, 1984–1996, on the left side and period 2, 1997–2009, on the right side).
The calibrations for all 13 versions of
Figure 6a shows simulation results for the objective function
Median estimates of the 13 SFCs in the calibration period were slightly lower
when the model was calibrated with
Figure 8 provides an overview (median of all 25 catchments) of how well SFCs
were simulated by presenting the results for both modeling time periods and
all five objective function types. Error magnitudes ranged between
The median error (illustrated by stars in Fig. 8) was used for the evaluation of the underestimation or overestimation of SFCs. Among the tested SFCs, an underestimation was observed for all five SFCs representing high-flow conditions as well as for three of four mean-flow-related SFCs. With one exception, low-flow SFCs were overestimated. This overall pattern was less evident when evaluating each objective function and time period separately (Figs. 8 and 9). The SFCs DH16 and MH10 indicate two typically observed deviations in the overall pattern. DH16 is an example of a SFC that could be regarded as being clearly underestimated by the model, because of its negative bias in 9 out of 10 cases (median values in Fig. 9a). However, for objective functions or modeling time periods with a low magnitude in the median bias, the underestimation of the SFC was not statistically significant. Even in the case of a median pointing to statistically significant underestimation, there might be a substantial number of catchments for which DH16 was overestimated. A second commonly observed phenomenon is shown by the SFC MH10 (Fig. 9b). While MH10 had mostly small but statistically significant median errors, there were many catchments with considerably higher errors. Although MH10 was the most extreme example, it illustrates that small median errors do not guarantee good results for all catchments.
The results demonstrated that the objective function used for model
calibration strongly influences the estimation accuracy of SFCs. This finding
confirms the findings of previous studies (e.g., Hingray et al., 2010;
Westerberg et al., 2011; Murphy et al., 2013; Olsen et al., 2013;
Pfannerstill et al., 2014; Shrestha et al., 2014; Caldwell et al., 2015; Vis
et al., 2015) and points out the importance of making a careful choice of the
objective function for model calibration. The benefit of optimizing one
specific SFC lies in the relatively accurate estimation of the respective SFC
compared to a calibration with
A noticeable result from the current study is the distinct difference in
model performance in calibration and validation when using the objective
function
The two least robust SFCs are MH10 and TL1. MH10 simulations with
The runoff model tends to underestimate SFCs related to mean and high-flow conditions, while SFCs representing low-flow conditions are generally overestimated. These results are consistent with those of Olsen et al. (2013), Caldwell et al. (2015), Vis et al. (2015), and Kiesel et al. (2017) and can partly be explained by the model behavior characterized by a less pronounced runoff response to precipitation events but increased groundwater discharge to the stream during drier periods compared to the observed data (Vis et al., 2015). The observations that average flow conditions are better simulated than extremes (Caldwell et al., 2015; Vis et al., 2015) or that high-flow-related SFCs are more accurately estimated than those related to low flow (Shrestha et al., 2014; Ryo et al., 2015) cannot be confirmed with our results. None of these earlier studies explicitly included SFCs in model calibration and the deviating results could be attributed to the differing approaches to defining the objective function(s). This presumption is supported by the previously described differences in results of Vis et al. (2015), although they applied the same runoff model, catchments, and SFCs.
The current study supports the assumption that including SFCs in model
calibration helps to preserve most hydrograph aspects relevant to those SFCs.
Thus, an objective function based on several SFCs is expected to result in a
hydrograph from which a suite of SFCs can be calculated. Not knowing which
SFCs will be relevant for a given study, a guideline as to which SFCs the
model calibration could be based on would be helpful. The first step towards
a guideline consists of selecting SFCs that are potentially valuable for
model calibration. This selection was based on the concept of robustness and
information value of SFCs, which is comparable to the approach used by Euser
et al. (2013), who assessed the realism of model structures. Like Euser et
al. (2013), results from the current study indicated that high robustness was
not necessarily related to high information value, emphasizing the importance
of selecting SFCs by jointly evaluating robustness and information value. The
concept of information value and robustness favors simulations that preserve
important hydrograph characteristics, as can be seen from the slightly
improved median estimation accuracy of SFCs with the objective functions
A model calibrated on certain flow conditions (low, medium, and high flow) is beneficial for SFCs representing these flow conditions (see, e.g., Murphy et al., 2013), so it was hypothesized that the information value of the selected SFCs is highest for SFCs belonging to the same group of flow conditions. The confirmation of this hypothesis would allow us to draw general conclusions about a minimum number of SFCs required for model calibration. Surprisingly the results did not reveal any pattern related to flow conditions and thus no recommendation for the final selection of SFCs can be made. It seems that the selection of SFCs for an informative and robust objective function depends on the type and the combination of SFCs one is interested in. Since this study was based on a limited number of SFCs it could be interesting to test the hypothesis by analyzing a greater number of SFCs. Testing a larger number of SFCs might reveal relations that are difficult to see with a small sample. Furthermore, more knowledge about the effect of single SFCs or the combination of SFCs used as objective functions on runoff simulations could be gained by using synthetic data and a modeling approach where an excellent hydrograph fit is possible (e.g., “HBV-land” in Seibert and Vis, 2012).
The emphasis of SFC-related modeling studies changed from estimating single
SFCs to simulating a suite of SFCs (Olden and Poff, 2003). The modeling
design of this study combined both approaches for the same SFCs and
catchments and thus enabled a direct comparison of the results. Ideally, the
runoff model could be calibrated to simulate a hydrograph for each catchment
from which any SFC can be calculated. Such an approach ensures a relatively
small calibration effort, which is especially valuable if one is interested
in modeling many catchments and/or various scenarios. However, results
indicate that SFCs related to a more generally calibrated model (e.g.,
As with regional statistical approaches, incorporating SFCs into model objective functions implies that a modeler knows which SFCs are relevant and that the model must be recalibrated if one is interested in additional SFCs. The advantage of runoff models over multivariate regressions and observed streamflow series includes their use for climate scenario analysis or for simulating runoff in ungauged catchments, with the latter being one of the ultimate aims in the ELOHA framework (Poff et al., 2010). Modeling SFCs gets even more challenging when moving from a gauged to an ungauged catchment. An appropriate calibration strategy targeted to the main simulation goal is crucial for any subsequent regionalization.
When comparing SFCs estimated from simulations of different runoff models, the question can be raised whether the results depend on the selected model. This question is especially important for resource managers who need to make decisions based on model results from different studies (Caldwell et al., 2015). A comparison of runoff models with different spatial scales that rely on different data inputs was conducted by Caldwell et al. (2015). Their results do not indicate that a certain runoff model is more suited for predicting SFCs than others, but rather that the calibration process probably has as much influence as the model structure. Thus, it can be assumed that the conclusions of this study would be similar if a different calibrated runoff model was applied.
In this study, we evaluated the value of using SFCs for the calibration of a runoff model used to estimate SFCs. The results suggest that the choice of the objective function used for model calibration strongly influences the estimation accuracy of SFCs. While the model was capable of correctly simulating any of the tested SFCs, a good reproduction of a particular SFC was generally achieved when this SFC was included in the objective function. SFC estimates from model simulations with an objective function consisting of a representative selection of SFCs resulted in comparable accuracies to the estimates from model runs based on the commonly used model efficiency when evaluated against SFCs not included in the objective function. Estimates of SFCs that are less dependent on the short-term weather input or SFCs representing average flow conditions were more robust than other SFCs. Since the results imply that one has to consider significant uncertainties when simulated time series are used to derive SFCs that were not included in the calibration, we strongly recommend calibrating the runoff model explicitly for the SFCs of interest.
Data used in this study are available at the U.S. Department of Commerce (2007a, b) and the U.S. Geological Survey (2016a, b).
SP, MV, RK, and JS designed this study based on a previous collaboration; MV performed the runoff simulations; SP analyzed the results that were discussed with all co-authors. Writing of the paper was led by SP with contribution of all co-authors.
The authors declare that they have no conflict of interest.
This paper is a product of discussions and activities that took place at the
U.S. Geological Survey John Wesley Powell Center for Analysis and Synthesis
as part of the workgroup focusing on Water Availability for Ungauged Rivers
(