Journal topic
Hydrol. Earth Syst. Sci., 22, 5021–5039, 2018
https://doi.org/10.5194/hess-22-5021-2018
Hydrol. Earth Syst. Sci., 22, 5021–5039, 2018
https://doi.org/10.5194/hess-22-5021-2018

Research article 28 Sep 2018

Research article | 28 Sep 2018

# Parameter uncertainty analysis for an operational hydrological model using residual-based and limits of acceptability approaches

Parameter uncertainty analysis for an operational hydrological model using residual-based and limits of acceptability approaches
Aynom T. Teweldebrhan1, John F. Burkhart1,2, and Thomas V. Schuler1 Aynom T. Teweldebrhan et al.
• 1Department of Geosciences, University of Oslo, Oslo, Norway
• 2Statkraft, Oslo, Norway

Correspondence: Aynom T. Teweldebrhan (aynomtt@geo.uio.no)

Abstract

Parameter uncertainty estimation is one of the major challenges in hydrological modeling. Here we present parameter uncertainty analysis of a recently released distributed conceptual hydrological model applied in the Nea catchment, Norway. Two variants of the generalized likelihood uncertainty estimation (GLUE) methodologies, one based on the residuals and the other on the limits of acceptability, were employed. Streamflow and remote sensing snow cover data were used in conditioning model parameters and in model validation. When using the GLUE limit of acceptability (GLUE LOA) approach, a streamflow observation error of 25 % was assumed. Neither the original limits nor relaxing the limits up to a physically meaningful value yielded a behavioral model capable of predicting streamflow within the limits in 100 % of the observations. As an alternative to relaxing the limits, the requirement for the percentage of model predictions falling within the original limits was relaxed. An empirical approach was introduced to define the degree of relaxation. The result shows that snow- and water-balance-related parameters induce relatively higher streamflow uncertainty than catchment response parameters. Comparable results were obtained from behavioral models selected using the two GLUE methodologies.

1 Introduction

Hydrological models have numerous applications of central importance to society including for planning, design, and management of environmental and water resources. The operation of hydropower systems is mainly constrained by the availability of water resources. Hydrological models play an important role in forecasting the local inflows to the system on scales ranging from hours to years. With due recognition of the need for accurate prediction of streamflow and snow storage, Statkraft (2018) has recently released a new modeling framework mainly tailored for an operational purpose. In this study, one of the conceptual models of this framework was subjected to uncertainty analysis. Conceptual hydrological models typically have one or more calibration parameters and commonly require some form of inverse modeling to estimate model parameters from observations (Crawford and Linsley, 1966). During calibration, equifinality arises when different parameter sets give equally good results in terms of predefined efficiency criteria (Beven, 1993; Savenije, 2001; Wagener et al., 2003). The generalized likelihood uncertainty estimation (GLUE) methodology (Beven and Binley, 1992) is an extension of the generalized sensitivity analysis concept of Hornberger and Spear (1981), and it accepts equifinality as a working paradigm for parameter calibration of hydrological models (Choi and Beven, 2007). It is based on the concept that all models of hydrological systems are highly simplified representations of reality (e.g., Reichert and Omlin, 1997), and hence it is expected to have several different model structures and parameter sets that describe the system in an adequate way (Blazkova and Beven, 2002). When dealing with nonlinear systems, the classic hydrological approach of using a single set of model parameters may lead to large predictive biases (e.g., Mantovan and Todini, 2006).

Hydrological modeling is affected by four main sources of uncertainty related to input data, validation data, model structure, and model parameters (e.g., Renard et al., 2010). Input data uncertainties may arise from measurement limitations and scaling issues, for example, due to forcing data downscaling. Errors of the rating curve affect streamflow estimates and thereby lead to validation data uncertainty. Structural uncertainty may result from the underlying assumptions and simplifications in the model formulation as well as from application of the model to conditions inconsistent with the model structure (Tripp and Niemann, 2008). Parametric uncertainty reflects the inability to specify exact values of model parameters (Renard et al., 2010) and it may stem from errors in input data and observations used for model conditioning as well as be due to epistemic errors in model structure. An increased awareness of these modeling uncertainties and the need for quality control of such models requires the integration of uncertainty analysis into the modeling process from the very beginning (Beven, 1989; Saltelli et al., 2006; Refsgaard et al., 2007).

Uncertainty analysis techniques can be classified as frequentist or Bayesian approaches, probabilistic or non-probabilistic approaches (e.g., Montanari et al., 2009), or as formal or informal approaches (e.g., Vrugt et al., 2009). Among the most widely used techniques in hydrological modeling are the formal Bayesian and the GLUE methods (Jin et al., 2010). The formal Bayesian approach makes strong assumptions about the statistics of observed data; with the likelihood function defined based on assumptions about the nature of the residuals (Schoups and Vrugt, 2010). However, the choice of an adequate likelihood function has been the subject of considerable debate. According to Beven and Smith (2015), a formal probabilistic likelihood function will have limited value since non-stationary epistemic uncertainties cannot be adequately represented by a statistical model. In GLUE, the likelihood measure is associated with a parameter set and should ideally reflect all the different sources of uncertainty (Beven and Smith, 2015). The original GLUE methodology has been the subject of debate for using a subjectively set threshold of behavioral models (e.g., Mantovan and Todini, 2006; Stedinger et al., 2008; Clark et al., 2011; Nearing et al., 2016). This problem is common to most residual-based model selection methods (Schaefli, 2016). The extended concept of behavioral models in the GLUE limits of acceptability approach (GLUE LOA) (Beven, 2006) attempts to overcome this drawback through use of error bounds of the observational dataset.

The GLUE LOA methodology involves specifying limits around some observational data within which model predictions are required to lie and thereby considered acceptable for the intended model application. The acceptability limits are set prior to running a model and, among other considerations, they are expected to take into account incommensurability and uncertainty in both the input and evaluation data (Beven, 2009). However, identification of models that reproduce the observed system behavior within the limits of measurement error is not easy due to time-varying errors in the input data and model structure (e.g., Beven, 2016). This difficulty is even more pronounced when input and other sources of errors are not explicitly accounted for in defining the LOA.

Good quality time series data and associated uncertainties are not always readily available. For example, in regulated catchments the inflow hydrograph is often estimated from changes in storage volume and outflows using the water balance equation. Thus, as in the case of our study catchment, no stage–discharge relationship exists for estimating the streamflow uncertainty using the usual practice, i.e., by fitting different rating curves. In such instances the alternative is to assume an observation error proportional to the observational data. However, the identification of behavioral models without due consideration to such less precise observation error estimates may lead to the rejection of a useful model (i.e., making a type II error). Some of the measures taken to minimize the risk of making a type II error when identifying behavioral models using the GLUE LOA include extending the limits (e.g., Blazkova and Beven, 2009; Liu et al., 2009) and using different model realizations for different periods of a hydrological year (e.g., Choi and Beven, 2007). In this study, instead of relaxing the limits, the percentage of observations where model predictions are required to fall within the acceptability limits was relaxed.

The GLUE methodology has been widely used in various disciplines (Beven, 2009; Efstratiadis and Koutsoyiannis, 2010) primarily due to its conceptual simplicity and ease of implementation. Further, its suitability for parallel implementation on distributed computer systems as well as its general strategy in dealing with equifinality in model calibration make it an appealing framework (Blasone et al., 2008; Shen et al., 2012; Mirzaei et al., 2015).

In this study model parameters were constrained using streamflow and the MODIS snow cover product (Hall et al., 2006). Multi-criteria model conditioning helps to reduce prediction uncertainty through improved parameter identification (e.g., Efstratiadis and Koutsoyiannis, 2010; Finger et al., 2015), and GLUE provides a flexible approach for using multi-criteria methods through different ways of combining measures. Besides streamflow, one of the observations commonly used in multi-criteria conditioning of rainfall-runoff models in snow-dominated catchments is snow data. Remote sensing snow cover data have been used in several hydrological modeling studies for deriving and updating a snow depletion curve (SDC) (e.g., Lee et al., 2005; Kolberg and Gottschalk, 2006; Bavera et al., 2012), as well as in multi-criteria-based model calibration and simulated snow cover validation (e.g., Udnaes et al., 2007; Parajka and Bloschl, 2008; Berezowski et al., 2015). However, studies involving combined uncertainty of streamflow and snow cover predictions using the GLUE methodology are still missing in the literature.

The main objective of this study is to assess parameter uncertainty for a recently developed distributed conceptual hydrological model using the GLUE methodology with due consideration to the model's main application as an operational hydrological model. The second objective is to investigate the potential value of snow cover data as additional observation in conditioning model parameters in the study area. The third objective is to assess the possibility of using a time-relaxed GLUE LOA approach for constraining model parameters. In doing so, we employ a novel empirical approach for implicitly accounting for the effects of input and observational data errors by relaxing the percentage of time steps in which predictions of model realizations fall within the limits.

This paper is organized as follows. First the (i) hydrological model and (ii) the study site and relevant data used in this study are briefly described in Sect. 2.1 and 2.2. The procedures followed to set up the uncertainty analyses are then outlined in Sect. 2.3. In Sect. 3, the results from parameter uncertainty as well as the uncertainty of streamflow and snow cover predictions using the residual-based GLUE approach are presented. The results from the relaxed GLUE LOA are also presented in this section. Finally, in Sects. 4 and 5, the analysis results and their implication on the hydrologic model, the data and the methodologies followed are discussed and conclusions are drawn.

2 Methods and materials

## 2.1 The hydrological model

The Statkraft Hydrological Forecasting Toolbox, Shyft (https://github.com/statkraft/shyft, last access: 1 March 2018), is an open-source distributed hydrological modeling framework developed by Statkraft (Burkhart et al., 2016). The modeling framework has three main models (method stacks) and, in this study, the PT_GS_K model was used for uncertainty analysis. PT_GS_K is a conceptual model with several adjustable parameters depending on the climatic and physiographic characteristics of the study area where the model is applied. This model requires temperature, precipitation, radiation, relative humidity, and wind speed as forcing data. PT_GS_K uses the Priestley–Taylor (PT) method (Priestley and Taylor, 1972) for estimating potential evaporation; a quasi-physical-based method for snowmelt, sub-grid snow distribution and mass balance calculations (GS method); and a simple storage–discharge function (Lambert, 1972; Kirchner, 2009) for catchment response calculation (K). Overall, these three methods constitute the PT_GS_K model in Shyft. The framework establishes a sequence of spatially distributed cells of arbitrary size and shape. As such it can provide lumped (single cell) or discretized (spatially distributed) calculations, as in this study. The model was applied to each of the grid cells and for each time step.

Within the GS method, precipitation falling in each grid cell is classified as solid or liquid precipitation depending on a threshold temperature (tx) and on the local temperature values. The snowmelt energy is the sum effect of different energy sources in the system such as shortwave and long-wave radiation as well as the turbulent sensible and latent energy fluxes. Among other factors, the energy contribution from shortwave radiation depends on snow albedo. For a given time step (t), the snow albedo of each grid cell depends on the minimum (αmin) and maximum (αmax) albedo values as well as on air temperature (Ta) (Eq. 1). In this method the decay rates of albedo due to snow ageing as a function of temperature, i.e., the fast (fast ADR, αfdr) and slow (slow ADR, αsdr) albedo decay rates corresponding to temperature conditions above and below 0 C, respectively, are parameterized. Turbulent heat contribution is the sum of latent and sensible heat. Wind turbulence is linearly related to wind speed using a wind constant and wind scale from the intercept and slope of the linear function, respectively (Hegdahl et al., 2016).

$\begin{array}{}\text{(1)}& {\mathit{\alpha }}_{t}=\phantom{\rule{0.125em}{0ex}}\left\{\begin{array}{l}{\mathit{\alpha }}_{\mathrm{min}}+\left({\mathit{\alpha }}_{t-\mathrm{1}}-{\mathit{\alpha }}_{\mathrm{min}}\right)\cdot \left(\frac{\mathrm{1}}{{\mathrm{2}}^{\mathrm{1}/{\mathit{\alpha }}_{\mathrm{fdr}}}}\right)\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{T}_{\mathrm{a}}>\mathrm{0}\phantom{\rule{0.125em}{0ex}}{}^{\circ }\mathrm{C}\\ {\mathit{\alpha }}_{t-\mathrm{1}}+\left({\mathit{\alpha }}_{\mathrm{max}}-{\mathit{\alpha }}_{\mathrm{min}}\right)\cdot \left(\frac{\mathrm{1}}{\mathrm{2}\phantom{\rule{0.125em}{0ex}}\left({\mathit{\alpha }}_{\mathrm{sdr}}\right)}\right)\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}{T}_{\mathrm{a}}\le \phantom{\rule{0.125em}{0ex}}\mathrm{0}\phantom{\rule{0.125em}{0ex}}{}^{\circ }\mathrm{C}\end{array}\right\\end{array}$

The sub-grid snow distribution is described by a three-parameter gamma probability distribution snow depletion curve (Liston, 1999; Kolberg and Gottschalk, 2006). The traditional gamma distribution is parameterized with two values, i.e., the average amount of snow at the onset of the melt season m (mm) and the shape value (k), based on the assumption that the ground is completely snow covered before the onset of melt. Since this assumption may not hold true for a number of grid cells especially in alpine areas, a third parameter representing the bare ground fraction at the onset of the snowmelt season has been introduced (Kolberg and Gottschalk, 2006). The two-parameter gamma distribution (Eq. 2) is thus applied only to the remaining portion of a grid cell to estimate the fraction of the initially snow-covered area where snow has disappeared (y). The initial bare ground fraction parameter is constant for all years. At each time step, the state parameters such as snow water equivalent (SWE) and snow cover area (SCA) are updated using the SDC function. In the GS method, the shape value is a direct transformation of the sub-grid snow coefficient of variation (CVs).

$\begin{array}{}\text{(2)}& y{}^{\prime }=\underset{\mathrm{0}}{\overset{\mathit{\lambda }\left(t\right)}{\int }}f\left(x;k,\mathit{\theta }\right)\mathrm{d}x=\mathit{\gamma }\left(k,\phantom{\rule{0.125em}{0ex}}\frac{\mathit{\lambda }}{\mathit{\theta }}\right),\end{array}$

where f denotes the gamma probability density function and γ is the incomplete gamma function. x and λ(t), respectively, refer to point snow storage and the accumulated melt depth (mm) at time t since the onset of the melt season. θ represents the scale parameter with m=kθ and $k={\mathrm{CV}}_{\mathrm{s}}^{-\mathrm{2}}$.

Table 1Range of model parameters used for the PT_GS_K model stack uncertainty analysis.

The catchment response function (CRF) is based on the storage–discharge relationship concept described in Kirchner (2009) and represents the sensitivity of discharge to changes in storage (Eq. 3). This method is based on the idea that catchment sensitivity to changes in storage, i.e., g(Q), can be estimated from the time series of discharge alone through fitting empirical functions to the data such as the quadratic equation. Since discharge is generally nonlinear and typically varies by many orders of magnitude, the recommended approach is to use log-transformed discharge values in order to avoid the risk of numerical instability. In this method, the three parameters of the catchment response function, i.e., c1, c2, and c3, are parameterized.

$\begin{array}{}\text{(3)}& \frac{\mathrm{d}\left(\mathrm{ln}\left(Q\right)\right)}{\mathrm{d}t}=\phantom{\rule{0.125em}{0ex}}g\left(Q\right)\left(\frac{P-E}{Q}-\mathrm{1}\right),\end{array}$

with $g\left(Q\right)={e}^{c\mathrm{1}\phantom{\rule{0.125em}{0ex}}+\phantom{\rule{0.125em}{0ex}}c\mathrm{2}\left(ln\left(Q\right)\right)\phantom{\rule{0.125em}{0ex}}+\phantom{\rule{0.125em}{0ex}}c\mathrm{3}\left(ln\left(Q\right){\right)}^{\mathrm{2}}}$,

in which E and Q, respectively, represent actual evapotranspiration and discharge. In the original formulation P refers to precipitation, whereas in this method it refers to the liquid water supply from rainfall and snowmelt.

The potential evaporation calculation in the PT method requires net radiation and the slope of saturated vapor pressure as well as the Priestley–Taylor parameter, the psychometric constant, and the latent heat of vaporization (e.g., Matt et al., 2018). The latter three variables are kept constant in the PT method. Actual evapotranspiration is assumed to take place only from snow-free areas and it is estimated as a function of potential evapotranspiration and a scaling factor.

In the default parameter settings of the PT_GS_K model seven parameters are considered as influential and thus allowed to vary in conditioning the model. Preliminary model calibration using the BOBYQA algorithm (Powell, 2009) and the default setting gave reasonable model performance. Hence, the same setting was also followed in this study with the addition that the sub-grid snow coefficient of variation was also considered an uncertain model parameter. A similar result was also observed when this setting was later verified using the method of Morris (Morris, 1991; Saltelli et al., 2008) for screening the most influential out of the relevant model parameters. The feasible ranges of parameter values are set based on relevant literature and previous modeling studies in the Nea-Nidelva catchment. Table 1 shows a list of these parameters with their range of possible values.

Figure 1Physiographic and location map of the Nea catchment in Norway.

## 2.2 Study area and data

This study was conducted using climatic and catchment data from the Nea catchment (11.67390–12.46273 E, 62.77916–63.20405 N). The Nea catchment constitutes the headwaters of the Nea-Nidelva water resources management area which is situated in Sør-Trøndelag county, Norway (Fig. 1). The hydropower generated from this area is the main source of electric supply to several places in mid-Norway including to one of the biggest cities in the country, Trondheim. As a result this area has significance for Statkraft AS and other stakeholders responsible for the development and management of water resources in the region and has been selected for research focused on better prediction and understanding of the snow processes and their impact on hydrology of the downstream area.

The Nea catchment covers a total area of 703 km2 and it is characterized by a wide range of physiographic and land cover characteristics. Altitude of the catchment ranges from 1783 m a.s.l. on the eastern part around the mountains of Storsylen to 649 m a.s.l. at its outlet on the western part of the catchment. Mean annual precipitation for the hydrological years 2011–2014 was 1120 mm. The highest and lowest average daily temperature values for this period were 28 and −30C, respectively.

As mentioned in Sect. 2.1, the PT_GS_K model requires temperature, precipitation, radiation, relative humidity, and wind speed as forcing data. In this study, daily time series data of these variables for the study area were obtained from Statkraft (2018) as point measurements, with the exception of relative humidity. Daily gridded relative humidity data were retrieved from ERA-Interim (Dee et al., 2011). The Model uses a Bayesian kriging approach to distribute the point temperature data over the domain, while for the other forcing variables it uses an inverse distance weighting approach.

Two observational datasets, streamflow and snow cover, were used in this study. Daily observed streamflow measurements covering 4 hydrological years (1 September to 31 August) were provided for the study area. The climatic data show that these hydrological years represented periods both above and below the long-term average annual precipitation. Years 2011 and 2013, respectively, were the wettest and driest years in over 10 years. Daily snow cover fraction (SCF) data were retrieved from NASA MODIS snow cover products (MODIS SCF) (Hall et al., 2006). Frequent cloud cover is one of the major challenges when using MODIS and other optical remote sensing data in Norway. In order to minimize the effect of obstructions and misclassification errors emanating from clouds and other sources, a composite dataset was formed using data retrieved from the Aqua and Terra satellites, MYD10A1 and MOD10A1 products, respectively.

In this analysis, PT_GS_K was set up in distributed mode over 812 grid cells, requiring the following physiographic data of each grid cell: average elevation and grid cell total area, as well as the areal fractions of forest, reservoir, lake, and glacier. Data for these physiographic variables were retrieved from two sources: the land cover data from Copernicus land monitoring service (2016) and the 10 m digital elevation model (10 m DEM) from the Norwegian mapping authority (2016).

## 2.3 The uncertainty analysis methods

In this study a modeling and parameter uncertainty analysis was conducted using two GLUE variants. First, the hydrological model and its snow sub-model were subjected to uncertainty analysis using the residual-based GLUE methodology. When using this approach, the relevant model parameters were initially conditioned using either streamflow or MODIS SCF. In the subsequent analysis, they were conditioned using both streamflow and SCF. Following that, the uncertainty analysis was conducted using the relaxed GLUE LOA approach.

### 2.3.1 Sampling the parameter dimensions

The performance of all uncertainty analysis techniques depends on the efficiency of the sample in representing the entire response surface (Pappenberger et al., 2008). In this study, prior distributions of the uncertain model parameters were not known and hence a uniform distribution was assumed. The challenge in using uniform distribution is, however, to adequately sample the entire parameter dimensions. To overcome this challenge and to better identify regions of behavioral simulations, a sample size of 100 000 runs was used. Each model run is a realization of a parameter set randomly drawn from the domains of the model parameters. An all-at-a-time (AAT) sampling method (e.g., Pianosi et al., 2016) was employed. This method involves random selection of all parameter values simultaneously. The residual-based GLUE (Sect. 2.3.2) and the relaxed GLUE LOA (Sect. 2.3.3) approaches are used to identify the behavioral model runs. Matlab scripts from the SAFE toolbox (Pianosi et al., 2015) were used as a basis to characterize behavioral and non-behavioral models.

### 2.3.2 The residual-based GLUE approach

In this study, the performance of each model realization was evaluated by using relevant likelihood measures. Residual-based informal likelihood measures are considered suitable measures of fit when large datasets such as rainfall-runoff time series exist for model conditioning (Hassan et al., 2008). The Nash–Sutcliffe efficiency (NSE, Eq. 4) belongs to these groups of likelihood measures, and it is the most widely used likelihood measure for assessing the fitness of model parameters in hydrological modeling (Xiong and O'Connor, 2008). Further, the main end users of the model commonly use NSE both in calibration and evaluation of hydrological models. Thus, use of this performance measure as a streamflow likelihood measure makes it easier both in setting the threshold value for behavioral models (i.e., based on previous experience) and in communicating model performance outputs. However, the NSE calculated using raw values tends to overestimate model performance during peak streamflow and underestimate during low-streamflow conditions (e.g., Krause et al., 2005). To partly overcome this problem, NSE is often calculated with log-transformed observed and simulated values. In this study, both NSE and NSE with log-transformed streamflow values (LnNSE) were thus employed as likelihood measures in evaluating each model run.

$\begin{array}{}\text{(4)}& \mathrm{NSE}=\mathrm{1}-\frac{\sum _{i=\mathrm{1}}^{n}\left({Q}_{\mathrm{sim},\phantom{\rule{0.125em}{0ex}}i}-{Q}_{\mathrm{obs},\phantom{\rule{0.125em}{0ex}}i}{\right)}^{\mathrm{2}}}{\sum _{i=\mathrm{1}}^{n}\left({Q}_{\mathrm{obs},\phantom{\rule{0.125em}{0ex}}i}-{\overline{Q}}_{\mathrm{obs}}{\right)}^{\mathrm{2}}},\end{array}$

in which Qsim represents simulated streamflow, Qobs is observed streamflow, and ${\overline{Q}}_{\mathrm{obs}}$ represents the mean value of observed streamflow series.

Within the residual-based GLUE procedure, the definition of threshold likelihood value at which the model performance is judged reasonable is a subjective choice by the modeler. In this study, NSE and LnNSE of 0.7 and 0.6, respectively, were considered as the threshold values for behavioral models. These values were chosen with due consideration to the input and observational data quality as well as the relative importance given to high streamflow in relation to low-streamflow conditions in the hydropower industries. In the case of the combined likelihood measure, a weighted average threshold value (e.g., Hassan et al., 2008) was calculated assuming each likelihood measure to have a weight proportional to its threshold value. Accordingly, the NSE and LnNSE likelihood measures were respectively assigned weights of 0.54 and 0.46 (Eq. 5).

$\begin{array}{}\text{(5)}& {L}_{\mathrm{NS}}\left(O\phantom{\rule{0.125em}{0ex}}\mid \phantom{\rule{0.125em}{0ex}}M\left({\mathit{\theta }}_{i}\right)\right)=\mathrm{0.46}\left({L}_{\mathrm{LnNSE}}\right)+\mathrm{0.54}\left({L}_{\mathrm{NSE}}\right),\end{array}$

where LNS(O ∣ M(θi)) represents the combined likelihood measure for the ith model realization with model prediction of M(θi), which is a function of the set of model parameters θi, and corresponding to the observations (O). LNSE and LLnNSE, respectively, represent the likelihood measures based on NSE and LnNSE. Models producing likelihood measure values greater than or equal to the threshold value were labeled as behavioral models and were retained for use in further analysis.

The root mean squared error (RMSE) of simulated and MODIS fractional snow cover was used as a likelihood measure of SCF. A threshold value of 0.17 was set when using the RMSE in model conditioning. This value was fixed based on the average performance of similar conceptual hydrological models as a reference (e.g., Skaugen and Weltzien, 2016) and with due consideration to the inherent error in the MODIS SCF data. The estimated annual average error of MODIS SCF maps for the Northern Hemisphere is approximately 8 % in the absence of cloud (Pu et al., 2007), and in forest-dominated areas it may reach up to 15 % (Hall et al., 2001).

Preliminary assessment of model performance indicates that the snow yes/no-based model performance (critical success index, CSI; Table 2) is very high both before the onset of snowmelt and during the complete melt-out period. The lowest match between simulated and MODIS SCF was observed during early summer. It was thus decided to use a weighted mean likelihood measure of SCF, with maximum weight assigned to likelihoods from the middle part of the observation period. The likelihood of each SCF observation was assigned a specific weight based on the location of the observation date in a trapezoidal membership function (TMF). The start and end of the MODIS SCF observation period locate the feet of the trapezoid and the start and end of the month of June locate the shoulders (Fig. 2). For each model realization, the weighted average RMSE (wRMSE) of all SCF observations and their corresponding simulated values for the calibration period were calculated and model realizations with wRMSE below the threshold value were considered behavioral. The weight of each behavioral model was calculated as the inverse of wRMSE and was used in constructing the cumulative distribution function (CDF), based on which the predicted SCF values for different quantiles can be extracted.

Table 2Setup of the two-by-two contingency table for binary snow cover data comparison. O and S, respectively, represent observed and simulated binary snow cover and the subscripts refer to a snow-free (0) and snow-covered (1) grid cell.

Figure 2A trapezoidal membership function for SCF likelihoods in the observational period.

When selecting behavioral models using the combined likelihoods of streamflow and SCF, the merging of these likelihoods was carried out in two steps. First the likelihoods representing low- and high-flow conditions, viz. LnNSE and NSE, were combined following a similar procedure as described above. The likelihoods of streamflow and SCF were separately rescaled such that their respective weights would sum to unity following a similar procedure to that used in Brazier et al. (2000). The combined streamflow likelihood and the SCF likelihood were subsequently multiplied to get a combined likelihood measure of streamflow and SCF.

### 2.3.3 The relaxed GLUE LOA approach

Unlike the residual-based model selection approaches, including the residual-based GLUE methodology, the GLUE LOA approach relies on an assessment of uncertainty in the observational data. The uncertainty analysis was also thus conducted in this study using the GLUE LOA approach and its results compared against those from the residual-based GLUE methodology.

In this study when using the GLUE LOA approach, both the streamflow and MODIS SCF data were considered as uncertain observations. Since no uncertainty data were available for streamflow observations in the study site, mean streamflow uncertainty of 25 % was assumed and the streamflow limits were defined using this value. Although, the maximum expected error of MODIS snow cover products under clear-sky conditions is reported to be 15 % for forest areas (Hall et al., 2001), cloud coverage coupled with a lack of contrast between clouds and snow cover may severely affect the accuracy. And in some cases this leads to misclassification of snow as land (e.g., Parajka et al., 2012). Thus, a SCF uncertainty of 25 %–50 % was assumed to represent the errors associated with the SCF observations and the input data.

An alternative approach was employed to minimize the risk of rejecting useful model realizations due to using assumed average observational error bounds and due to a lack of a viable means for explicitly accounting for the time-varying level of observational and input data uncertainties. The procedure involves relaxing the percentage of observations where model predictions fall within the acceptability limits. Model realizations whose predictions fall within the acceptable bounds in a defined percentage of the observations were considered behavioral. The minimum acceptable percentage of observations where model predictions fall within the limits (hereafter referred as threshold pLOA) in turn was set such that the 5 %–95 % prediction limit of streamflow, reported as the containing ratio (CR, see Eq. 6), is close to the value obtained using the residual-based GLUE methodology. The procedure for relaxing the original GLUE LOA requirement during the calibration period involves the following steps.

• Step 1: define an acceptable prediction limit (CR) at a chosen certainty level (e.g., 5 %–95 %). In this study the CR value obtained for the calibration period using the residual-based GLUE methodology was adopted as an acceptable CR value.

• Step 2: relax the acceptable percentage of observations where model predictions fall within the limits. This is done by gradually lowering the requirement for bracketing the observations in 100 % of the time steps up to the acceptable pLOA.

• Step 3: run a calibration and test whether each model realization prediction falls within the limits at least for the specified percentage of the total observations. If model realizations that satisfy the relaxed acceptability criteria are found, proceed to step 4, otherwise lower the threshold pLOA further and repeat this step.

• Step 4: calculate the new CR and check if it is close to the predefined acceptable CR value. If the calculated CR is less than the predefined CR, repeat steps 2 to 4, whereas if the two CR values are close (e.g., within 5 %) then accept all model realizations that satisfy this pLOA as behavioral and store their indices for use in further analysis.

Model realizations that fulfill this relaxed LOA criteria both in streamflow and SCF observations were considered behavioral. A triangular membership function was used to define the weights of each criterion, where a maximum weight of 1.0 was assigned to predictions with a perfect match to the observation and a minimum weight of 0.0 to predictions outside the acceptability limits. For each model realization, the weights of individual time steps were added to give a generalized weight. Following the procedure by Blazkova and Beven (2009), the weights associated with streamflow and MODIS SCF were combined by taking the sum of these two criteria and rescaling them such that the sum of the weights for behavioral models is unity. The behavioral model realizations were used for prediction weighted by their overall degree of performance.

### 2.3.4 GLUE output analysis

A split-sample-based cross-validation of streamflow predictions was used to alternately evaluate how well the behavioral models identified at a given calibration period are able to reproduce the observed values from another period. The hydrologic model was run for 4 years at a daily time step. The first month of each hydrological year was considered as a spin-up period and hence excluded from all uncertainty analyses. Each of the 4 years was alternately used to identify behavioral models and the remaining 3 years were individually used to assess the modeling uncertainty.

Table 3Statistical summary of the posterior distribution for model parameters.

In this study the modeling uncertainty was evaluated using both qualitative and quantitative evaluation techniques. The upper and lower streamflow prediction limits as well as observed values were plotted on the same graph to visually assess the capability of the identified behavioral models in bracketing the observations. The containing ratio (CR) index was also used to analyze the prediction uncertainty following a similar procedure to that used in some studies involving the GLUE methodology (e.g., Xiong et al., 2009; He et al., 2011). CR is expressed as the ratio of the number of observations falling within respective prediction bounds to the total number of observations (Eq. 6).

$\begin{array}{ll}& \mathrm{CR}=\frac{\sum _{i=\mathrm{1}}^{n}I\left({Q}_{\mathrm{obs},\phantom{\rule{0.125em}{0ex}}i}\right)}{n},\\ & \mathrm{where}\\ \text{(6)}& & I\left({Q}_{\mathrm{obs},\phantom{\rule{0.125em}{0ex}}i}\right)=\left\{\begin{array}{l}\mathrm{1},\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}{L}_{\mathrm{lim},\phantom{\rule{0.125em}{0ex}}i}<{Q}_{\mathrm{obs},\phantom{\rule{0.125em}{0ex}}i}<{U}_{\mathrm{lim},\phantom{\rule{0.125em}{0ex}}i}\\ \mathrm{0},\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\mathrm{Otherwise}\end{array}.\right\\end{array}$

Qobs, i represents observed streamflow at the ith time step, and Llim, i and Ulim, i are the lower and upper prediction bounds, respectively.

Figure 3Dotty plots of the likelihood measure for behavioral and non-behavioral models identified using the residual-based GLUE methodology.

Figure 4Distribution of model parameters within their variability ranges.

As an alternative to a crisp prediction for an observation (e.g., Xiong and O'Connor, 2008), the median (50 %) streamflow prediction was also estimated from the behavioral model simulations and compared against observations using both NSE and LnNSE as goodness-of-fit measures. Similarly, the critical success index (Table 2) and RMSE were used as goodness-of-fit measures for median SCF prediction. When using RMSE, the fractional snow cover data of each grid cell were directly employed in validating median predictions. CSI represents the number of grid cells where the snow events are correctly predicted out of the total number of grid cells where snow is predicted in the model. It was calculated based on binary snow cover data using the two-by-two contingency table analysis (Table 2) following a similar procedure to that used in Hanzer et al. (2016). When converting the snow cover fraction to a binary measure, a grid cell was classified as snow covered if at least 50 % of its area is snow covered.

3 Results

## 3.1 Uncertainty analysis using the residual-based GLUE approach

### 3.1.1 Uncertainty of model parameters

The uncertainty of model parameters was analyzed using all years of record together as single time series data. The dotty plots (Fig. 3) depict the goodness-of-fit response surface projected onto individual parameter dimensions. The parallel coordinate plots (Fig. 4) also show the distribution of model parameters within their respective parameter dimensions. The distribution of behavioral simulations across a parameter dimension varies from one parameter to another. The behavioral models are scattered nearly across the entire range of parameter dimension for fast ADR, slow ADR, and snow CV, indicating low model sensitivity to these parameters. On the other hand, the relatively localized distribution of behavioral models towards lower values when projected onto the parameter ranges of c1, c2, tx, and wind scale as well as towards higher values of c3 reflects higher sensitivity of simulated streamflow to these calibration parameters. Furthermore, the parallel coordinate plots show an increase in likelihood measure value towards the lower (for c1, c2, tx, and wind scale) and higher (for c3) parts of their respective parameter dimensions.

Figure 5Model performance in response to the interaction between model parameters (upper diagonal cells) and correlation coefficient scores between the parameters (lower diagonal cells)

Figure 6Posterior distribution of calibration parameters after conditioning on flow observations.

Table 4Cross-validation of streamflow predictions against observed values. Bold numbers show the result for the calibration period.

The aforementioned less sensitive model parameters can, however, have a high effect on model outputs through interaction with other parameters. Some degree of interaction between model parameters can be seen from the correlation shown in Fig. 5. For example, a general decreasing trend in model performance can be noticed with a joint increase in c1 and c2. The strong influence of tx in constraining the output is also evident in these plots. A considerable level of interaction can also be observed from the correlation coefficient scores between c1 and c2 (0.56), c2 and c3, (0.53) and between tx and wind scale (0.66).

Figure 7Median, 5–95 percentile range, and observed values of streamflow for the sample calibration period (a) and validation periods (b, c and d). The calibration result (a) and the validation results presented in (b) and (d) are based on behavioral models identified using the combined likelihood, while the result shown in (c) is based on behavioral models identified using NSE alone.

The posterior distribution histograms (Fig. 6) and the statistical summary table of posterior distribution (Table 3) illustrate variability in distribution characteristics of the model parameters. The catchment response parameters, viz. c1, c2, and c3, showed relatively well-defined peaks, whereas fast ADR, slow ADR, and snow CV appear less identifiable with a relatively flat distribution across their respective parameter dimensions. It should, however, be noted that, in the GLUE methodology, it is the set of parameter values that gives a behavioral model.

### 3.1.2 Uncertainty of streamflow predictions

Figure 7 shows a sample cross-validation of daily streamflow prediction limits against observed values. The upper and lower prediction bounds as well as the median values are generated with behavioral models identified in year 2011 using the combined NSE and LnNSE likelihood measure. The calculated uncertainty in streamflow prediction indicated by the 5–95 percentile range (shaded band) varied over time and relatively higher uncertainty was noticed during high-streamflow than low-streamflow periods.

As can be seen from the summary table of cross-validation results (Table 4), the CR values range from 0.62 to 0.91 with an overall mean value of 0.77. The mean CR values for the calibration and validation periods are 0.78 and 0.76, respectively. The evaluation result generally shows that the median prediction of behavioral models selected using the combined likelihood was able to reproduce the observed values remarkably well with average NSE and LnNSE of 0.86 and 0.72, respectively, for the validation period. However, performance of the behavioral models identified using NSE was very low when evaluated using LnNSE in year 2014. This phenomenon can be attributed to the relatively low quality of streamflow observations during the low-streamflow period of this year. The validation result was also highly affected by the nature of the likelihood measure used during the identification of behavioral models. For example, a persistent low performance was observed during the early months of the hydrologic year when validating behavioral models identified using NSE alone (Fig. 7c) as compared to those identified using the combined likelihood (Fig. 7d). Similarly, excluding the first 30 observations from the validation dataset resulted in an improvement of LnNSE from −0.53 to 0.44.

### 3.1.3 Uncertainty of snow cover predictions

Snow cover fractions and snow water equivalent are two main outputs of the snow sub-model (GS) of the PT_GS_K model. In this study an initial single-likelihood-based conditioning of the GS specific parameters was carried out using MODIS SCF only and RMSE as a measure of model performance.

Table 5Cross-validation of SCF predictions against MODIS SCF.

The cross-validation result of predicted median values against MODIS SCF observations is shown in Table 5. The highest and lowest RMSE values during the calibration period were 0.15 and 0.06, respectively, with an average RMSE value of 0.11. Minimum and maximum RMSE values of 0.06 and 0.22, respectively, were observed during the validation period with an average RMSE value of 0.13. Similarly the lowest CSI values during the calibration and validation periods were 0.99 and 0.88, respectively. Comparable maximum CSI results were observed between the two periods. The 5 %–95 % SCF prediction interval was able to reasonably bracket the observations in most of the calibration and validation periods with mean CR values of 0.60 and 0.71, respectively, without any explicit accounting for model residuals for each parameter set. The inter-annual comparison of model performance shows that relatively lower performance was observed in years 2011 and 2012 as compared to the other periods.

### 3.1.4 Uncertainty of streamflow and snow cover predictions using both observations

The cross-validation result of simulated streamflow and SCF against observations is shown in Table 6. A similar model performance was observed when model parameters are conditioned using both streamflow and MODIS SCF as compared to when only streamflow was used for model conditioning. The mean NSE and LnNSE values of the median streamflow prediction in the validation periods were 0.85 and 0.71, respectively. The average streamflow prediction uncertainty (CR) in the validation period was 0.70. For SCF, average RMSE and CSI values of 0.11 and 0.99, respectively, were obtained when using the combined likelihood. The streamflow and SCF median predictions obtained in this analysis are similar to the results when model parameters are respectively conditioned with streamflow only or MODIS SCF only. This result shows that contribution from the MODIS SCF was less significant in constraining the model parameters. The relatively low quality of MODIS SCF data as compared to the streamflow data for the study site may also partly explain this phenomenon.

Table 6Cross-validation of streamflow and SCF predictions.

Figure 8Prediction and acceptable flow bounds for the sample calibration period (a) and validation period (b).

## 3.2 Uncertainty analysis using the relaxed GLUE LOA approach

The median streamflow prediction of behavioral models identified using the relaxed GLUE LOA was able to mimic the observed values very well with a mean NSE and LnNSE of 0.85 and 0.7, respectively, for the validation period (Table 7). A comparable performance was observed between models selected using the residual-based GLUE and the relaxed GLUE LOA. The similarity in median predicted streamflow by these two GLUE methodologies can also be noticed from visual comparison of the resulting hydrographs (Figs. 7 and 8). A mean streamflow CR value of 0.75 was obtained for the validation period when using the relaxed GLUE LOA. This shows slightly better capability of the 5 %–95 % prediction bounds in bracketing the observations as compared to predictions using the residual-based GLUE methodology when both streamflow and SCF are used in model conditioning.

Table 7Cross-validation of streamflow and SCF predictions after relaxing the LOA criteria.

Figure 9Prediction and acceptable bounds of average SCF for the sample calibration period (a) and validation period (b).

The behavioral models selected using the relaxed GLUE LOA approach were also able to adequately reproduce observed SCF with a mean RMSE and CSI of 0.11 and 0.98, respectively, for the validation period. Generally, high prediction uncertainty of SCF was observed during the onset of snowmelt and low uncertainty during the summer with an average CR of 0.63. Thus, hydrological year 2011, having most of its observations coming from April, showed the lowest CR as compared to the other periods. Figure 9 shows observed and simulated average catchment SCF for the sample calibration period (2011) and validation period (2012). From this figure it can be noticed that the median prediction tends to overestimate the observed SCF values, and many of the observed values from the month of April fall outside the 5 %–95 % prediction bounds. The overall result, however, indicates an improved capability of the 5 %–95 % prediction bounds in bracketing the SCF observations as compared to predictions using the residual-based GLUE methodology.

4 Discussion

The streamflow prediction uncertainty analyses results show that model performance was relatively lower during low-streamflow than high-streamflow conditions throughout most validation periods (e.g., Table 4). A similar result was reported by Choi and Beven (2007) in their multiperiod cluster-based uncertainty analysis in the Bukmoon catchment, South Korea, where a high percentage of simulation bias was observed during the drier seasons due to relatively poor model performance during these periods. The result of this study is thus consistent with the general observation that catchment hydrologic models perform relatively well in wet conditions but break down during low-streamflow conditions (e.g., Kirchner, 2009). In the case of results from the residual-based GLUE methodology, this can also be partly attributed to the nature of the likelihood measure used to identify the behavioral models. The result reveals this observation, where model performance during low-streamflow periods (LnNSE) was improved when using the combined likelihood measures as compared to using NSE alone. This is because models identified using NSE alone strongly reflect the hydrologic characteristics of the high-streamflow periods and are expected to perform more poorly during low-streamflow conditions.

Figure 10Box plot showing posterior distribution of model parameters when separately conditioned using streamflow and SCF. Parameter values are scaled between 0 and 1.

In order to assess the potential value of MODIS SCF in constraining model parameters, the snow sub-model parameters were constrained using this observation and the posterior distribution of the individual parameters were compared against corresponding distributions that resulted from model conditioning using streamflow only. Parameter inference based on SCF only as a conditioning observation gave some parameter estimates that deviate significantly from those obtained when conditioned with streamflow only (Fig. 10). The box plots depict the posterior distribution of the snow-related parameters separately conditioned using streamflow and SCF. For the ease of comparison, parameter values were scaled between 0 and 1. From these plots it can be seen that tx and wind scale are the model parameters most sensitive to the conditioning data type with a significant shift in their quartiles towards the upper part of the parameter dimensions when conditioned using SCF, whereas the fast ADR, slow ADR, and snow CV did not show significant displacement in their posterior distribution. These parameters were also identified as the least sensitive model parameters when the model was constrained using streamflow only.

Generally, in snow models with the sub-grid snow distribution component parameterized using the statistical probability distribution function, low snow CV results in a faster depletion rate of the snow-covered fraction (e.g., Liston, 2004). Thus, the slight displacement of snow CV posterior values towards the lower part of its parameter dimensions coupled with the increased posterior values of wind scale would give rise to lower snow cover fraction during the melting period when model parameters are constrained using SCF only. On the other hand, the increased posterior values of the rain–snow threshold (tx) would result in an increase in snow deposition and thereby in a partial or full canceling out of the effects of changes in snow CV and wind scale. This phenomenon may thus lead to equifinality, where different sets of model parameters give comparable SCF responses.

In the GLUE LOA approach a particular model realization is classified as acceptable if its prediction falls within the limits for all observed values. In continuous rainfall-runoff modeling it is difficult for all predictions of a given model realization to lie within the observation limits in a time series. In some cases this phenomenon can be attributed to different specific processes dominating the hydrologic behavior of a catchment at different sub-periods, while in other instances it may be due to a lack of a viable means for explicitly accounting for the effect of variable sources and level of uncertainties from the input data errors, which are difficult to set a priori. Thus, the time-varying likely effects of other sources of errors such as input errors on prediction uncertainty need also be implicitly taken into account when defining the limits of acceptability.

The use of GLUE LOA for testing hydrologic models as hypotheses without a due consideration of errors in input data may lead to a rejection of useful models that might adequately represent the catchment behavior and thereby to making a type II error (false negative). In the past, various attempts have been made to minimize the risk of making type II errors in model calibration studies using the GLUE and other frameworks. In some studies an improved calibration of hydrologic models was obtained through independent calibration of sub-periods of a time series (e.g., Boyle et al., 2000; Samanta and Mackay, 2003). When it comes to the GLUE LOA approach, extending the limits (e.g., Blazkova and Beven, 2009; Liu et al., 2009) and using different model realizations for different periods of a hydrological year (e.g., Choi and Beven, 2007) are some of the measures taken to minimize the risk of making type II errors. Common to all these measures is that they attempt to relax the selection criteria for behavioral models.

In this study when using the GLUE LOA approach, the streamflow bounds were set to ±25 % and the result shows that none of the model realizations were able to satisfy the LOA criteria without one or more of their predictions falling outside the acceptable streamflow bounds. The failure rate was higher during low-flow conditions as compared to high-flow conditions. An initial attempt was made to relax the limit of acceptability by extending the streamflow bounds. Regardless, no model realization with its predictions falling within the error bounds for all observations was found until the limits were extended to over ±85 %. This relaxed acceptability limit seems less reasonable in terms of its physical meaning as an error bound. Therefore, rather than relaxing the limits, an alternative empirical approach was followed by relaxing the number of simulation time steps which fulfilled the original LOA criterion. The procedure involves defining the acceptable percentage of observations that are required to be bracketed by model predictions (during the calibration period) based on a predefined acceptable CR value.

Figure 11The effect of the percentage of observations required to be bracketed by each model realization (pLOA) on prediction uncertainty (CR) and efficiency of the median prediction (NSE) for the sample calibration periods (years 2011 and 2012).

This empirical approach is based on the observed relationship between prediction uncertainty and number of behavioral models, which in turn is a function of the selection criterion. As the threshold value of a likelihood measure increases (in the case of residual-based GLUE) or absolute value of the limits decreases (in the case of GLUE LOA), the simulated runoff series gradually converges, though not necessarily to the observations. A similar observation was also reported in other GLUE-based uncertainty studies (e.g., Xiong et al., 2008). A further analysis in this study reveals that, as the percentage of observations required to be bracketed by each model realization (pLOA) increases, the number of behavioral models decreases and thereby the simulated runoff series converges, resulting in a low CR (Fig. 11). In this study, the threshold pLOA for each calibration period was defined in such a way that the 5 %–95 % prediction uncertainties of streamflow using the residual and the LOA-based GLUE methodologies are similar. Defining the threshold pLOA this way helps to set a reasonable value that minimizes the risk of making type II errors while maintaining the overall model accuracy by rejecting the inclusion of non-behavioral models. Furthermore, it helps to roughly compare the performance of behavioral models selected using the relaxed GLUE LOA against the residual-based GLUE in terms of their ability to reproduce the median streamflow and SCF predictions at a similar level of uncertainty (i.e., the CR used to set pLOA).

Although it is difficult to single out the effects of input data error from model structural error on model performance using the GLUE methodology, the error patterns may aid in assessing model performance in different periods of the hydrologic year. Generally, a good model structure coupled with good data is not expected to give a consistent bias (e.g., Liu et al., 2009). Figure 12 shows a sample daily percentage of acceptable simulations satisfying the LOA criteria during the hydrologic year 2012. The percentage of the acceptable number of model realizations in each time step was generally low during the calibration period (<65 %). However, for each time step, predictions from some behavioral models are able to mimic the corresponding observation within the assumed error bound. The percentage of acceptable models was relatively higher during high than low-streamflow conditions. And this result is consistent with the general observation that most hydrological models perform relatively well during high-streamflow compared to low-streamflow periods. The spike in the percentage of acceptable models in the month of February 2012 when time steps around are so low, however, reveals how model performances can unexpectedly vary between time steps in response to input data errors and/or the observational error bounds. The observed spike could thus be attributed to relatively low input data errors and/or lower actual observational error bounds as compared to the assumed average values for the particular time step. The distribution of the behavioral model weights over the calibration period shows that the mean weight during the period where the spike occurred is very low. Similarly the median weight of behavioral models during this period is close to zero, implying that most of the model realizations have their predictions that barely fall within the limits.

Figure 12Daily percentage of acceptable model realizations with their predictions falling within the observation error bounds (a) and the daily weight associated with each acceptable model realization as well as daily mean and median value of the weights (b) in a sample calibration period.

This result reveals that the GLUE LOA with relaxation in percentage of observations where model predictions fall within observational error bounds can be used as an alternative approach for conditioning model parameters and conducting an uncertainty analysis when there is a lack of metadata on input and observational data uncertainty coupled with a highly time-varying level of uncertainty from such sources. After relaxation, a limited sample of the total observations, i.e., 30 %–40 % of a hydrologic year, was able to effectively identify behavioral models, and this result is consistent with findings of other studies dealing with the effect of observation size on constraining model parameters (e.g., Seibert and Beven, 2009; Liu and Han, 2010; Sun et al., 2017). The relative accuracy of an event and other factors that affect the information content of the input and observation datasets (e.g., Beven and Smith, 2015) are more important than the length of the datasets, especially in continuous rainfall-runoff modeling.

5 Conclusions

Two GLUE methodology variants were applied for parameter uncertainty analysis of a distributed conceptual hydrological model. The analysis result from the residual-based GLUE methodology shows that the catchment response parameters, viz. c1, c2, and c3 as well as the wind scale, are the most sensitive model parameters. More caution is thus required when defining the value range of these parameters. On the other hand, the fast and slow albedo decay rates as well as the snow CV are relatively more uncertain model parameters.

Model conditioning using combined streamflow and MODIS SCF did not improve the median prediction of streamflow as compared to the result when model parameters are conditioned using streamflow only. A similar result was also observed for SCF predictions. The additional information from the MODIS SCF data was generally less significant in constraining the rainfall-runoff model parameters.

When using the GLUE LOA approach, the model did not provide any behavioral simulation that yields predictions within the assumed observational error bound in over 90% of the time steps. A relaxation was needed in order to partly overcome the limitations of using constant observational error proportionality and not taking an explicit account of the other sources of uncertainty such as from input data errors. A relaxed GLUE LOA approach was introduced that allows a relaxation on the number of time steps required to achieve the LOA. Similar results are obtained using both the residual-based GLUE and the relaxed GLUE LOA approaches. Relaxing the percentage of observations required to be bracketed per simulation period by a particular model realization (pLOA) was found to be more effective than relaxing the observational error bounds. In this study the 5 %–95 % prediction uncertainty of the residual-based GLUE methodology was used as a reference to define the pLOA in the relaxed GLUE LOA analysis using forcing and observational datasets from a single catchment. More similar case studies should be conducted on catchments with different hydrologic characteristics to assess the scope of this approach under different condition.

Data availability
Data availability.

The underlying hydrologic observations for this analysis were provided by Statkraft AS and are proprietary within their hydrologic forecasting system. However, the data may be made available upon request. Please contact John Burkhart (john.burkhart@statkraft.com) for further information and access to the data.

Author contributions
Author contributions.

ATT designed and performed the analysis. JFB and TSV supervised the study. ATT wrote the manuscript. JFB and TSV reviewed the manuscript.

Competing interests
Competing interests.

The authors declare that they have no conflict of interest.

Acknowledgements
Acknowledgements.

This work was conducted within the Norwegian Research Council's Enhancing Snow Competency of Models and Operators (ESCYMO) project (NFR no. 244024) and in cooperation with the strategic research initiative LATICE (Faculty of Mathematics and Natural Sciences, University of Oslo https://mn.uio.no/latice, last access: 1 March 2018). Computational and data storage resources were provided by NOTUR/NORSTORE projects NS9333K and NN9333K. We are grateful for Keith Beven and Chong-Yu Xu for their helpful comments. Furthermore, we thank Sigbjorn Helset and Statkraft AS, in general, for helping us to set up Shyft in a windows environment and for providing us the data. We also thank Kristoffer Aalstad and Sebastian Westermann for providing us a Matlab script for retrieving the composite MODIS SCF data.

Edited by: Alberto Guadagnini
Reviewed by: two anonymous referees

References

Bavera, D., Michele, C., Pepe, M., and Rampini, A.: Melted snow volume control in the snowmelt runoff model using a snow water equivalent statistically based model, Hydrol. Process., 26, 3405–3415, 2012.

Berezowski, T. and Batelaan, O.: Skill of remote sensing snow products for distributed runoff prediction, J. Hydrol., 524, 718–732, 2015.

Beven, K.: Changing ideas in hydrology – the case of physically-based models, J. Hydrol., 105, 157–172, 1989.

Beven, K. and Binley, A.: The future of distributed models: model calibration and uncertainty prediction, Hydrol. Process., 6, 279–298, 1992.

Beven, K.: Prophecy, reality and uncertainty in distributed hydrological modelling, Adv. Water Resour., 16, 41–51, 1993.

Beven, K.: A manifesto for the equifinality thesis, J. Hydrol., 320, 18–36, 2006.

Beven, K.: Environmental modelling: An uncertain future?, CRC Press, London, 2009.

Beven, K.: Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis testing, and communication, Hydrolog. Sci. J., 61, 1652–1665, 2016.

Beven, K. and Smith, P.: Concepts of information content and likelihood in parameter calibration for hydrological simulation models, J. Hydrol. Eng., 20, A4014010, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000991, 2015.

Blasone, R.-S., Vrugt, J. A., Madsen, H., Rosbjerg, D., Robinson, B. A., and Zyvoloski, G. A.: Generalized likelihood uncertainty estimation (GLUE) using adaptive Markov Chain Monte Carlo sampling, Adv. Water Resour., 31, 630–648, 2008.

Blazkova, S. and Beven, K.: Flood frequency estimation by continuous simulation for a catchment treated as ungauged (with uncertainty), Water Resour. Res., 38, 1139, https://doi.org/10.1029/2001WR000500, 2002.

Blazkova, S. and Beven, K.: A limits of acceptability approach to model evaluation and uncertainty estimation in flood frequency estimation by continuous simulation: Skalka catchment, Czech Republic, Water Resour. Res., 45, W00B16, https://doi.org/10.1029/2007WR006726, 2009.

Boyle, D. P., Gupta, H. V., and Sorooshian, S.: Toward improved calibration of hydrologic models: Combining the strengths of manual and automatic methods, Water Resour. Res., 36, 3663–3674, 2000.

Brazier, R. E., Beven, K. J., Freer, J., and Rowan, J. S.: Equifinality and uncertainty in physically based soil erosion models: application of the GLUE methodology to WEPP – the Water Erosion Prediction Project – for sites in the UK and USA, Earth Surf. Proc. Land., 25, 825–845, 2000.

Burkhart, J. F., Helset, S., Abdella, Y. S., and Lappegard, G.: Operational Research: Evaluating Multimodel Implementations for 24/7 Runtime Environments, Abstract H51F-1541 presented at the Fall Meeting, AGU, San Francisco, California, 11–15 December 2016.

Choi, H. T. and Beven, K.: Multi-period and multi-criteria model conditioning to reduce prediction uncertainty in an application of TOPMODEL within the GLUE framework, J. Hydrol., 332, 316–336, 2007.

Clark, M. P., Kavetski, D., and Fenicia, F.: Pursuing the method of multiple working hypotheses for hydrological modeling, Water Resour. Res., 47, W09301, https://doi.org/10.1029/2010WR009827, 2011.

Copernicus land monitoring service: CORINE land cover, available at: https://land.copernicus.eu/pan-european/corine-land-cover, last access: 29 August 2016.

Crawford, N. H. and Linsley, R. K.: Digital simulation in hydrology, Stanford Watershed Model IV, Department of Civil Engineering, Stanford University, California, 1966.

Dee, D. P., Uppala, S., Simmons, A., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M., Balsamo, G., and Bauer, P.: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597, 2011.

Efstratiadis, A. and Koutsoyiannis, D.: One decade of multi-objective calibration approaches in hydrological modelling: a review, Hydrolog. Sci. J., 55, 58–78, 2010.

Finger, D., Vis, M., Huss, M., and Seibert, J.: The value of multiple data set calibration versus model complexity for improving the performance of hydrological models in mountain catchments, Water Resour. Res., 51, 1939–1958, 2015.

Hall, D. K., Riggs, G. A., Salomonson, V. V., Barton, J., Casey, K., Chien, J., DiGirolamo, N., Klein, A., Powell, H., and Tait, A.: Algorithm theoretical basis document (ATBD) for the MODIS snow and sea ice-mapping algorithms, Nasa Gsfc, 45, 2001.

Hall, K., George, R., Vincent, S., and Grid, V.: Updated daily MODIS/Terra Snow Cover Daily L3 Global 500 m Grid V005, April 2011 to August 2014, in: National Snow and Ice Data Center, Digital media, Boulder, Colorado USA, 2006.

Hanzer, F., Helfricht, K., Marke, T., and Strasser, U.: Multilevel spatiotemporal validation of snow/ice mass balance and runoff modeling in glacierized catchments, The Cryosphere, 10, 1859–1881, https://doi.org/10.5194/tc-10-1859-2016, 2016.

Hassan, A. E., Bekhit, H. M., and Chapman, J. B.: Uncertainty assessment of a stochastic groundwater flow model using GLUE analysis, J. Hydrol., 362, 89–109, 2008.

He, M., Hogue, T. S., Franz, K. J., Margulis, S. A., and Vrugt, J. A.: Characterizing parameter sensitivity and uncertainty for a snow model across hydroclimatic regimes, Adv. Water Resour., 34, 114–127, 2011.

Hegdahl, T. J., Tallaksen, L. M., Engeland, K., Burkhart, J. F., and Xu, C. Y.: Discharge sensitivity to snowmelt parameterization: a case study for Upper Beas basin in Himachal Pradesh, India, Hydrol. Res., 47, 683–700, 2016.

Hornberger, G. M. and Spear, R. C.: Approach to the preliminary analysis of environmental systems, J. Environ. Mgmt., 12, 7–18, 1981.

Jin, X., Xu, C. Y., Zhang, Q., and Singh, V. P.: Parameter and modeling uncertainty simulated by GLUE and a formal Bayesian method for a conceptual hydrological model, J. Hydrol., 383, 147–155, 2010.

Kirchner, J. W.: Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology backward, Water Resour. Res., 45, W02429, https://doi.org/10.1029/2008WR006912, 2009.

Kolberg, S. A. and Gottschalk, L.: Updating of snow depletion curve with remote sensing data, Hydrol. Process., 20, 2363–2380, 2006.

Krause, P., Boyle, D. P., and Bäse, F.: Comparison of different efficiency criteria for hydrological model assessment, Adv. Geosci., 5, 89–97, https://doi.org/10.5194/adgeo-5-89-2005, 2005.

Lambert, A.: Catchment models based on ISO-functions, J. Instn. Water Engrs., 26, 413–422, 1972.

Lee, S., Klein, A. G., and Over, T. M.: A comparison of MODIS and NOHRSC snow-cover products for simulating streamflow using the Snowmelt Runoff Model, Hydrol. Process., 19, 2951–2972, 2005.

Liston, G. E.: Interrelationships among snow distribution, snowmelt, and snow cover depletion: Implications for atmospheric, hydrologic, and ecologic modeling, J. Appl. Meteorol., 38, 1474–1487, 1999.

Liston, G. E.: Representing subgrid snow cover heterogeneities in regional and global models, J. Climate, 17, 1381–1397, 2004.

Liu, J. and Han, D.: Indices for calibration data selection of the rainfall-runoff model, Water Resour. Res., 46, W04512, https://doi.org/10.1029/2009WR008668, 2010.

Liu, Y., Freer, J., Beven, K., and Matgen, P.: Towards a limits of acceptability approach to the calibration of hydrological models: Extending observation error, J. Hydrol., 367, 93–103, 2009.

Mantovan, P. and Todini, E.: Hydrological forecasting uncertainty assessment: Incoherence of the GLUE methodology, J. Hydrol., 330, 368–381, 2006.

Matt, F. N., Burkhart, J. F., and Pietikäinen, J.-P.: Modelling hydrologic impacts of light absorbing aerosol deposition on snow at the catchment scale, Hydrol. Earth Syst. Sci., 22, 179–201, https://doi.org/10.5194/hess-22-179-2018, 2018.

Mirzaei, M., Huang, Y. F., El-Shafie, A., and Shatirah, A.: Application of the generalized likelihood uncertainty estimation (GLUE) approach for assessing uncertainty in hydrological models: a review, Stoch. Env. Res. Risk. A., 29, 1265–1273, 2015.

Montanari, A., Shoemaker, C. A., and van de Giesen, N.: Introduction to special section on Uncertainty Assessment in Surface and Subsurface Hydrology: An overview of issues and challenges, Water Resour. Res., 45, W00B00, https://doi.org/10.1029/2009WR008471, 2009.

Morris, M. D.: Factorial sampling plans for preliminary computational experiments, Technometrics, 33, 161–174, 1991.

Nearing, G. S., Tian, Y., Gupta, H. V., Clark, M. P., Harrison, K. W., and Weijs, S. V.: A philosophical basis for hydrological uncertainty, Hydrolog. Sci. J., 61, 1666–1678, 2016.

Norwegian mapping authority: Kartverket, available at: https://www.kartverket.no/, last access: 1 September 2016.

Pappenberger, F., Beven, K. J., Ratto, M., and Matgen, P.: Multi-method global sensitivity analysis of flood inundation models, Adv. Water Resour., 31, 1–14, 2008.

Parajka, J. and Blöschl, G.: The value of MODIS snow cover data in validating and calibrating conceptual hydrologic models, J. Hydrol., 358, 240–258, 2008.

Parajka, J., Holko, L., Kostka, Z., and Blöschl, G.: MODIS snow cover mapping accuracy in a small mountain catchment – comparison between open and forest sites, Hydrol. Earth Syst. Sci., 16, 2365–2377, https://doi.org/10.5194/hess-16-2365-2012, 2012.

Pianosi, F., Sarrazin, F., and Wagener, T.: A Matlab toolbox for global sensitivity analysis, Environ. Modell. Softw., 70, 80–85, 2015.

Pianosi, F., Beven, K., Freer, J., Hall, J. W., Rougier, J., Stephenson, D. B., and Wagener, T.: Sensitivity analysis of environmental models: A systematic review with practical workflow, Environ. Modell. Softw., 79, 214–232, 2016.

Powell, M. J.: The BOBYQA algorithm for bound constrained optimization without derivatives, Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26–46, 2009.

Priestley, C. and Taylor, R.: On the assessment of surface heat flux and evaporation using large-scale parameters, Mon. Weather Rev., 100, 81–92, 1972.

Pu, Z., Xu, L., and Salomonson, V. V.: MODIS/Terra observed seasonal variations of snow cover over the Tibetan Plateau, Geophys. Res. Lett., 34, L06706, https://doi.org/10.1029/2007GL029262, 2007.

Refsgaard, J. C., van der Sluijs, J. P., Højberg, A. L., and Vanrolleghem, P. A.: Uncertainty in the environmental modelling process-a framework and guidance, Environ. Modell. Softw., 22, 1543–1556, 2007.

Reichert, P. and Omlin, M.: On the usefulness of overparameterized ecological models, Ecol. Model., 95, 289–299, 1997.

Renard, B., Kavetski, D., Kuczera, G., Thyer, M., and Franks, S. W.: Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors, Water Resour. Res., 46, W05521, https://doi.org/10.1029/2009WR008328, 2010.

Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F., and Commission, E.: Sensitivity analysis practices: Strategies for model-based inference, Reliab. Eng. Syst. Safe., 91, 1109–1125, 2006.

Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S.: Global sensitivity analysis: the primer, John Wiley & Sons, Chichester, 2008.

Samanta, S. and Mackay, D. S.: Flexible automated parameterization of hydrologic models using fuzzy logic, Water Resour. Res., 39, 1009, https://doi.org/10.1029/2002WR001349, 2003.

Savenije, H. H.: Equifinality, a blessing in disguise?, Hydrol. Process., 15, 2835–2838, 2001.

Schaefli, B.: Snow hydrology signatures for model identification within a limits-of-acceptability approach, Hydrol. Process., 30, 4019–4035, 2016.

Schoups, G. and Vrugt, J. A.: A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors, Water Resour. Res., 46, W10531, https://doi.org/10.1029/2009WR008933, 2010.

Seibert, J. and Beven, K. J.: Gauging the ungauged basin: how many discharge measurements are needed?, Hydrol. Earth Syst. Sci., 13, 883–892, https://doi.org/10.5194/hess-13-883-2009, 2009.

Shen, Z. Y., Chen, L., and Chen, T.: Analysis of parameter uncertainty in hydrological and sediment modeling using GLUE method: a case study of SWAT model applied to Three Gorges Reservoir Region, China, Hydrol. Earth Syst. Sci., 16, 121–132, https://doi.org/10.5194/hess-16-121-2012, 2012.

Skaugen, T. and Weltzien, I. H.: A model for the spatial distribution of snow water equivalent parameterized from the spatial variability of precipitation, The Cryosphere, 10, 1947–1963, https://doi.org/10.5194/tc-10-1947-2016, 2016.

Statkraft: Statkraft information page, available at: https://www.statkraft.com/, last access: 20 June 2018.

Stedinger, J. R., Vogel, R. M., Lee, S. U., and Batchelder, R.: Appraisal of the generalized likelihood uncertainty estimation (GLUE) method, Water Resour. Res., 44, W00B06, https://doi.org/10.1029/2008WR006822, 2008.

Sun, W., Wang, Y., Wang, G., Cui, X., Yu, J., Zuo, D., and Xu, Z.: Physically based distributed hydrological model calibration based on a short period of streamflow data: case studies in four Chinese basins, Hydrol. Earth Syst. Sci., 21, 251–265, https://doi.org/10.5194/hess-21-251-2017, 2017.

Tripp, D. R. and Niemann, J. D.: Evaluating the parameter identifiability and structural validity of a probability-distributed model for soil moisture, J. Hydrol., 353, 93–108, 2008.

Udnæs, H. C., Alfnes, E., and Andreassen, L. M.: Improving runoff modelling using satellite-derived snow covered area, Hydrol. Res., 38, 21–32, 2007.

Vrugt, J. A., Ter Braak, C. J., Gupta, H. V., and Robinson, B. A.: Equifinality of formal (DREAM) and informal (GLUE) Bayesian approaches in hydrologic modeling, Stoch. Env. Res. Risk. A., 23, 1011–1026, 2009.

Wagener, T., McIntyre, N., Lees, M., Wheater, H., and Gupta, H.: Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic identifiability analysis, Hydrol. Process., 17, 455–476, 2003.

Xiong, L. and O'Connor, K. M.: An empirical method to improve the prediction limits of the GLUE methodology in rainfall-runoff modeling, J. Hydrol., 349, 115–124, 2008.

Xiong, L., Wan, M., Wei, X., and O'connor, K. M.: Indices for assessing the prediction bounds of hydrological models and application by generalised likelihood uncertainty estimation, Hydrolog. Sci. J., 54, 852–871, 2009.