Parameter uncertainty analysis for an operational hydrological model using residual based and limits of acceptability approaches

Parameter uncertainty estimation is one of the major challenges in hydrological modelling. Here we present parameter uncertainty analysis of a recently released distributed conceptual hydrological model applied in the Nea catchment, Norway. Two variants of the generalized likelihood uncertainty estimation (GLUE) methodologies, one based on the residuals and the other on the limits of acceptability, were employed. Streamflow and remote sensing snow cover data were 10 used in conditioning model parameters and in model validation. When using the GLUE limit of acceptability (GLUE LOA) approach, a streamflow observation error of 25 % was assumed. Neither the original limits, nor relaxing the limits up to a physically meaningful value, yielded a behavioural model capable of predicting streamflow within the limits in 100 % of the observations. As an alternative to relaxing the limits; the requirement for percentage of model predictions falling within the original limits was relaxed. An empirical approach was introduced to define the degree of relaxation. The result shows that 15 snow and water balance related parameters induce relatively higher streamflow uncertainty than catchment response parameters. Comparable results were obtained from behavioural models selected using the two GLUE methodologies.

thereby lead to validation data uncertainty. Structural uncertainty may result from the underlying assumptions and simplifications in the model formulation as well as from application of the model to conditions inconsistent with the model structure (Tripp and Niemann, 2008). Parametric uncertainty reflects the inability to specify exact values of model parameters (Renard et al, 2010) and it may stem from errors in input data and observations used for model conditioning as well as due to epistemic errors in model structure. An increased awareness of these modelling uncertainties and the need for 5 quality control of such models requires the integration of uncertainty analysis into the modelling process from the very beginning (Beven, 1989;Saltelli et al., 2006, Refsgaard et al., 2007. Uncertainty analysis techniques can be classified as frequentist or Bayesian approaches, probabilistic or non-probabilistic approaches (e.g. Montanari et al. 2009), or as formal or informal approaches (e.g. Vrugt et al., 2009). Among the most widely used techniques in hydrological modelling are the formal Bayesian and the GLUE methods (Jin et al., 2010). The

10
formal Bayesian approach makes strong assumptions about the statistics of observed data; with the likelihood function defined based on assumptions about the nature of the residuals (Schoups and Vrugt, 2010). However, the choice of an adequate likelihood function has been the subject of considerable debate. According to Beven and Smith (2014), a formal probabilistic likelihood function will have limited value since non-stationary epistemic uncertainties cannot be adequately represented by a statistical model. In GLUE, the likelihood measure is associated with a parameter set and should ideally 15 reflect all the different sources of uncertainty (Beven and Smith, 2014). The original GLUE methodology has been subject of debate for using a subjectively set threshold of behavioral models. This problem is common to most residual-based model selection methods (Schaefli, 2016). The extended concept of behavioral models in the GLUE limits of acceptability approach (GLUE LOA) (Beven, 2006) attempts to overcome this drawback through use of error bounds of the observational dataset.
The GLUE LOA methodology involves specifying limits around some observational data within which model 20 predictions are required to lie and thereby considered acceptable for the intended model application. The acceptability limits are set prior to running a model and, among other considerations, they are expected to take into account incommensurability and uncertainty in both the input and evaluation data (Beven, 2009). However, identification of models that reproduce the observed system behavior within the limits of measurement error is not easy due to time-varying errors in the input data and model structure (e.g. Beven, 2016). This difficulty is even more pronounced when input and other sources of errors are not 25 explicitly accounted in defining the LOA.
Good quality time series data and associated uncertainties are not always readily available. For example, in regulated catchments the inflow hydrograph is often estimated from changes in storage volume and outflows using the water balance equation. Thus, as in the case of our study catchment, no stage -discharge relationship exists for estimating the streamflow uncertainty using the usual practice, i.e. by fitting different rating curves. In such instances the alternative is to assume an 30 observation error proportional to the observational data. However, the identification of behavioral models without due consideration to such less precise observation error estimates may lead to the rejection of a useful model (i.e. making Type II error). Some of the measures taken to minimize the risk of making type II error when identifying behavioral models using the GLUE LOA include: extending the limits (e.g. Blazkova and Beven, 2009;Liu et al., 2009) as well as using different model realization for different periods of a hydrological year (e.g. Choi and Beven, 2007). In this study, instead of relaxing 35 the limits, the percentage of observations where model predictions are required to fall within the acceptability limits was relaxed.
The GLUE methodology has been widely used in various disciplines (Beven, 2009;Efstratiadis and Koutsoyiannis, 2010) primarily due to its conceptual simplicity and ease of implementation. Further, its suitability for parallel implementation on Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2018-158 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 16 April 2018 c Author(s) 2018. CC BY 4.0 License. distributed computer systems as well as its general strategy in dealing with equifinality in model calibration make it an appealing framework (Blasone et al., 2008;Shen et al., 2012;Mirzaei et al., 2015). In this study model parameters were constrained using streamflow and the MODIS snow cover product (Hall et al., 2006). Multi-criteria model conditioning helps to reduce prediction uncertainty through improved parameter identification (e.g. Efstratiadis and Koutsoyiannis, 2010;Finger et al., 2015); and GLUE provides a flexible approach for using multi-criteria 5 methods through different ways of combining measures. Besides streamflow, one of the observations commonly used in multi-criteria conditioning of rainfall-runoff models in snow dominated catchments is snow data. Remote sensing snow cover data have been used in several hydrological modelling studies for deriving and updating a snow depletion curve (e.g. Lee et al., 2005;Kolberg and Gottschalk, 2006;Bavera et al., 2012); as well as in multi-criteria based model calibration and simulated snow cover validation (e.g. Udnaes et al., 2007;Parajka and Bloschl, 2008;Berezowski et al., 2015). However, 10 studies involving combined uncertainty of streamflow and snow cover predictions using the GLUE methodology are still missing in the literature.
The main objective of this study is to assess parameter uncertainty for a recently developed distributed conceptual hydrological model using the GLUE methodology with due consideration to the model's main application as an operational hydrological model. The second objective is to investigate the potential value of snow cover data as additional observation in 15 conditioning model parameters in the study area. The third objective is to assess the possibility of using a time relaxed GLUE LOA approach for constraining model parameters. In doing so, we employ a novel empirical approach for implicit accounting for the effects of input and observational data errors by relaxing the percentage of time steps in which prediction of model realizations fall within the limits. This paper is organized as follows. First the hydrological model as well as the study site and relevant data used in this 20 study are briefly described in sections 2.1 and 2.2. The procedures followed to setup the uncertainty analyses are then outlined in section 2.3. In section 3, the results from parameter uncertainty as well as uncertainty of streamflow and snow cover predictions using the residual based GLUE approach are presented. The results from the relaxed GLUE LOA are also presented in this section. Finally in sections 4 and 5, the analysis results and their implication on the hydrologic model, the data as well as the methodologies followed are discussed and conclusions are drawn.  . The modelling framework has three main models (method stacks) and in this study, the PT_GS_K model was used for uncertainty analysis. PT_GS_K uses the Priestley-Taylor (PT) method (Priestley, 1972) for estimating potential evaporation; a quasi-physical based method for snow melt, sub-grid snow distribution and mass balance calculations (GS method); and a simple storage-discharge function (Lambert, 1972;Kirchner, 2009) for catchment response calculation (K). Overall, these three methods constitute the PT_GS_K model in Shyft. The framework establishes a sequence of spatially distributed cells of arbitrary size and shape. As such it can provide lumped (single cell) or discretized (spatially distributed) calculations, as in this study. The model was 35 applied to each of the grid-cells and for each time step.
Within the GS method, precipitation falling in each grid-cell is classified as solid or liquid precipitation depending on a threshold temperature (tx) and on the local temperature values. The snow melt energy is the sum effect of different energy Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2018-158 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 16 April 2018 c Author(s) 2018. CC BY 4.0 License. sources in the system such as short wave and long wave radiation as well as the turbulent sensible and latent energy fluxes.
Among other factors, the energy contribution from short wave radiation depends on snow albedo. For a given time step (t), the snow albedo of each grid cell depends on the minimum (α ) and maximum (α ) albedo values as well as on air temperature ( ) (Eq. 1). In this method the decay rates of albedo due to snow ageing as a function of temperature, i.e. the fast (fast ADR,α ) and slow (slow ADR,α ) albedo decay rates corresponding to temperature conditions above and below 5 0 o C respectively, are parameterized. Turbulent heat contribution is the sum of latent and sensible heat. Wind turbulence is linearly related to wind speed using a wind constant and wind scale from the intercept and slope of the linear function, respectively (Hegdahl et al., 2016).
Sub-grid snow distribution is described by a three parameter Gamma probability distribution snow depletion curve (SDC) (Liston, 1999;Kolberg and Gottschalk, 2006). The traditional Gamma distribution is parameterized with two values, i.e. the 10 average amount of snow at the onset of the melt season (mm) and the shape value ( ); based on the assumption that the ground is completely snow covered before the onset of melt. Since this assumption may not hold true for a number of grid cells especially in alpine areas, a third parameter representing the bare ground fraction at the onset of snow melt season has been introduced (Kolberg and Gottschalk, 2006). The two parameter Gamma distribution (Eq. 2) is thus applied only to the remaining portion of a grid cell to estimate the fraction of the initially snow covered area where snow has disappeared (y').

15
The initial bare ground fraction parameter is constant for all years. At each time step, the state parameters such as snow water equivalent (SWE) and snow cover area (SCA) are updated using the SDC function. In the GS method, the shape value is a direct transformation of the sub-grid snow coefficient of variation (CV s ).
Where denotes the Gamma probability density function and γ is the incomplete Gamma function. and ( ) 20 respectively refer to point snow storage and the accumulated melt depth (mm) at time t since the onset of the melt season.
represents the scale parameter with = and = −2 .
The catchment response function is based on the storage-discharge relationship concept described in Kirchner (2009) and represents the sensitivity of discharge to changes in storage (Eq. 3). This method is based on the idea that catchment sensitivity to changes in storage i.e. g(Q) can be estimated from the time series of discharge alone through fitting empirical

25
functions to the data such as the quadratic equation. Since discharge is generally non-linear and typically varies by many orders of magnitude, the recommended approach is to use log transformed discharge values in order to avoid the risk of numerical instability. In this method, the three parameters of the catchment response function, i.e. c1, c2, and c3 are parameterized.
The potential evaporation calculation in the PT method requires net radiation and the slope of saturated vapor pressure as well as the Priestley-Taylor parameter, the psychometric constant, and the latent heat of vaporization (e.g. Matt et al., 2018).

35
Hydrol. Earth Syst. The latter three variables are kept constant in the PT method. Actual evapotranspiration is assumed to take place only from snow free areas and it is estimated as a function of potential evapotranspiration and a scaling factor.
In the default parameter settings of the PT_GS_K model seven parameters are allowed to vary in conditioning the model.
The same setting was also followed in this study with the addition that the sub-grid snow coefficient of variation was also considered an uncertain model parameter. Table 1 shows list of these parameters with their range of possible values.

Study area and data
This study was conducted using climatic and catchment data from the Nea-catchment (11. The PT_GS_K model requires temperature, precipitation, radiation, relative humidity, and wind speed as forcing data. In this study, daily time series data of these parameters for the study area were obtained from Statkraft as point measurement,

20
with the exception of relative humidity. Daily gridded relative humidity data was retrieved from ERA-interim (Dee et al., 2011). The Model uses a Bayesian Kriging approach to distribute the point temperature data over the domain, while for the other forcing variables it uses an inverse distance weighting approach.
Two observational datasets, streamflow and snow cover, were used in this study. Daily observed streamflow measurements covering four hydrological years (September 1 to August 31) were provided for the study area. The climatic 25 data show that these hydrological years represented periods both above and below the long-term average annual precipitation.
Years 2011 and 2013, respectively, were the wettest and driest years in over 10 years. Daily snow cover fraction data (SCF) was retrieved from NASA MODIS snow cover products (MODIS SCF) (Hall et al., 2006). Frequent cloud cover is one of the major challenges when using MODIS and other optical remote sensing data in Norway. In order to minimize the effect of obstructions and misclassification errors emanating from clouds and other sources, a composite dataset was formed using 30 data retrieved from the Aqua and Terra satellites, MYD10A1 and MOD10A1 products respectively In this analysis, PT_GS_K was setup in distributed mode over 812 grid cells; requiring the following physiographic data of each grid cell: average elevation, and grid cell total area, as well as the areal fractions of forest, reservoir, lake, and glacier.
Data for these physiographic variables was retrieved from two sources: the land cover data from copernicus land monitoring service (https://land.copernicus.eu/pan-european/corine-land-cover) and the 10m digital elevation model (

The uncertainty analysis methods
In this study modelling and parameter uncertainty analysis was conducted using two GLUE variants. First, the hydrological model and its snow sub-model were subjected to uncertainty analysis using the residual based GLUE methodology. When using this approach, the relevant model parameters were initially conditioned using either streamflow or MODIS SCF. In subsequent analysis, they were conditioned using both streamflow and SCF. Following that, uncertainty analysis was 5 conducted using the relaxed GLUE LOA approach.

Sampling the parameter dimensions
The performance of all uncertainty analysis techniques depends on the efficiency of the sample to represent the entire response surface (Pappenberger et al., 2008). In this study, prior distributions of the uncertain model parameters were not known and hence a uniform distribution was assumed. The challenge in using uniform distribution is, however, to adequately 10 sample the entire parameter dimensions. To overcome this challenge and to better identify regions of behavioral simulations, a sample size of 100,000 runs was used. Each model run is a realization of a parameter set randomly drawn from the domains of the model parameters. Matlab scripts from the SAFE toolbox (Pianosi et al., 2015) were used as a basis to characterize behavioral and non-behavioral models.

15
In this study, the performance of each model realization was evaluated by using relevant likelihood measures. Residual based informal likelihood measures are considered suitable measures of fit when large data sets such as rainfall-runoff time series exist for model conditioning (Hassan et al., 2008). The Nash-Sutcliffe efficiency (NSE, Eq. 4) belongs to these groups of likelihood measures; and it is the most widely-used likelihood measure for assessing the fitness of model parameters in hydrological modelling (Xiong and O'Connor, 2008). Further the main end users of the model commonly use NSE both in 20 calibration and evaluation of hydrological models. Thus use of this performance measure as a streamflow likelihood measure makes it easier both in setting the threshold value for behavioral models (i.e. based on previous experience) and in communicating model performance outputs. However, the NSE calculated using raw values tends to overestimate model performance during peak streamflow and underestimate during low streamflow conditions (e.g. Krause et al., 2005). To partly overcome this problem, NSE is often calculated with log transformed observed and simulated values. In this study,

25
both NSE and NSE with log transformed streamflow values (LnNSE) were thus employed as likelihood measures in evaluating each model run.
In which represents simulated streamflow, is observed streamflow and ̅ represents mean value of observed streamflow series.

30
Within the residual based GLUE procedure, the definition of threshold likelihood value at which the model performance is judged reasonable is a subjective choice by the modeler. In this study, NSE and LnNSE of 0.7 and 0.6 were respectively considered as the threshold values for behavioral models. These values were chosen with due consideration to the input and observational data quality as well as the relative importance given to high streamflow in relation to low streamflow conditions in the hydropower industries. In the case of the combined likelihood measure, a weighted average threshold value (e.g. Hassan et al., 2008) was calculated assuming each likelihood measure to have a weight proportional to its threshold value. Accordingly, the NSE and LnNSE likelihood measures were respectively assigned weights of 0.54 and 0.46 (Eq. 5).
( | ( )) = 0.46( ) + 0.54( ) where ( | ( )) represents the combined likelihood measure for the i th model realization with model prediction of ( ) which is a function of the set of model parameters , and corresponding to the observations ( ). and 5 respectively represent the likelihood measures based on NSE and LnNSE. Models producing likelihood measure values greater than or equal to the threshold value were labeled as behavioral models and were retained for use in further analysis.
The root mean squared error (RMSE) of simulated and MODIS fractional snow cover was used as a likelihood measure of SCF. A threshold value of 0.17 was set when using the RMSE in model conditioning. This value was fixed based on average performance of similar conceptual hydrological models as a reference (e.g. Skaugen and Weltzien, 2016); and with based on which the predicted SCF values for different quantiles can be extracted.
When selecting behavioral models using the combined likelihoods of streamflow and SCF, the merging of these 25 likelihoods was carried out in two steps. First the likelihoods representing low and high flow condition, viz. LnNSE and NSE were combined following similar procedure as described above. The likelihoods of streamflow and SCF were separately rescaled such that their respective weights would sum to unity following a similar procedure to that used in Brazier et al. (2000). The combined streamflow likelihood and the SCF likelihood were subsequently multiplied to get a combined likelihood measure of streamflow and SCF.

The relaxed GLUE LOA approach
Unlike the residual based model selection approaches, including the residual based GLUE methodology, the GLUE LOA approach relies on an assessment of uncertainty in the observational data. Uncertainty analysis was also thus conducted in this study using the GLUE LOA approach and its results compared against those from the residual based GLUE methodology.

35
In this study when using the GLUE LOA approach, both the streamflow and MODIS SCF data were considered as uncertain observations. Since no uncertainty data was available for streamflow observations in the study site, mean streamflow uncertainty of 25 % was assumed and the streamflow limits were defined using this value. Although, the maximum expected error of MODIS snow cover products under clear-sky conditions is reported to be 15 % for forest areas  (MODIS, 2010), cloud coverage coupled with lack of contrast between clouds and snow cover may severely affect the accuracy. And in some cases this leads to misclassification of snow as land (e.g. Parajka et al., 2012). Thus a SCF uncertainty of 25-50 % was assumed to represent the errors associated with the SCF observations and the input data.
An alternative approach was employed to minimize the risk of rejecting useful model realizations due to using assumed average observational error bounds and due to lack of explicitly accounting the time-varying level of observational and input within the limits (hereafter referred as threshold pLOA) in turn was set such that the 5-95 % prediction limit of streamflow, reported as Containing Ratio (CR, see Eq. 5), is close to the value obtained using the residual based GLUE methodology.

10
The procedure for relaxing the original GLUE LOA requirement during the calibration period involves the following steps: Step 1: define an acceptable prediction limit (CR) at a chosen certainty level (e.g. 5-95 %). In this study the CR value obtained for the calibration period using the residual based GLUE methodology was adopted as an acceptable CR value.
Step 2: relax the acceptable percentage of observations where model predictions fall within the limits. This is done by gradually lowering the requirement for bracketing the observations in 100% of the time steps up to the acceptable pLOA.

15
Step 3: run calibration and test whether each model realization prediction falls within the limits at least for the specified percentage of the total observations. If model realizations that satisfy the relaxed acceptability criteria are found, proceed to step 4, otherwise lower the threshold pLOA further and repeat this step.
Step 4  weight. Following the procedure by Blazkova and Beven (2009), the weights associated with streamflow and MODIS SCF were combined by taking the sum of these two criteria and rescaling them such that the sum of the weights for behavioral models is unity. The behavioral model realizations were used for prediction weighted by their overall degree of performance.

30
A split-sample based cross-validation of streamflow predictions was used to alternately evaluate how well the behavioral models identified at a given calibration period are able to reproduce the observed values from another period. The hydrologic model was run for four years at daily time step. The first month of each hydrological year was considered as a spin up period; and hence excluded from all uncertainty analyses. Each of the four years was alternately used to identify behavioral models and the remaining three years were individually used to assess the modelling uncertainty.

35
In this study the modelling uncertainty was evaluated using both qualitative and quantitative evaluation techniques. The upper and lower streamflow prediction limits as well as observed values were plotted on the same graph to visually assess capability of the identified behavioral models in bracketing the observations. The Containing Ratio (CR) index was also used to analyze the prediction uncertainty following a similar procedure to that used in some studies involving the GLUE methodology (e.g. Xiong et al., 2009;He et al., 2011). CR is expressed as the ratio of the number of observations falling within respective prediction bounds to the total number of observations (Eq. 5).
As an alternative to a crisp prediction for an observation (e.g. Xiong and O'Connor, 2008), the median (50 %) streamflow prediction was also estimated from the behavioral model simulations and compared against observations using both NSE and LnNSE as goodness of fit measures. Similarly, the critical success index (CSI, Table 2) and RMSE were used as goodness of fit measures for median SCF prediction. When using RMSE, the fractional snow cover data of each grid cell 10 was directly employed in validating median predictions. CSI represents the number of grid-cells where the snow events are correctly predicted out of the total number of grid-cells where snow is predicted in the model. It was calculated based on a binary snow cover data using the two by two contingency table analysis (Table 2) following a similar procedure to that used in Hanzer et al. (2016). When converting the snow cover fraction to a binary measure, a grid-cell was classified as snow covered if at least 50 % of its area is snow covered. The aforementioned less sensitive model parameters can, however, have high effect on model outputs through interaction with other parameters. Some degree of interaction between model parameters can be seen from the correlation shown in Fig.   30 4. For example, a general decreasing trend in model performance can be noticed with a joint increase in c1 and c2. The strong influence of tx in constraining the output is also evident in these plots.
The posterior distribution histograms (Fig. 5) and the statistical summary table of posterior distribution (Table 3) illustrate variability in distribution characteristics of the model parameters. relatively flat distribution across their respective parameter dimensions. It should however be remembered that in the GLUE methodology, it is the set of parameter values that gives a behavioral model. percentile range (shaded band) varied over time and relatively higher uncertainty was noticed during high streamflow than low streamflow periods.

Uncertainty of streamflow predictions
As can be seen from the summary table of cross-validation results (Table 4) 10 The evaluation result generally shows that the median prediction of behavioral models selected using the combined likelihood was able to reproduce the observed values remarkably well with average NSE and LnNSE of 0.86 and 0.72 respectively for the validation period. However, performance of the behavioral models identified using NSE was very low when evaluated using LnNSE in year 2014.

15
Snow cover fractions (SCF) and snow water equivalent (SWE) are two main outputs of the snow sub-model (GS) of the PT_GS_K model. In this study an initial single-likelihood based conditioning of the GS specific parameters was carried out using MODIS SCF only and RMSE as a measure of model performance.
The cross-validation result of predicted median values against MODIS SCF observations is shown in Table 5

Uncertainty of streamflow and snow cover predictions using both observations
The cross-validation result of simulated streamflow and SCF against observations is shown in Table 6. A similar model performance was observed when model parameters are conditioned using both streamflow and MODIS SCF as compared to  The relatively low quality of MODIS SCF data as compared to the streamflow data for the study site may also partly explain this phenomenon.

Uncertainty analysis using the relaxed GLUE LOA approach
The median streamflow prediction of behavioural models identified using the relaxed GLUE LOA was able to mimic the observed values very well with a mean NSE and LnNSE of 0.85 and 0.7, respectively for the validation period (Table 7). A 5 comparable performance was observed between models selected using the residual based GLUE and the relaxed GLUE LOA.
The similarity in median predicted streamflow by these two GLUE methodologies can also be noticed from visual comparison of the resulting hydrographs ( Fig. 6 and Fig. 7). A mean streamflow CR value of 0.75 was obtained for the validation period when using the relaxed GLUE LOA. This shows slightly better capability of the 5-95 % prediction bounds in bracketing the observations as compared to predictions using the residual based GLUE methodology when both 10 streamflow and SCF are used in model conditioning.
The behavioural models selected using the relaxed GLUE LOA approach were also able to adequately reproduce observed SCF with a mean RMSE and CSI of 0.11 and 0.98 respectively for the validation period. Generally high prediction The overall result, however, indicates an improved capability of the 5-95 % prediction bounds in bracketing the SCF observations as compared to predictions using the residual based GLUE methodology.

Discussion
The streamflow prediction uncertainty analyses results show that model performance was relatively lower during lowstreamflow than high-streamflow conditions throughout most validation periods (e.g. Table 4). A similar result was reported by Choi and Beven (2007) in their multi-period cluster based uncertainty analysis in Bukmoon catchment, South Korea, where high percentage of simulation bias was observed during the drier seasons due to relatively poor model performance 25 during these periods. The result of this study is thus consistent with the general observation that catchment hydrologic models perform relatively well in wet conditions, but break down during low streamflow conditions (e.g. Kirchner, 2009). In the case of results from the residual based GLUE methodology, this can also be partly attributed to the nature of the likelihood measure used to identify the behavioral models. The result reveals this observation, where model performance during low streamflow periods (LnNSE) was improved when using the combined likelihood measures as compared to using 30 NSE alone. This is because models identified using NSE alone strongly reflect hydrologic characteristics of the high streamflow periods and are expected to perform more poorly during low streamflow conditions.
In order to assess the potential value of MODIS SCF in constraining model parameters, the snow sub-model parameters were constrained using this observation and the posterior distribution of the individual parameters were compared against corresponding distributions that resulted from model conditioning using streamflow only. Parameter inference based on SCF

35
only as a conditioning observation gave some parameter estimates that deviate significantly from those obtained when conditioned with streamflow only (Fig. 9). The box plots depict posterior distribution of the snow related parameters separately conditioned using streamflow and SCF. For the ease of comparison, parameter values were scaled between 0 and 1. From these plots it can be seen that tx and wind scale are the model parameters most sensitive to the conditioning data type with a significant shift of their quartiles towards upper part of the parameter dimensions when conditioned using SCF.
Whereas the fast ADR, slow ADR, and snow CV did not show significant displacement in their posterior distribution. These parameters were also identified as the least sensitive model parameters when the model was constrained using streamflow 5 only.
Generally in snow models with the sub-grid snow distribution component parameterized using statistical probability distribution function, low snow CV results to faster depletion rate of snow covered fraction (e.g. Liston, 2004). The use of GLUE LOA for testing hydrologic models as hypotheses without a due consideration to errors in input data may lead to rejection of useful models that might adequately represent the catchment behavior and thereby to making type II error (false negative). In the past, various attempts have been made to minimize the risk of making type II errors in model 25 calibration studies using the GLUE and other frameworks. In some studies an improved calibration of hydrologic models was obtained through independent calibration of sub-periods of a time series (e.g. Boyle et al., 2000;Samanta and Mackay, 2003). When it comes to the GLUE LOA approach, extending the limits (e.g. Blazkova and Beven, 2009;Liu et al., 2009) as well as using different model realization for different periods of a hydrological year (e.g. Choi and Beven (2007) are some of the measures taken to minimize the risk of making Type II errors. Common to all these measures is that they attempt to relax 30 the selection criteria for behavioral models.
In this study when using the GLUE LOA approach, the streamflow bounds were set to +/-25 % and the result shows that none of the model realizations were able to satisfy the LOA criteria without one or more of their predictions falling outside the acceptable streamflow bounds. The failure rate was higher during low flow conditions as compared to high flow conditions. An initial attempt was made to relax the limit of acceptability by extending the streamflow bounds. Regardless,

35
no model realization with its predictions falling within the error bounds for all observations was found until the limits were extended to over +/-85 %. This relaxed acceptability limit seems less reasonable in terms of its physical meaning as an error bound. Therefore, rather than relaxing the limits, an alternative empirical approach was followed by relaxing the number of simulation time steps which fulfilled the original LOA criterion. The procedure involves defining the acceptable This empirical approach is based on the observed relationship between prediction uncertainty and number of behavioral models which in turn is a function of the selection criterion. As the threshold value of a likelihood measure increases (in the case of residual based GLUE) or absolute value of the limits decreases (in the case of GLUE LOA), the simulated runoff 5 series gradually converges, though not necessarily to the observations. A similar observation was also reported in other GLUE based uncertainty studies (e.g. Xiong et al., 2008). A further analysis in this study reveals that, as the percentage of observations required to be bracketed by each model realization (pLOA) increases, the number of behavioral models decreases and thereby the simulated runoff series converges resulting to low CR ( Figure 10). In this study, the threshold pLOA for each calibration period was defined in such a way that the 5-95 % prediction uncertainties of streamflow using the 10 residual and the LOA based GLUE methodologies are similar. Defining the threshold pLOA this way helps to set a reasonable value that minimizes the risk of making type II errors while maintaining the overall model accuracy by rejecting the inclusion of non-behavioural models. Furthermore, it helps to grossly compare the performance of behavioural models selected using the relaxed GLUE LOA against the residual based GLUE in terms of their ability to reproduce the median streamflow and SCF predictions at similar level of uncertainty (i.e. the CR used to set pLOA).

15
Although it is difficult to single out the effects of input data error from model structural error on model performance using the GLUE methodology, the error patterns may aid in assessing model performance in different periods of the hydrologic year. Generally, a good model structure coupled with good data is not expected to give a consistent bias (e.g. Liu et al., 2009). Figure 11 shows hydrologic year, was able to effectively identify behavioural models; and this result is consistent with findings of other studies dealing with the effect of observation size on constraining model parameters (e.g. Seibert and Beven, 2009;Liu and Han, 2010;Sun et al., 2017). The information content of the input and observation datasets is more important than the length of the datasets especially in continuous rainfall-runoff modelling.

Conclusions
Two GLUE methodology variants were applied for parameter uncertainty analysis of a distributed conceptual hydrological model. The analysis result from the residual based GLUE methodology shows that the catchment response parameters, viz.
c1, c2, and c3 as well as the wind scale are the most sensitive model parameters. More caution is thus required when defining the value range of these parameters. On the other hand, the fast and slow albedo decay rates as well as the snow CV

5
are relatively more uncertain model parameters.
Model conditioning using combined streamflow and MODIS SCF did not improve the median prediction of streamflow as compared to the result when model parameters are conditioned using streamflow only. A similar result was also observed for SCF predictions. The additional information from the MODIS SCF data was generally less significant in constraining the rainfall-runoff model parameters.

10
When using the GLUE LOA approach, the model did not provide any behavioral simulation in the sample tried on the basis of treating the assumed observational error bound as 5-95% error. A relaxation was needed in order to partly overcome the limitations of using constant observational error proportionality and not taking an explicit account of the other sources of uncertainty such as from input data errors. A relaxed GLUE LOA approach was introduced that allows a relaxation on the number of time steps required to achieve the LOA. Similar results are obtained using both the residual based GLUE and the 15 relaxed GLUE LOA approaches. Relaxing the percentage of observations required to be bracketed per simulation period by a particular model realization (pLOA) was found to be more effective than relaxing the observational error bounds.