Accounting for dependencies in regionalized signatures for predictions in ungauged catchments.

. A recurrent problem in hydrology is the absence of streamﬂow data to calibrate rainfall–runoff models. A commonly used approach in such circumstances condi-tions model parameters on regionalized response signatures. While several different signatures are often available to be included in this process, an outstanding challenge is the selection of signatures that provide useful and complementary information. Different signatures do not necessarily provide independent information and this has led to signatures being omitted or included on a subjective basis. This paper presents a method that accounts for the inter-signature error correlation structure so that regional information is neither neglected nor double-counted when multiple signatures are included. Using 84 catchments from the MOPEX database, observed signatures are regressed against physical and climatic catchment attributes. The derived relationships are then utilized to assess the joint probability distribution of the signature regionalization errors that is subsequently used in a Bayesian procedure to condition a rainfall–runoff model. The results show that the consideration of the inter-signature error structure may improve predictions when the error correlations are strong.


Introduction
In many areas of the world the absence of past observational streamflow time series to calibrate rainfall-runoff models limits the ability to apply such models reliably to predict streamflow and inform effective water resources management. Whilst a large and increasing number of regions across the world are insufficiently gauged (Mishra and Coulibaly, 2009), there are also many highly monitored catchments (Gupta et al., 2014). Transferring the knowledge gained in data-rich areas to ungauged catchments -a process known as regionalization -offers a possible way of overcoming the absence of streamflow observations in data-scarce regions. Several techniques for transferring information are reported in the literature (for an overview of different methods used in continuous streamflow regionalization, see He et al. (2011), Peel andBlöschl (2011), and Razavi and Coulibaly (2013); and for a recent comparative assessment of some of the most commonly used methods, see Parajka et al., 2013).
A commonly applied approach is to use response signatures (e.g., the runoff ratio and the base flow index), which can provide insight into the hydrological functional behavior of a catchment . Response signatures are calculated from available system output or inputoutput time series for numerous gauged catchments with known catchment attributes, i.e., physiographic and/or meteorological attributes (drainage area, latitude and longitude, average annual temperature, average monthly precipitation, etc.). Subsequently, statistical models relating each response signature to a set of catchment attributes can be identified.
Given the attributes of an ungauged catchment, the signatures for the ungauged location can then be estimated using the derived statistical models. Numerous regional models of this type can be found in the literature (e.g., Boorman et al., 1995). These regionalized signatures can be used to constrain the prior range of streamflow simulations generated using a preselected rainfall-runoff model structure and hence restrict the model parameter space (Yadav et al., 2007;Zhang et al., 2008;Bulygina et al., 2009;Castiglioni et al., 2010). Advantages of this approach include (1) the flexibility in the selection of the response signatures allowing it to be based on the specific parts of the hydrograph that are of greatest importance for a given application and, if known, on the dominant hydrological processes of the catchment, and (2) access to readily available regional models for different signatures in the literature (such as base flow index from the Hydrology of Soil Types system (Boorman et al., 1995) and curve number from the United States Department of Agriculture's Soil Conservation Service soil and land use classification; USDA, 1986) hence eliminating the need to build new regional regression models; in addition, (3) the relationships between response signatures and catchment and climatic characteristics are not specific to any rainfall-runoff model nor to a particular calibration method used in the gauged catchments and are therefore not obscured by model structural error and can be used to condition any model. Different ways of incorporating the regionalized information into a catchment model have been suggested in the literature. This includes set-theoretic approaches (e.g., Yadav et al., 2007;Winsemius et al., 2009) and more formal Bayesian data assimilation frameworks (e.g., Bulygina et al., 2009Bulygina et al., , 2011Castiglioni et al., 2010;Singh et al., 2011). Where probability distributions characterizing regionalization quality have been estimated, a Bayesian conditioning procedure is one of the possibilities (Bulygina et al., 2009(Bulygina et al., , 2011. This provides a framework for combining prior knowledge with the regionalized data and/or other sources of information (e.g., small-scale physics-based knowledge and hydrological measurements as in Bulygina et al., 2012), which has the potential to formally encompass the nature of the errors arising from the regionalization.
Conditioning a rainfall-runoff model on multiple independent signatures would reflect a spectrum of processes and, in principle, lead to an accurate prediction of flow time series (Parajka et al., 2013). However, regionalized signatures have correlated errors, for example, if the signatures have been estimated using a common data set of catchment attributes or using the same hydroclimatic data; in general, the correlations are expected to be stronger for pairs of signatures that represent similar functional behaviors of the catchment. This raises the questions of not only how many and which signatures should be used but also how to avoid double counting of the information in signatures with correlated error distributions. Previous applications have tended to use a small number of signatures (e.g., Bulygina et al., 2009Bulygina et al., , 2011 and/or have tended to select signatures that are considered to be independent (e.g., Yadav et al., 2007). When multiple signatures are used, the correlations between the errors in the different sources of information are commonly disregarded (e.g., Bulygina et al., 2012). To make better use of information in available sets of signatures, a formal way of combining them, so that information is neither double-counted nor neglected, is required. Using formal methods to include autocorrelated data errors in model calibration is well researched (e.g., Sorooshian and Dracup, 1980); an application of comparable methods in the regionalization context will allow making more formal and rigorous assessments of the value of correlated information sources.
Formally, in a Bayesian context, it is necessary to distinguish between correlated signatures and correlated signature errors. It is the correlation between the errors that should be accounted for in the likelihood function to avoid double counting of information. It is possible to have two highly correlated signatures that are derived from independent information sources and therefore have uncorrelated errors. In that case, it would be valid to include both signatures in the likelihood function without accounting for correlation. This principle is well established when considering Bayesian calibration to a time series of flow observations, where flow values are typically strongly autocorrelated -but it is the observation error autocorrelation that is relevant to the likelihood function derivation (e.g., Sorooshian and Dracup, 1980). The same principle applies to adopting signatures as the observations. In the case study below, the signatures are derived from a common data set using a common approach, so in practice the signature correlations are comparable to the signature error correlations; nevertheless, for the sake of formality, we use the term "signature error correlations" (or "covariance").
In this paper, we introduce and test a method that considers multiple regionalized signatures, explicitly accounting for the signature error correlations. By formally accounting for the error covariance, we hypothesize that accuracy of flow predictions will generally improve and a greater number of signatures can usefully be included without introducing avoidable bias related to the duplication of information. This should allow the modeler to use all signatures available without having to select, on a more or less subjective basis, the most relevant (independent) signatures. The objective is thus to explore how to get fuller value out of a set of regionalized information than has been achieved in past applications. The method is applied to a set of 84 United States catchments with a broad range of hydrometeorological characteristics, obtained from the Model Parameter Estimation Experiment (MOPEX) data set Schaake et al., 2006). The impact of signature error covariance is assessed using pairs of signatures to condition a rainfall-runoff model. Along with the real data, synthetic streamflow data are used to isolate the effect of model structural error. Further, the model is conditioned on a variable number of regionalized signatures to evaluate whether an increasing number of sig-natures is justifiable when formally accounting for the error covariance.

Bayesian method for signature assimilation
Using a simple least-squares regression, observed signatures of catchments' functional responses are related to physical and climatic attributes of the catchments. Assuming that the same catchment attributes are available for an ungauged location, it is possible to obtain an estimate of the set of signatures for the location. Further, the parametric distribution of regression errors can be directly translated to a response signature(s) likelihood function. The likelihood function can then be used to update the prior available knowledge about model parameters via Bayes' law, which is expressed as where, for one catchment, s * represents the regionalized response signature(s); p( |I, M) is the prior distribution of parameters for a model structure M and inputs I; L(s( )|s * , I, M) is the likelihood function of the modeled response signature(s) s( ) given s * , I and M; p(s * |I, M) is the marginal density of s * ; and p( |s * , I, M) is the posterior distribution of given s * , I and M. For the purpose of this paper, M is selected in advance and considered to be fixed (as it is the common practice in regionalization studies; Wagener and Montanari, 2011), as is I for any one catchment, and so both these terms are dropped from (Eq. 1) for convenience, resulting in p( |s * ) = L(s( )|s * ) × p( ) p(s * ) . (2) Parameter sets are then sampled from the parameter posterior to allow an ensemble of rainfall-runoff simulations and a posterior distribution of flow at each time step to be estimated and evaluated against observed flow. This can be repeated using different sets of signatures and different assumptions about their error correlations.

Prior distribution
To apply Bayes' law (Eq. 2), it is necessary to specify the likelihood function (L(s( )|s * ) in Eq. 2) and the prior distribution (p( ) in Eq. 2). The prior is defined so that it reflects our initial lack of knowledge. We follow Almeida et al. (2013) and sample sets of signature values from uniform distributions representing the feasible ranges of signatures. This approach allows the signatures to be sampled uniformly using a simple amendment to the commonly applied approach of sampling from uniform parameter priors, which avoids highly skewed signature priors that have undue influence on the posterior likelihood. More specifically, N parameter sets (N is equal to 10 000 in our study) are sampled from a uniform distribution using Latin hypercube sampling, so that probability of each parameter set is 1/N (10 −4 in our study). Subsequently, to provide parameter samples that correspond to a uniform in signatures' prior distribution, the parameter probabilities are reweighted (see Almeida et al., 2013), and used in the further posterior distribution approximation. This allows accounting for correlation among the parameters imposed by the uniform in signatures' prior distribution.

Likelihood function approximation
The likelihood functions are defined using joint distributions of respective signature errors obtained from the regionalization model. Errors introduced by the regionalization procedure may come from at least five sources. First, errors are introduced by the fact that the regression model is estimated using a specific sample of catchments rather than the entire population; second, differences may exist between the observed and the true value of the response signature due, for example, to factors such as the discharge record length and time period of record used in the computation (Kennard et al., 2010); third, errors are present due to errors in the catchment properties data; fourth, errors exist due to the incomplete set of catchment properties used as explanatory variables in the regression equations; and, fifth, they exist due to the assumed linear regression structure. It is assumed that the total error model for the regionalized signature(s) s * can be estimated using the following procedure: 1. Considering all available gauged catchments, stepwise regression is applied to each signature independently to determine which predictors to include. The predictors are then fixed for the remaining steps.
2. Considering all available gauged catchments, one catchment is left out and the remaining are used in the fitting of the regression models for each signature.
3. The regression models obtained in step 2 are used to estimate the signature values for the omitted catchment.
4. The error for each signature is calculated for the omitted catchment by comparing the regionalized and observed signature values.
5. The process is repeated for all catchments.
6. A parametric joint probability distribution is fitted to all the computed errors. Furthermore, the errors are tested for independence that allows (approximately) factorizing a joint distribution into a product of marginal distributions.

S. Almeida et al.: Accounting for dependencies in regionalized signatures
The resultant error distribution defines the likelihood function L in Eq.
(2). The main assumption here is that the potentially complex nature of errors in the set of signature values can be usefully represented by the fitted error distributions.

Synthetic case and likelihood functions
To avoid masking the potential value of the regionalized signatures with model structure and observational errors, a "perfect model" is first employed. This involves using the preselected rainfall-runoff model and the observed forcing data to generate the "observed" catchment signatures. The Nash-Sutcliffe criteria (NSE) (Nash and Sutcliffe, 1970) optimal parameter set is taken to generate a "perfect model" streamflow time series for each catchment. To produce regionalized signature analogues in this case, two types of imposed errors are introduced to these "observed" signatures. The first error type is characterized by a range of standard deviations (1, 5, 10 and 20 % of the signature value range observed over all catchments used in this study) and a range of inter-signature error correlations (Pearson correlation coefficients equal to 0, 0.25, 0.50, 0.75, and 0.90). This allows the sensitivity of the results to the regionalization quality and the regionalization errors' correlations to be evaluated. The second error type is set to be equal to the observation-based likelihood function (Sect. 2.2.2). These error structures are the likelihoods used in Eq.
(2) for the synthetic case when flows are generated by a "perfect model".

Study catchments
A set of 84 medium-sized United States catchments (242 to 8657 km 2 ) from the MOPEX database Duan et al., 2006), for which a variety of regional response signature models have been determined in Almeida et al. (2012) (namely runoff ratio, base flow index, streamflow elasticity, slope of slow duration curve, and high pulse count), are used to test the method proposed in this paper. It has proven difficult to derive regionalization equations of acceptable prediction quality for all catchments in the MOPEX data set (Almeida, 2014). This is due to the lack of descriptive power in the set of available catchment attributes, e.g., the attributes do not provide satisfactory information about catchment geology. To isolate the effect of variable geology on the regression equations, the selected 84 catchments are grouped based on the underlying geology, namely, middle Paleozoic sedimentary rocks. Use of more catchments from the MOPEX database would require different regionalization equations due to changing process controls and would be unnecessary given that the focus of the study is on signature error correlations in regionalization models. For more details on the motivation for choosing these specific 84 catchments, see Almeida et al. (2012) and Almeida (2014). The 84 catchments are hydrologically varied with a selection of properties summarized in Table 1. Daily time series for the period from 1 October 1949 to 30 September 1959 are employed. As highlighted in Almeida et al. (2012), these 10 years of data, representing only a subset of all the data available, are assumed to be of sufficient length to capture climatic variability but short enough to avoid effects of longterm climatic trends (Sawicz et al., 2011).

Response signatures
Five response signatures are considered: runoff ratio (RR), base flow index (BFI), streamflow elasticity (SE), slope of flow duration curve (SFDC), and high pulse count (HPC) ( Table 1). This specific subset of signatures is selected to cover a wide range of different qualities of regionalized information and also to ensure that some signature errors are largely uncorrelated, whilst others are strongly correlated (see also Sect. 3.1).
RR reflects the amount of precipitation that becomes streamflow over a certain area and time. It is determined as the ratio of catchment's outlet streamflow and catchment average precipitation over the 10 years used in this study. BFI gives the proportion of streamflow that is considered to be base flow. A simple one-parameter single-pass digital filter method is used to derive BFI (Arnold and Allen, 1999). SE provides a measure of the sensitivity of streamflow to changes in precipitation (Sankarasubramanian et al., 2001). It is calculated as a median of the inter-annual variation in total annual streamflow to the inter-annual variation in total annual precipitation ratios normalized by the long-term runoff ratio (Sawicz et al., 2011;Sankarasubramanian et al., 2001). SFDC gives an indication of the streamflow variability and is calculated as the slope of the flow duration curve between the 33 and 66 % flow exceedance values in a semilog scale (Sawicz et al., 2011). HPC reflects aspects of the high flow regime and catchment flashiness and is calculated as the average number of events per year that exceed 3 times the median daily flow (Clausen and Biggs, 2000;Yadav et al., 2007).

Rainfall-runoff model choice
The probability distributed moisture (PDM) model (Moore, 2007), together with two parallel linear routing stores and a simple snow model (Hock, 2003), is selected with two major motivations (a detailed description of the model is given in Appendix A). First, this type of model has been shown to have a suitable complexity for modeling daily rainfall-runoff over a large sample of the MOPEX catchments . Second, the model has been successfully applied in other regionalization studies across a wide range of climate and physiographic conditions, for example, Calver et al. (1999), Lamb and Kay (2004), McIntyre et al. (2005), Young (2006), and De Vleeschouwer and Pauwels (2013). Even though other model structures may be better suited for some specific catchments, it is prohibitively difficult to vary model structure between catchments and no single model structure will ever be best for all catchments (Lidén and Harlin, 2000;Clark et al., 2008;van Werkhoven et al., 2008). Consequently, the selected model structure is believed to be a sufficient choice for the purposes of this paper. Most importantly, the general framework is independent of the rainfall-runoff model choice.

Posterior distribution and performance assessment
Employing Bayes' law (Eq. 2), the rainfall-runoff model is conditioned on different combinations of signatures: (1) assuming independence between the signature regionalization errors (setting the correlation values to zero in the joint probability function); and (2) accounting for the inter-signature error correlations (using the estimated covariance in the joint probability function).
Two metrics are used to assess the effectiveness of the parameter conditioning procedure: (1) the Bayes factor (Jeffreys, 1961) to assess convergence of the parameter posteriors to known parameter values; (2) the probabilistic Nash-Sutcliffe efficiency (Bulygina et al., 2009) to assess convergence of the flow ensembles to the observed flows.
The Bayes factor (BF) is defined as the ratio between two marginal distributions of the data y (e.g., observed streamflow time series) for two competing hypotheses (H 1 and H 2 ) (Kass and Raftery, 1995) (more detail is given in Appendix B): Thus, to test the impact of representing the error correlations, the hypothesis H 1 corresponds to the inter-signature errors being treated as correlated, while the hypothesis H 2 corresponds to the inter-signature errors assumed to be independent. If the resulting Bayes factor is greater than 1, there is more support for hypothesis H 1 , and the inter-signature error correlation is worth considering. When using synthetic streamflow data ("perfect model" approach), with the streamflow time series generated by a preselected parameter set, p(y|H ) in Eq. (3) can be seen as either the posterior probability of the known observed streamflow time series under hypothesis H or the probability of the known parameter set that generated that particular flow time series under hypothesis H . As in a "perfect model" approach there is no observational error, p(y|H ) is the probability estimated for the known value of the parameter set that generated the observed streamflow under each of the hypotheses H 1 and H 2 . Since there is no known parameter value corresponding to the real data, the application of the Bayes factor is less useful in this situation. In this case, defining y as an NSE-optimal parameter set allows an indication of the relative degree of convergence around the chosen point.
The probabilistic Nash-Sutcliffe efficiency NSEprob (Bulygina et al., 2009) is a probabilistic analogue of the traditional Nash-Sutcliffe efficiency coefficient (Nash and Sutcliffe, 1970), and allows both prediction accuracy and precision to be summarized by a single statistic (Eq. 4), where q t denotes a set of streamflow observations for time t = 1, . . ., T , E[q] is the average value for the q t time series, q t is the simulated time series of streamflow for time t = 1, . . ., T , Var[ q t ] is the prediction variance at time t, E[ q t ] is the mathematical expectation of the predictions at time t, and T is the total number of time steps in the sequence. The first part of Eq. (4) corresponds to the traditional Nash-Sutcliffe efficiency coefficient (Nash and Sutcliffe, 1970) in which expected streamflow values are considered as predictors. The latter part of the equation represents the variance, whereby higher predictor variance corresponds to less precise predictions (Bulygina et al., 2009). An NSEprob of 1 indicates a perfect fit, i.e., the results are both accurate and precise. The incremental improvement in the NSEprob can be used to measure the value of adding signatures into the conditioning or otherwise changing the likelihood function.
For model validation, we use a jack-knife approach (or leave-one-out strategy), commonly employed in regionalization studies (e.g., Merz and Blöschl, 2004;Shu and Ouarda, 2012). One catchment at a time is removed as a test "ungauged" catchment and the remaining gauged catchments are used to support the regionalization process, including steps 2-6 listed in Sect. 2.2.2 The procedure is repeated for each of the available catchments.

Regionalized signature errors and likelihood functions
The regionalization error probability distributions (that define the likelihoods) are generated following steps 2-6 in Sect. 2.2.2 and are shown in Fig. 1. The marginal error distributions, shown on the Fig. 1 diagonal, are approximated using histograms, and parameters of normal distributions are fitted using the method of moments. The univariate Kolmogorov-Smirnov test shows that the marginal distribution normality cannot be rejected at the 95 % confidence level for each of the five signatures. The off-diagonal shows the regionalization errors for different signature pairs (lower off-diagonal), the corresponding correlation coefficient values and their statistical significance (upper off-diagonal). The joint error distributions are approximated using multivariate normal distributions that are fitted using estimates of the marginal normal distribution parameters and the intersignature error correlations. These marginal and joint distributions define the likelihood functions in Eq.
(2). Note that Fig. 1 represents the regionalization errors based on all 84 catchments. Meanwhile, the jack-knife procedure (see Sect. 2.4) utilized in the performance assessment employs only 83 catchments at a time.

The impact of inter-signature error correlations (pairs of signatures)
This section considers the role of inter-signature error correlation on model parameter estimation when pairs of signatures are used. First, different imposed error variances and correlations together with synthetic streamflow data are employed to test the impact of inter-signature error correlation without the impact of model structural error. Then, the results obtained using the observation-based error structure, for both synthetic and observed data streamflow, are analyzed.

Synthetic streamflow data (imposed likelihoods)
Synthetic streamflow data are generated as described in Sect. 2.2.3 and the imposed likelihood functions are defined as described in Sect. 2.2.3. The imposed likelihoods are considered to have standard deviations equal to 1, 5, 10, and 20 % of the signature value range observed over all catchments. A comparison of the imposed error structures under the different levels of variance and the observed error structure is given in Table 2. Furthermore, different inter-signature error correlations are also tested, namely 0 (linear independence), 0.25, 0.50, 0.75, and 0.90. Ten possible pairs of the five response signatures are used in parameter conditioning, and the median Bayes factor, calculated over the 84 MOPEX catchments, is calculated for each pair. The Bayes factor (Eq. 3) compares the two following hypotheses: H 1 , the inter-signature error correlation is to be taken into account, and H 2 , the errors between the different sources of information can be assumed independent. The Bayes factor is found to be relatively insensitive to the selection of response signature pairs (Kruskal-Wallis test). Table 3 summarizes the 95 % pooled confidence intervals for the median Bayes factor across all catchments and across all 10 signature pairs, for each choice of the likelihood (i.e., 20 likelihoods). This provides reference values indicative of the error interdependency importance in model regionalization depending on the signature pair correlations and marginal distribution variances. As it would be expected, the median Bayes factor is equal to 1 when signatures errors are not correlated (i.e., ρ = 0). However, as correlations between signatures errors increase the median Bayes factor increases noticeably. This suggests that considering error correlations allocates higher likelihoods to parameter sets that capture a considered signature pair. Furthermore, the results shown in Table 3 also imply that the median Bayes factor is relatively insensitive to the precision with which the signatures are regionalized.  ble different pairs of signatures, when the observation-based error structure is used for each catchment. Figure 2a shows the results for the observed streamflow data with regionalized signatures calculated from the derived regressions; Fig. 2b shows the results for the synthetic streamflow data with regionalized signatures calculated by adding noise to the exact signature values. The Tukey boxplots in red correspond to pairs of signatures whose errors are statistically significantly correlated (see Fig. 1). The upper whisker represents the upper quartile plus 1.5 times the interquartile range and the lower whisker represents the lower quartile minus 1.5 times the interquartile range. The matrix below Fig. 2b shows the pairs of signatures used. The signature pair [SFDC, HPC] shows the strongest correlation between errors (ρ = 0.65, Fig. 1). A likelihood function with a standard deviation equal to 10 % of the observed signature ranges and ρ = 0.75 in Table 3 is comparable to the observation-based likelihood of the pair [SFDC, HPC] (Table 2), with Table 3 indicating [1.45, 1.53] as a 95 % confidence interval for the median Bayes factor. However, a median Bayes factor of 2.17 is obtained for the observed streamflow data (Fig. 2a). Similar differences are found for the other pairs of signatures, although the comparison with the reference table (Table 3) becomes challenging, as the individual signatures have not been regionalized necessarily with similar quality. On the other hand, Fig. 2b shows that the Bayes factors for the synthetic study (when there is no model structural error) are consistent with the values provided in the look-up Table 3. The difference between the median Bayes factor for the two cases is likely to be caused by the model structure error, or may be related to the location of the NSEoptimal in the parameter space.

Synthetic and observed streamflow data (observation-based likelihoods)
Nevertheless, it is clear from Fig. 2 that those pairs of signatures whose errors are significantly correlated (i.e., [SFDC, HPC], [BFI, HPC], [BFI, SFDC] and [BFI, SE]) have wider interquartile ranges. Furthermore, the pair of signatures with the strongest correlation between errors [SFDC, HPC] presents the greatest interquartile range. Therefore the inclusion of significant correlations in the likelihood function matters, but whether or not it is beneficial to conditioning the parameters seems to depend on the interplay between model structure error, parameter space and likelihood function. Only strong correlations (as in the [SFDC, HPC] case) can be expected to result in a median Bayes factor clearly above 1.

The impact of inter-signature error correlations (multiple signatures)
Multiple signatures are used for parameter constraining and flow prediction. The information value of multiple signatures and its dependence on inter-signature error correlations is explored in this section. Figure 3 shows Bayes factors derived for the synthetic streamflow data (generated using the NSE-optimal parameter set) when the observation-based likelihood is used. The Bayes factor considers p(.|H 2 ) to be the prior parameter distribution and p(.|H 1 ) to be one of the parameter posteriors that includes or ignores the inter-signature error correlations. Figure 3 summarizes the variability in the Bayes factor for the different combinations of signatures for all 84 catchments. Boxplots are color coded by the total number of signatures combined, when the inter-signatures error correlation is considered in the likelihood function definition. The grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent when defining the likelihood function. Although the colored boxplots visually seem to have higher values than the grey dashed boxplots, these differences are not statistically significant at a 95 % confidence level (Kolmogorov-Smirnov two-sided tests).

Synthetic streamflow data (observation-based likelihood)
To better evaluate whether the incorporation of additional sources of information improves parameter identification, one-sided Kolmogorov-Smirnov tests are applied between any combination of certain signatures (e.g., [SE, SFDC]) and any other combination that contains the same signatures and a new one (e.g., [SE, SFDC, HPC]). It is found that adding more signatures improves parameter identification in 82.5 % of the cases (66 out of 80 cases) at a 95 % confidence level. Figure 4 summarizes the variability in the analog Nash-Sutcliffe efficiency measure NSEprob for different combinations of signatures for all 84 catchments. The colored boxplots correspond to the results obtained when the intersignature error correlations are considered in the likelihood definition and the grey dashed boxplots correspond to the results when the inter-signature errors are assumed to be independent. There is no visual or statistical (two-sided Kolmogorov-Smirnov tests) difference between the colored boxplots and the grey dashed boxplots in Fig. 4. Moreover, visually, adding more response signatures seems to improve streamflow predictions in terms of accuracy and precision when no model structure error exists. However, only in 59 % of the cases (47 out of 80 cases) more signatures contribute to improved streamflow predictions at a 95 % confidence level (one-sided Kolmogorov-Smirnov test). The other 33 cases always involve the inclusion of the most poorly regionalized signatures (with the highest variance from the five regional- ized signatures) -SE, SFDC, or HPC -as additional sources of information (see Table 2).
It is worth noting that very similar results (not shown here) are obtained when instead of regionalized signatures, "observed" signatures are used but with the same error derived from regionalization. This suggests that the uncertainty around the regionalized signatures values, as well as signature information content, are the key factors leading to the results shown in Fig. 4. Figure 5 shows the results when the same methodology as in the Sect. 3.3.1 is applied using the observed streamflow data. As in the synthetic streamflow case, the differences between the Bayes factor distributions when inter-signature error correlations are considered and when inter-signature errors are assumed to be independent are not statistically significant at a 95 % confidence level (Kolmogorov-Smirnov two-sided tests). Further, by comparing Fig. 5 with Fig. 3, it becomes clear that the signatures contribute less information and there is a smaller increase in performance as more signatures are added. It is found that adding more signatures tends to im- prove parameter identification only in half of the cases when compared to the synthetic streamflow case at a 95 % confidence level (42.5 vs. 82.5 % in the synthetic streamflow case). Furthermore, and contrastingly to the case where no structural error exists, in five situations, adding more signatures contributes to a decrease in performance. These five cases always involve adding either SFDC or HPC as an additional source of information. This performance deterioration can be attributed to model structure and observational error. Overall, a statistically significant drop in performance with regard to the Bayes factor is observed most of the time when model structural error is present. Figure 6 presents the results in terms of NSEprob using the observed streamflow data. As in the synthetic study in Sect. 3.3.1, there is no statistically significant difference at a 95 % confidence difference between the NSEprob distributions when the inter-signature error correlations are considered and when the errors are treated independently (Kolmogorov-Smirnov two-sided tests). Figure 6 shows that better results in terms of NSEprob are not necessarily achieved when all five signatures are used simultaneously. It is found that adding more signatures tends to improve parameter identification only in 36 % of the cases at a 95 % confidence level (compared to 59 % when there is no model structure error). Furthermore, and contrasting the case where no model structure error exists, in two situations, adding more signatures may contribute to a decrease in performance (when we start with [RR, BFI] and add HPC, and when we start with [RR, BFI] and add SFDC). This might be due to regionalization biases in SFDC and HPC and/or due to the inability of the PDM model to maintain a satisfactory overall performance when conditioned on high peak flow and medium flow information. This negative impact is not observed when synthetic streamflow data are used (Fig. 4), indicating that the decrease in performance may be due to model structural deficiencies. Moreover, a statistically significant drop in performance with regard to NSEprob is observed most of the time when there is model structural error.

Observed streamflow data (observation-based likelihood)
In summary, unless there is no model structural error, an all-round performance improvement is not guaranteed by adding more signatures. Furthermore, model structure uncertainty seems to have a much bigger effect on the performance than the explicit inclusion of the inter-signature error correlations.

Limitations and applicability
The main feature of the method suggested in this paper lies in the possibility of allowing a large number of signatures to be added to the conditioning process, without worrying about double counting of information or degree of uncertainty in Figure 6. Boxplots representing the distribution of NSEprob values for each combination of signatures for observed streamflow data. The colored boxplots correspond to the results obtained when intersignature error correlations are considered in the likelihood function, whereas the grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent. signature estimates and avoiding subjective decisions about removal of possibly non-independent information. Although the proposed framework can be applied to any number of signatures, the limited sample size (i.e., number of gauged catchments available) can have an impact on the definition of the likelihood distribution. For this specific study 83 samples were available to define that distribution. When a single response signature is used to condition the hydrological model, this sample size is likely to be sufficient to confidently judge whether the normal distribution assumption is sufficient. However, when moving to multidimensional problems, in which various signatures may be used simultaneously to condition the hydrological model, it is increasingly difficult to judge the adequacy of any multivariate parametric distribution and to judge which catchments are outliers. This implies that as more signatures are used simultaneously in the conditioning of the hydrological model, the more gauged catchments should be used to define the likelihood function. As stressed by Gupta et al. (2014), large samples are of great importance to support statistical regionalization of uncertainty estimates and this is particularly the case if dependencies between information sources are to be specified.
While the work presented in this paper addresses a number of issues associated with model regionalization, it is important to highlight some additional areas for future research. An important source of uncertainty comes from model structure error (Gupta et al., 1998;Kuczera et al., 2006). The conditioning framework suggested here is independent of the selected model and, in principle, Figs. 5 and 6 could be created by using the model structure that is considered suitable for each catchment rather than using a model structure that we consider good for generalizing. Further research is needed to diagnose the relative importance of different model structures in various climate regimes and for different catchment characteristics (Clark et al., 2008;Hrachowitz et al., 2013). This is crucial to both identifying the most appropriate model structure for an ungauged location and quantifying the uncertainty in the model structure that should be integrated into the likelihood, thus allowing virtually any model choice. Similarly, other sources of uncertainty, namely observational error (e.g., rainfall error), should ideally be evaluated and integrated into the likelihood function. By accounting for all the important sources of uncertainty, further insight should be achieved into the information value of sets of signatures and the value of including their dependencies in the likelihood function.
Some of the results presented may be sensitive to the response signatures used. The relationship between value of signatures and catchment type remains ambiguous and an interesting aspect for posterior evaluation would be how the value of signatures depends on catchment type. Other aspects that are worth further research include whether a similar framework could be applied to different types of information source, e.g., can some discharge measurements be added into the model conditioning process? While Bulygina et al. (2012) suggests a framework capable of combining multiple sources of knowledge, namely physically based information, regionalized signatures and spot observations to identify parameters for models of ungauged catchments, the errors between them were assumed to be independent in their case study. A combination of the framework suggested by Bulygina et al. (2012) and the method proposed in this paper may be the way forward to maximizing the value of the available information within a framework of uncertainty reduction.

Conclusions
Uncertainty in streamflow estimation in ungauged catchments originates not only from the traditional sources of error generally identified in rainfall-runoff modeling (i.e., model structural, parameter, and data errors) but also by errors introduced by the transposition of information from data-rich areas and use of this information to condition model simulations. To identify which and how many types of signatures can usefully be included in model conditioning, it is critical to understand the effects of all these uncertainties. Moreover, when multiple signatures are used simultaneously to condition model simulations, inter-signature error dependencies may also introduce uncertainty and affect decisions about the value of information. While error and uncertainty analyses are quite common in regionalization studies, the question of how much information can be taken from a set of uncertain signatures and determining how many and which signatures should be used given their error dependencies has not been extensively studied.
The method suggested in this paper allows the specification of a signature error structure. A common reason for not including large numbers of signatures in regionalization studies is the potential for underestimation of uncertainty due to duplication of information. This study helps to justify the inclusion of larger sets of signatures in the regionalization procedure if their error correlations are formally accounted for and thus enables a more complete use of all available information. The results show that adding response signatures to constrain the hydrological model, while accounting for intersignature error correlations, can contribute to a stronger identification of the optimum parameter set when the error correlations between different sources of information are strong. Furthermore, the results show that assuming independency of errors does not result in significant deterioration in model performance, unless the error correlation is very strong. The results also show that the effect of error correlations is likely to be overwhelmed by model structure and observation errors. The method suggested here can therefore become more relevant if observational and structural errors are reduced. In addition, it is illustrated that using more signatures, with and without considering their error correlations, may lead to deterioration in performance. In our case, there were particular problems when adding the slope of the flow duration curve and/or the high pulse count. As this is likely to be specific to the rainfall-runoff model used, the selected performance criteria and the set of catchments, it is recommended that the misinformative information sources are identified as part of any regionalization study, in a similar manner as has been done here.

Appendix A: Model structure
A schematic representation of the model structure used in this study is shown in Fig. A1. The snowmelt routine is based on the degree-day method. Precipitation accumulates as snow or rain depending whether the air temperature is above or below a threshold temperature (T th ). When the air temperature is above the temperature threshold for snowmelt (T m ), snowmelt occurs at a rate that is proportional to the degree-day factor (DDF). The soil moisture storage component describes the water balance at the soil level. The PDM model uses a probability density function to represent changes in the catchment storage capacity, defined by the maximum soil moisture storage (c max ) -the maximum soil water storage capacity within the modeled elementand a shape parameter (b) that controls the degree of spatial variability of storage capacity over the catchment. Interception is not explicitly modeled. Transpiration and evaporation are lumped into a single term. The actual evapotranspiration (AE) is determined based on a relationship between evapotranspiration and soil moisture deficit (Moore, 2007). After evapotranspiration, the remaining available water is used to fill the soil moisture store. When effective rainfall is produced through overflow of the storage elements, excess water is passed to the routing stores. The routing module channels this water into two reservoirs, according to a fraction split coefficient (α). A proportion α of the water excess goes to the quick flow reservoir, controlled by the quick flow residence time (k q ) and (1 − α) of the water excess goes to the slow flow reservoir, controlled by the slow flow residence time (k s ). The streamflow at the catchment outlet is the sum of the outputs from each of these quick and slow flow reservoirs.
The parameter ranges (Table A1) are selected after Kollat et al. (2012) and based largely on the maximum range sampled from several recent studies, such that only sufficiently extreme values are ruled out.  When evaluating the impact of inter-signature error correlations on model parameter identification, results are assessed in terms of Bayes factor (Jeffreys, 1961). This form of assessment is preferred to the most commonly used QQ plots (Laio and Tamea, 2007), due to the particular nature of the problem under analysis. When a signature(s) (either regionalized for the case of an ungauged catchment, or derived from actual observations for the case of gauged catchments) is employed to reduce uncertainty beyond what is possible by defining the priors on model parameters, QQ plots may not be the most effective form of assessment. Although response signatures are measures of theoretically relevant system process behaviors Wagener et al., 2007), they reflect fragmented knowledge as different signatures capture different catchment processes. Consequently, the quantiles of observed flows are not conditioned to follow a uniform distribution, as QQ plots assess. Rather, quantiles of response signatures should follow this condition (for all catchments considered - Almeida et al., 2013). Therefore, an alternative performance measure that more adequately reflects the aim of this particular application (i.e., the reproduction of certain aspects of the hydrograph) is used. The Bayes factor BF is particularly relevant in the current context as it allows comparison of predictions based on two competing theories (Jeffreys, 1961). It is defined as the ratio between the marginal distributions of the data y for the two hypotheses (H 1 and H 2 ) being compared (Kass and Raftery, 1995): When the two hypotheses are equally likely a priori, the Bayes factor is the posterior odds in favor of H 1 (Kass and Raftery, 1995). In other words, a value of BF greater than 1 means that H 1 is more strongly supported by the data than H 2 . For example, a Bayes factor equal to 2 implies that H 1 is favored over H 2 with 2 : 1 odds given the evidence provided by the data. For a given hypothesis H , parameterized by model parameter set , the marginal density p(y|H ) represents the likelihood of the data and it is given by where (i) is the ith of N draws from p(.| ) and N is the size of the Monte Carlo sample (in this paper N is equal to 10 000).
In a "perfect model" study, data y are generated by a model with parameter set * , so that there is no model structural or observational error. This means that p(y| (i) , H ) is always equal to zero, except when (i) = * . Mathematically this is expressed as p(y (i) , H ) = δ (i) = * , where δ is the Dirac delta function. Therefore Eq. (B3) is equal to 1/N times p( (i) = * |H ) and the Bayes factor is given by While other choices can be made, two cases are considered in this paper. First, the two distributions in Eq. (B4) are posterior distributions, but with different assumptions about the likelihood functions. Given that we are particularly interested in evaluating the impact of considering the inter-signature error correlations versus ignoring them, H 1 will correspond to the joint likelihood defined such that inter-signature error correlations are considered, while H 2 corresponds to the likelihood when inter-signature error correlations are ignored. For the Bayes factor defined in this way, a value greater than 1 supports the idea that considering inter-signature error correlations contributes to an improved specification of the optimum parameter set. In this paper we are also interested in the value of adding/not adding more signatures in model conditioning, and so the Bayes factor will be also calculated for p(.|H 2 ) set to be the prior parameter distribution, and p(.|H 1 ) set to one of the derived parameter posteriors. For the Bayes factor defined in this way, a value greater than 1 supports the idea that additional sources of information contribute to a stronger identification of the optimum parameter set.