Regional water balance modelling using flow-duration curves with observational uncertainties

: Robust and reliable water-resource mapping in ungauged basins requires estimation of the uncertainties in the hydrologic model, the regionalisation method, and the observational data. In this study we investigated the use of regionalised flow-duration curves (FDCs) for constraining model predictive uncertainty, while accounting for all these uncertainty sources. A water balance model was applied to 36 basins in Central America using regionally and globally available precipitation, climate and discharge data that were screened for inconsistencies. A rating-curve analysis for 35 Honduran discharge stations was used to estimate discharge uncertainty for the region, and the consistency of the model forcing and evaluation data was analysed using two different screening methods. FDCs with uncertainty bounds were calculated for each basin, accounting for both discharge uncertainty and, in many cases, uncertainty stemming from the use of short time series, potentially not representative for the modelling period. These uncertain FDCs were then used to regionalise a FDC for each basin, treating it as ungauged in a cross-evaluation, and this regionalised FDC was used to constrain the uncertainty in the model predictions for the basin. There was a clear relationship between the performance of the local model calibration and the degree of data set consistency – with many basins with inconsistent data lacking behavioural simulations (i.e. simulations within predefined limits around the observed FDC) and the basins with the highest data set consistency also having the highest simulation reliability. For the basins where the regionalisation of the FDCs worked best, the uncertainty bounds for the regionalised simulations were only slightly wider than those for a local model calibration. The predicted uncertainty was greater for basins where the result of the FDC regionalisation was more uncertain, but the regionalised simulations still had a high reliability compared to the locally calibrated simulations and often encompassed them. Abstract. Robust and reliable water-resource mapping in ungauged basins requires estimation of the uncertainties in the hydrologic model, the regionalisation method, and the observational data. In this study we investigated the use of regionalised ﬂow-duration curves (FDCs) for constraining model predictive uncertainty, while accounting for all these uncertainty sources. A water balance model was applied to 36 basins in Central America using regionally and globally available precipitation, climate and discharge data that were screened for inconsistencies. A rating-curve 35

estimated by calibration to measured data in the first two cases whereas the last case requires a regionalisation procedure. Discharge data are non-existent, intermittent or nonavailable for many basins, which make Predictions in Ungauged Basins (PUB) an important prerequisite for comprehensive water-resource mapping (Bloeschl et al., 2013). However, estimating the response of an ungauged basin always involves some uncertainty, and one of the features of the PUB science plan was the development of methods to constrain that uncertainty (Hrachowitz et al., 2013;Sivapalan et al., 2003). In this study we addressed uncertainties in the observational data, the hydrological model parameterisation and the regionalisation method (based on regionalised flowduration curves, FDCs).
Conceptual water balance models have traditionally been regionalised by transferring parameter values from gauged to ungauged basins using some measure of hydrologic similarity or a regression with model parameter values as dependent variables and physical characteristics of the basins as independent variables (Seibert, 1999;Jakeman et al., 1992;Parajka et al., 2005;Xu, 2003). Such procedures are often limited by their assumption of model parameter independence and incomplete assessment of predictive uncertainty for gauged and ungauged basins (McIntyre et al., 2005;Bardossy, 2007;Buytaert and Beven, 2009).
Wagener and Montanari (2011) discuss a convergence of approaches for PUB in recent years where regionalisation is based on the expected functional behaviour of the ungauged watershed rather than the model and its parameters. Watershed behaviour has been quantified in the form of information or "signatures" derived from discharge or other types of data for model calibration in recent studies (Winsemius et al., 2009;Son and Sivapalan, 2007;Yu and Yang, 2000;Castiglioni et al., 2010;Westerberg et al., 2011b;Blazkova and Beven, 2009;Yadav et al., 2007). Many of these studies have been made within a set-theoretic approach for uncertainty estimation (e.g. Blazkova and Yadav et al., 2007;Winsemius et al., 2009), but Bayesian statistical approaches have also been used (e.g. Bulygina et al., 2009). The types of information that have been used include recession curves (Winsemius et al., 2009), slope of the FDC (Yilmaz et al., 2008;Yadav et al., 2007), base-flow index (Bulygina et al., 2009), spectral properties (Montanari and Toth, 2007) and flood discharge and snow-water equivalent frequency quantiles (Blazkova and ). Calibration approaches focused on matching hydrological signatures thus allow regionalisation to be performed directly on a wide range of hydrologic information, which is then used to constrain model parameters at ungauged sites. Yadav et al. (2007) regionalise constraints on expected watershed response behaviour in the UK and account for uncertainty in the regionalisation method. Kapangaziwiri et al. (2012) use regionalised signature constraints for runoff ratio (long-term ratio of runoff over precipitation) and slope of the FDC in combination with prior parameter estimation. Yu and Yang (2000) regionalise FDCs and calibrate their model against a performance measure based on specific exceedance percentages of the FDC using an optimisation algorithm.
Uncertainties in observational data affect the information content of data and derived signatures and it is therefore important to estimate and account for these uncertainties also in rainfall-runoff model regionalisation (Hrachowitz et al., 2013). However, as noted in the recent review by McMillan et al. (2012) no studies have so far explicitly investigated the role of observational uncertainties in this context. Dischargedata uncertainty can often be estimated based on ratingcurve analyses and has received increasing attention in recent years. Relative errors of around 10-20 % for medium to high flows, with higher ranges for low flows (50-100 %) and out-of-bank flows (40 %) are typically reported (McMillan et al., 2012). The main uncertainties relate to the approximation of the true stage-discharge relation by the rating curve. Discharge data are therefore especially uncertain in alluvial rivers with non-stationary stage-discharge relationships (Jalbert et al., 2011;Guerrero et al., 2012) and for flow conditions outside those used for constructing the rating curve. Model input data, especially precipitation, are also affected by sometimes substantial uncertainties that are more difficult to estimate and may have non-stationary characteristics, e.g. because of temporal changes in the number and quality of precipitation gauges (Westerberg et al., 2010;Brath et al., 2004). In some cases the observational uncertainties can be so large that the model forcing and evaluation data are physically inconsistent (Beven and Westerberg, 2011), e.g. because of inferred actual evaporation greater than potential evaporation (Kauffeldt et al., 2013) or runoff ratios greater than one . Such data inconsistencies will be "disinformative" in calibration of a model built on such assumptions. Data sets can be screened for inconsistencies prior to modelling (Kauffeldt et al., 2013;. However, identification of inconsistent data might prove difficult in cases where auxiliary information is not available or where disinformation is not easily identified. The aim of this study was to investigate whether regionalised FDCs could be used to reliably constrain water balance prediction uncertainty in ungauged basins, while estimating and analysing uncertainties in the observational data and regionalisation method as well as the model parameterisation. We used the FDC calibration method of Westerberg et al. (2011b) together with regionalised FDCs, therefore also testing this method for a wider range of basins than in the previous study. A variety of approaches have been used for regionalisation of FDCs (reviewed by Bloeschl et al., 2013), including the fitting of a frequency distribution (Castellarin et al., 2004) or a parametric equation (Yu et al., 2002) to the FDCs where the parameters are regionalised through regression with basin characteristics as independent variables. Holmes et al. (2002), building on the work of Burn (1990a, b), use a region-of-influence (ROI) approach to predict FDCs for the UK, with a dynamic definition of a Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/ ROI based on hydro-geologic similarity. While some studies explore uncertainty in the regionalised FDCs (e.g. Yu et al., 2002), data uncertainties in snow-model regionalisation (He et al., 2011) and rainfall and parameter uncertainties in modelling a poorly gauged urban basin (Sikorska et al., 2012), none has, to our knowledge, accounted for discharge and input-output data uncertainties in FDC or rainfall-runoff model regionalisation.
2 Study area and data

Study area
Central America is a region with a highly variable climate in both space and time despite its small extent (around 520 000 km 2 ). This has resulted in many water-related disasters; flooding with severe consequences such as inundations and destruction of important crops, promulgation of landslides, and loss of lives (Waylen and Laporte, 1999); and sustained droughts with severe consequences for hydro-power generation, water supply, irrigation and tourism (George et al., 1998). The characteristics of the complex regional climate have been well studied (e.g. Alfaro, 2002;Amador et al., 2006;Magaña et al., 1999;Enfield and Alfaro, 1999), but there are relatively few published hydrological modelling studies (but see e.g. Birkel et al., 2012;Westerberg et al., 2011b;Hidalgo et al., 2013). One reason for the scarcity of peer-reviewed literature is the difficulty of accessing comprehensive and good-quality hydro-meteorological data, and several studies point to the need for data quality control in this region (Aguilar et al., 2005;Westerberg et al., 2010;Flambard, 2003). The regional precipitation regime has a less marked seasonal variability on the Atlantic Coast compared to the Pacific Coast, where around 80 % of the precipitation falls in the rainy season from May to October-November (Portig, 1976). There is also a rainfall minimum, the socalled midsummer drought or veranillo in July-August on the Pacific coast, resulting in a bimodal regime with two peaks in June and September-October (Magaña et al., 1999). The spatiotemporal variability of precipitation is high, since precipitation is often convective, and associated with different mechanisms such as hurricanes, tropical storms and easterly waves in the atmosphere (Peña and Douglas, 2002). Temperature variability is low, with a greater diurnal than seasonal variation that is characteristic of the tropics. Climate variability on an inter-annual timescale is pronounced with large differences between wet and dry years; this variability is modulated by ENSO (El Niño-Southern Oscillation) and Atlantic sea-surface temperatures (Diaz et al., 2001;Enfield and Alfaro, 1999).

Model forcing data
The water balance model we used was driven with daily precipitation and daily potential evaporation data and calibrated and evaluated using daily discharge. Comprehensive local climate and discharge data sets covering the whole of Central America are difficult to obtain as observation data are either non-existing or cannot be made available with a reasonable effort. We therefore used globally or regionally available gridded meteorological data in this study. In early attempts with the regional model, potential evaporation calculated from ERA-Interim (Dee et al., 2011) climate variables at a 0.75 resolution and TRMM precipitation data (Huffman et al., 2007) with a spatial resolution of 0.25 were used for the period 1998-2009. However, this resulted in inconsistently simulated hydrographs in a few test basins since the TRMM precipitation did not compare well to local precipitation data. We therefore used daily precipitation data from the CRN073 data set (Magaña et al., 1999(Magaña et al., , 2003 at a spatial resolution of 0.5 that covers Central America, Mexico and the Caribbean region for the period 1958-2000. It is based on station data from the national weather services blended with satellite precipitation estimates for the oceans. The station data cover different time periods resulting in time-varying errors and some obvious inhomogeneities could be seen for many stations in the late 1990s, which may result from inclusion of malfunctioning automatic rain gauges. Since the temporal coverage of this data set did not overlap sufficiently with the potential evaporation calculated from the ERA-Interim data, we used the WATCH Forcing Data (WFD; Weedon et al., 2010) for the period 1958-2000 at a 0.5 spatial resolution. The WFD provide bias-corrected variables based on the ERA-40 reanalysis (Uppala et al., 2005) and we used specific humidity, atmospheric pressure, 2 m air temperature, 10 m wind speed, net short-wave radiation and net long-wave radiation to calculate potential evaporation using the Penman-Monteith FAO-56 equation (Allen et al., 1998). Specific humidity was first converted to relative humidity using a mixing-ratio method and 10 m wind speed was converted to 2 m wind speed using a logarithmic relationship (Allen et al., 1998). Prior to the calculation of potential evaporation, the quality of the WFD data was evaluated using daily weather data (Global Surface Summary of the Day, or GSOD) from the National Climatic Data Center (NCDC, 2011). The comparison was made for 18 half-degree cells spread over the study area, each of which contained at least one GSOD station with at least 5 years of daily data. The evaluation showed that WFD air temperature and the WFD-derived relative humidity were reasonably correlated with GSOD data although with average biases of 1.7 C and +6 % respectively. No significant correlation was found between WFD and GSOD wind-speed data, which is often the least sensitive variable for the estimation of potential evaporation on the daily scale. The WFD radiation components showed good agreement when compared with radiation components derived from sunshine hours recorded at the airport in Tegucigalpa, Honduras.
I. K. Westerberg et al.: Regional water balance modelling using flow-duration curves

Discharge data and basin delineation
The discharge data were obtained from the Global Runoff Data Centre (GRDC, 2010), which includes data from 91 discharge stations from all Central American countries except Belize. Daily data were only available for 77 stations of which none were located in Guatemala or El Salvador. In addition to these 77 stations, we included two Honduran stations (Paso La Ceiba on the Choluteca River and La Chinda on the Ulúa River) for which daily discharge and its uncertainty had been calculated using a time-variable rating curve in a fuzzy regression based on estimated uncertainties in the stage and discharge measurements (Westerberg et al., 2011a describe the calculation for the Paso La Ceiba basin). The total period for which there were data for at least one station was 1952-2009, with most of the data available for the period 1965-1994. We used official rating curves and stagedischarge measurements for another 35 stations in Honduras (see Sect. 4.2) to estimate discharge-data uncertainty for all GRDC stations in this study. Paso La Ceiba and La Chinda were included in this data set together with three of the GRDC stations; but discharge time series were not available for the remainder and they could therefore not be included in the rest of the study. The GRDC discharge data and the station locations were analysed to select stations with: (1) a sufficient number of years with data ( 5 years), (2) discharge that appeared to have sufficient quality from a visual inspection of the time series, (3) no detected influence from major dams in the basin during 1965-1994, and (4) a location that was not in the basin of another of the stations. Obvious outliers in the series (values orders of magnitudes too large) were removed. This procedure resulted in a set of 36 basins that could potentially be used for regionalisation. These basins ( Fig. 1) were delineated from the HydroSHEDS elevation data (Lehner et al., 2008), a gridded global hydrography data set with the highest resolution (3 00 ) publicly available at present. Upstream areas for HydroSHEDS pixels were derived by Gong et al. (2011). The basins were registered in the HydroSHEDS flow network overlaid with 0.25 ⇥ 0.25 cells. Only the parts of the boundary cells that were in the catchment, as delineated by the HydroSHEDS pixels, contributed discharge to the downstream gauging station. The GRDC station coordinates sometimes had a low precision and were adjusted to obtain basins with the right basin area using visual inspection of river locations from satellite images and/or coordinates of higher quality from local sources. We used a tolerance of 10 % difference between the area reported in the GRDC database and that obtained from the delineation together with a visual inspection of basin boundaries. Since a large part of Central America is mountainous, the greatest source of uncertainty in basin areas is likely the exact location of the stations and not the precision of the delineation algorithm. While all calculations were made on a depth per unit area basis, uncertainty in catchment area has a direct effect on the water balance calculation. Many discharge series had frequent gaps and the temporal availability of data at the stations varied substantially in the region, with most data available for Panama and the least for Costa Rica (Fig. 2).

Regional water balance model
We tested a simple lumped version of the water balance model WASMOD (Xu, 2002) that was previously used with good results in Honduras (Westerberg et al., 2011b), and we used the same model equations as in this earlier study. The model has four parameters (sampling ranges for uncertainty estimation given in parentheses); for actual evaporation ([0, 1] -), routing of fast flow ([0, 1] d 1 ), fast flow ([e 11 , 1] mm 1 ) and slow flow ([e 12 , 1] mm 0.5 d 1 ), see model equations in Table 1 in Westerberg et al. (2011b). These parameter intervals where used for all catchments since no information on parameter regionalisation was available. The 0.25 spatial resolution used with the TRMM and ERA-Interim data in the early model version was retained for the CRN073 and WFD data at a 0.5 scale since the centre locations of the CRN073 and WFD cells differed by 0.25 . The precipitation and evaporation data were interpolated to the higher resolution using nearest-neighbour interpolation. Monte Carlo simulations with 150 000 model runs were performed for each basin using uniformly sampled parameter values and a 4-year model warm-up period.

Method
This study was carried out in five steps ( Fig. 3): (1) observational uncertainties were first analysed and estimated through: (a) a screening for data set inconsistencies, (b) estimation of discharge uncertainty using a rating-curve analysis, and (c) estimation of the temporal uncertainty in FDCs  stemming from short time series; (2) regionalisation of FDCs; (3) local calibration of the water balance model using all available data (for comparison to the regionalised results); (4) regional modelling by constraining the uncertainty in basins treated as ungauged with the regionalised FDCs; and (5) posterior performance analysis of the results. We used the period 1965-1994 because of a comparably large availability of discharge data, and since the CRN073 precipitation data did not show the same occurrence of inhomogeneities as in the later period.

Screening for data inconsistencies
The consistency of the model input and evaluation data for each basin was evaluated for both long-term averages and the daily time-series scale. The long-term analysis used a Budyko curve (Budyko, 1974), which shows the relationship between the aridity index (long-term ratio of potential evaporation over precipitation) and the runoff ratio. The Budyko relation was plotted to identify stations with inconsistent data; either a runoff ratio greater than one or inferred actual evaporation greater than potential evaporation (grey areas in Fig. 4). The second quality check was the calculation of the correlation between the Current Precipitation Index (CPI; Smakhtin and Masse, 2000) and discharge for intermediate and high flows. The CPI is essentially the sum of the Antecedent Precipitation Index (API, Kohler and Linsley, 1951) and the precipitation on the current day and was calculated using a decay coefficient of K = 0.85 (the lowest value in the range quoted by Smakhtin and Masse) so that for day t the index is where R t was precipitation at day t. All basins with a correlation between CPI and discharge lower than 0.3 were identified in red on the Budyko curve (Fig. 4). It could be seen that these basins were mostly located in the inconsistent, grey areas in Fig which in this case might result from an uncertain basin area). The long-and short-term analyses thus gave similar results, which increased our confidence in the screening methods. There were four basins with unrealistic runoff ratios ( 1) and these were excluded leaving a final 32 basins for the regionalisation. The four excluded basins were all small basins in the mountainous parts of Costa Rica (maximum elevations between 1800 and 3000 m a.s.l.) and the precipitation data at a scale of 0.5 were likely not sufficiently representative for these basins. There were three basins with runoff ratios close to one as well as low correlations between discharge and CPI, which indicated that the data may be inconsistent, but these were kept for further study since such runoff-ratio values may be a result of discharge-data uncertainty. Two additional basins (Laja Blanca and Boca de Cupe) had combinations of aridity-index and runoff-ratio values that were far from the theoretical line but were not excluded ( Fig. 4 and Table A1 in Appendix A). Both basins were located in the easternmost part of Panama and had seemingly too high mean annual precipitation values, which might be a result of poor coverage of local precipitation stations in the CRN073 data set in that area. Mean annual precipitation 1971-2002 data presented by the Panamanian hydroelectric company show around 1000-2500 mm year 1 lower values (ETESA, 2007), which indicates a major source of uncertainty.

Estimation of discharge uncertainty
Stage-discharge measurements for the 35 discharge stations in Honduras (basin areas 110-21 400 km 2 , see also Sect. 2.3) were used to estimate the uncertainty in the discharge data as an upper and lower uncertainty bound. These 35 stations had rating curves that had been classified as having an acceptable or good quality in a previous Honduran water-resource project and the rating-curve equations reported in that project (Flambard, 2003) were used here. Rating-curve data from other countries were not available and it was assumed that the errors of the reported discharge data were similar to those in Honduras, i.e. that the Honduran stations were representative of measurement practices and conditions in the region. The discharge uncertainty could therefore be underestimated in cases where discharge data from the other countries include stations with poorer rating curves. Site-dependent uncertainties, e.g. related to a poor choice of measurement location, could not be quantified. For many stations there was considerable temporal variability in the rating measurements. For these stations a rating curve for a period with many measurements covering a large part of the flow range was selected. The residuals along each rating curve were then calculated as a percentage of the rating-curve-calculated discharge corresponding to the same stage measurement. To facilitate comparison between the residuals at different stations for different flow ranges, the discharge data were normalised by the mean discharge for each basin, using mean discharges reported in the Honduran national water balance study (Balairón Pérez et al., 2004) as we had no discharge time series data. The normalised discharge was grouped in frequency intervals limited by the percentiles 1, 5, 10, . . . , 95, 100; the 1 percentile was used instead of zero to exclude the very lowest flows that resulted in large relative residuals because of division by values close to zero. The 2.5 and 97.5 percentile values for the residuals belonging to each group of normalised discharges were calculated and used together with the median normalised discharge in each group to calculate the rating-curve uncertainty as a function of the normalised discharge. Exponential and power-law functions were fitted to the positive and negative residual percentiles respectively, and these functions were then used to estimate discharge uncertainty for all the GRDC stations in the regionalisation.
When mean daily discharge is calculated, it is important to realise that the actual observations might have been collected with different temporal resolutions. If stages are not registered continuously this can result in a commensurability error in daily discharge data especially if measurements are taken in between flow peaks. In Honduras, three measurements were taken during the day and in some cases more around flow peaks (Westerberg et al., 2011a;Flambard, 2003). The size of this error depends on the size and response time of the basin, with larger values for small basins and those that have a quick response. We used a value of Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/ 17 %, previously estimated using 15-minute-resolution stage data for the 1766 km 2 Paso La Ceiba basin in Honduras which responds quickly to rainfall and is comparably small (Westerberg et al., 2011a). The estimate can therefore be considered conservative for most of the stations in the regionalisation. In Costa Rica, stage was recorded continuously using limnigraphs; this error source was therefore excluded for these stations. For the other countries we had no information on the stage-recording method and the Honduran practice was assumed. An estimated error in the actual stage reading of 5% was also added to the uncertainty bounds, as previously used in the fuzzy rating-curve method by Westerberg et al. (2011a). The different uncertainties were assumed to be additive when calculating the daily discharge uncertainty. This is a simplification that may have resulted in overestimated uncertainty bounds.

Calculation of FDCs and temporal uncertainty from short time series
The discharge uncertainty estimates were used in the calculation and regionalisation of FDCs for all basins. The FDC, traditionally calculated for a period of record, describes the time duration that a certain flow is equalled or exceeded, and is a compact signature of runoff variability that has often been regionalised to ungauged basins (Bloeschl et al., 2013). Our regionalisation was based on data for the period 1965-1994 and in all the following analyses only years with at least 80% complete data (either calendar year or hydrological year depending on reported format) were used to avoid biases in the FDCs. First, evaluation points (EPs) were defined as specific exceedance percentages on the FDCs (using the same method as Westerberg et al., 2011b). The choice of EPs emphasises different aspects of the hydrograph; some previous studies have only used low-flow EPs for FDC regionalisation (e.g. 30-99 % exceedance by Castellarin et al., 2004), while others have used EPs covering a large part of the flow range from 0.1 to 99 % exceedance (Mohamoud, 2008). We did not include the very lowest or highest flows since these would likely be associated with the largest uncertainty, but used a volume-weighting method for calculating EPs (Westerberg et al., 2011b), which resulted in simulations with a good match to the whole flow range in this previous study. This means that EPs for each basin (local EPs) were determined so they were evenly spaced according to the area under the FDC (that equals the volume of water contributed by flows in a certain magnitude range) with increments of 5 %. This resulted in 19 EPs when excluding the maximum and minimum flows. The same EPs had to be used for all basins in the regionalisation and we chose these as the median EP values of all the different sites for each of the 19 EPs (regional EPs). The calibration using the at-site data for each basin was assessed using both the local and regional EPs to evaluate the effect of this difference. Uncertain FDCs consisting of the best-estimate specific discharge with uncertainty limits were calculated using the observed discharge data and their estimated uncertainty bounds. This calculation of the uncertainty in the FDC implied an assumption that the uncertainty may consist of non-stationary bias rather than a random error (see also Westerberg et al., 2011b). Varying temporal data availability (stations that do not have data covering the whole 30-year period used for the regionalisation, Fig. 2) results in added uncertainty to the calculated FDCs because the FDC based on the available data might differ from that for the entire period. We estimated this temporal uncertainty in the upper and lower uncertainty bounds as a function of the number of years with data using the nine stations that had long-term data (at least 80 % complete daily data in total in . Seven of these were located in Panama, one station in Honduras and one in Nicaragua. In terms of the variability of the FDCs, these stations covered most of the observed range of the normalised FDC discharge values. There were between 5 and 30 years of data at all the stations in the modelling period 1965-1994 and the uncertainty was estimated using all possible consecutive 5, 6, . . . , 29-year periods and 1000 randomly generated series of non-consecutive years. For the latter the order of the years was not maintained and individual years could not be selected more than once per realisation when the 5-29 year series were generated. The uncertainty was calculated from the realisations as the 2.5 and 97.5 percentiles of the percentage uncertainty in the specific discharge values at the upper/lower uncertainty bounds for the FDC EPs. The largest uncertainty from the two sampling schemes (random and consecutive) for each number of years with data was used. This temporal uncertainty was finally added to the FDC uncertainty bounds as a function of the number of years of discharge data at each station in 1965-1994.

Regionalisation of FDCs with uncertainty
These uncertain FDCs were regionalised using a weighted linear combination of the N most similar basins. We defined similarity based on a number of climate and basin characteristics which all had been found to be related to the FDC discharge values in a correlation analysis (Table 1). These characteristics were standardised by subtracting the mean and dividing by the standard deviation for all basins. The similarity was then calculated using the similarity measure defined by Burn (1990a, b) as the Euclidean distance in the space spanned by the standardised characteristics (Eq. 2): where d it is the Euclidean distance between the target basin t and basin i in the data pool, and X mi is the standardised characteristic m for basin i. While geographic distance was not included explicitly, differences in the characteristic QLONG essentially agree with geographic distance because of the  Figure 5. Regionalisation of uncertain FDCs using the general weighted mean operator for fuzzy numbers by Dubois and Prade (1980) for each EP. The individual membership functions for the fuzzy FDC discharge for each of the N surrounding stations were rescaled so that the area under the curves equalled the weights and then summed over the range of the support to a new membership function for the regionalised FDC (top panel). The 2.5, 50 and 97.5 percentiles of the cumulative distribution of the aggregated membership function were then used as lower, crisp and upper uncertainty bounds for the regionalised FDC (red circles). spatial distribution of the basins. The weights for each basin in the regionalisation were, similar to Holmes et al. (2002), calculated based on the relative inverse distances (Eq. 3): where w it is the weight of basin i in prediction of target basin t and N is the number of basins in the data pool. For calculating the predicted FDCs using these weights, the uncertain discharge at each EP was defined as a fuzzy number with a triangular membership function defined by the lower, crisp (best-estimate) and upper uncertainty limits. The uncertainty in the regionalisation was accounted for through a weighted aggregation of the fuzzy discharge at each EP using the N most similar basins. The general weighted mean operator for fuzzy numbers of Dubois and Prade (1980) was used to aggregate these membership functions to a new membership function; the individual membership functions were rescaled so that the area under the curves equalled the weights w it and then summed over the range of the support (Fig. 5). The 2.5, 50 and 97.5 percentiles of the cumulative distribution of the aggregated membership function were finally used as lower, crisp and upper uncertainty bounds for the regionalised FDC.
Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/ The FDC regionalisation was evaluated in a jack-knife cross-evaluation by excluding one basin at a time because the low number of stations did not allow for separate calibration and validation sets. The correspondence between the predicted and observed FDC discharge uncertainty bounds at the EPs was evaluated by two measures. The reliability of the predicted uncertainty bounds was calculated as the overlapping range between the observed and simulated uncertainty bounds as percentage of the observed range. The precision of the predicted uncertainty bounds was calculated as the overlapping range as percentage of the simulated range. These measures were previously used by Westerberg et al. (2011b) and Guerrero et al. (2013). They are similar to the ones used by Yadav et al. (2007) and Breinholt et al. (2012), but differ in that they incorporate an estimate of the uncertainty in the observed discharge data, where that estimate consists of an upper and lower bound that allows for non-stationary biases in between the bounds.

Local and regional water balance modelling
The simulated uncertainty from the Monte Carlo runs was first constrained (in a local calibration) using limits of acceptability in the extended Generalised Likelihood Uncertainty Estimation (GLUE) method (Beven, 2006) for the locally calculated FDCs (Westerberg et al., 2011b). This was done both for the local EPs and the regional (median) EPs used in the regionalisation, using the discharge data for each station in 1965-1994. Behavioural simulations were required to be within the limits of acceptability defined from the discharge-data uncertainty at each of the 19 EPs. Then the simulations were constrained with the regionalised FDCs. In both cases an informal likelihood was calculated in the same way as Westerberg et al. (2011b), using the sum of a triangular weighting at each EP of the simulated value relative to the observed data and its limits of acceptability. Simulations with correlation in deviations across successive EPs then obtain a lower weight but can still be behavioural if they are inside all limits of acceptability, i.e. a systematically underor overestimated FDC for (part of) the flow range can still be behavioural but get a lower weight. The simulated uncertainty bounds were calculated at each time step as the 2.5 and 97.5 percentiles of the likelihood-weighted distribution of the simulated discharge of all behavioural parameter value sets.

Posterior performance analysis
The resulting simulated uncertainty bounds were analysed, as with the FDC regionalisation, by calculating two different model diagnostics that assess the similarity between the uncertainty bounds for the simulated and observed discharge. Reliability was in this case defined as the percentage of time that the simulated and observed uncertain intervals overlapped, and precision was in the same way as for the FDC regionalisation the overlapping range expressed as a percentage of the simulated range, but here calculated as the average value for the number of days with observations. All the model diagnostics were calculated for low, intermediate and high flows separately. Low flows were defined as flows smaller than the median flow, high flows as flows that were exceeded 1 % of the time, and intermediate flows were all flows in between these limits.

Estimation of discharge uncertainty
The analysis of discharge uncertainty for the 35 Honduran stations showed that five stations had most medium to highflow residuals in the range ±10 % of the discharge calculated from the official curves. The remainder had larger deviations and the 2.5 and 97.5 percentiles of the distributions were around ±25 %, with larger percentage uncertainties for low flows (Fig. 6). Underestimation was larger than overestimation and there were sometimes poor rating-curve fits to the lowest measurements. For some stations the average residual values varied with flow as a result of poorly fitted rating curves. The exponential and power-law functions fitted to the positive and negative residual percentiles respectively fitted well to the data with adjusted R 2 values of 0.80 and 0.98 (Fig. 6). Uncertainty values for normalised discharges smaller/larger than the smallest/largest point used in the fitting were set to the smallest/largest value when these functions were used to calculate the discharge uncertainties for the GRDC stations. The final calculated uncertainty in discharge after the stage and temporal commensurability error had been added varied between 266 and +64 % of the crisp discharge for the low-flow range and between 52% and +45 % for the high-flow range, where negative (positive) values denote underestimation (overestimation) as in Fig. 6. The uncertainty ranges for the lowest flows were larger than the previously calculated discharge-uncertainty limits at Paso La Ceiba (Westerberg et al., 2011a) and La Chinda as an effect of larger uncertainty in the fitting of some official rating curves. The medium to high flow range was almost identical to that for Paso La Ceiba but around 5 % larger in this calculation than that for La Chinda where the non-stationarity in the stage-discharge relationship was less pronounced compared to at Paso La Ceiba.

Calculation and regionalisation of FDCs with uncertainty
The added uncertainty to the FDC discharge as a result of time series shorter than the 30-year modelling period varied in the range of 3-45 % (4-33 %) for the upper (lower) uncertainty bound (for time series with 5-29 years of data). This temporal uncertainty was added to the uncertainty bounds for the FDC discharge values for the stations with incomplete time series data before the regionalisation. The FDCs Functions were fitted to the 2.5 and 97.5 percentiles against the median normalised discharge in each group respectively to calculate rating-curve uncertainty as a function of the normalised discharge. The residuals were calculated as rating-curve discharge minus observed discharge as a percentage of the rating-curve discharge and the plot excludes a few smaller and larger residuals to improve the visibility for the main flow.
showed great variability in the region; normalised discharge (by mean discharge) varied in the range 3.8-27 (0.05-0.59) for the lowest (highest) regional EP at an exceedance percentage of 0.52 % (75 %). The number of surrounding basins to be included in the FDC regionalisation was chosen as eight as a trade-off between increase in reliability and decrease in precision (Fig. 7). In 12 of the 32 basins the regionalised FDCs encompassed the observed FDCs (reliability = 100 % for all EPs). At some of these basins (e.g. nos. 5, 12, 18, 23 and 24, Fig. 7) there were also high precision values. There were six stations where the minimum reliability was less than 50 % (Fig. 7). Observations from these stations are plotted in the upper and lower extremes of the Budyko curve and include the most extreme FDCs in the region in terms of shape and magnitude of specific discharge, two of these stations had been identified as having likely disinformative data. The poorer performance for the most extreme FDCs was not surprising given that the linear weighted combination method used for regionalisation makes it difficult to predict the most extreme FDC shapes. There was a clear relation between runoff ratio and precision (not shown), with higher precision in humid basins (except for Guatuso, no. 1, which had an inconsistent runoff ratio of 1.05 and a greatly underestimated regionalised FDC at all EPs). Examples of regionalised FDCs for four stations, including one of the best (San Francisco, no. 24) and one of the worst (Tamarindo, no. 16), are given in Fig. 8.

Water balance modelling using local calibration
Local calibration of the model parameters to the observed FDCs resulted in behavioural simulations in 26 of the 32 basins using the regional EPs, of which basin no. 17 had no behavioural simulations when using the local EPs Hydrol. Earth Syst. Sci., 18, 2993-3013 Observed discharge Simulated discharge Figure 10. Precipitation, observed and simulated discharge (mm day 1 ) at Bratsi, station no. 10 (top panel), one of the stations that had a poor correlation between observed discharge and CPI (0.12), and at Paiwas, station no. 18 (bottom panel) that had a high correlation between observed discharge and CPI (0.60). The simulated discharge was calibrated using FDCs calculated from local observed discharge and using the regional EPs. (Fig. 9). The basins with no behavioural simulations included three basins in northern Costa Rica (no. 2-4) that had runoff ratios of different magnitudes but approximately the same mean annual precipitation (Table A1 in Appendix A), as well as the two Panamanian stations (no. 27 and 28) that deviated substantially from the Budyko curve (Fig. 4). The differences in the reliability and precision between the simulations calibrated using local and regional EPs were small (Fig. 9). There were 13 basins for the regional EP calibration with reliability 50 % for low, intermediate and high flows. Unrepresentative precipitation data likely had an important contribution to the poorer performance in the other basins since a visual inspection showed obvious differences between basins with lower and higher high-flow reliability (Fig. 10). To further test this hypothesis, the correlation between the observed discharge for intermediate and high flows and CPI was plotted against the high-flow reliability for the local calibration with regional EPs (Fig. 11), and it could be seen that the basins with poor performance also had a poor agreement between CPI and observed discharge. For some basins (Fig. 10, bottom panel) there appeared to be a frequent timing difference of 1 day for the flow peaks, which may be related to commensurability uncertainty between precipitation and discharge stemming from precipitation measurements taken in the morning but discharge representing daily

Regional water balance modelling
The reliability of the regionalised simulations was comparable to that of the local calibration, with generally higher values for the regionalisation with some exceptions for intermediate (Guatuso, basin no. 1, see below) and high flows (Fig. 12a-c). The precision values were often lower, in particular for low and intermediate flows; this was in general related to the wider uncertainty bounds for the regionalised simulations (as a consequence of the greater uncertainty in the regionalised FDCs). The predicted uncertainty bounds for the regionalised simulations always overlapped with the locally calibrated simulation bounds (except for Guatuso, basin no. 1, which had an inconsistent runoff ratio of 1.05 and a regionalised FDC that was greatly underestimated), and also encompassed them for a large part of the time for most basins (Fig. 12d, 100 % overlap as percentage of the locally calibrated bounds means that they are encompassed). The overlap in percentage of the regional bounds (with a low value indicating relatively wide regional bounds) showed a similar pattern to the precision of the FDC regionalisation. There was also a clear relation for the aridity index with relatively wider regionalised bounds in more arid basins (Fig. 12e), which appears to be a result of relatively greater uncertainty for regionalised FDCs in arid basins in combination with narrow locally calibrated bounds as a result of few behavioural simulations in the most arid basins. Similar results with greater uncertainty in regionalisation in arid basins were also found by Bloeschl et al. (2013).
There was almost no difference between the locally and regionally simulated hydrographs where the regionalisation of the FDCs worked best (e.g. Camaron, basin no. 22, Figs. 12 and 13). Where the regionalised FDCs had wider uncertainty bounds, the predicted simulation uncertainty was greater than that from the local calibration (e.g. Balsa, no. 6 and Agua Caliente, no. 12, Fig. 13). In such cases additional regionalised information, e.g. recession behaviour (Winsemius et al., 2009), might provide additional constraints. For basins where the regionalisation worked less well, such as at Guanas (no. 14, that, except for Guatuso, had the poorest regionalisation results of the stations with behavioural local simulations) there was, apart from wide uncertainty bounds, also a systematic shift to the uncertainty bound for the less well regionalised part of the flow range (here high flows) but still a high degree of overlap with the locally calibrated uncertainty bounds (Figs. 12 and 13). There were six basins with Reg. Figure 13. Precipitation (dark blue), comparison of simulated uncertainty bounds from regionalisation (red) and local calibration (black) with observed discharge (light blue) at Camaron (no. 22 with one of the best FDC regionalisation results), Guanas (no. 14 that, except for Guatuso, had the poorest FDC regionalisation when there were behavioural local simulations), Balsa (no. 6 with high FDC regionalisation uncertainty), Agua Caliente (no. 12 with a good FDC regionalisation but poorer data consistency and local calibration), and Guardia (no. 2 with inconsistent data and no local behavioural simulations). The regionalised (red) and observed uncertain (blue) FDCs are shown in log-log space (right in each plot) together with the correlation between discharge and CPI for intermediate and high flows. The observed FDCs are plotted as used in the local calibration, i.e. without added temporal uncertainty.
Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/ behavioural simulations when the wider regionalised FDCs were used to constrain the simulations but not when using the local data (e.g. Guardia, no. 2). In all these cases the data seemed inconsistent when inspecting the time series of discharge and precipitation.

Discussion and concluding remarks
This study has explored a method for predictions in ungauged basins based on FDCs that accounts for uncertainty in the observed data, the FDC regionalisation method and the model parameterisation. This method is novel in for the first time explicitly incorporating observational uncertainties in rainfall-runoff model regionalisation; uncertainty in discharge from rating-curve analyses, uncertainties stemming from the use of short discharge time series, and analyses of uncertainties stemming from disinformative data. It also addresses the need for reliable predictions in ungauged basins in developing regions, where data limitations are often important, as highlighted by Hrachowitz et al. (2013).

Discharge data uncertainty
Discharge-data uncertainty can often be an important source of error (McMillan et al., 2010), which to our knowledge has not previously been accounted for in regionalisation. We estimated the uncertainty in the GRDC discharge data using 35 rating stations in Honduras, with the assumption that measurement practices and rating-curve derivation were similar in the rest of the region. The different uncertainties in the discharge-uncertainty estimation were assumed to be additive which may have resulted in overestimated uncertainties. It was, however, likely a conservative estimate that reflected the lack of information about site-specific conditions. The estimated discharge uncertainty was similar but somewhat higher to that reported in the review by McMillan et al. (2012), with the largest uncertainties for low flows for many stations as a result of poor rating-curve fits in combination with higher natural variability and relative measurement uncertainties for low flows (Pelletier, 1988). Patterns could be seen for some of the Honduran discharge stations in the variation of the residuals as a function of normalised flow as a result of poor rating-curve fits. An assumption of errors with a simple structure within the bounds was therefore not appropriate when the estimated uncertainty bounds were used for the GRDC discharge station data in model evaluation, but the limits-of-acceptability approach we used allowed for non-stationary biases within the observed uncertainty bounds.

Precipitation data uncertainty
Overall, precipitation data quality was probably the most limiting factor. The WFD variables used to calculate potential evaporation differed somewhat to local station data, but precipitation data quality is more important than evaporation data quality in many cases (Paturel et al., 1995). Because of lack of information about the magnitude of the precipitation errors, we only treated this uncertainty source implicitly through data-screening analyses and visual inspections of the time series. The CRN073 precipitation data were the best available gridded data for the Central American region. However, because of the high spatial variability of precipitation (Alfaro, 2002;Magaña et al., 1999), the resolution of the CRN073 data was not sufficient for many basins -in particular those located in mountainous regions where runoff ratios greater than one were found likely because of underestimated precipitation. In such circumstances no hydrological model that assumes mass balance can be expected to give good predictions . There were also noticeable time-variable errors in the precipitation data set as a result of changes in station density and/or measurement equipment.

Detection and impact of data set inconsistencies
The two methods that were used to screen the data set for inconsistencies between the runoff and climate data gave mostly similar results. The disinformative outliers on the Budyko curve resulted from runoff ratios much greater than one (Sect. 6.1.2) and from some basins with overestimated precipitation compared to higher-quality local information.
Most basins with low discharge-CPI correlation were outliers on the Budyko curve, with often obvious mismatches between the precipitation and discharge data time series, and there was a strong relation between the discharge-CPI correlation and high-flow reliability in the local calibration. This suggests that this method was useful for identifying inconsistent data in this region, and we recommend the use of data-screening methods in future regional studies. It should be remembered, however, that there may be shorter informative periods even if long-term averages are inconsistent, and matching peaks in precipitation and discharge should not be expected under all circumstances. Event-based runoff ratios may be useful to identify data with inconsistent events in basins with low baseflow but require sub-daily data in most basins . Identification of disinformative data prior to modelling may not always be possible, and another method for dealing with such data inconsistencies is therefore to use modelevaluation criteria that are robust to moderate disinformation (Beven and Westerberg, 2011). Calibration focused on hydrological signatures, such as FDCs, could be expected to be more robust to moderate disinformation, such as the presence of a few events with inconsistent inputs and outputs (Westerberg et al., 2011b). Our study combined these two methods for addressing the significant data uncertainties in studies of this type, and both were necessary considering that all disinformation could not be identified in the data screening and that the calibration method in some cases resulted in behavioural simulations even with highly disinformative input data. The latter cases can be detected in gauged catchments, but calls for discharge-data independent datascreening methods and/or the use of multiple signature constraints in ungauged catchments. Further research is needed to investigate the effects of disinformation on signature calibration and how best to estimate the effect of observational uncertainties on the values of different types of signatures. The choice of an appropriate likelihood in the face of the errors that affect hydrological inference has been discussed in great detail Clark et al., 2012). In this study we found a high presence of non-stationary errors in the model input and evaluation data with little information about the magnitudes. This made the informal likelihood function we used a suitable choice since it allowed implicitly for some of these errors without requiring an error model to statistically represent the error characteristics.

The use of FDCs for regional water balance modelling
The regionalised simulations were generally reliable compared to local simulations in the basins where behavioural simulations were found in local calibration. In the basins where the regionalisation of the FDCs worked best there was little difference between the regionalised and local simulations. Where it worked less well the predicted uncertainty was sometimes much wider than the local uncertainty bounds and the most extreme FDC shapes were less well predicted, leading to some systematic shifts to the uncertainty bounds compared to the local calibrations in those cases. Greater uncertainty in the regionalised compared to the local FDCs reduced their information content for constraining model predictive uncertainty in ungauged basins. This was especially important in the presence of disinformative input data, where simulations within the regionalised FDC uncertainty bounds were found in some basins but not within the locally estimated FDC bounds that were narrower. In local model calibration, posterior-performance analyses are useful to check whether the chosen signatures (e.g. the FDC) provide sufficient constraints for the particular modelling application (type of model structure, basin, climate, etc.) or whether additional information is needed to constrain the simulations (Westerberg et al., 2011b). However, in regionalisation such analyses cannot be made for the ungauged catchments and it would be advisable to always apply several different regionalised signatures (Yadav et al., 2007;Castiglioni et al., 2010) to ensure greater robustness of the predictions -especially in the presence of completely disinformative input data. It would, however, still be important to perform data screening and posterior performance analyses in the nearby gauged basins since similar behaviour, uncertainties and conditions might be expected. The use of other signatures requires further investigation of how observational uncertainties affect the uncertainty in different types of signatures and their regionalisation.
The method for FDC calibration developed by Westerberg et al. (2011b) was here tested for a wider range of basins and resulted in a high reliability in the local calibration in basins where the data screening indicated that the data had good quality. An assessment of the performance for different hydrograph aspects and of different ways of choosing the EPs on the FDCs, as in the previous study, was not made here but would be useful to assess the performance of the FDC calibration for the wider range of hydrological conditions in this study. It could be seen that in arid basins the discharge was often more constrained in recession periods compared to in humid basins (which could be a result of the more non-linear FDC shape), indicating that recession information (e.g. Winsemius et al., 2009;McMillan et al., 2013) might be useful to further constrain the uncertainty bounds in the latter case. Further conclusions on the strengths and weaknesses of the FDC calibration for this wider range of basins could also be drawn through the use of different model structures, e.g. different conceptualisations of groundwater storage and runoff generation in groundwater-dominated basins. The parsimonious model structure used here might be overly simple in many cases even if it showed good results previously at Paso La Ceiba (Westerberg et al., 2011b). Compared to those results, the average reliability was lower here (86 %, compared to 95 % previously), with the main difference between the simulations being the precipitation data. The CRN073 precipitation used here had a correlation of only 0.77 with the locally interpolated precipitation in that study. It might also be possible to estimate the prior parameter ranges based on catchment and climate characteristics, however such an analysis was outside the scope of this paper and would also be affected by disinformation in the regionalisation data.

Regionalisation of FDCs with uncertainty
The FDC regionalisation method was based on a fuzzy aggregation of the FDCs from the hydrologically most similar basins, which accounted for uncertainty in the data as well as the regionalisation relation. It resulted in generally reliable results except for the most extreme FDC shapes. This was because of the weighted combination of the FDCs in combination with relatively few gauged stations for a quite heterogeneous region. We found it important to include climate as well as basin characteristics in the definition of hydrologic similarity since rainfall is a dominating factor in shaping the hydrological regime in Central America (George et al., 1998;Waylen and Laporte, 1999). The representativeness of the climate data likely affected the calculation of hydrologic similarity and therefore the FDC regionalisation. The different Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/ lengths of the discharge series resulted in a temporal uncertainty that we estimated as a function of the number of years with data. The FDC regionalisation approach we used was similar to that of Holmes et al. (2002) who used a much larger set of basins. The effect of the chosen number of hydrologically similar catchments was evaluated in a cross-evaluation, and we recommend performing this type of analysis to inform the choice. Further conclusions about the advantages and disadvantages of the regionalisation method could be drawn by testing it in other regions with better-quality data.

Concluding remarks
The FDC contains important information about hydrological behaviour that is needed for most water balance investigations in ungauged basins, and it is therefore of interest on its own as well as a basic regionalised model constraint in many cases. Further research will be required to reveal what additional regionalised information is needed to ensure robust predictions under different circumstances and how uncertainties in such additional regionalised information can be reliably estimated. This study provides a strong demonstration of the need to assess the quality of the data used to inform the estimation of ungauged basin responses in a regionalisation study. The potential for non-stationary epistemic errors and hydrological inconsistencies means that the regionalisation might be subject to significant uncertainties that are difficult to estimate by standard statistical methods. This implies that deterministic predictions might be misleading, and that explicit recognition of uncertainty should be used in decision making. Where the estimates of uncertainty are particularly high, further data collection might be valuable in making decisions for water-resource management.
I. K. Westerberg et al.: Regional water balance modelling using flow-duration curves Appendix A: Discharge stations and basin characteristics Table A1. Discharge stations and basin characteristics, indices calculated for 1965-1994 except for RR and E POT /P that were calculated for the period of discharge record (i.e. the same as in the Budyko plot, Fig. 4 1 RElev is the elevation range in metres. 2 E POT /P is the aridity index, where E POT is potential evaporation and P is precipitation, here calculated for the period with discharge data at each station. 3 RR is the runoff ratio, total runoff divided by total precipitation calculated for the period with discharge data at each station. 4 MAP is the mean annual precipitation. 5 RL5 is the average number of days per year with precipitation below 5 mm. 6 NYr is the number of years with 80 % complete data in a year or hydrological year in 1965-1994. Hydrol. Earth Syst. Sci., 18, 2993-3013, 2014 www.hydrol-earth-syst-sci.net/18/2993/2014/