Uncertainty in hydrological signatures. Hydrology Earth System

. Information about rainfall–runoff processes is essential for hydrological analyses, modelling and water-management applications. A hydrological, or diagnostic, signature quantiﬁes such information from observed data as an index value. Signatures are widely used, e.g. for catchment classiﬁcation, model calibration and change detection. Uncertainties in the observed data – including measurement inaccuracy and representativeness as well as errors relating to data management – propagate to the signature values and re-duce their information content. Subjective choices in the calculation method are a further source of uncertainty. We review the uncertainties relevant to different signatures based on rainfall and ﬂow data. We propose a generally applicable method to calculate these uncertainties based on Monte Carlo sampling and demonstrate it in two catchments for common signatures including rainfall–runoff thresholds, recession analysis and basic descriptive signatures of ﬂow distribution and dynamics. Our intention is to contribute to awareness and knowledge of signature uncertainty, including typical sources, magnitude and methods for its assessment. We found that the uncertainties were often large (i.e. typical intervals of ± 10–40 % relative uncertainty)


Hydrological signatures and observational uncertainty
Information about rainfall-runoff processes in a catchment is essential for hydrological analyses, modelling and watermanagement applications. Such information derived as an index value from observed data series (rainfall, flow and/or other variables) is known as a hydrological or diagnostic signature and is widely used in both hydrology  and ecohydrology (Olden and Poff, 2003). The reliability of signature values depends on uncertainties in the data and calculation method, and some signatures may be particularly susceptible to uncertainty. Signature uncertainties have so far received little attention in the literature; therefore, guidance on how to assess uncertainty and typical uncertainty magnitudes would be valuable.
Signatures are used to identify dominant processes and to determine the strength, speed and spatiotemporal variability of the rainfall-runoff response. Common signatures describe the flow regime (e.g. flow duration curve, FDC, and recession characteristics) and the water balance (e.g. runoff ratio and catchment elasticity; Harman et al., 2011). Field studies have identified drivers of catchment function, such as a threshold response to antecedent wetness (Graham et al., 2010b;Penna et al., 2011;Tromp-van Meerveld and Mc-Donnell, 2006a), which have been captured as signatures (McMillan et al., 2014). Signatures often incorporate multiple data types, including soft data (Seibert and McDonnell, 2002;Winsemius et al., 2009).
There is a long history of using flow signatures in ecohydrology to assess instream habitat including the seasonal Published by Copernicus Publications on behalf of the European Geosciences Union. 3952 I. K. Westerberg and H. K. McMillan: Uncertainty in hydrological signatures streamflow pattern, and the timing, frequency and duration of extreme flows (e.g. Jowett and Duncan, 1990). Signatures are used to detect hydrological change, e.g. Archer and Newson (2002) used flow signatures to assess the impacts of upland afforestation and drainage. Signatures can define hydrological similarity between catchments (McDonnell and Woods, 2004;Sawicz et al., 2011;Wagener et al., 2007) and assist prediction in ungauged basins (Blöschl et al., 2013). Model calibration criteria using signatures are useful because they preserve information in measured data (Gupta et al., 2008;Refsgaard and Knudsen, 1996;Sugawara, 1979). Signatures used in calibration include the FDC (Westerberg et al., 2011), flow entropy (Pechlivanidis et al., 2012), the spectral density function (Montanari and Toth, 2007), and combinations of multiple signatures (Pokhrel et al., 2012). By using signatures that target individual modelling decisions, model components can be tested for compatibility with observed data Coxon et al., 2013;Hrachowitz et al., 2014;Kavetski and Fenicia, 2011;Li and Sivapalan, 2011;McMillan et al., 2011). Hydrological signatures have been regionalised to ungauged basins and then used to constrain a model for the ungauged basin (Kapangaziwiri et al., 2012;Westerberg et al., 2014;Yadav et al., 2007).
Some authors have considered the effect of data uncertainty on hydrological signatures , particularly in model calibration. Blazkova and Beven (2009) incorporate uncertainties in signatures used as limits of acceptability to constrain hydrological models. Juston et al. (2014) investigate the impact of rating-curve uncertainty on FDCs and change detection for a Kenyan basin. They show that uncertainty in extrapolated high flows creates significant uncertainty in the FDC and the total annual flow. Kennard et al. (2010) discuss the uncertainties affecting ecohydrological flow signatures from measurement error, data retrieval and preprocessing, data quality, and the hydrologic metric estimation.

Uncertainty considerations relevant for hydrological signatures
We present a short description of data uncertainties relevant to hydrological signatures (see McMillan et al., 2012, for a longer review). In general, data uncertainties stem from (1) measurement uncertainty (e.g. instrument inaccuracy or malfunction), (2) measurement representativeness for the variable under study (e.g. point rainfall compared to catchment average rainfall), and (3) data management uncertainty (e.g. data entry errors, filling of missing values or station coordinate errors). Errors from data management, equipment malfunction or human errors can often be detected and corrected in quality control (Bengtsson and Milloti, 2010;Eischeid et al., 1995;Viney and Bates, 2004;Westerberg et al., 2010). But some data errors, e.g. poorly calibrated or offlevel rain gauges, are difficult to correct post hoc (Sieck et al., 2007). The calculation of some signatures requires sub-jective decisions that introduce extra uncertainty, for example storm identification criteria, data time step, and whether to split the data by month/season (e.g. Stoelzle et al., 2013). Each uncertainty component requires an error model that specifies the error distribution and dependencies (e.g. errors may be heteroscedastic and/or autocorrelated). It is essential that the error model accurately reflects the uncertainty, rather than simply adding random noise, as hydrological uncertainties are typically highly structured. Some measurement uncertainties can be estimated by repeated sampling, whereas representativeness errors are difficult to estimate. The latter are often epistemic due to lack of knowledge at unmeasured locations/time periods (e.g. rainfall distant from rain gauges). The most appropriate method to assess data uncertainty depends on the information available and the hydrologist's knowledge of the catchment. For example, the choice of likelihood function may depend on characteristics of the data errors and the measurement site. Uncertainty estimation depends on the perceptual understanding of the uncertainty sources as well as the studied system and there is potential for a false sense of certainty about uncertainty where strong error model assumptions are made (Brown, 2004). Juston et al. (2014) refer to uncertainty 2 and show how interpretation of uncertainties as random vs. systematic affects hydrologic change detection. This paper was focused on signature uncertainty rather than data uncertainty; we stress that alternative data uncertainty assessment methods could be used where the perceptual understanding of the uncertainty sources is different.
The objectives of this paper were (1) to contribute to the community's awareness and knowledge of observational uncertainty in hydrologic signatures, (2) to propose a general method for estimating signature uncertainty, and (3) to demonstrate how typical uncertainty estimates translate to magnitude and distribution of signature uncertainty in two example catchments.

Catchments and data
We used two catchments: the Brue catchment in the UK, and the Mahurangi catchment in New Zealand. This enabled us to compare signature uncertainties in different locations and with different uncertainty sources. Both catchments have excellent rain-gauge networks that allowed us to quantify uncertainty in rainfall data, and there is some existing knowledge of the dominant hydrological processes.

The Mahurangi catchment
The Mahurangi is a 50 km 2 catchment in the North Island of New Zealand. It has a warm and humid climate, with mean annual rainfall of 1600 mm yr −1 . The catchment has hills and gently rolling lowlands, and land use is a mixture of pasture, native forest and pine plantation. The soils are clay loams, less than 1 m deep. Extensive data sets of rainfall and flow were collected during the Mahurangi River Variability Experiment 1997-2001(Woods et al., 2001. We used hourly data from the 13 tipping bucket rain gauges and the catchment outlet flow gauge for 1 January 1998-31 December 2000 (Fig. 1). Missing rainfall values were available from a previous study that had infilled them using linear correlation with a nearby site. The flow gauge has a two-part triangular weir for low to medium flows, and a rated section with confining wooded banks for high flows. During the study period, the maximum recorded stage was 3.8 m, but the highest gauged stage is 2.7 m.

The Brue catchment
The predominantly rural 135 km 2 Brue catchment in southwest England has low grassland hills of up to 300 m a.s.l. (Fig. 2). Clay soils overlay alternating bands of permeable and impermeable rocks. An extensive precipitation data set consisting of 49 tipping-bucket rain gauges and radar data with 15 min resolution was created by the HYREX (Hydrological Radar Experiment) project Wood et al., 2000). We used the data from 1 January 1994 to 31 December 1997, with a mean annual precipitation of 820 mm yr −1 . The extensive quality control described by Wood et al. (2000) included analyses of monthly cumulative rainfall totals and correlation analyses of timing errors. The detected errors included those caused by instrument malfunctions such as funnels blocked by debris and due to damage to electrical cables by mice. There were thus substantial periods of missing data resulting after quality control (Fig. 2), even for these carefully maintained rain gauges. We interpolated the missing precipitation values with inverse-distance weighting to obtain a complete data set for subsampling analysis.
The Lovington discharge station has a crump profile weir for low flows and a rated section above 0.6 m. The whole stage range was gauged and the water was below bankfull level for the chosen period. The stage-discharge relationship is affected by downstream summer weed growth resulting in scatter in the low-flow part of the rating curve.

Method: estimation of uncertainty in hydrological signatures
Uncertainty sources and distributions are application specific, so a general analytic solution for the signature uncertainty is not available. We suggest that Monte Carlo simulation provides a generally applicable and flexible method, by sampling equally likely possible realisations of the true data values (e.g. rainfall or flow series), conditioned on the observed data. Where multiple data sources are needed (e.g. calculation of runoff ratio), paired samples are used. Each sampled data series is used to calculate the signature value, and the values collated to give the signature distribution. This technique has previously been used to determine uncertainty in discharge (McMillan et al., 2010; and rainfall (Villarini and Krajewski, 2008). We applied the Monte Carlo (MC) approach to estimate uncertainty in signatures of different complexity. We used signatures that require rainfall and/or streamflow data only. Our method is described in Fig. 3 and has four steps: (1) identification of uncertainty sources in the data and from subjective decisions in signature calculation, (2) specification of uncertainty models for each uncertainty source either from the literature or catchment-specific analyses, (3) Monte Carlo sampling from the different uncertainty models and calculation of signature values for each sample, and (4) analyses of the estimated signature distributions, their dependence on individual uncertainty sources and comparisons between catchments. We analysed both the absolute and relative uncertainty distributions, where the relative uncertainties were defined using the signature value from the best-estimate discharge and precipitation.

Method: data uncertainty sources and their estimation
We first describe the error models for uncertainties relating to rainfall and flow. Further uncertainty sources that are specific to a particular signature are described separately in Sect. 3.2. Table 1 presents a summary of all uncertainty sources together with literature references for the uncertainty estimation methods.

Identification of uncertainty sources
We considered catchment average rainfall estimated from a network of rain gauges, with three main uncertainty sources: point measurement uncertainty, spatial interpolation uncertainty and equipment malfunction uncertainty (e.g. unrecognised blocked gauges). Point uncertainty includes random errors such as turbulent airflow around the gauge (Ciach, 2003) and is usually assessed using co-located gauges. Systematic point errors are also common (e.g. undercatch due to wind loss, wetting loss, splash-in/out). In theory, systematic errors can be corrected for, but this is difficult and the sitespecific information required is not always available (Sieck et al., 2007). In this study, we considered random point uncertainty but not systematic components. Interpolation errors occur when estimating catchment average rainfall from the point measurements at the gauges and depend on rainfall spatial variability (affected by topography, rain rate and storm type), density of gauges and network design.

Uncertainty estimation method
Point uncertainty was calculated using the formula derived by Ciach (2003) from a study of 15 co-located tipping bucket rain gauges over 12 weeks: where r is the rainfall rate (in mm h −1 ) and σ is the standard deviation of the relative error in 1 h measurements. No information about the distribution of the errors was given; we assumed a Gaussian distribution with zero mean. Interpolation uncertainty was estimated by subsampling from the gauge network. We subsampled using 1-13 (1-49) gauges for Mahurangi (Brue) for the basic signatures. For the combined rainfall-runoff signatures, three gauge densities were used: 1 gauge 45 km −2 , 1 gauge 10 km −2 and 1 gauge 5 km −2 , which equalled 1 (3), 5 (14) and 10 (28) gauges in Mahurangi (Brue) respectively. We also used the single-gauge case for Brue. Each subsampled data set was used to estimate areal average rainfall at each time step using Thiessen polygon interpolation. Equipment malfunction uncertainty was investigated for Brue, where a quality-assured set of reliable periods was available (Sect. 2.2). We repeated our analyses using both the raw and quality-controlled data sets.  (2015) Recession analysis Flow data time step Tested hourly vs. daily Seasonality of response Tested using all data or split by season Shaw and Riha (2012) Rainfall-runoff Effects of base flow Tested with/without base-flow separation Gustard et al. (1992) threshold Rainfall event definition Tested with/without inclusion of smaller events

Identification of uncertainty sources
We considered discharge as estimated from a measured stage series and a rating curve that relates stage to discharge. This is the most common method and is used at both our case study sites. The following are the main uncertainty sources.
1. Uncertainty in the gaugings (i.e. the measurements of stage and discharge used to fit the rating curve). Discharge uncertainty is typically larger; however, during high-flow gaugings, stage can change rapidly and its average may be difficult to estimate.
2. Approximation of the true stage-discharge relation by the rating curve. This is usually the dominant uncertainty , especially when the stage-discharge relation changes over time. In both catchments, low to medium flows are contained within a weir, which constrains the uncertainty. However, for Brue considerable low-flow uncertainty remains as a consequence of seasonal vegetation growth.
Uncertainty in the stage time series was not assessed apart from correcting obvious outliers. For Brue, occasional periods where stage data had been interpolated linearly from lower-frequency measurements were excluded from the recession analysis.

Uncertainty estimation method
We used the voting point likelihood method to estimate discharge uncertainty by sampling multiple feasible rating curves (McMillan and Westerberg, 2015). In brief, discharge gauging uncertainty was approximated by logistic distribution functions based on an analysis of 26 UK flow gauging stations with stable rating sections (Coxon et al., 2015). This analysis gave 95 % relative error bounds of 13-14 % for high flow and of 30-40 % for low flow (noting that the logistic distribution is heavy-tailed). Stage gauging uncertainty was ap-proximated by a uniform distribution of ± 5 mm, a mid-range value based on previous studies . Rating-curve uncertainties, including extrapolation and temporal variability, were jointly estimated using Markov chain Monte Carlo (MCMC) sampling of the posterior distribution of rating curves consistent with the uncertain gaugings. The voting point likelihood draws on previous methods that account for multiple sources of discharge uncertainty (Juston et al., 2014;Krueger et al., 2010;McMillan et al., 2010;. The rating-curve forms were based on the official curves, where Mahurangi had a three-segment power law curve and Brue a two-segment power law curve (for the range of flows analysed here). The power law parameters and the breakpoints were treated as parameters for estimation.

Basic signatures
A set of signatures describing different aspects of the rainfall-runoff behaviour were calculated (Table 2). We used signatures describing flow distribution, event characteristics, flow dynamics and rainfall; flow timing would be less affected by the data uncertainties studied here. Only data uncertainty (i.e. no subjective decisions) was considered for the basic signatures.

Recession analysis
Recession analysis is widely used to study the storagedischarge relationship of a catchment (Hall, 1968;Tallaksen, 1995), which gives insights into the size, heterogeneity and release characteristics of catchment water stores Staudinger et al., 2011). We used the established method of characterising the relationship between flow and its time derivative. In the theoretical case where flow Q is a power function of storage, and evaporation is negligible, the relationship is High-flow event frequency Average number of daily high-flow events per year, yr −1 with a threshold of 9 times the median daily flow (Clausen and Biggs, 2000) Q HD High-flow event duration Average duration of daily flow events higher days than 9 times the median daily flow (Clausen and Biggs, 2000) Q LF Low-flow event frequency Average number of daily low-flow events per year, yr −1 with a threshold of 0.2 times the mean daily flow (Olden and Poff, 2003, they used a 5 % threshold) Q LD Low-flow event duration Average duration of daily flow events lower days than 0.2 times the mean daily flow (see Q LF ) Flow dynamics

BFI
Base-flow index Contribution of base flow to total streamflow, calculated from daily flows using the Flood Estimation Handbook method (Gustard et al., 1992) S FDC Slope of the normalised FDC Slope of the FDC between the 33 and 66 % exceedance values of streamflow normalised by its mean (Yadav et al., 2007) Q CV Overall flow variability Coefficient of variation in streamflow, i.e. standard deviation divided by mean flow (Clausen and Biggs, 2000;Jowett and Duncan, 1990) Q LV Low-flow variability Mean of annual minimum flow divided by the median flow (Jowett and Duncan, 1990) Q HV High-flow variability Mean of annual maximum flow divided by the median flow (Jowett and Duncan, 1990) Q AC Flow autocorrelation Autocorrelation for 1 day (24 h) -Used by Euser et al. (2013) and Winsemius et al. (2009) whereQ = Q/Q 0 is flow scaled by the median flow Q 0 . T 0 and b are found by plotting −dQ/dt against Q on logarith-mic axes; b is the slope and T 0 is derived from the intercept. T 0 is the characteristic recession time at the median flow. b indicates nonlinearity of response: b = 1 implies a linear reservoir, b > 1 implies greater nonlinearity or multiple wa-ter stores with different drainage rates (Clark et al., 2009;Harman et al., 2009). Subjective decisions in recession analysis include how recession periods are defined, the delay after rainfall used to eliminate quickflow, the data time step, and whether to extend time steps during low flows to improve flow derivative accuracy (Rupp and Selker, 2006). A moving average can be used to smooth diurnal flow fluctuations. Options to estimate T 0 and b include linear regression, total least squares regression to allow for errors in both variables (Brutsaert and Lopez, 1998), or regression on binned data values (Kirchner, 2009). If water distributions vary seasonally, the results are sensitive to whether recessions are fitted using all data combined or split by season, month or event (Shaw and Riha, 2012).
We assessed subjective uncertainty in recession analysis by comparing the distributions of recession parameters b and T 0 in the following cases, which in our experience have the most potential to affect recession parameter values: (1) using hourly vs. daily flow data, and (2) calculating recession parameters using all data combined vs. calculating parameters by season and taking the mean.

Thresholds in rainfall-runoff response
Threshold behaviour in the relationship between rainfall depth and flow contributes to hydrological complexity (Ali et al., 2013) and exerts a strong control on model predictions. Threshold identification depends on both rainfall and flow data, making it a good candidate to test the effect of multiple uncertainty sources. Rainfall-runoff thresholds have been found in many catchments (Graham et al., 2010b;Tromp-van Meerveld and McDonnell, 2006a, b), including the Mahurangi (McMillan et al., , 2014. We only studied threshold signatures in Mahurangi, as Brue did not display any rainfall-runoff threshold. The signatures that we used were threshold location (in millimetres of rain per event) and threshold strength. We quantified threshold strength based on the method of McMillan et al. (2014). Storm events were identified and event rainfall was plotted against event runoff. Strong threshold behaviour was defined as an abrupt increase in slope of the event rainfall-runoff relationship. This attribute was tested by fitting each data set with two intersecting lines (a "brokenstick" fit), using total least squares to optimise the slopes and intersect. The corresponding null hypothesis was that the two lines have equal slopes. This test returns a z statistic which quantifies the strength of evidence for the alternative hypothesis: where the absolute value exceeds 1.96, the null hypothesis can be rejected at the 5 % level.
We defined events based on McMillan et al. (2011), such that events require at least 2 mm h −1 or 10 mm day −1 of precipitation, and are deemed to end either when a new event begins, or 5 days after the last rainfall. Events are distinct if they are separated by 12 dry hours. We assessed uncertainty  due to subjective decisions by using or not using base-flow separation and by changing the event definition to include smaller events, where at least 1 mm h −1 or 5 mm day −1 of precipitation fell. We used the base-flow separation method of Gustard et al. (1992), which interpolates linearly between 5-day flow minima to create the base-flow series.

Rainfall data
The standard deviation of the error in catchment average rainfall resulting from different numbers of subsampled stations was calculated. It was plotted as a function of hourly rain rate using the moving-average window method of Villarini and Krajewski (2008), with a bandwidth equal to 0.7 times the rain rate at the centre of the window (results for Brue in Fig. 4). The errors decreased with rain rate and there was a large initial decrease in the error when the number of subsampled stations increased from 1 to around 5. The point uncertainty only had a small effect on the error standard deviation. The number of gauges had a large effect on the estimated mean annual precipitation; if only one rain gauge was used, there was a range of 200-300 mm yr −1 that would clearly affect catchment water balance analyses (Fig. 5). One rain gauge in a catchment of this size is still well above the WMO-recommended station density of 1 gauge 575 km −2 in hilly terrain (WMO, 2008). Here there was also a large initial decrease in the range when the number of gauges increased to around five. But, even when three or four gauges were used (1 gauge 12-16 km −2 ) for Mahurangi, there was a 1430-1660 mm yr −1 range in mean annual precipitation. When the non-quality-controlled data set was used for Brue ( Fig. 5a and b), there was a decrease in both mean annual values and standard deviation. At the same time, the range in standard deviation increased because stations with erroneously high or missing precipitation values were retained (blocked rain gauges were a particular problem in this catchment; Wood et al., 2000). The estimated precipitation standard deviation was uncertain for one subsampled gauge in Mahurangi (Fig. 5c), where gauges were located in both the wettest and driest parts of the catchment.

Discharge data
The estimated rating-curve uncertainty is shown in Fig. 6, with the corresponding flow percentile uncertainty summarised using boxplots. The 5-95 percentile uncertainty bounds enclose almost all of the uncertain gaugings, apart from a small number of outliers. Low-flow uncertainty is larger in Brue where vegetation growth affects the stability of the stage-discharge relation. High-flow uncertainty is larger in Mahurangi where fewer, more scattered high-flow gaugings cause a wider range in the extrapolated flows. Mahurangi has a fast rainfall-runoff response with little base flow and peak-flow events that are infrequent but have large magnitudes (up to 11 mm h −1 ; Fig. 7a, right inset plot). Brue, by contrast, has a higher base flow and more peak-flow events of longer duration and lower magnitudes (up to 1 mm h −1 ; Fig. 7b, right inset plot). Large high-flow uncertainty is likely in catchments such as Mahurangi where peak flows occur seldom and last only a few hours -this makes reliable highflow gauging practically difficult and rating-curve extrapolation likely necessary. The larger high-flow rating-curve uncertainty in Mahurangi (Fig. 6a) is reflected in a wider peakflow uncertainty distribution (Fig. 7a, left inset plot). In Brue, the whole flow range is gauged and the high-flow ratingcurve uncertainty is smaller (Fig. 6c), the peak-flow distribution has higher kurtosis with heavier tails (Fig. 7b,  Uncertainties are calculated relative to the optimal rating curve from the MCMC. For Brue the official rating curve is dissimilar to the optimal MCMC rating curve because it was calculated for a longer gauging data set starting in the 1960s, with considerably more variability. The rating curve is shown in linear space, with an inset plot in log space for the low-flow range. The flow percentiles for the optimal rating are given as hourly averages (in mm h −1 ) at the bottom of the (b and d) figures. The boxplot whiskers extend to the 5 and 95 percentiles, and the box covers the interquartile range.

Basic signatures
Flow percentile uncertainties mirrored those of the rating curves, with larger uncertainties in high-flow percentiles for Mahurangi and larger uncertainties in low-flow percentiles for Brue (Fig. 6). Uncertainty in mean discharge was around ± 10 % for both catchments; this is the 5-95 percentile interval, while the distributions are shown in Fig. 8. Signatures describing the flow variability (S FDC , Q CV , and Q AC ) had much higher uncertainties in Mahurangi (± 20-50 %), where there was a fast rainfall-runoff response and greater highflow rating uncertainty. The uncertainty in the S FDC was particularly large for Mahurangi because the rating curve had a breakpoint in the 33-66 percentile interval used to calculate the slope. Signatures describing the frequency and duration of high-and low-flow events (Q HF , Q HD , Q LF , and Q LD ) had large uncertainties in both catchments (± 10-35 %). This arises because the event threshold is defined as a multiplier of the mean or median flow, and so the (uncertain) gradient of the rating curve greatly impacts on the flow percentile equivalent to the threshold value. Frequency and duration signatures have alternatively had the event threshold defined directly as a flow percentile (Kennard et al., 2010;Olden and Poff, 2003); we suggest this is preferable as those signatures were insensitive to the uncertainties analysed here, apart from sometimes small effects when using daily averages.

Total runoff ratio
For the total runoff ratio, we tested the contribution of each uncertainty source by including or excluding different sources. We calculated total uncertainty (Fig. 8c, d, black bars) using different rain-gauge densities. Total uncertainty was approximately ± 15 % using a single rain gauge, decreasing slowly with more gauges. The distributions were largely unbiased when using quality-controlled data. The contribution of point precipitation uncertainty was minimal: excluding this source made no difference to the uncertainty distribution (Fig. 8, green bars). Precipitation uncertainty is therefore due to interpolation and was evaluated by excluding flow uncertainty and calculating the remaining uncertainty (Fig. 8, blue bars). This uncertainty was noticeable (approx- imately ± 10 % Mahurangi, ± 9 % Brue) for one gauge, but decreased quickly with more gauges and was negligible at a density of 1 gauge 5 km −2 . Total uncertainty was dominated by discharge uncertainty (dark blue bars) which was greater than precipitation uncertainty (blue bars). In the Brue catchment the effect of using non-quality-controlled data was assessed (red and purple bars) which increased and biased the uncertainty, particularly at low gauging densities.

Recession analysis
We tested the effect of data uncertainty on recession analysis results by plotting histograms of the recession parameters b (nonlinearity of recession shape) and T 0 (recession slope at median flow). We considered subjective uncertainty by using data at daily or hourly time steps and by calculating parameters using all data together or splitting by season and then taking parameter averages (Fig. 9). Uncertainty in the recession descriptors was typically (1) greater for Brue than for Mahurangi, in particular for hourly flow data, and (2) greater for hourly flow data than for daily flow data. Recessions are calculated from flow derivatives and are therefore affected by relative changes in flow (e.g. channel shape). The linear regression used to calculate the recession parameters is particularly sensitive to uncertainties in extreme low or high flows. The low-flow uncertainty at Brue resulting from summer weed growth creates higher uncertainties at that site. Daily flow values are based on an aggregation of measured values and are therefore more robust to data uncertainty. However, using daily data in small catchments can mask details of the recession shape, as the slope can change markedly during a single day. In our case, this difference caused shifts in the parameter distributions between hourly and daily data and would therefore affect our ability to compare parameter values between catchments. For example, b values were similar in the two catchments when using daily data, but different when using hourly data; and the converse is true for T 0 . This was caused by differences in the hydrograph such as low-flow fluctuations in Brue and flashy peak-flow events in Mahurangi.
Recession parameters calculated per season were highly uncertain in Brue for the T 0 parameter. This was due to some seasons having very few recession data points and therefore the fitted regression relationships being sensitive to changes in these points. Recession parameters were highly sensitive to subjective decisions in defining recession periods, as also found by Stoelzle et al. (2013). Such definitions could result in particular recession periods being included or excluded from the analysis depending on the sampled rating curve. When the excluded periods included extreme high-or lowflow values, this could significantly skew the fitted parameters and therefore give multimodal parameter distributions according to the particular set of valid recession periods. For the daily timescale, the starting hour used in calculating the daily averages could also have a large effect on the resulting recession parameters.

Thresholds in rainfall-runoff response
We tested for uncertainty in the estimated threshold in the event rainfall-runoff relationship in Mahurangi using box-plots of the threshold location and strength under different uncertainty scenarios (Fig. 10). The threshold broken-stick fit is illustrated in Fig. 10a for the best-estimate data (in blue) and for an example realisation with uncertainty (in grey).
The threshold was 65 mm when using best-estimate rainfall and flow data. Total uncertainty was a largely unbiased distribution with a range of ∼ 20 mm. Total uncertainty was a combination of flow uncertainty (slight low bias) with rainfall interpolation uncertainty (slight high bias). Point rainfall uncertainty was not important when using multiple gauges. Threshold location was highly sensitive to the number of rain gauges used: using only one gauge created a very wide uncertainty distribution. As with the rainfall uncertainty analysis, there was a large decrease in the uncertainty when increasing to five gauges (Sect. 4.1.1). The use of base-flow separation did not greatly change the median threshold but did increase the range. Event definition parameters had little effect on the threshold uncertainty.
Threshold strength was defined using a change-in-slope statistic where higher values indicate a stronger threshold. Considering flow or rainfall uncertainty weakened the calculated threshold. For flow uncertainty this was due to the op- timal rating curve having its first breakpoint and mid-section slope above the median values of the sampled rating-curve distribution; both of which were associated with a stronger threshold. As with the S FDC , this shows the strong impact of the rating-curve breakpoint locations on signature uncertainty. For rainfall, uncertainty adds noise to the event rainfall depth and therefore corrupts the estimated rainfallrunoff relationship, weakening the threshold. Consequently, the number of rain gauges is an important control on estimated threshold strength, with fewer gauges causing a weakened threshold. As the underlying threshold was strong, the case of one rain gauge was the only scenario that could cause the threshold statistic not to be significant at the 5 % level. However, in other catchments with weaker thresholds, lack of good rainfall data is likely to result in thresholds being missed. Using base-flow separation increased the derived threshold strength, as it typically reduced runoff depths for smaller events below the threshold. Event definition had only a small effect on derived threshold strength; when smaller events were included the threshold strength statistic increased, as the fit was based on a greater number of points.

Summary of the signature uncertainties
To summarise our results, we tabulated examples of each signature type together with their dominant uncertainty sources and summary statistics of the total uncertainty distribution, for each catchment (Table 3). Our aim is to allow for an easy comparison of the signature uncertainties in our study with those of other studies. We therefore chose commonly used distribution statistics, i.e. the first three distribution moments (mean, standard deviation, skewness) and the half-width of the 5-95 percentile range, which is commonly quoted in uncertainty studies (e.g. McMillan et al., 2012). We hope that authors of future studies will consider using similar statistics, to enable the community to compile a generalised understanding of signature uncertainties across different catchments, scales and landscapes.

Uncertainty in different types of signatures
Uncertainty distributions were highly variable between signatures and therefore the impact of the uncertainty depends on which signatures are used (Table 3). There was greater uncertainty in signatures that use high-frequency responses (e.g. variations over short timescales, thresholds based on event precipitation totals), subsets of data more prone to measurement errors (e.g. extreme high and low flows, Q HV and Q 99 ), and signatures based on small numbers of values (e.g. seasonal recession characteristics in the Brue catchment). Signatures describing flow variability were uncertain in the Mahurangi catchment, which has a flashy rainfallrunoff response and where stage significantly exceeded the highest gaugings leading to large discharge uncertainty at high flows. This is likely to be a common situation in small, Horizontal grey lines show baseline signature values from the optimal rating-curve and precipitation data. The orange line in Fig. 9c shows the value above which the change in slope of the rainfall-runoff relationship is significant at the 5 % level. Boxplot whiskers for the uncertainty distribution in the one-rain-gauge scenario are truncated for clarity. The total uncertainty scenario used 1 raingauge 10 km −2 .

Hydrol
fast-responding catchments with few high-flow events, due to the practical difficulties of gauging during such short time windows. There was lower uncertainty in signatures that use spatial or temporal averages (e.g. total runoff ratio and BFI). Uncertainty in signatures calculated from averages depends on the type of data uncertainty, e.g. random errors are reduced by averaging, but some systematic errors such as rainfall undercatch are not. Rating-curve uncertainty is an intermediate case as it depends on error magnitudes that vary across the flow range. Some signatures are sensitive to particular types of data uncertainty. For example, in Mahurangi high uncertainty in S FDC relates to uncertainty in rating-curve shape, and in Brue high uncertainty in Q LD relates to uncertainty of the low-flow rating in combination with the shape of the hydrograph. Signatures that describe the rainfall-runoff relationship for individual events (e.g. threshold location and strength) were particularly sensitive to precipitation uncertainties for low gauging densities.
Signatures can be designed to be robust to some data uncertainty sources. A clear example is for signatures describing the frequency and duration of high and low-flow events. If these events are defined using a threshold defined as a multiplier of the mean or median flow, they are highly sensitive to rating-curve uncertainty. If, instead, the events are  Fig. 10. directly defined using a flow percentile threshold, they were little affected by rating-curve uncertainty (see Sect. 4.2.1). This simple change in signature definition reduces sensitivity to data uncertainty. We found that any cut-offs imposed in signature calculation, such as event or recession definition criteria, could have a strong and unpredictable effect on signature uncertainty. For example, rainfall-runoff threshold strength calculations were particularly sensitive to large storm events, which control the gradient of the second line in the "broken stick". If such events were conditionally excluded (e.g. classified as disinformative and removed when runoff exceeded rainfall; which depends on the rating curve and rain gauge(s) selected), the resulting uncertainty could overwhelm any other uncertainty sources. We suggest that signatures including cut-off type definitions should be carefully evaluated and the cut-offs removed if possible.

Method limitations and future developments
The quality of signature uncertainty estimates relies on accurate assessment of data uncertainty and therefore on sufficient information. An example of insufficient uncertainty information would be for a gauge where out-of-bank flows occur, but there is no information on the out-of-bank rating. As discussed by Juston et al. (2014) for rating-curve uncertainty, it is essential to understand whether data errors are random or systematic, aleatory or epistemic. In our study, point rainfall errors were not important in signature uncertainty, but there is scope to improve their representation as systematic or random (e.g. systematic wind-related undercatch, or random turbulence effects). However, quantification of these errors is not straightforward (Sieck et al., 2007).
We recognise that the inferred distributions of signature uncertainty will be sensitive to the assumptions and methods used to estimate distributions of data uncertainty. This introduces some subjectivity into the uncertainty estimation and it is therefore important to make the assumptions explicit and motivate method choices by the perceptual understanding of the uncertainty sources. For example, the optimal methods for estimating rating-curve uncertainty under typical timevarying, poorly specified errors remain an active debate in the hydrological community. Using an informal likelihood, as we did, rather than a formal statistical likelihood can be more robust to multiple epistemic error sources but can also be criticised for not obeying a formal statistical framework (as discussed by Smith et al., 2008). Future progress in understanding how perceptual models and data jointly contribute to system identification may help to resolve this dichotomy (Gupta and Nearing, 2014). At present, we recognise that uncertainty distributions are more subjective in signatures that emphasise poorly described aspects of data uncertainty such as out-of-bank flows.
For signatures calculated over a long time period, it may be appropriate to incorporate nonstationary error characteristics, such as rating-curve shifts or the example explored by Hamilton and Moore (2012) where the best-practice method for infilling discharge values under ice changed over time. The time period used is important if signatures are used for catchment classification: an unusual event such as a large flood may shift the signature values (Casper et al., 2012). Additional uncertainty sources can be important in other catchments, such as catchment boundary uncertainty and flow bypassing the gauge (Graham et al., 2010a).

Implications for use of signatures in hydrological analyses
Our results are pertinent to any hydrological analysis that uses signatures to assess catchment behaviour. Examples of applications whose reliability could be affected by signature uncertainty include testing bias correction of a climate model using signatures in a coupled hydrological model (Casper et al., 2012), predicting signatures in ungauged catchments (Zhang et al., 2014), classifying catchments using flow complexity signatures (Sivakumar et al., 2013), and assessing spatial variability of hydrological processes (McMillan et al., 2014). In some cases, absolute signature values are not used, rather it is the pattern or gradient over the landscape, or trend over time that is important. Data uncertainties may obscure such patterns depending on the magnitude of the uncertainty in relation to the strength of the measured pattern. The range of signature values found by McMillan et al. (2014) across Mahurangi was large compared to the uncertainty magnitudes found in this study. This suggests that the conclusions regarding the signature patterns would still hold, assuming that the uncertainty at the catchment outlet is representative for the internal subcatchments. Some subjective uncertainty sources may not be relevant in catchment comparisons, as choices such as how to define recession periods or whether to do base-flow separation can be chosen consistently. However, subjective uncertainties can still change the conclusions drawn such as the cut-offs described above, and as discussed in Sect. 4.2.3 where daily data suggested similar recession b parameters in Mahurangi and Brue but hourly data showed strong differences. When signatures are used as a performance measure in model calibration (e.g. Blazkova and Beven, 2009) reliable uncertainty estimates are crucial so that the model is not overfitted. Previous studies have quantified data and signature uncertainty using upper and lower bounds (e.g. fuzzy estimates used by Coxon et al., 2013;Hrachowitz et al., 2014;Westerberg et al., 2011). However, this does not allow for the straightforward estimation of uncertainty in all types of signatures that is made possible by our method of generating multiple feasible realisations of rainfall and discharge time series.

Conclusions
This study investigated the effect of uncertainties in data and calculation methods on hydrological signatures. We present a widely applicable method to evaluate signature uncertainty, and show results for two example catchments. The uncertainties were often large (i.e. typical intervals of ± 10-40 % relative uncertainty) and highly variable between signatures. It is therefore important to consider uncertainty when signatures are used for hydrological and ecohydrological analyses and modelling. Uncertainties of these magnitudes could change the conclusions of analyses such as cross-catchment comparisons or inferences about dominant processes.
Although we show that significant uncertainty can exist in hydrological signatures, we do not intend that this paper has a negative message. Consideration of uncertainty is equivalent to extracting the signal from noisy data and not overestimating the information content in the data. As argued by  and Juston et al. (2013), ig-norance is not bliss when it comes to hydrological uncertainty; incorporation of uncertainty analysis leads to many advantages including more reliable and robust conclusions, reduction in predictive bias, and improved understanding. In particular, we hope that this paper encourages others to estimate data uncertainty in their catchments, either individually or by reference to typical uncertainty magnitudes, to design diagnostic signatures and hypothesis testing techniques that are robust to data uncertainty and to evaluate analysis results in the context of signature uncertainty.