Interactive comment on “ Event-scale power law recession analysis : Quantifying methodological uncertainty ”

The authors state that, while there is increasing attention for single streamflow recession characterization, it is unknown what the dependence of estimated model parameters on methodological choices is. To resolve this problem, they use daily streamflow data from 16 catchments located in California and Oregon and investigate how commonly used streamflow recession definitions influence the parameters of a power-law recession model that describes the recession. The methodological choices include: the start of a recession the end of a recession the minimum length of a recession the method of power law model fitting Results indicate that these choices can impact parameter value estimates, whereby the recession parameter distributions are method-dependent, but a particular method affects a given parameter in similar ways C1


Introduction
Streamflow recession analysis has the goal of characterizing recession behavior in terms of phenomenological models of decreases in flow (Q, with units of L T −1 or L 3 T −1 ) over time, typically represented with a power law differential equation (Boussinesq, 1877;Hall, 1968;Tallaksen, 1995): There is no universally agreed upon procedure for performing power law recession analysis; however, most approaches are comprised of two key steps: (i) identify and isolate periods of flow recession using the hydrograph and (optionally) other hydroclimatic datasets -a step referred to here as "recession extraction"; and (ii) use the isolated periods of recession to parameterize the power law model -a step we refer to as "fitting".
Published by Copernicus Publications on behalf of the European Geosciences Union.
Classical recession seeks a single, effective parameterization of the power law recession model.With some exceptions (e.g., Lamb and Beven, 1997;Wittenberg, 1994), these approaches typically perform the fitting step in a single operation: [log(Q), log(−dQ/dt)] point pairs are computed for multiple recession periods, and the recession parameters are then obtained from the slope and intercept of a line fitted to the [log(Q), log(−dQ/dt)] point cloud (e.g., Brutsaert and Nieber, 1977;Stoelzle et al., 2013;Tague and Grant, 2004;Basso et al., 2015;Clark et al., 2009;Kirchner, 2009;Sawaske and Freyberg, 2014;Bogaart et al., 2016).This form of "lumped" recession analysis is empirically and theoretically motivated.Practically, it reasonably captures observed nonlinearity in the hydrograph recession.Theoretically, it uses a model form that is predicted by solutions of the hydraulic groundwater equations (Boussinesq, 1904;Troch et al., 2013).Lumped recession analysis has been used for inverse modeling, the development of flow separation algorithms, characterization of aquifer properties, and parameterization of hydrologic models, among other applications (Vogel and Kroll, 1992;Rupp and Selker, 2006a;Rupp et al., 2004;Szilagyi et al., 1998;Huyck et al., 2005;Bogaart et al., 2016;Tague and Grant, 2004).
Among the many issues associated with event-scale analysis (Dralle et al., 2015), perhaps the most challenging are the numerous subjective choices needed to establish consistent criteria for recession identification and fitting (Westerberg and McMillan, 2015).For lumped analyses, Brutsaert and Nieber (1977) established a derivative-based method, which avoids the issue of needing to determine the precise start day of a recession event.Event-scale analyses, however, must identify the start and end of each recession event and select one of many fitting techniques to obtain (a, b) values.
Despite the growing number of event-scale recession studies, it remains unclear to what extent the particular method of recession extraction and fitting could alter features of the computed populations of recession parameters.If uncertainty due to methodological choices exceeds physically derived variations in the recession parameters, new and less ambiguous methods will be needed to allow empirical comparative analyses, and to test hypotheses derived from novel theories (e.g., Biswal and Marani, 2010;Clark et al., 2009;Harman et al., 2009).Previous work has demonstrated that method dependent variability in recession parameters in lumped analysis can be larger than natural variability between catchments (Stoelzle et al., 2013).For event-scale recession analysis, Chen and Krajewski (2016) demonstrate sensitivity of the recession exponent to recession length and start time relative to a flow peak.However, no systematic study has been undertaken to examine sensitivity of both a and b to some of the most common methodological choices made during event-scale power law recession analysis.Given the early stage of event-scale recession exploration, it is an opportune time to determine the methodological limitations associated with event-scale techniques, hopefully supporting intercomparability and consistency in future work.
Analogously to Stoelzle et al. (2013) and Chen and Krajewski (2016), this study examines the sensitivity of recession parameter values to the various methodological choices to be made when performing event-scale recession extraction and fitting.Specifically, we seek to address four primary research questions: -Research question 1 -how do methodological choices impact fit quality of the power law recession model?
-Research question 2 -when catchments are ranked by fitted recession parameter statistics, is the rank order dependent on methodological choices?
-Research question 3 -how do methodological choices affect the empirical frequency distributions (over the period of record) of recession parameter values?
-Research question 4 -how might methodological choices affect relationships between a given recession parameter and other physical measures of catchment state, such as catchment wetness?
In seeking answers to these questions, we recognize that, unlike lumped recession analysis (Vogel and Kroll, 1992;Brutsaert and Nieber, 1977;Kirchner, 2009), no set of canonical methods of event-scale recession analysis have been established.We propose breaking down the two steps of recession analysis -recession extraction and power law model parameterization -into four methodological choices: three concerning recession extraction and one concerning model parameterization: 1.The minimum allowable length of a recession eventthis choice sets a minimum duration (units of days in our analysis) for a recession period to be selected for analysis.Recessions less than the minimum duration are discarded.
2. The definition of the beginning of a recession eventthe start of a recession event is usually determined by a flow peak filtering algorithm applied to the streamflow time series.Commonly, peaks are identified using a simple flow threshold, wherein flow peaks exceeding the threshold are flagged as potential starts to a recession event.
3. The definition of the end of a recession event -recession ends can be identified by the occurrence of a rainfall event, a transition from decreasing discharge to increasing discharge (dQ/dt < 0 → dQ/dt > 0), or a break in the upward concavity of the flow time series In this work, we select two end-member settings that define realistic methodological limits for each of the above four choices.The resulting 16 combinations of method choices, as applied to a broad flow dataset, provide a basis for constraining method-dependent uncertainty in the populations of recession parameters. 2 Methods

Study sites
The analyses in this study are performed using United States Geologic Survey daily streamflow data for the set of 16 US catchments from northern California and southern Oregon summarized in Table 1.While more frequently sampled discharge data can be used for recession analysis, we use daily data because it is the most common choice in event-scale recession literature.The periods of record for each catchment are visualized in Fig. 1.
The study catchments are relatively steep, forested, and situated within the US western coastal mediterranean climate region (Peel and Finlayson, 2007), which is characterized by a distinct rainy season (with little or no snowfall), followed by a pronounced dry season during which precipitation makes a minimal contribution to the water balance (Power et al., 2015;Dralle et al., 2015).While average annual rainfall for the study catchments ranges from about 1m to 2m, the highly variable rainfall climatology characteristic of mediterranean regions (Fatichi et al., 2012) might be considered ideal for a recession sensitivity analysis, as catchments experience a large range of recession behaviors and wetness states.A typical year of runoff data for Elder Creek near Branscomb, California (USGS Gage: 11475560), is presented in Fig. 2.

Overview of the methods varied across recession analyses
Presumably there is an unbounded range of methodological choices that could be made regarding event-scale recession analysis.of criteria to confirm the continuation of a hydrograph segment that merits analysis (e.g., a slope or concavity requirement), (iv) the selection of a fitting methodology by which to analyze a chosen recession.While other choices undoubtedly have impacts on the characteristics of a population of analyzed recessions, the selection of these four criteria represents the most constrained and fundamental set of methodological choices to explore.

Nomenclature and symbols used
To concisely describe the combinations of methods tested here, we first represent the four methodological choices with four binary (taking values of 0 or 1) variables: 1. minimum recession length (M) 2. peak selectivity (S)

fitting method (L).
The extraction related variables (M, S, and C) are defined so that a value of 1 corresponds to a more restrictive extraction method; that is, the method choice filters out more recessions if its corresponding variable is 1 than if the variable is 0. For example, M = 1 corresponds to a minimum recession length of 10 days, which is more restrictive than a minimum recession length of 4 days (M = 0).Figure 3 enumerates the 16 method combinations using a decision tree, where each level of the tree sketches the effect of that level's method choice on the features of extracted recessions.

Defining the minimum allowable length of recession event (M)
Owing to the derivative-based methods developed by Brutsaert and Nieber (1977), most lumped recession analyses do not set a minimum duration recession events.However, nearly all event-scale recession studies set a minimum duration for chosen recession periods.Reasons for this choice vary; authors cite the removal of noise from short events (Ye et al., 2014), the necessity of capturing late time flow processes (Chen and Krajewski, 2015), and data quality concerns related to sample size (Shaw, 2016).Event-scale recession analyses have typically chosen a minimum of 4 to 5 days of recession for daily data (e.g., Shaw and Riha, 2012;Biswal and Marani, 2010), although values upwards of 10 days (e.g., Howe, 1966) and as low as 12 h (e.g., McMillan et al., 2014, for high-frequency data) have been used.
To logically examine sensitivity to minimum recession length, the "liberal" and "restrictive" end-member values should be chosen to be consistent with typical recession timescales of the study catchments.By fitting a linear recession model (dQ/dt = −kQ) to a representative collection of recessions from each catchment in our dataset, we find that median recession response timescales (1/k [T]) range from about 2 to 4 days.To capture the important features of extracted recessions, while also varying the minimum recession length significantly with respect to typical response times (and also without choosing values so restrictive as to limit the size of our sample sets), we extract restricted sets of recessions using a minimum length of 10 days (M = 1), and less restricted sets of recessions with a minimum length of 4 days (M = 0).

Method choices decision tree
MSCL 0101 MSCL 0100 Figure 3. Graphical illustration of the 16 method choices.Minimum recession length (M) determines whether extracted recessions are required to have a minimum length of either 4 (M = 0) or 10 days (M = 1).Recession peak selectivity (S) determines whether the peak selection algorithm is highly restrictive (S = 1) or relatively permissive (S = 0).Recession concavity (C) determines whether recessions are required to be both decreasing and concave up (C = 1), or simply decreasing (C = 0).Finally, Linearity (L) determines whether or not the values of the recession parameters a and b are determined using linear regression on the plot of log[−dQ/dt] vs. log Q (L = 1), or using a nonlinear least squares fit to the raw recession time series (L = 0).

Identifying potential recession starts (S)
Ideally, rainfall data would be used to identify periods of recession.However, high-quality precipitation records are often unavailable, and so the majority of event-scale recession analyses rely on flow data alone for recession identification.We therefore only consider methods of recession analysis that can be applied to any daily streamflow record, with or without rainfall data.More stringent extraction methods that require rainfall data would be expected to reduce uncertainty in recession analysis, as extracted recession periods with rainfall data can reasonably be expected to be a subset of those extracted without rainfall data.Without rainfall data, recession starts are typically identified by locating days with discharge peaks -that is, times when dQ/dt changes sign from positive to negative.However, some recession starts, while consistent with this definition, do not satisfy other important criteria for robust analysis and should be excluded.Rationales for exclusion might include discarding minor peaks that are small relative to measurement error, or which have dynamics that would be expected to be unresolvable on daily timescales, although few authors give a strong justification for their choices in this regard.For example, Ye et al. (2014) discard peak flows less than the 10th flow percentile to filter noise from small events.Mutzner et al. (2013) and Biswal and Marani (2010) choose only recession events where initial flow conditions are greater than mean annual flow in order to avoid "minor events, which may not have significantly increased the average soil saturation, thus not triggering a significant response of the groundwater".Without identifying some justifiable tolerance for noise associated with small peaks, or defining what constitutes a significant groundwater response, it is difficult to objectively determine a peak threshold below which recessions should be excluded from analysis.
To test the effect of peak filtering decisions on recession analysis, we implement a peak selection procedure that is sensitive to the "distinctness" of any given peak relative to the data around it (Yoder, 2009).Our scheme selects a peak if all of the following are true: (i) it is a local maximum; (ii) it is greater by some threshold amount (X) than the local minimum lying between it and the previously chosen peak; and (iii) discharge decays to a local minimum by the same threshold amount before the next greater local maximum is found.The peak extraction algorithm is illustrated in Fig. 4. We define the threshold as Here, d is a tunable parameter that we set to be 50 for highly selective extraction (S = 1; only larger, more distinct peaks are analyzed) and set to 500 for less selective extraction (S = 0, a broad range of peaks are analyzed).
In most studies, once a significant discharge peak has been identified, a recession start time, which is often lagged from the discharge peak, is chosen.The most commonly cited rationale for this lagged recession start is to ensure the dominance of groundwater dynamics in the recession signal, rather than overland flow processes (e.g., Biswal and Marani, 2014;Patnaik et al., 2015).Most event-scale recession anal- The empty star identifies a local maximum that will not be selected due to the fact that the subsequent recession does not decay by an amount X before the next local maximum.The filled star is selected as a recession peak because it is at least X greater than the local minimum between it and the previously selected recession peak (square), and is followed by a flow decrease of at least X.
yses lag recession starts by at least 1 day (Patnaik et al., 2015;Bart and Hope, 2014;Biswal and Marani, 2014), although it is not clear that such lagging is necessary to enable proper interpretation of event-scale dynamics (e.g., Harman et al., 2009).Fast-flow processes, as well as slow-flow processes, may also contribute to the hypothesized dynamics which generate power law recession behavior.For example, Harman et al. (2009) postulate that heterogeneous transport timescales alone give rise to power law recession dynamics, with no restriction on the "fastest allowable" timescale.Without a priori information that surface flow processes are a significant source of runoff generation in a watershed, lagging each recession start may be unnecessary.For example, no surface flow processes have been observed at the Elder Creek watershed in our collection of study watersheds; all runoff is generated by a highly responsive perched water table system (Salve et al., 2012).While adopting either approach -lagging the recession time or not -involves some risk, we seek and analyze distinct streamflow peaks without removing any days following the recession start.

Identifying the end of a recession event (C)
A number of criteria have been used to determine the end of a recession event.Without a reliable rainfall record, many event-scale analyses halt recession extraction upon the first day where flow does not decrease, that is, as soon as dQ/dt ≥ 0 (e.g., Mutzner et al., 2013).Vogel and Kroll (1996) define the recession end as the first day of increase in the 3-day moving average of streamflow.Shaw and Riha (2012) end the extracted recession 2 days before dQ/dt changes from negative to positive following a recession start.Some studies use the inflection point of the recession curve -the first day following a rainfall event for which the hydro-graph is concave down -to identify the start of the extracted recession (Singh and Stall, 1971;Wittenberg and Sivapalan, 1999).A similar concavity criterion, paired with the requirement of decreasing flow, could also be used to define the end of a recession event.Exploring every possible combination of the above (and other) methods would lead to an intractably large number of methodological combinations.We therefore define two consensus strategies derived from the above criteria.
The first (C = 0) considers a recession as any hydrograph segment with dQ/dt < 0 following an identified peak.The second, more restrictive strategy (C = 1) requires that the raw flow time series is strictly decreasing (again, dQ/dt < 0) and classified as concave up.A recession day is classified as concave up if either the raw time series or a 3-day averaged time series is concave up -that is, if the second difference of either the raw flow time series or a smoothed flow time series is greater than or equal to zero.The inclusion of the criterion based on the 3-day moving average has the effect of including days with small "bumps" in concavity in the raw time series, while consideration of the raw time series ensures inclusion of days immediately after sharply peaked events, which are often classified as convex by the smoothed time series.This simple criterion could serve as an improvement to methods that only require dQ/dt < 0, which could inadvertently extract highly convex recessions that are likely associated with continued rainfall.

Choosing a fitting procedure (L)
Fitting methods can be broken down into one of three categories: (i) linear regression or enveloping of a binned collection of [log(Q), log(−dQ/dt)] points (e.g., Kirchner, 2009;Parlange et al., 2001); (ii) linear regression or enveloping of a raw collection of [log(Q), log(−dQ/dt)] points (e.g., Brutsaert and Nieber, 1977;Biswal and Marani, 2010); or (iii) nonlinear regression (e.g., Wittenberg, 1994).Within these three general categories, a wide variety of specific regression techniques can be applied (e.g., Thomas et al., 2015;Zecharias and Brutsaert, 1988).Importantly, many of these approaches require a large number of data points and are thus unsuitable for event-scale methods.
For event-scale recession fitting, the most popular method is to find a regression line through raw [log(Q), log(−dQ/dt)] point data corresponding to each recession event.There is evidence, however, that nonlinear fitting methods produce more consistent values for recession parameter fits (Wittenberg, 1999;Chen and Krajewski, 2016).Moreover, nonlinear techniques have been used to successfully parameterize hydrologic models (Müller et al., 2014;Dralle et al., 2016), and to avoid numerical issues associated with computing the time derivative of a flow time series (Rupp and Selker, 2006b).
For the purposes of the present study, we again frame the problem in terms of the most fundamental methodolog-ical dichotomy between linear and nonlinear fitting.Linear fitting (L = 1) is performed on the log-transformed values, [log(Q), log(−dQ/dt)].Values of the flow derivative are computed for each 2-day window (days i and i − 1, with t = 1 day) over the duration of the recession as dQ/dt = (Q i − Q i−1 )/ t, with corresponding values of Q computed as the average flow value over both days, (Brutsaert and Nieber, 1977).Nonlinear fitting (L = 0) is performed using nonlinear least squares minimization on extracted, non-transformed recession segments.

Method combination comparisons
In general, only fitted exponents can be reliably compared between different recession events (e.g., Berghuijs et al., 2016;Sawaske and Freyberg, 2014).This is a consequence of a mathematical artifact that impacts fitted values of the recession scale parameter, arising when fitting power laws to datasets with arbitrarily chosen scaling (Dralle et al., 2015).The issue can be avoided by setting the recession exponent to a fixed value (e.g., the median; Biswal and Marani, 2010), but this comes at the expense of biasing the fitted values of a due to constraints on the exponent.Dralle et al. (2015) present a technique that removes the scaling artifact from the recession scale parameter without constraint on the recession exponent.
The scale correction procedure begins by first fitting each recession curve to obtain an initial population (of size n) of recession parameters a i and b i .The flow time series is then re-scaled by a constant, Q 0 , computed as where log a is the mean of the natural logarithm of the a i , and b is the mean of the b i .Following re-scaling, the flow time series is re-fit to the power law recession model.While the recession exponent is scale-independent, the recession scale parameter will be altered by the scaling procedure in such a way as to eliminate artifactual linear correlation between log a and b.The resultant population of recession scale parameters has units of inverse time and has been shown empirically to correlate strongly with measures of catchment wetness (Dralle et al., 2015).
With this in mind, we choose three primary recession measures for comparison between recession events: the recession exponent (b), the scale-corrected (Dralle et al., 2015) recession scale parameter (a), and the recession time (T R ), defined by Stoelzle et al. (2013) as the amount of time required for flow levels to decline from the median flow to the 10th flow percentile.The measure T R , which depends on both a and b, belongs to a class of widely calculated recession timescales for the general, nonlinear form of Eq. ( 1) (e.g., Stoelzle et al., 2013;Westerberg and McMillan, 2015).
To see how methodological choices might impact the interpretation of a, b, and T R , we organize our analysis around four primary questions, which were outlined in the introduction and are detailed in the following sections.

Research question 1 -how do methodological
choices impact the overall quality of individual recession fits?
Fit quality is one measure of confidence in the estimated value for each recession measure.Testing event-scale recession theories that predict specific values for recession measures (e.g., Biswal and Marani, 2010) can be expected to be constrained by the degree of this confidence.This question looks to identify method combinations that consistently produce high-quality fits, and thus high confidence in recession parameter estimates.
We report two measures of the overall quality of recession fits as a function of combinations of method choices.First we compute the mean average percent error (abbreviated as MAPE and denoted mathematically as E MAP ) for each method combination, across all catchments.MAPE is computed as where Q i and Qi are the observed and predicted flows on the ith day following the start of the recession event.(Note that comparing goodness of fit using R 2 is not appropriate, because one of our fitting methods is nonlinear (Kvalseth, 1985).)We also report, for each method, the percentage of all fits that yield "non-physical" estimates for the recession parameters, which we define as b < 0. In all subsequent analyses, the recession parameters are filtered so that b ≥ 0 (b < 0 occurs for less than 3 % of all recession events).

2.3.2
Research question 2 -are a, b, and T R "characteristic" across various methodological choices?
That is, do catchments rank in a similar order according to different statistical measures (in the present study, the median and interquartile range) of the populations of a, b, and T R across the 16 method combinations (cf.Stoelzle et al., 2013)?
The results of comparative hydrologic studies (e.g., Bogaart et al., 2016), which rely on relative relationships between recession measures, can be expected to be affected by any methodological sensitivity demonstrated here.
While Stoelzle et al. (2013) perform lumped recession analysis and obtain single recession parameter values for each catchment and method combination, our event-scale analysis yields distributions for a, b, and T R .We therefore report measures of central tendency and variability for the computed recession measures.We choose the median as a measure of central tendency and the interquartile range as a measure of variability, both of which are robust against the biasing effects of occasional outlier fits.Following Stoelzle et al. (2013), we compute Spearman rank correlation coefficients by ranking catchments between all method combination pairs based on the following measures (recession characteristics): median(a), median(b), median(T R ), IQR(a), IQR(b), and IQR(T R ), where IQR is the interquartile range.The rank correlation can take a value between −1 and 1, where a rank correlation of 1 indicates that two method combinations produce identical rankings and a rank correlation of −1 indicates that two method combinations produce exactly opposite rankings.
Even if the absolute magnitudes of the values of a, b, and T R vary between the method combinations, these rank tests will determine whether catchments rank in the same order by the recession characteristic for all methods.Determining the consistency of such ranked comparisons has implications for efforts to develop effective metrics for catchment classification, where relative differences in recession characteristics have been used to compare or categorize catchments (e.g Bogaart et al., 2016;Mutzner et al., 2013;Guzmán et al., 2015).

2.3.3
Research question 3 -for each catchment, are the empirical frequency distributions of a, b, and T R statistically similar across method combinations, and, if not, what method choices have the greatest impact on recession parameter distributions?
Event-scale theories of the streamflow recession suggest that, beyond measures of central tendency, higher-order moments of recession parameter distributions (such as the variance) should vary in systematic ways, depending on climate or catchment physiographic properties (Biswal and Nagesh, 2014;Harman et al., 2009).By addressing research question 3, we seek to identify the methodological choices which could most significantly impact testing of event-scale recession theories.
While shifts in the Spearman rank correlation between method combinations allow a comparative analysis of the effects of method choice, they do not provide information about variations in the specific values of the recession parameters obtained by each method.To address the specific values of the recession parameters, which is important for testing theories that make such specific predictions (Biswal and Marani, 2010;Brutsaert, 1994), we therefore also explore the empirical frequency distributions of parameter populations estimated with each methodological combination.
We first illustrate general patterns of a, b, and T R , across all method combinations with Tukey box plots for a single representative catchment -the Elder Creek watershed, a tributary of the Eel River in northern California.These plots provide visual representation of the observed difference between the character of recession measure distributions for different The recessions identified by the more restrictive method will be "shared" by the two methods, in the sense that they will by definition also be identified by the less restrictive method.
Recessions identified by only the less restrictive method (d) are classified as "unshared".method combinations.However, they do not represent the absolute effect of changing individual method choices.This is because more restrictive extraction procedures produce populations of recessions that are a subset of the populations generated by less restrictive extraction measures.For example, fixing all other method choices, recessions extracted with a minimum length of 10 days must be a subset of recessions extracted with a minimum length of 4 days.This "dilutes" the true effect of the shift in choice of minimum recession length on the recession measures derived from the two resulting populations.
One way to isolate the absolute effect of a given method choice is to compare recessions that are shared between the restrictive choice and non-restrictive choice, to those that are unshared between the restrictive and non-restrictive choices.This procedure is illustrated for the minimum length choice in Fig. 5. Here, the raw streamflow data (Fig. 5a) are subjected to extraction procedures with a minimum length of 4 days (Fig. 5b) and with a minimum length of 10 days (Fig. 5c).All other method choices are fixed.Clearly the 10-day minimum length recessions are a subset of the 4-day minimum length.Two distinct groups can then be formed: a set of recessions shared between the two extractions (Fig. 5c) and a set of unshared recessions (Fig. 5d; those extracted by the minimum 4-day extraction, but not the 10-day ex-Hydrol.Earth Syst.Sci., 21, 65-81, 2017 www.hydrol-earth-syst-sci.net/21/65/2017/ traction).Differences between these disjoint "shared" and "unshared" sets of recessions embody the absolute effect of an individual method choice on a recession measure.Recession measures between the two groups should be comparable if the particular recession measure is not sensitive to the method choice.We compare shared and unshared recession measure distributions in two ways.First, for a high-level overview, we show Tukey box plots of shared vs. unshared distributions of the recession exponent (b) for a single catchment (the Elder Creek watershed) for two recession extraction choices (minimum length and concavity).We then use a two-sided Mann-Whitney U test (Mann and Whitney, 1947) to compare shared vs. unshared distributions for each recession measure across all method choices and all catchments.The null hypothesis for this non-parametric test is that the shared and unshared distributions are sampled from the same population.For a given catchment and for each method choice, we compute p values for the Mann-Whitney U test by comparing shared to unshared distributions for each of the eight combinations of the other method choices.If the test rejects the null hypothesis, then we conclude that the method choice significantly changes the distribution of the recession measure (note: we also compare populations between linear and nonlinear fitting; although it is not a "shared" vs. "unshared" comparison, the two fitting methods nonetheless produce distinct distributions for the recession measures).
For each catchment, we then rank the four method choices by the number of Mann-Whitney U tests (eight tests for each method choice) that detect significant differences between shared and unshared distributions.A method choice is assigned a higher rank if more shared/unshared comparisons detect significant differences.We use this rank as an indicator of the sensitivity of a recession measure to a given method choice.We perform this procedure for all recession measures, a, b, and T R .Since each measure requires 512 total comparisons (16 catchments × 8 tests × 4 method choices), we apply a Bonferroni correction for the critical p value of each test, which is required when a statistical test is applied many times for multiple comparisons (Abdi, 2007).For an overall level of significance of α = 0.05, the correction requires a critical p value for each test set to p = α/512.

Research question 4 -can methodological choices affect features of the observed relationship between measures of catchment wetness and the recession scale parameter?
Overall, few studies have attempted to tease apart the convergent predictions of power law recession theories.Some works informed by Biswal and Marani (2010) demonstrate a relationship between measures of antecedent catchment wetness and the power law scale parameter (e.g., Bart and Hope, 2014;Biswal and Nagesh, 2014;Patnaik et al., 2015), although explicit connections to wetted channel network expansion and contraction still require elucidation (Ghosh et al., 2016).Whatever its physical basis, we observe similar correlations between measures of antecedent wetness and the scale-corrected recession scale parameter.To demonstrate that method choices can significantly impact the quantitative nature of such emergent relationships, we explore the functional relationship between the recession scale parameter and a measure of antecedent wetness for the Elder Creek catchment for three methodological combinations.The antecedent wetness measure (W ) is computed as a weighted sum of streamflow prior to each recession event: where i is the number of days prior to the start of the recession event.The weighting coefficient, 0.95 i , is included to discount the effect of less recent events on the catchment wetness state.Following recession extraction and fitting, a regression line is fit to observed log-log linear relationships between a and W and the resulting regression slopes are compared between the three method combinations.

Results
While the lengths of record for study catchments vary from 35 to 99 years, we find that subsetting flow records and reperforming analyses does not significantly impact our findings.We also find that, at confidence level p = 0.05, approximately 6 % of the (16 catchments × 16 method combinations × 3 recession measures) = 768 populations of recession measures exhibit significant trends over time.At a confidence level of 0.05, one would expect 5 % of the tests to flag significance purely by chance.We conclude that any potential trends in recession parameters over time will have a minimal impact on the results of this study.

Recession fit quality
The box plots in Fig. 6 are generated using computed MAPE values for all fits across all catchments.The boxes are grouped using certain combinations of linearity and concavity, the two method choices found to be the strongest controllers of fit quality.There is a clear increase in fit quality associated with extraction of concave-only recessions and use of nonlinear fitting procedures.

Catchment rankings by recession characteristic
Catchments were ranked by the values of six recession characteristics -median(a), median(b), median(T R ), IQR(a), IQR(b), and IQR(T R ) -for all pairs of method combinations.The collection of corresponding Spearman rank correlations is presented as box plots in Fig. 7.We performed a thorough investigation of the rank correlations between different method combinations across all 16 study catchments but found few patterns related to individual method choices.Therefore, we present aggregated box plots of the Spearman rank correlation for each of the recession characteristics.Overall, none of the rank correlations were negative.The most "characteristic" measure, in the sense that its ability to rank catchments is least sensitive to the method choice, is median(a).

Recession measure distributions
Figure 8 presents box plots of the three recession measures across all method combinations for the Elder Creek study catchment.For any given method combination, variability in the recession measures can be significant.The recession exponent b regularly falls between b = 1 and b = 2.5, while interquartile ranges for a and T R typically span upwards of an order of magnitude.Figure 9 is provided to help illustrate the comparisons between shared and unshared distributions of recession measures.In this case, we compare recession exponent shared vs. unshared distributions for the Elder Creek watershed for the minimum recession length method choice and the concavity method choice.The light green boxes represent the distribution of the recession exponent for shared recessions, while the dark green boxes represent the unshared population of recession exponent values (extracted by only the less restrictive procedure).The horizontal axes in each subplot enumerate the eight possible combinations of the other method choices, showing how these shared and unshared distributions vary for different combinations of the other method variables.Significant differences between the shared and unshared distributions as determined by a two-sided Mann- Whitney U test are indicated with red highlighting in Fig. 9. Two of the eight unshared/shared distribution pairs are identified as significantly different for the minimum length method choice, while eight of the eight pairs are identified as different for the concavity choice.
The results from Fig. 9 feed into the larger shared vs. unshared analysis for all recession measures across all watersheds presented in Fig. 10.This figure plots outcomes of all Mann-Whitney U tests between shared and unshared distributions for each recession measure, for all method choices, and for all catchments.The Elder Creek (catchment number 11475560) recession exponent distributions in Fig. 9 correspond to the recession exponent subplot (center) of Fig. 10, row 11475560, columns M and C. In agreement with Fig. 9, the recession exponent is most significantly affected by the choice to extract only concave recessions, and less so by the minimum recession length.The strong dependence on concavity demonstrated in Fig. 9 manifests in Fig. 10 as the very dark rectangle in the concavity column of the recession exponent parameter for the Elder Creek catchment, gage number 11475560.

Catchment wetness and the recession scale parameter
Figure 11 plots the logarithm of the recession scale parameter against logarithm of the antecedent wetness variable (W ) for three method combinations.The first and second plots are less restrictive with respect to recession length and peak size; the first plot extracts concave recessions and uses nonlinear fitting, while the second plot extracts decreasing recessions with linear fitting.The third plot uses a highly selective extraction procedure and fits the recession model with nonlinear regression.All plots demonstrate a decreasing log-log linear relationship between the antecedent wetness measure and the recession scale parameter.

Recession fit quality
The pattern in Fig. 6 hints at a hierarchy of the importance of method choices in terms of their impact on the "quality" of extracted recessions and their corresponding power law fits.Specifically, we found that the concavity (C) and linearity (L) method choices roughly subdivide the results into three groups.The worst fits observed were those performed without the concavity requirement (only the decreasing requirement) and with linear regression.Fits that use concave recessions or nonlinear fitting, but not both, are of intermediate quality.The best fits are those that combined the concavity requirement with nonlinear regression.Overall, the results suggest an additive increase in goodness of fit associated with the concavity requirement and nonlinear fitting.
The finding that concavity and linearity play primary roles in determining the quality of recession fits is notable in light of the fact that minimum recession length and minimum recession peak size are more commonly emphasized as the most important methodological choices made during eventscale recession analysis (e.g., Biswal and Marani, 2010;Patnaik et al., 2015;Mutzner et al., 2013).We did not find that these method choices were important determinants of fit quality.Evidence here suggests concavity requirements and nonlinear fitting greatly improve the quality of event-scale recession analyses and that these improvements are additive when we impose these methodological choices together.In .Each cell is colored by a sensitivity rank.A cell shading of 4 (darkest) means that method choice had the highest number of significantly different shared and unshared distributions for that recession measure in that catchment, indicating that the particular recession measure is highly sensitive to the corresponding method choice.
fact, the often-used definition of flow recession -that the flow derivative is negative -could be misleading; the simple dynamical system model developed by Kirchner (2009) predicts that streamflow can decrease during precipitation events.The use of improved, flow-derived recession extraction methods, such as the concavity requirement, could reduce the frequency of "false" recession extraction, increasing the quality of recession measure estimates.
Beyond the tendency to produce lower-quality fits, the linear fitting procedures applied in the majority of recession studies have other well-documented drawbacks.Linear regression on log-transformed flow values disproportionately weights errors for smaller model values, creating a risk of bias in the fit (Miller, 1984;Pattyn and Van Huele, 1998).Linear fitting also requires computation of the flow derivative, which introduces a number of documented numerical and data quality challenges (Rupp and Selker, 2006b).The various differencing schemes that can be implemented to obtain the flow derivative (e.g., Thomas et al., 2015) add another potential source of method dependent bias in the fitting scheme.There are downsides, however, associated with nonlinear fitting (Motulsky and Ransnas, 1987).Fit bias may be introduced by the optimization algorithm used, or the necessity of specifying an initial condition for the nonlinear fitting procedure.Choice of initial values can be relatively clear for recession measures like b that can be expected to have tightly constrained values, but for other more variable recession measures, such as a, this choice could also be opaque, and differing initial conditions could lead to differing recession parameter estimations.

Recession measures are characteristic
We find that the medians and IQRs of a, b, and T R are all fairly characteristic, though to varying degrees.All rank correlations are positive, indicating that, at worst, no method combination predicts a characteristic ranking that is inverted relative to another method combination.Median(a) is more characteristic than median(b) or median(T R ), and each IQR is less characteristic than its corresponding median.One might expect that median(a) is highly characteristic because it spans many orders of magnitude, while other parameters are more tightly constrained.However, the derived measure T R also has a wide range but does not display the same level of rank stability as a.
The relatively stable ranking of catchments by recession measures has potential implications for testing event-scale recession theory.Recent work by Harman et al. (2009) hypothesizes that b can be interpreted as a measure of the diversity of water transport timescales throughout the various parts of the catchment.In this framework, measures of variability of b could be interpreted as representative of the "realizable" range of catchment states, with respect to the relative dominance of various water transit times in the catchment.Strongly characteristic measures of b suggest the potential to use the recession exponent to develop relative measures of catchment complexity, if the Harman et al. (2009) theory applies to catchment populations.
Results also provide support for application of the recession scale parameter scale-correction procedure presented by Dralle et al. (2015).Medians of the scale-corrected recession scale parameters rank catchments more consistently than all other recession characteristics.Moreover, the fact that a has units of inverse time suggests it can be interpreted physically in a manner similar to more commonly computed response timescales, such as T R (e.g., Stoelzle et al., 2013;Westerberg and McMillan, 2015).In fact, the median and IQR of T R are the least consistent catchment ranking characteristics.Considering that T R is a measure derived from both a and b, it has likely inherited catchment ranking uncertainty from both these parameters.Numerous derived recession measures have been used for comparative catchment analysis (Sawaske and Freyberg, 2014;Berghuijs et al., 2016;Stoelzle et al., 2013), and the findings here suggest a trade-off; the development of more complex derived measures comes at the risk of compounding uncertainty.

Comparing distributions of recession measures
Patterns in the recession measures for Elder Creek plotted in Fig. 8 are generally representative of the patterns observed in the other 15 study catchments.The four-step repeating pattern for b seen in Fig. 8 indicates that concavity and linearity play important roles in shifting the distributions of the recession exponent.When other methodological choices are fixed, nonlinear fitting and concavity both produce noticeably higher values for the recession exponent.Without the concave requirement, the "decreasing only" extraction procedures will produce lower values due to decreased concavity.Table 2 supports this conclusion; the concavity requirement greatly decreases the number of "non-physical" (b < 0) extracted recessions.
The recession scale parameter (a) varies most strongly with minimum length and selectivity, and the recession time (T R ) varied most strongly with linearity and concavity.The latter likely results from the fact that T R is a measure derived from the recession exponent, and thus inherits b's strong sensitivity to the concavity and linearity method choices.The degree of variability in a is comparable to eventscale recession studies that impose a fixed value on the recession exponent (e.g., Shaw and Riha, 2012).While the scale-correction procedure for a has only been applied in one previous study (Dralle et al., 2015), the median values of scale-corrected a are consistent with inverse reces- sion timescales (commonly referred to as the "recession constant") extracted from linear reservoir models (e.g., Sánchez-Murillo et al., 2015;Botter et al., 2013).The observed median values of b and T R are also consistent with those typically found in lumped recession analyses (e.g., Tague and Grant, 2004;Palmroth et al., 2010;Szilagyi et al., 2007;Wang, 2011;Stoelzle et al., 2013;McMillan et al., 2014).
A pattern of shorter whiskers from left to right in Fig. 8 shows that the variability of the recession measures decreases as extraction procedures become more restrictive.For a minimum recession length of 10 days and highly selective peak filtering (M = 1, S = 1), this decrease in variability is likely due to the fact that the collection of extracted recessions becomes less "diverse" as the extraction method becomes more restrictive, as suggested by Stoelzle et al. (2013).As compared to minimum length and peak selectivity, which had a minimal impact on fit quality (see Fig. 6), the larger variability for non-concave data paired with nonlinear fitting is due, at least in part, to more noise from persistent rainfall during the recession.This suggests that peak size and recession length data quality concerns cited by some authors (e.g., Ye et al., 2014;Shaw, 2016) could be augmented to consider fitting methods and the "quality" of the shape of extracted recessions.
Patterns displayed in Fig. 9 are largely similar to distributions of b in other watersheds.Whiskers are shorter for "shared" distributions for minimum length (as well as for selectivity, although it is not plotted here to simplify the presentation).This could be because very large storms and very long recessions, which are more likely to be shared, represent the asymptotic behavior of the catchment response associated with more extreme states.Despite this disparity in variability between shared and unshared distributions, the Mann-Whitney U test identified only two of the eight distribution pairs as significantly distinct from one another, suggesting moderate sensitivity of b to the minimum recession length choice.The second subplot displays shared and unshared distributions for the concavity method choice, which again emerges as the most important choice for determining the absolute magnitude of b.The Mann-Whitney U test identified all eight distribution pairs as significantly distinct, indicating high sensitivity of b to the concavity method choice.
Whereas Fig. 6 indicated certain method choices play an important role in determining quality of fit, Fig. 10 demonstrates that other choices could play a more important role in determining realized values of a and b.This finding makes it difficult to determine the "best" method combination.Whereas concavity and linearity were the dominant drivers of goodness of fit, it is selectivity and concavity that exert the strongest control over the distribution of b.Minimum recession length seems to exert the strongest control over the distribution of the recession scale parameter (a).Along with the inconsistencies in controls on each recession measure, we also note that some recession measures are uniformly sensitive to a given method for all catchments (e.g., concavity strongly affects b for almost all catchments), while others seem to vary between catchments.For example, linearity exerts a strong control on the distribution of b for catchment 11468500 (Noyo River), but apparently makes very little difference for catchment 11143000 (Big Sur River).

The relationship between catchment antecedent wetness and the recession scale parameter
Despite moderate sensitivity to concavity and linearity in the Elder Creek data (catchment 11475560) displayed in Fig. 10, the first two fits in Fig. 11 are not statistically different as shown by 95 % confidence intervals for the fitted slopes.The slope on the third plot differs significantly from the first two, likely due to the fact that the population of recessions originates from a highly selective extraction procedure that triggers a's strong sensitivity to the minimum recession length method choice.This highlights the potential for recession extraction bias in parameter populations; without good cause to discard smaller or shorter recessions, such choices can lead to quantitatively different interpretations of recession parameter values.In general, Fig. 11 demonstrates that quantitative validation of recession theories that predict emergent relationships between recession measures and catchment state may be hindered by uncertainty relating to methodological choices.

Methodological recommendation
Results demonstrate that nonlinear fitting and concavity significantly increase recession fit quality, while minimum length and peak selectivity do not adversely affect it.The re-cession measure a is highly sensitive to the minimum length choice, and the recession exponent is highly sensitive to concavity and moderately sensitive to peak selectivity, as demonstrated by Figs. 8 and 10.Taken all together, we conclude that this suggests an ideal combination of method choices that will both maximize fit quality and minimize bias in the "type" of recession identified for fitting: 4-day minimum length (M = 0), permissive recession peak selectivity (S = 0), concavity requirement (C = 1), and nonlinear least squares fitting (L = 0).

Conclusion
This study quantified the sensitivity of the power law streamflow recession parameters a and b to four common methodological choices made during recession extraction and fitting.
While rankings of study catchments in terms of the descriptive statistics of a and b were relatively insensitive to the methods used, individual method choices did significantly impact observed parameter distributions, though each parameter had a distinct sensitivity profile.These results highlight the importance of accounting for methodological uncertainty when performing event-scale recession analysis.
among other criteria.4. The method of power law model fitting -numerous methods for fitting the power law recession model have been developed.Most such methods involve either some form of linear regression on the log-transformed version of Eq. (1) (log[−dQ/dt] = log a + b log Q), or nonlinear regression on the solution to Eq. (1).

Figure 2 .
Figure 2. Typical, highly erratic runoff time series for northern California coastal mediterranean watersheds.

Figure 4 .
Figure 4. Illustration of the peak extraction algorithm.The square represents the most recent recession peak identified for selection.The empty star identifies a local maximum that will not be selected due to the fact that the subsequent recession does not decay by an amount X before the next local maximum.The filled star is selected as a recession peak because it is at least X greater than the local minimum between it and the previously selected recession peak (square), and is followed by a flow decrease of at least X.

Figure 5 .
Figure5.Example recession extraction from the hydrograph (a) using a less restrictive method M = 0 (b) and a more restrictive method M = 1 (c).The recessions identified by the more restrictive method will be "shared" by the two methods, in the sense that they will by definition also be identified by the less restrictive method.Recessions identified by only the less restrictive method (d) are classified as "unshared".

Figure 7 .
Figure 7. Box and whisker plots of Spearman rank correlations for all six descriptive measures of the distributions of a, b, and T R .Per characteristic, there are 15 × 16 = 240 unique pairwise comparisons between method combinations.

Figure 8 .
Figure 8. Box plots for a, b, and T R across all method combinations for Elder Creek watershed.

Figure 9 .Figure 10 .
Figure9.Box plots comparing recession exponent shared vs. unshared distributions for minimum recession length and concavity method choices for Elder Creek.Each subplot corresponds to a particular method choice; the shared boxes are generated with the b values from the recessions shared between the more and less restrictive values of the method choice for that subplot.The unshared boxes are those values of b from the recessions extracted by only the less restrictive value of the subplot method choice.The independent axis enumerates the eight combinations of the method choices other than the subplot method choice.

Figure 11 .
Figure11.The recession scale parameter plotted against antecedent catchment wetness for three method combinations, together with a linear fit for each point cloud, and a 95 % confidence interval for each fitted slope.
To constrain the problem, we address, in the sim-

Table 2 .
Fraction of recessions with non-physical recession exponent (b < 0) for each method combination.