Methods for estimating mean transit times from chemical or isotopic tracers
(such as Cl

Seasonal tracer cycles in the two-box model are very poor predictors of mean
transit times, with typical errors of several hundred percent. However, the
same tracer cycles predict time-averaged young water fractions (

In a companion paper (Kirchner, 2016, hereafter referred to as Paper 1), I pointed out that although catchments are pervasively heterogeneous, we often model them, and interpret measurements from them, as if they were homogeneous. This makes our measurements and models vulnerable to so-called “aggregation error”, meaning that they yield inconsistent results at different levels of aggregation. I illustrated this general problem with the specific example of mean transit times (MTTs) estimated from seasonal tracer cycles in precipitation and discharge. Using simple numerical experiments with synthetic data, I showed that these MTT estimates will typically exhibit strong bias and large scatter when they are derived from spatially heterogeneous catchments. Given that spatial heterogeneity is ubiquitous in real-world catchments, these findings pose a fundamental challenge to the use of MTTs to characterize catchment behavior.

In Paper 1 I also showed that seasonal tracer cycles in precipitation and
streamflow can be used to estimate the young water fraction

But real-world catchments are not only heterogeneous. They are also nonstationary: their travel-time distributions shift with changes in their flow regimes, due to shifts in the relative water fluxes and flow speeds of different flowpaths (e.g., Kirchner et al., 2001; Tetzlaff et al., 2007; Hrachowitz et al., 2010; Botter et al., 2010; Van der Velde et al., 2010; Birkel et al., 2012; Heidbüchel et al., 2012; Peters et al., 2014). This nonstationarity is more than simply a time-domain analogue to the heterogeneity problem explored in Paper 1, because variations in flow regime may alter both the transit-time distributions of individual flowpaths and the mixing ratios between them. Intuition suggests that catchment nonstationarity could play havoc with estimates of MTTs, and perhaps also with estimates of the young water fraction.

This paper explores three central questions. First, does nonstationarity
lead to aggregation errors in MTT and thus to bias or scatter in MTT
estimates derived from seasonal tracer cycles? Second, is the young water
fraction

Schematic diagram of conceptual model. Drainage from the upper and
lower boxes is determined by power functions of the storage volumes

In keeping with the spirit of the approach developed in Paper 1, here I explore the consequences of catchment nonstationarity through simple thought experiments. These thought experiments are based on a simple two-compartment conceptual model (Fig. 1). This model greatly simplifies the complexities of real-world catchments, but it is sufficient to illustrate the key issues at hand. It is not intended to simulate the behavior of a specific real-world catchment, and thus its “goodness of fit” to any particular catchment time series is unimportant. Instead, its purpose is to simulate how nonstationary dynamics may influence tracer concentrations across wide ranges of catchment behavior and thus to serve as a numerical “test bed” for exploring how catchment nonstationarity affects our ability to infer catchment transit times from tracer concentrations. One can of course construct more complicated and (perhaps) realistic models, but that is not the point here. The point here is to explore the consequences of catchment nonstationarity, in the context of one of the simplest possible models which nonetheless exhibits a wide range of nonstationary behaviors.

The model catchment consists of two compartments, an upper box and a lower
box (Fig. 1). In typical conceptual models the upper box might represent
soil water storage and the lower box might represent groundwater, but for
the present purposes it is unnecessary to assign the two boxes to specific
domains in the catchment. The upper box storage

Discharge from both boxes is assumed to be non-age-selective, meaning that
discharge is taken proportionally from each part of the age distribution;
thus, the flow from each box will have the same tracer concentration, the
same young water fraction

The model is solved on a daily time step, using a weighted combination of the partly implicit trapezoidal method (for greater accuracy) and the fully implicit backward Euler method (for guaranteed stability). Details of the solution scheme are outlined in Appendix A.

Excerpts of daily precipitation records used to drive the model:

The drainage coefficients

The storages are initialized at the reference values

Here I drive the model with three different real-world rainfall time series,
representing a range of climatic regimes: a humid maritime climate with
frequent rainfall and moderate seasonality (Plynlimon, Wales; Köppen
climate zone Cfb), a Mediterranean climate marked by wet winters and very
dry summers (Smith River, California, USA; Köppen climate zone Csb), and
a humid temperate climate with very little seasonal variation in average
rainfall (Broad River, Georgia, USA; Köppen climate zone Cfa). Figure 2
shows the contrasting frequency distributions and seasonalities of the three
rainfall records. The Plynlimon rain gauge data were provided by the Centre
for Ecology and Hydrology (UK), and the Smith River and Broad River
precipitation data are reanalysis products from the MOPEX (Model Parameter Estimation Experiment) project (Duan et al.,
2006;

The model used here shares a similar overall structure with many other conceptual models (e.g., Benettin et al., 2013), with several simplifications. However, although the model used here is typical in many respects, I will use it in an unusual way. Typically, one calibrates a model to reproduce the behavior of a real-world catchment and then draws inferences about that catchment from the parameters and behavior of the calibrated model. Here, however, the model is not intended to represent any particular real-world system. Instead, the model itself is the system under study, across wide ranges of parameter values, because the goal is to gain insight into how nonstationarity affects general patterns of tracer behavior. Thus, the fidelity of the model in representing any particular catchment is not a central issue.

For the simulations shown here, the drainage exponents

My main purpose is to use the simple two-box model to explore how catchment nonstationarity affects our ability to infer water ages from tracer time series. I will take up that issue beginning in Sect. 3.3. As background for that analysis, however, it is helpful to first characterize the nonstationary behavior of the simple model system.

Illustrative time series from the two-box model, using the
reference parameter set and the Smith River (Mediterranean climate)
precipitation time series. Responses to precipitation events

Figure 3 shows excerpts from the time series generated by the model with the
Smith River (Mediterranean climate) precipitation time series and the
reference parameter set. One can immediately see that the upper and lower
boxes have markedly different mean ages (Fig. 3e), young water fractions
(Fig. 3d), and tracer concentrations (Fig. 3c), which also vary differently
through time. Tracer concentrations in the upper box (the orange line in
Fig. 3c) show a blocky, irregular pattern, remaining almost constant during
periods of little rainfall, and then changing rapidly when the box is
episodically flushed by large precipitation events. The lower box's tracer
concentrations (the red line in Fig. 3c) are much more stable than the upper
box's, because its mean residence time is roughly 40 times longer
(

In this regard, the most striking feature of Fig. 3 is the volatility of the tracer concentrations, young water fractions, and mean transit times in discharge (the dark blue lines in Fig. 3c–e), as the mixing ratio between the two boxes (Fig. 3b) shifts in response to precipitation events. This mixing ratio is not a simple function of discharge (Fig. 4c); instead it is both hysteretic and nonstationary, varying in response both to precipitation forcing and to the antecedent moisture status of the two boxes (and thus to the prior history of precipitation). This dependence on prior precipitation reflects the fact that the boxes typically retain their water age and tracer signatures over timescales much longer than the timescale of hydraulic response, because their residual storage is large compared to their dynamic storage (see Sect. 3.2). As a result, both the young water fraction and mean age of discharge and storage are widely scattered functions of discharge (Fig. 4a, b). Likewise, there is no simple relationship between either the young water fraction or mean age in storage and the corresponding quantities in discharge (Fig. 4d), although there is a strong overall bias toward water in discharge being much younger than the average water in storage.

Even though drainage from each box is non-age-selective (that is, the young water fraction and mean age in drainage from each box are identical to those in storage), this is emphatically not true at the level of the two-box system, because the two boxes account for different proportions of discharge than of storage. Furthermore, because the fractional contributions to streamflow from the (younger, smaller) upper box and the (older, larger) lower box are highly variable, the water age and young water fraction in discharge are not only strongly biased, but also highly scattered, indicators of the same quantities in storage (Fig. 4d).

The aggregate long-term implications of these dynamics are evident in the marginal (time-averaged) age distributions of storage and discharge (Fig. 5). From Fig. 5 it is immediately obvious that the age distributions in discharge are strongly skewed toward young ages, compared to the age distributions in storage, both for each box individually and for the catchment as a whole. This skew toward young ages arises for two main reasons. First, although drainage from each box is not age-selective, more outflow occurs during periods of stronger precipitation forcing and thus shorter residence times. Thus, the average ages of the outflow and the storage can differ greatly. Second, under high-flow conditions a larger proportion of discharge is derived from the upper box (which has a relatively short transit time), and at base flow more discharge is derived from the lower box (which has a larger volume and a relatively long transit time). Thus, the short-transit-time components of the system dominate the discharge, while the long-transit-time components of the system dominate the storage. As a result, the mean age in discharge will generally be much younger than the mean age in whole-catchment storage, and likewise the young water fraction in discharge will be much larger than the young water fraction in storage. Note that this is the opposite of what one would expect from conceptual models like those of Botter (2012), in which the mean water age in discharge either equals the mean age in storage (for well-mixed systems) or is older than the mean age in storage (for piston-flow systems).

Daily values of young water fractions

Marginal (time-averaged) age distributions in storage

More generally, and more importantly, these results imply that estimates of
water age in streamflow cannot be translated straightforwardly into
estimates of water age in storage. Instead, they may underestimate the age
of water in storage by large factors, although in the particular example
shown in Fig. 5, the difference is only about a factor of 2. Three closely
related theoretical functions have recently been proposed to quantify the
long-recognized (Kreft and Zuber, 1978) disconnect between the age
distributions in storage and in discharge. These include the time-dependent
StorAge Selection (SAS) function

A further implication of the analysis above is that the marginal age
distributions are not exponential, even for individual boxes, and even
though drainage from each box is not age-selective. In steady state,
non-age-selective drainage (i.e., the well-mixed assumption) would yield an
exponential distribution of ages in the upper box and in the short-time age
distribution in streamflow. However, when the system is not in steady state
and we aggregate its behavior over time, we are combining different age
distributions from different moments in time with different precipitation
forcing. This creates an aggregation error in the time domain, in the sense
that the steady-state approximation will be a misleading guide to the
non-steady-state behavior of the system,

One can further explore these issues by examining the marginal (time-averaged) age distributions for separate ranges of discharge (Fig. 6). Figure 6 shows that at higher discharges, age distributions in streamflow are much more strongly skewed toward younger ages, reflecting the increased dominance of the upper box at higher flows. For the upper half of all discharges, the age distributions are more skewed than exponential; that is, they plot as upward-curving lines in Fig. 6b. For the top 25 % of discharges, water ages follow approximate power-law distributions, plotting as nearly straight lines in Fig. 6c. The slopes of these lines are steeper than 1, however, implying that the distributions must deviate from this trend at very short ages; otherwise their integrals (i.e., their cumulative distributions) would become infinite. It is important to note the mean ages quoted in Fig. 6a imply that the tails of the distributions all extend far beyond the plot axes, which are truncated at 90 days. Note also that the distributions shown in Fig. 6 have different shapes in different flow regimes, suggesting that the model's high-flow behavior is not simply a re-scaled transform of its low-flow behavior.

The model's complex, nonstationary water age and tracer dynamics arise from the disconnect between the timescales of hydraulic response and catchment storage in each box, and from the divergence in both these timescales between the two boxes. These contrasting timescales can be estimated through simple scaling and perturbation analyses, as outlined in this section.

Total catchment storage consists of two components: the dynamic storage that is linked to discharge fluctuations through storage–discharge relationships like Eqs. (6)–(7), plus the residual or “passive” storage that remains when discharge has declined to very slow rates. The range of dynamic storage exerts an important control on timescales of catchment hydrologic response, while the much larger residual (or “passive”) storage has little effect on water fluxes but is an essential control on residence times (Kirchner, 2009; Birkel et al., 2011).

In real-world catchments, sharply nonlinear storage–discharge relationships
(Kirchner, 2009) guarantee that dynamic storage will be small compared to
residual storage. This behavior is mirrored in the model, where if
Eqs. (6) and (7) are strongly nonlinear (i.e., if the drainage exponents

Marginal (time-averaged) transit-time distributions (TTDs) for
selected ranges of daily discharges in the two-box model, with the reference
parameter set and Smith River (Mediterranean climate) precipitation forcing,
on linear

One can express this concept more quantitatively (though only approximately)
using a simple perturbation analysis. A first-order Taylor expansion of Eqs. (6)
and (7) shows directly that the fractional variability in drainage rates
and storage are related by the drainage exponents in the two boxes:

The perturbation analysis also yields estimates for the timescale of
hydraulic response (which controls how “flashy” the discharge will be),
through a rearrangement of Eqs. (8) and (9) as follows:

Equations (12) and (13) imply that the mean transit times in the upper and
lower boxes should be roughly 13 days (or 0.036 years) and 529 days (or
1.45 years), respectively, in good agreement with the mean transit times of 0.03 and
1.44 years determined from age tracking (Fig. 5d). However, Eqs. (10) and (11)
imply that these transit times will differ by factors of 10 and 20 (the
values of

The analysis above shows that the simple two-box model gives hydrograph and tracer behavior that is complex and nonstationary (Figs. 3–6). Furthermore, even this simple five-parameter model exhibits strong equifinality (Appendix B). Much of this equifinality can be alleviated (compare Figs. B1 and B2) through parameter transformations based on the perturbation analysis outlined above. However, because the timescales of catchment storage and hydraulic response are controlled by different combinations of parameters, parameter calibration to the hydrograph cannot constrain the storage volumes or streamwater age (Figs. B2, B3). These model results demonstrate general principles that have been recognized for years: (a) the hydrograph responds to and, thus, can help to constrain dynamic storage but not passive storage; and (b) because passive storage is often large, timescales of hydrologic response and catchment water storage are decoupled from one another, such that water ages cannot be inferred from hydrograph dynamics. Thus, for understanding how catchments store and mix water, tracer data are essential.

But how should these tracer data be used? One approach is to explicitly include tracers in a catchment model and calibrate that model against both the hydrograph and the tracer chemograph (e.g., Birkel et al., 2011; Benettin et al., 2013; Hrachowitz et al., 2013). The usefulness of that approach depends on whether the model parameters can be constrained and, more importantly, whether the model structure adequately characterizes the system under study (which is usually unknown, and possibly unknowable). Except in multi-model studies, it will be unclear how much the conclusions depend on the particular model that was used and on the particular way that it was fitted to the data. Furthermore, adequate tracer data for calibrating such models are rare, particularly because dynamic models require input data with no gaps. The mismatch between model complexity and data availability means that, in some cases, all the data are used for calibration and validation must be skipped, leaving the reproducibility of the model results unclear (e.g., Benettin et al., 2015).

For all of these reasons, there will be an ongoing need for methods of
inferring water ages that have modest data requirements and that are not
dependent on specific model structures and parameters. Sine-wave fitting of
seasonal tracer cycles, for example, is not based on a particular
mechanistic model but, instead, is based on a broader conceptual framework in
which stream output is some convolution of previous precipitation inputs.
That premise is of course open to question but, nevertheless, seasonal tracer
cycles (of, e.g.,

As detailed more fully in Paper 1, the seasonal tracer cycle method is based
on the principle that when one convolves a sinusoidal tracer input with a
TTD, one obtains a sinusoidal output that is
damped and phase-lagged by an amount that depends on the shape of the TTD
and also on its scale, as expressed, for example, by its MTT.
Conventionally one assumes an exponential TTD, which is the
steady-state solution for a well-mixed reservoir. More generally, one might
assume that transit times are gamma-distributed, recognizing that the
exponential distribution is a special case of the gamma distribution (with
the shape factor

The procedure is as follows. One first measures the amplitudes and phases of
the seasonal tracer cycles in precipitation and streamflow using
Eqs. (4)–(6) of Paper 1. If one assumes an exponential TTD, one can estimate the
MTT directly from the amplitude ratio

Paper 1 shows that both of these MTT measures are extremely vulnerable to
aggregation bias in spatially heterogeneous catchments. Therefore, Paper 1
proposes an alternative measure of travel times: the young water fraction

Young water fractions (

One can use seasonal tracer cycles to infer the young water fraction
following either of two strategies. As shown in Sect. 4.1 of Paper 1, in
many situations

These methods for inferring the young water fraction

Figure 7 shows the true young water fractions

One additional complication in nonstationary situations, compared to the
time-invariant examples explored in Paper 1, is that the young water
fraction

The underestimation bias in

Panels g–i of Fig. 7 compare the MTT in streamflow with estimates of MTT as they
are conventionally calculated, that is, from the seasonal tracer cycle
amplitude assuming an exponential TTD. These plots show that these
conventional estimates are subject to a strong underestimation bias, which
can exceed an order of magnitude. Some of the MTT estimates do fall close to
the 1 : 1 line, but these are mostly cases in which the partition
coefficient

The implication of Fig. 7g–i (and of Paper 1) is that many of the MTT values in the literature are likely to be underestimated by large factors and, thus, that real-world catchment MTTs are likely to be much longer than we thought. This observation raises the question: where is all that water being stored? In steady state, the storage volume must equal the discharge multiplied by the MTT (see Sect. 3.2). Thus, if we have been underestimating MTTs by large factors, then we have also been underestimating catchment storage volumes by similar multiples. Where is the storage volume that can accommodate all this water?

One possible answer is that in a non-steady-state system, the MTT decreases
with increasing discharge (e.g., Fig. 4b), and the storage volume equals the
discharge multiplied by the

It is important to recognize that the predicted

Figures 3 and 4 show that high-flow periods are characterized by shorter mean transit times and higher young water fractions, reflecting the increased dominance of drainage from the upper box with its younger water ages. Although instantaneous transit-time distributions (TTDs) can be highly variable and, thus, instantaneous mean transit times and young water fractions can exhibit scattered relationships with discharge (Fig. 4), the marginal (time-averaged) TTDs in Fig. 6 clearly show a systematically stronger skew toward younger water ages in higher ranges of streamflow. Thus, as Fig. 6 shows, the TTD varies in shape, not just in scale, between different flow regimes.

This observation leads naturally to the question of whether these variations in TTDs are also reflected in streamflow tracer concentrations and whether those tracer signatures can be used to draw inferences about the TTDs that characterize individual flow regimes. Figure 3 shows that high-flow periods typically exhibit wider variations in tracer concentrations, reflecting greater contributions from the upper box, which has shorter residence times and thus more labile tracer concentrations than the lower box does. To test how systematic these variations in concentrations are, I ran the model with the reference parameter set and Plynlimon (temperate maritime) precipitation forcing and separated the resulting time series into six discharge ranges. Figure 8 shows these six discharge ranges and the corresponding tracer concentrations in dark blue, superimposed on the entire discharge and concentration time series in light gray. As Fig. 8 shows, seasonal tracer cycles at higher flows are systematically less damped and phase-shifted (relative to the tracer cycle in precipitation, shown by the dotted gray line), implying shorter MTTs and larger young water fractions.

To test whether these changes in the seasonal tracer cycles are
quantitatively consistent with the shifts in water age across the six flow
regimes, I fitted sinusoids separately to the tracer concentrations in each
individual discharge range (Fig. 8). I compared these with a single sinusoid
fitted to the entire precipitation tracer time series (because it is not
possible to assign discrete precipitation events to individual discharge
ranges). From the resulting amplitude ratios and phase shifts for each
discharge range, I then estimated

To test whether this result is general, I repeated this thought experiment
for 200 random parameter sets and all three precipitation drivers. The
results are shown in Fig. 10, with each discharge range plotted in a
different color. The colors overlap because the discharge ranges,

Daily discharges (left panels) and tracer concentrations (right panels) in streamflow from the two-box model with reference parameter values and Plynlimon precipitation forcing. Individual discharge ranges and corresponding tracer concentrations are highlighted in dark blue. In the right-hand panels, precipitation tracer concentrations are shown by dashed gray lines and sinusoidal fits to streamflow tracer concentrations are shown in light blue. At higher discharges, tracer cycles are less damped and less phase-shifted, indicating greater fractions of young water in streamflow.

Paper 1 explored whether mean travel times and young water fractions can be
reliably inferred from tracer dynamics in spatially heterogeneous (but
stationary) catchments, composed of diverse subcatchments with different
(but time-invariant) TTDs. The sections above have presented a similar
analysis for nonstationary (but spatially homogeneous) catchments. However,
real-world catchments are not

Time-averaged, flow-specific young water fractions

Young water fractions (

Scheme for simulating spatially heterogeneous catchments with
nonstationary tributary subcatchments. A single precipitation time series

As illustrated in Fig. 11, I ran eight copies of the nonstationary model developed in Sect. 2, representing eight different tributaries, each with a different, randomly chosen parameter set. I chose the number eight to provide a reasonable degree of complexity and heterogeneity while preserving a reasonable degree of computational efficiency. I supplied the same precipitation forcing (Fig. 11a) to all eight models (Fig. 11b) to simulate the behavior of the eight hypothetical tributary streams (Fig. 11c). I then simulated the merging of these streams by averaging their discharges, and taking volume-weighted averages of their tracer concentrations, young water fractions, and water ages (Fig. 11d). Because the instantaneous flows from the eight tributaries vary differently through time, their mixing ratios also fluctuate. The individual random parameter sets create a wide range of model structures at the whole-catchment level, since the eight parallel subcatchments in Fig. 11 jointly comprise a 16-box, 40-parameter model incorporating wide ranges of large and small reservoirs with varying degrees of nonlinearity.

In any spatially heterogeneous catchment (which is to say, any real-world catchment), one will typically only have observations from the merged whole-catchment streamflow (i.e., the blue time series in Fig. 11d). One will typically have no information about the behavior of the individual tributaries (i.e., the colored time series in Fig. 11c), and if one did, then those tributaries would themselves have their own spatially heterogeneous tributary streams or flowpaths, and so on. Thus, the heterogeneity of any real-world catchment will remain poorly quantified (and possibly even unrecognized), and rigorously reductionist attempts to fully characterize such complex multiscale heterogeneity would be impractical.

Thus, we face the problem: how much can we infer from the behavior of the
merged whole-catchment streamflow, given that it originates from processes
that are heterogeneous and nonstationary (to a degree that is unknown and
unknowable)? Figure 12 explores this general question in the specific
context of young water fractions and mean travel times, presenting results
from 200 iterations of the heterogeneous nonstationary model shown in Fig. 11
with all three precipitation drivers. In Fig. 12 the merged streamflow is
separated into discrete flow regimes, following the approach outlined in
Sect. 3.4. As Fig. 12 shows,

Figure 12 is analogous to Fig. 10, with the difference that Fig. 10 shows
model runs for individual random parameter sets, whereas Fig. 12 shows
results from eight runs merged together. Merging the model outputs will tend
to average out the idiosyncrasies of the individual parameter sets, which is
why the clusters of points in Fig. 12 are more compact than the
corresponding point clouds in Fig. 10. As a result, the individual discharge
ranges overlap less in Fig. 12 than in Fig. 10. The compact scatterplots
shown in Fig. 12 show only small deviations from the 1 : 1 line for estimates
of the young water fraction

The results reported above, together with the results reported in Paper 1,
show that unlike mean transit times, young water fractions can be estimated
reliably from seasonal tracer cycles in catchments that are spatially
heterogeneous, nonstationary, or both. These findings then raise the obvious
question: we can measure young water fractions reliably, but what are they
good for? One answer is that young water fractions can be considered as a
catchment characteristic, analogous (but far from equivalent) to MTT. In
theory MTT should be particularly useful as a catchment descriptor, because
the MTT times the mean annual discharge yields the total catchment storage.
But because estimates of MTT will often be substantially in error, estimates
of catchment storage derived from MTT are likely to be equally unreliable.
If the shape of the TTD were known, of course,
there would be a clear functional relationship between MTT and

Because the young water fraction is indifferent to the age of the older
water, it cannot be used to estimate residual storage. What

One can use

Actual and inferred young water fractions (

Correlations between flow-weighted young water fractions

Concentrations of reactive chemical species as functions of
discharge (left panels), young water fractions (middle panels), and
reciprocal young water fractions (right panels) for streams draining three
contrasting catchments at Plynlimon, Wales. Symbols show means for 20 %
intervals of each catchment's discharge distribution, and error bars
indicate

The young water fraction

Figure 14 illustrates a preliminary proof of concept for this approach,
based on 20–28 years of weekly precipitation and streamflow samples from
three catchments at Plynlimon, Wales (Neal et al., 2011) with contrasting
geochemical behavior. I separated the streamflow samples into five discharge
ranges (lowest 20 %, next 20 %, and so on), then fitted the
seasonal chloride concentration cycles in each discharge range and
calculated the corresponding young water fractions using the approach
outlined in Sect. 3.4. I then examined the relationships between these
young water fractions and the mean streamwater concentrations of reactive
chemical species in each discharge range. Figure 14 shows three different
views of how reactive tracer chemistry varies with discharge across the
three catchments. The left-hand panels show the average concentrations in
each discharge range, as functions of the logarithm of discharge. The middle
panels show the same concentrations as functions of the inferred

The three catchments are characterized by contrasts in soil hydrology, with
the abundance of impermeable gley soils and boulder clay tills increasing in
the rank order Hafren

It is tempting to interpret the concentration differences between the young and old end-members as reflecting chemical kinetics, but this should be approached with caution. A kinetic interpretation makes sense if the young and old end-members differ only in age (albeit by an unspecified amount since we cannot know how old the “old” end-member is), but not if they differ in other respects as well. At Plynlimon, for example, porewaters in the acidic soil layers have relatively high concentrations of aluminum and transition metals, and relatively low concentrations of base cations and silica, whereas waters infiltrating deep into the fractured bedrock react with calcite and layer lattice silicates and thus become enriched in base cations and silica, and depleted in aluminum and transition metals (Neal et al., 1997). Thus, one must also consider the alternative hypothesis that the young end-member represents mostly soil water, that the old end-member represents mostly deeper groundwater, and that the two end-members exhibit different chemistry because of their sources rather than their ages. In this case, the end-member compositions identified through plots like Fig. 14 may help in characterizing the chemistries, and thus localizing the physical sources, of the young and old waters. In this proof-of-concept example, all three catchments appear to have geochemically similar young water end-members, with a composition suggesting a shallow soil source, but each has a different old water end-member, suggesting deeper groundwater sources with differing amounts of carbonate minerals. This is consistent with independent geochemical evidence at Plynlimon (Neal et al., 1997).

It is also important to note that if the ideal end-member mixing assumptions
hold (i.e., the young and old end-members are invariant, and the mixture
undergoes no further chemical reactions), then the mixing relationships in
the middle plots of Fig. 14 should be straight lines, and they should
extrapolate to physically realistic (non-negative) concentrations at both

It is important to recognize that the inferred young water fractions

When these results are applied in practice, however, one must keep in mind
that in contrast to typical field studies, these thought experiments are
based on synthetic data sets that are dense (daily measurements for
10 years) and error-free. Furthermore, these thought experiments use a
sinusoidal precipitation tracer signal that varies only seasonally, with no
confounding variation on shorter or longer timescales. Further benchmark
testing will be needed to test the accuracy of

One can of course also question the realism of the particular model that I
have used for these thought experiments. This model can be calibrated to
reproduce the stream discharge with a Nash–Sutcliffe efficiency (NSE) of better
than 0.85 at two of the three sites, but there is no guarantee that it is
getting the right answer for the right reasons. All models – whether lumped
conceptual models or “physically based” spatially explicit models – necessarily
involve approximations and simplifications. In plain language:
any model, including this one, incorporates assumptions that are false and
are known to be false. One obvious idealization (a less euphemistic word
would be

What is different, however, is that here the model is used for purposes that make its literal realism unnecessary. Typical modeling studies draw conclusions about real-world systems from model behavior; thus, those conclusions depend critically on the realism of the model. Here, the primary goal is not to test how catchments work but instead to test specific methods for inferring water ages from complex, nonstationary time series of tracer concentrations. All the model must do is generate outputs with reasonable degrees of complexity and nonstationarity; it is not essential that the model generates these time series by the same mechanisms that real-world catchments do. The only inductive leap is the inference that if a method correctly infers water ages from tracer patterns in these complex, nonstationary time series, it will also correctly infer water ages in complex, nonstationary time series generated by real-world catchments.

It is important to highlight an essential difference between the approach
developed here and typical studies that infer water ages or transit-time
distributions from calibrated models (e.g., Birkel et al., 2011; Van der
Velde et al., 2012; Heidbüchel et al., 2012; Hrachowitz et al., 2013;
Benettin et al., 2013, 2015). When one draws inferences
from a model, their validity depends on whether that model is structurally
adequate and whether its parameter values are realistic, both of which are
usually in doubt. Here, by contrast, I have developed an inferential method
(for estimating the young water fraction

The results reported here, together with those in Paper 1, show that MTTs cannot be estimated reliably by fitting sine waves to
seasonal tracer cycles from nonstationary or spatially heterogeneous
catchments. These results do not imply that other methods for estimating
MTTs are any better; instead, they imply only that sine wave fitting has
been subjected to rigorous benchmark testing and has failed. The other
methods have not yet been similarly tested, and it is unclear whether they
too will fail. Efforts to fill this knowledge gap are underway. But in the
meantime, ignorance is not bliss; one should not simply assume that these
other methods work as intended, just because they have not yet been
rigorously tested. In that regard, the most general contribution of this
analysis is not that it reveals specific problems with MTT estimation from
seasonal tracer cycles, or that it demonstrates the reliability of

The age of streamflow – i.e., the time that has elapsed since it fell as
precipitation – is an essential descriptor of catchment functioning with
broad implications for runoff generation, contaminant transport, and
biogeochemical cycling (Kirchner et al., 2000; McGuire and McDonnell, 2006).
The age of streamflow is commonly measured by its MTT,
which in turn has often been estimated from the damping of seasonal cycles
of chemical and isotopic tracers (such as Cl

Here I have explored how catchment nonstationarity affects estimates of MTT
and

Marginal (time-averaged) age distributions in drainage are skewed toward
younger ages than the storage distributions they come from, because storage
is flushed more quickly (and thus is younger) during periods of higher
discharge (Fig. 5). The age distributions in whole-catchment storage and
discharge are approximate power laws, with markedly different slopes (Fig. 5).
The age distribution in streamflow becomes increasingly skewed at higher
discharges, with a marked increase in the young water fraction and decrease
in the mean water age (Fig. 6), reflecting the increased dominance of the
upper box at higher flows. Flow-weighted average MTTs are typically close
to the steady-state MTT, estimated as the ratio of the total storage to the
throughput rate. However, the marginal age distributions are markedly
different from the distributions that would be expected in steady state,
demonstrating that steady-state approximations are misleading guides to the
non-steady-state behavior of the system,

Even this simple two-box model exhibits strong equifinality (Fig. B1), with four of its five parameters having virtually no identifiability through hydrograph calibration. However, scaling arguments based on simple perturbation analyses (Sect. 3.2) reveal ratios of parameters that can be constrained through hydrograph calibration (Fig. B2), greatly reducing the equifinality in the parameter space. Unfortunately, water age is primarily controlled by residual storage, which cannot be constrained through hydrograph calibration (Fig. B2). Thus, parameter sets that yield virtually identical hydrographs imply widely differing young water fractions and mean water ages (Fig. B3).

The simple two-box model was used to simulate discharge, water ages, and the
propagation of seasonal tracer cycles through the catchment, across wide
ranges of random parameter sets. MTTs inferred from the damping and phase
shift of the seasonal tracer cycles exhibited strong underestimation bias
and large scatter (Fig. 7). This result implies that many literature MTT
values (and thus also residual storage volumes) may have been underestimated
by large factors. By contrast, the seasonal tracer cycles accurately
predicted the actual

Flow-weighted fits to the seasonal tracer cycles accurately predicted the
flow-weighted average

The relationship between

These findings extend the results of Paper 1 by showing that estimates of
MTT from seasonal tracer cycles are unreliable under nonstationarity as well
as spatial heterogeneity. These findings also extend the results of Paper 1
by showing that

For simplicity and efficiency, the hydrological model is solved on a fixed
daily time step. This requires some care with the numerics, given the clear
(though often overlooked) dangers in naive forward-stepping simulations of
nonlinear equations (Clark and Kavetski, 2010; Kavetski and Clark, 2010,
2011). Here I use a weighted combination of the trapezoidal method (which is
partly implicit, for enhanced accuracy) and the backward Euler method (which
is fully implicit, for guaranteed stability). The hydrological solution
scheme is illustrated here for the upper box; the lower box is handled
analogously. The storage in the upper box is updated using the following equation:

The tracer concentrations are determined under the assumption that each box
is well mixed, implying that individual water parcels within each box do not
need to be tracked, and also that the concentration draining from each box
equals the average concentration within the box. I make the simplifying
assumption that each box's inflow and outflow rates (and also inflow
concentrations) are constant over each day. Again taking the upper box as an
example, these assumptions imply that starting from

The mean age within each box is modeled analogously to the tracer
concentrations, following the “age mass” concept widely used in groundwater
hydrology. Here I will illustrate the approach using the example of the
lower box, since it is the more complex case (for the upper box, the input
age in precipitation is zero, but this is not true for the upper-box
drainage that recharges the lower box). Assuming that the inflow and outflow
rates

The approach used here for concentrations and water ages requires the assumption that input fluxes to each box are constant within each time interval (but constant at their average values, not their initial values). This is a reasonable approximation, particularly when we have no sub-daily precipitation data. And in exchange for this simplifying assumption, Eqs. (A5), (A6), and (A9) provide something important, namely, the exact analytical solution for the evolution of concentration and age during each time interval. Thus, these equations directly solve for the correct result even if, for example, an individual day's rainfall is much greater than the total volume of the upper box. The equations above will correctly calculate the consequences of the (potentially many-fold) flushing that occurs in such cases. The approach outlined above also guarantees exact consistency between stocks and fluxes (but note that this is not done in the usual way by updating stocks with fluxes, but rather by calculating output fluxes from inputs and changes in stocks). Readers should keep in mind that all stocks and properties of stocks (i.e., storage volumes, concentrations, and ages) are expressed as the instantaneous values at the beginning of each time interval, and that fluxes and properties of fluxes (i.e., water fluxes and their concentrations and ages) are expressed as averages over each time interval. Otherwise it could be difficult to make sense of the equations above.

The analysis outlined in Sect. 3.2 implies that approximate equifinality is
inevitable, even in such a simple model, because variations in the exponents

This equifinality problem can be readily visualized by plots like Fig. B1.
To generate Fig. B1, I ran the model with Smith River precipitation forcing
and the reference parameter set (shown by the red squares in Fig. B1) and
used the resulting daily hydrograph (after the spin-up period) as virtual
“ground truth” for model calibration. I then ran the model with 1000 random
parameter sets and used the NSE of the
logarithms of discharge to measure how well their hydrographs matched the
reference hydrograph (thus the reference hydrograph has a NSE of 1 by
definition). The 50 best-fitting parameter sets, all with NSE

The other panels of the scatterplot matrix also give important clues to the
origins of the observed equifinality. In particular, the best-fitting
parameter sets show strong correlations between

Equifinality in discharge predictions. The scatterplot matrix
shows relationships among 1000 random parameter sets and the Nash–Sutcliffe
efficiency (NSE) of discharge time series driven by Smith River
(Mediterranean climate) precipitation forcing. The red square indicates the
“reference” parameter set that was used to generate the discharge time
series that the other parameter sets were tested against; these reference
parameters thus correspond to NSE

Equifinality partly cured by parameter transformations. The
scatterplot matrix shows relationships among 1000 random parameter sets and
the NSE of discharge time series driven by Smith
River (Mediterranean climate) precipitation forcing, along with two key
model outputs, the young water fraction and mean transit time in discharge
(bottom two rows). As in Fig. B1, the red square indicates the “reference”
parameter set that was used to generate the discharge time series that the
other parameter sets were tested against; these reference parameters thus
correspond to NSE

This information can be exploited to design parameter spaces that are more
identifiable through calibration (e.g., Ibbitt and O'Donnell, 1974). An
ideal parameter space would be one in which (1) all parameters are highly
identifiable, meaning the goodness-of-fit surface is strongly curved along
each parameter axis, and (2), in the best-fitting parameter sets, no
parameters are strongly correlated with one another. The second of these
criteria is necessary (although not sufficient) for the first, as Fig. B1
illustrates. A third criterion is that all parameters that are needed for
simulating any quantities of interest must be determined somehow within the
parameter space, either individually or through combinations of other
parameters. Thus, for example, although the volumes of the boxes
(

Excerpts from time series of discharge, tracer concentrations,
young water fractions, and mean travel times in the two-box model with Smith
River (Mediterranean climate) precipitation forcing and the reference
parameter set (the dark lines, for the parameter values shown by the red
squares in Figs. B1 and B2) and the 50 parameter sets that come closest to
matching the reference discharge time series (the light gray lines, for the
parameter sets shown by the solid blue dots in Figs. B1 and B2). The 50 gray
hydrographs

Figure B2 shows that this parameter space exhibits much less equifinality than the parameter space shown in Fig. B1, although the underlying parameter sets and model simulations are exactly the same. All that has been done is to re-project the parameter space onto a different set of coordinate axes in which the curvature of the goodness-of-fit surface is more clearly visible. Thus, much of the apparent equifinality in the parameter space has been eliminated by simple transformations of variables. These transformations can be designed by eye in this case, because the dimensionality of the original parameter space is low. In higher-dimension parameter spaces, multivariate techniques such as factor analysis may be helpful. Nonetheless, given the obvious utility of this simple correlation analysis and the perturbation analysis of Sect. 3.2, it is surprising that they are not more widely used in hydrological modeling.

Despite the improved identifiability of the parameter space, however, it is
still not possible to constrain the mean transit time by calibration to the
hydrograph. As the bottom row of scatterplots in Fig. B2 shows, the MTT is almost entirely determined by the lower box's
reference volume

Figure B3 provides a different visualization of the same equifinality
problem. Figure B3 shows a 2-year excerpt from the simulated time series
of streamflows, tracer concentrations, young water fractions, and mean
transit times for the reference parameter set (the blue curves), along with
the 50 parameter sets that gave the best fit to the reference hydrograph
(the gray curves). Because these 50 parameter sets were those that matched
the reference hydrograph best, it is unsurprising that the 50 gray
hydrographs generally follow the blue reference hydrograph in Fig. B3a. The
50 gray tracer concentration time series also follow the blue reference time
series (Fig. B3b), but with somewhat greater variability than the
hydrographs, indicating that the parameter values affect the chemographs and
the hydrographs in somewhat different ways. But the most striking feature of
Fig. B3 is the much greater variability among the young water fractions

I thank Scott Jasechko and Jeff McDonnell for the intensive discussions that motivated this analysis, and Markus Weiler and an anonymous reviewer for their comments. I thank the Centre for Ecology and Hydrology for making the Plynlimon data available. Edited by: T. Bogaard