Environmental heterogeneity is ubiquitous, but environmental systems are
often analyzed as if they were homogeneous instead, resulting in aggregation
errors that are rarely explored and almost never quantified. Here I use
simple benchmark tests to explore this general problem in one specific
context: the use of seasonal cycles in chemical or isotopic tracers (such as
Cl

Environmental systems are characteristically complex and heterogeneous. Their processes and properties are often difficult to quantify at small scales and difficult to extrapolate to larger scales. Thus, translating process inferences across scales and aggregating across heterogeneity are fundamental challenges for environmental scientists. These ubiquitous aggregation problems have been a focus of research in some environmental fields, such as ecological modeling (e.g., Rastetter et al., 1992), but have received surprisingly little attention elsewhere. In the catchment hydrology literature, for example, spatial heterogeneity has been widely recognized as a fundamental problem but has rarely been the subject of rigorous analysis.

Instead, it is often tacitly assumed (although

This is a testable proposition, and the answer will depend partly on the nature of the underlying model. All models obscure a system's spatial heterogeneity to some degree, and many conceptual models obscure it completely, by treating spatially heterogeneous catchments as if they were spatially homogeneous instead. Doing so is not automatically disqualifying, but neither is it obviously valid. Rather, this spatial aggregation is a modeling choice, whose consequences should be explicitly analyzed and quantified. What do I mean by “explicitly analyzed and quantified?”. As an example, consider the Kirchner et al. (1993) analysis of how spatial heterogeneity affected a particular geochemical model for estimating catchment buffering of acid deposition. The authors began by noting that spatial heterogeneities will not “average out” in nonlinear model equations and by showing that the resulting aggregation bias will be proportional to the nonlinearity in the model equations (which can be directly estimated) and proportional to the variance in the heterogeneous real-world parameter values (which is typically unknown but may at least be given a plausible upper bound). They then showed that their geochemical model's governing equations were sufficiently linear that the effects of spatial heterogeneity were likely to be small. They then confirmed this theoretical result by mixing measured runoff chemistry time series from random pairs of geochemically diverse catchments (which do not flow together in the real world). They showed that the geochemical model correctly predicted the buffering behavior of these spatially heterogeneous pseudo-catchments, without knowing that those catchments were heterogeneous and without knowing anything about the nature of their heterogeneities.

Here I use similar thought experiments to explore the consequences of
spatial heterogeneity for catchment mean transit-time estimates derived from
seasonal tracer cycles in precipitation and streamflow. Catchment

A catchment is characterized by its travel-time distribution (TTD), which reflects the diversity of flowpaths (and their velocities) connecting each point on the landscape with the stream. Because these flowpaths and velocities change with hydrologic forcing, the TTD is nonstationary (Kirchner et al., 2001; Tetzlaff et al., 2007; Botter et al., 2010; Hrachowitz et al., 2010a; Van der Velde et al., 2010; Birkel et al., 2012; Heidbüchel et al., 2012; Peters et al., 2014); but time-varying TTDs are difficult to estimate in practice, so most catchment studies have focused on estimating time-averaged TTDs instead. Both the shape of the TTD and its corresponding mean travel time (MTT) reflect storage and mixing processes in the catchment (Kirchner et al., 2000, 2001; Godsey et al., 2010; Hrachowitz et al., 2010a). However, due to the difficulty in reliably estimating the shape of the TTD, and the volumes of data required to do so, many catchment studies have simply assumed that the TTD has a given shape, and have estimated only its MTT. As a result, and also because of its obvious physical interpretation as the ratio between the storage volume and the average water flux (in steady state), the MTT is by far the most universally reported parameter in catchment travel-time studies. Estimates of MTTs have been correlated with a wide range of catchment characteristics, including drainage density, aspect, hillslope gradient, depth to groundwater, hydraulic conductivity, and the prevalence of hydrologically responsive soils (e.g., McGuire et al., 2005; Soulsby et al., 2006; Tetzlaff et al., 2009; Broxton et al., 2009; Hrachowitz et al., 2009, 2010b; Asano and Uchida, 2012; Heidbüchel et al., 2013).

Travel-time distributions and mean travel times cannot be measured directly,
and they differ – often by orders of magnitude – from the hydrologic
response timescale, because the former is determined by the velocity of
water flow, and the latter is determined by the celerity of hydraulic
potentials (Horton and Hawkins, 1965; Hewlett and Hibbert, 1967; Beven,
1982; Kirchner et al., 2000; McDonnell and Beven, 2014). Nor can travel-time
characteristics be reliably determined a priori from theory. Instead, they must be
determined from chemical or isotopic tracers, such as Cl

As reviewed by McGuire and McDonnell (2006), three methods are commonly used to infer catchment travel times from conservative tracer time series: (1) time-domain convolution of the input time series to simulate the output time series, with parameters of the convolution kernel (the travel-time distribution) fitted by iterative search techniques; (2) Fourier transform spectral analysis of the input and output time series; and (3) sine-wave fitting to the seasonal tracer variation in the input and output. In all three methods, the greater the damping of the input signal in the output, the longer the inferred mean travel time. Sine-wave fitting can be viewed as the simplest possible version of both spectral analysis (examining the Fourier transform at just the annual frequency) and time-domain convolution (approximating the input and output as sinusoids, for which the convolution relationship is particularly easy to calculate). Whereas time-domain convolution methods require continuous, unbroken precipitation isotopic records spanning at least several times the MTT (McGuire and McDonnell, 2006; Hrachowitz et al., 2011), and spectral methods require time series spanning a wide range of timescales (Feng et al., 2004), sine-wave fitting can be performed on sparse, irregularly sampled data sets. Because sine-wave fitting is mathematically straightforward, and because its data requirements are modest compared to the other two methods, it is arguably the best candidate for comparison studies based on large multi-site data sets of isotopic measurements in precipitation and river flow. For that reason – and because it presents an interesting test case of the general aggregation issues alluded to above, in which some key results can be derived analytically – the sinusoidal fitting method will be the focus of my analysis.

The isotopic composition of precipitation varies seasonally as shifts in
meridional circulation alter atmospheric vapor transport pathways (Feng et
al., 2009) and as shifts in temperature and storm intensity alter the
degree of rainout-driven fractionation that air masses undergo (Bowen,
2008). The resulting seasonal cycles in precipitation (e.g., Fig. 1a) are
damped and phase-shifted as they are transmitted through catchments
(e.g., Fig. 1b), by amounts that depend on – and thus can be used to infer
properties of – the travel-time distribution. Figure 1 shows an example of
sinusoidal fits to seasonal

Seasonal cycles in

That particular estimate of mean transit time, like practically all such estimates in the literature, was made by methods that assume that the catchment is homogeneous and therefore that the shape of its TTD can be straightforwardly characterized. Typical catchments violate this assumption, but the consequences for estimating MTTs have not been systematically investigated, either for sine-wave fitting or for any other methods that infer travel times from tracer data. Are any of these estimation methods reliable under realistic degrees of spatial heterogeneity? Are they biased, and by how much? We simply do not know, because they have not been tested. Instead, we have been directly applying theoretical results, derived for idealized hypothetical cases, to complex real-world situations that do not share those idealized characteristics. Methods for estimating catchment travel times urgently need benchmark testing. The work presented below is intended as one small step toward filling that gap.

Any method for inferring transit-time distributions (or their parameters,
such as mean transit time) must make simplifying assumptions about the
system under study. Most such methods assume that conservative tracers in
streamflow can be modeled as the convolution of the catchment's transit time
distribution with the tracer time series in precipitation (Maloszewski et
al., 1983; Maloszewski and Zuber, 1993; Barnes and Bonell, 1996; Kirchner et al.,
2000).

The transit-time distribution

In much of the analysis that follows, I will assume that the transit-time
distribution

Gamma distributions for the range of shape factors

Figure 2 shows gamma distributions spanning a range of shape factors

For present purposes, it is sufficient to note that the family of gamma
distributions encompasses a wide range of shapes which approximate many
plausible TTDs (Fig. 2). The moments of the gamma distribution vary
systematically with the shape factor

Studies that have used tracers to constrain the shape of catchment TTDs
have generally found shape factors

Because convolutions (Eq. 1) are linear operators, they transform any
sinusoidal cycle in the precipitation time series

The amplitudes

The key to calculating the amplitude damping and phase shift that will
result from convolving a sinusoidal input with a gamma-distributed

Amplitude ratio and phase shift between seasonal cycles in
precipitation and streamflow, for gamma-distributed catchment transit-time
distributions with a range of shape factors

Conceptual diagram illustrating the mixture of seasonal tracer cycles in runoff from a heterogeneous catchment, comprising two subcatchments with strongly contrasting MTTs, and which thus damp the tracer cycle in precipitation (light blue dashed line) by different amounts. The tracer cycle in the combined runoff from the two subcatchments (dark blue solid line) will average together the highly damped cycle from subcatchment 1, with long MTT (solid red line), and the less damped cycle from subcatchment 2, with short MTT (solid orange line).

If the shape factor

The methods outlined above can be applied straightforwardly in a homogeneous catchment characterized by a single transit-time distribution. Real-world catchments, however, are generally heterogeneous; they combine different landscapes with different characteristics and thus different TTDs. The implications of this heterogeneity can be demonstrated with a simple thought experiment. What if, instead of a single homogeneous catchment, we have two subcatchments with different MTTs and therefore different tracer cycles, which then flow together, as shown in Fig. 4? If we observed only the tracer cycle in the combined runoff (the solid blue line in Fig. 4), and not the tracer cycles in the individual subcatchments (the red and orange lines in Fig. 4), would we correctly infer the whole-catchment MTT? Note that although I refer to the different runoff sources as “subcatchments”, they could equally well represent alternate slopes draining to the same stream channel or even independent flowpaths down the same hillslope; nothing in this thought experiment specifies the scale of the analysis. And, of course, real-world catchments are much more complex than the simple thought experiment diagrammed in Fig. 4, but this two-component model is sufficient to illustrate the key issues at hand.

From assumed MTTs

One can immediately see that this situation is highly prone to aggregation
bias, following the Kirchner et al. (1993) rule of thumb that the degree of
aggregation bias is proportional to the nonlinearity in the governing
equations and the variance in the heterogeneous parameters. The amplitude
ratios

Illustration of the aggregation error that arises when mean
transit time is inferred from seasonal tracer cycles in mixed runoff from
two landscapes with contrasting transit-time distributions (e.g., Fig. 4).
The relationship between MTT and the amplitude ratio
(

Figure 5 illustrates the crux of the problem. The plotted curve shows the
relationship between

Combining flows from two subcatchments with different mean transit times
will result in a combined TTD that differs in shape, not just in scale, from
the TTDs of either of the subcatchments. For example, combining two
exponential distributions with different mean transit times does not result
in another exponential distribution but rather a hyperexponential
distribution, as shown in Fig. 6. The characteristic function of the
hyperexponential distribution (Walck, 2007) yields the following expression
for the amplitude ratio of tracer cycles in precipitation and streamflow,

Exponential transit-time distributions for subcatchments 1 and 2
in Fig. 4 (with mean transit times of 1 and 0.1 years, shown by the orange and
red dashed lines, respectively), and the hyperexponential distribution
formed by merging them in equal proportions (solid blue line).

Apparent MTT inferred from seasonal tracer
cycles, showing order-of-magnitude deviations from true MTT for
1000 synthetic catchments. Each synthetic catchment comprises two subcatchments
with individual MTTs randomly chosen from a uniform distribution of
logarithms spanning the interval between 0.1 and 20 years, with each pair
differing by at least a factor of 2. In

Amplitude ratio (

To test the generality of this result, I repeated the thought experiment
outlined above for 1000 hypothetical pairs of subcatchments, each with
individual MTTs randomly chosen from a uniform distribution of logarithms
spanning the interval between 0.1 and 20 years (Fig. 7). Pairs with MTTs
that differed by less than a factor of 2 were excluded, so that the entire
sample consisted of truly heterogeneous catchments. I then applied Eq. (10)
to calculate the apparent MTT from the inferred runoff. As Fig. 7 shows,
apparent MTTs calculated from the combined runoff of the two subcatchments
can underestimate true whole-catchment MTTs by an order of magnitude or
more, and this strong underestimation bias persists across a wide range of
shape factors

In most real-world cases, unlike these hypothetical thought experiments, one will only have measurements or samples from the whole catchment's runoff. The properties of the individual subcatchments and thus the degree of heterogeneity in the system will generally be unknown. And even if data were available for the subcatchments, those subcatchments would be composed of sub-subcatchments, which would themselves be heterogeneous to some unknown degree, and so on. Thus, it will generally be difficult or impossible to characterize the system's heterogeneity, but that is no justification for pretending that this heterogeneity does not exist. Nonetheless, in such situations it will be tempting to treat the whole system as if it were homogeneous, perhaps using terms like “apparent age” or “model age” to preserve a sense of rigor. But whatever the semantics, as Fig. 7 shows, assuming homogeneity in heterogeneous catchments will result in strongly biased estimates of whole-catchment mean transit times.

The analysis in Sect. 3 demonstrates what can be termed an “aggregation error”:
in heterogeneous systems, mean transit times estimated from seasonal tracer
cycles yield inconsistent results at different levels of aggregation. The
aggregation bias demonstrated in Figs. 5 and 7 implies that seasonal cycles
of conservative tracers are unreliable estimators of catchment mean transit
times. This observation raises the obvious question: is there anything

One hint is provided by the observation that when two tributaries are mixed, the tracer cycle amplitude in the mixture will almost exactly equal the average of the tracer cycle amplitudes in the two tributaries (Fig. 8). This is not intuitively obvious, because the tributary cycles will generally be somewhat out of phase with each other, so their amplitudes will not average exactly linearly. But when the tributary cycles are far out of phase (because the subcatchments have markedly different mean transit times or shape factors), the two amplitudes will also generally be very different and thus the phase angle between the tributary cycles will have little effect on the amplitude of the mixed cycle.

Because tracer cycle amplitudes will average almost linearly when two
streams merge and thus are virtually free from aggregation bias (Fig. 8),
anything that is proportional to tracer cycle amplitude will also be
virtually free from aggregation bias. So, what is proportional to tracer
cycle amplitude? One hint is provided by the observation that in Fig. 5, for
example, the tracer cycle amplitude in the mixture is highly sensitive to
transit times that are much shorter than the period of the tracer cycle (for
a seasonal cycle, this period is

These lines of reasoning lead to the conjecture that for many realistic
transit-time distributions, the amplitude ratio

Numerical experiments verify these conjectures for gamma distributions
spanning a wide range of shape factors (see Fig. 9). I define the young
water fraction

The young water fraction

The analysis presented in Sect. 4.1 shows that the amplitude ratio

We can interpret the uncertainty in

Best-fit young water thresholds for gamma transit-time
distributions, as a function of shape factors

True and apparent young water fractions for the same
1000 synthetic catchments shown in Fig. 7. The tracer cycle amplitude ratio in
the combined runoff of the two subcatchments (vertical axes) corresponds
closely to the average young water fraction in the combined runoff
(horizontal axes). As in Fig. 7, each synthetic catchment comprises two
subcatchments with individual MTTs randomly chosen from a uniform
distribution of logarithms spanning the interval between 0.1 and 20 years,
and with each pair of MTTs differing by at least a factor of 2. In

First, from Fig. 10 we can estimate how uncertainty in

Alternatively, we can treat the uncertainty in

For comparison, we can contrast this uncertainty with the corresponding
uncertainty in the mean transit time

We can extend these sample calculations over a range of shape factors

Sensitivity analysis showing how variations in shape factor

Similar sensitivity of mean transit time to model assumptions was also
observed by Kirchner et al. (2010) in two Scottish streams and by Seeger
and Weiler (2014) in their study calibrating three different transit-time
models to monthly

Because both the young water fraction

True and apparent young water fractions

Effect of including phase information in estimates of young water
fraction (

It is important to recognize that the two-tributary catchments that were
merged in Fig. 13 are not characterized by gamma transit-time distributions
(although their tributaries are), because mixing two gamma distributions
does not create another gamma distribution. Thus, Fig. 13 demonstrates the
important result that although the analysis presented here was based on
gamma distributions for mathematical convenience, the general principles
developed here – namely, that the amplitude ratio

For example, as Fig. 6 showed, mixing two exponential distributions will not
create another exponential distribution, nor any other member of the gamma
family but rather a hyperexponential distribution. Thus, Fig. 13b implies
that

One interpretation of the strong aggregation bias in mean transit-time estimates, as documented in Figs. 7 and 13, is that when the transit-time distributions of the individual tributaries are averaged together, the result has a different shape (i.e., averages of exponentials are not exponentials and averages of gamma distributions are not gamma-distributed). Thus, it is unsurprising that a formula for estimating mean travel times based on exponential distributions (for example) will be inaccurate when applied to nonexponential distributions. The practical issue in the real world, of course, is that the shape of the transit-time distribution will usually be unknown, so the problem of fitting the “wrong” distribution will be difficult to solve.

In the specific case of fitting seasonal sinusoidal patterns, the only information one has for estimating the transit-time distribution is the amplitude ratio and the phase shift of streamflow relative to precipitation. The phase shift has heretofore been ignored as a source of additional information. Could it be helpful?

As described in Sect. 2.2, one can use the amplitude ratio and phase
shift to jointly estimate the shape factor

This approach assumes that the catchment's transit times are
gamma-distributed. To test whether it can nonetheless improve estimates of
the mean transit time or the young water fraction, even in catchments whose
transit times are not gamma-distributed, I applied this method to the
eight-tributary synthetic catchments shown in Fig. 13. As pointed out in
Sect. 4.3, the TTDs of these catchments (and even their two-subcatchment
tributaries) will be sums of gammas and thus not gamma-distributed
themselves. Figure 14 shows the new estimates based on amplitude ratios and
phase shifts (in dark blue), superimposed on the previous estimates from
Fig. 13, based on amplitude ratios alone, as reference (in light blue). Mean
transit-time estimates based on both phase and amplitude information are
somewhat more accurate than those based on amplitude ratios alone (Fig. 14d–f),
but they still exhibit very large aggregation bias. Incorporating
phase information in estimates of

Two main results emerge from the analysis presented above. First, MTTs estimated from seasonal tracer cycles exhibit severe aggregation bias in heterogeneous catchments, underestimating the true MTT by large factors. Second, seasonal tracer cycle amplitudes accurately reflect the fraction of young water in streamflow and exhibit very little aggregation bias. Both of these results have important implications for catchment hydrology.

Figures 7, 13, and 14 indicate that in spatially heterogeneous catchments (which is to say, all real-world catchments), MTTs estimated from seasonal tracer cycles are fundamentally unreliable. The relationship between true and inferred MTTs shown in these figures is not only strongly biased, but also wildly scattered – so much so, that it can only be visualized on logarithmic axes. The huge scatter in the relationship means that there is little point in trying to correct the bias with a calibration curve, because most of the resulting estimates would still be wrong by large factors. This scatter also implies that one should be careful about drawing inferences from site-to-site comparisons of MTT values derived from seasonal cycles, since a large part of their variability may be aggregation noise.

The underestimation bias in MTT estimates arises because, as Figs. 3a and 5 show, travel times significantly shorter than 1 year have a much bigger effect on seasonal tracer cycles than travel times of roughly 1 year and longer. DeWalle et al. (1997) calculated that an exponential TTD with a MTT of 5 years would result in such a small isotopic cycle in streamflow that it would approach the analytical detection limit of isotope measurements. But while this may be the hypothetical upper limit to MTTs determined from seasonal isotope cycles, my results show that even MTTs far below that limit cannot be reliably estimated in heterogeneous landscapes. Indeed, Fig. 7 shows that MTTs can only be reliably estimated (that is, they will fall close to the 1 : 1 line) in heterogeneous systems where the MTT is roughly 0.2 years or so – in other words, only when most of the streamflow is “young” water.

It is becoming widely recognized that stable isotopes are effectively blind to the long tails of travel-time distributions (Stewart et al., 2010, 2012; Seeger and Weiler, 2014). The results presented here reinforce this point, showing how in heterogeneous catchments any stable isotope cycles from long-MTT subcatchments (or flowpaths) will be overwhelmed by much larger cycles from short-MTT subcatchments (or flowpaths). Furthermore, the nonlinearities in the governing equations (Figs. 3, 5) imply that the shorter-MTT components will dominate MTT estimates, which will thus be biased low. This underestimation bias may help to explain the discrepancy between MTT estimates derived from stable isotopes and those derived from other tracers, such as tritium (Stewart et al., 2010, 2012). However, one should note that, like any radioactive tracer, tritium ages should themselves be vulnerable to underestimation bias in heterogeneous systems (Bethke and Johnson, 2008). Until tritium ages are subjected to benchmark tests like those I have presented here for stable isotopes, one cannot estimate how much they, too, are distorted by aggregation bias.

Sine-wave fitting to seasonal tracer cycles is just one of several methods for estimating MTTs from tracer data. I have focused on this method because the relevant calculations are easily posed, and several key results can be obtained analytically. My results show that MTT estimates from sine-wave fitting are subject to severe aggregation bias, but they do not show whether other methods are better or worse in this regard. This is unknown at present and needs to be tested. But, until this is done, there is little basis for optimism that other methods will be immune to the biases identified here. One would expect that the results presented here should translate straightforwardly to spectral methods for estimating MTTs, as these methods essentially perform sine-wave fitting across a range of timescales. Thus, one should expect aggregation bias at each timescale. The upper limit of reliable MTT estimates should be expected to be a fraction of the longest observable cycles in the data (as it is for the annual cycles measured here). Thus, this upper limit will depend on the lengths of the tracer time series and also on whether they contain significant input and output variability on long wavelengths (longer records will not help, unless the tracer concentrations are actually variable on those longer timescales). The same principles are likely to apply to convolution modeling of tracer time series, due to the formal equivalence of the time and frequency domains under Fourier's theorem. Furthermore, to the extent that seasonal cycles are the dominant features of many natural tracer time series, convolution modeling of tracer time series may effectively be an elaborate form of sine-wave fitting, with all the attendant biases outlined here. Until these conjectures are tested, however, they will remain speculative. Given the severe aggregation bias identified here, there is an urgent need for benchmark testing of the other common methods for MTT estimation.

It should also be noted that methods for estimating MTTs assume not only homogeneity but also stationarity, and real-world catchments violate both of these assumptions. The results presented here suggest that nonstationarity (which is, very loosely speaking, heterogeneity in time) is likely to create its own aggregation bias, in addition to the spatial aggregation bias identified here. This aggregation bias can also be characterized using benchmark tests, as I show in a companion paper (Kirchner, 2016).

The analysis presented here implies that many literature values of MTT are likely to be underestimated by large factors or, in other words, that typical catchment travel times are probably several times longer than we previously thought they were. This result sharpens the “rapid mobilization of old water” paradox: how do catchments store water for weeks or months, and then release it within minutes or hours in response to precipitation events (Kirchner, 2003)? This result also sharpens an even more basic puzzle: where can catchments store so much water, that it can be so old, on average?

Many studies have sought to link MTTs to catchment characteristics, often
with inconsistent results. For example, McGuire et al. (2005) reported that
MTT was positively correlated with the ratio of flowpath distance to
average hillslope gradient at experimental catchments in Oregon, but
Tetzlaff et al. (2009) reported that MTT was

More generally, though, my analysis implies that the young water fraction

Of course, because

It should be kept in mind that in real-world data, unlike the thought
experiments analyzed here, the tracer measurements themselves will be
somewhat uncertain, and this uncertainty will also flow through to estimates
of either MTT or

Since young water fractions are estimated from amplitude ratios and phase
shifts of seasonal tracer cycles, one could ask whether they add any new
information or whether we could characterize catchments equally well by
their amplitude ratios and phase shifts instead. One obvious answer is that
amplitude ratios and phase shifts, by themselves, are purely
phenomenological descriptions of input–output behavior. Young water
fractions, by contrast, offer a mechanistic explanation for how that
behavior arises, showing how it is linked to the fraction of precipitation
that reaches the stream in much less than 1 year. Not only is this
potentially useful for understanding the transport of contaminants and
nutrients, it also directly quantifies the importance of relatively fast
flowpaths in the catchment. These fast flowpaths are likely to be shallow
(since permeability typically decreases rapidly with depth:
Brooks et al., 2004; Bishop et al., 2011) and to originate relatively
close to flowing channels. One would expect

Because one can estimate

One final note: it has not escaped my notice that because the young water threshold is defined as a fraction of the period of the fitted sinusoid (here, an annual cycle), and because spectral analysis is equivalent to fitting sinusoids across a range of timescales, the input and output spectra of conservative tracers can be re-expressed as a series of young water fractions for a series of young water thresholds. In principle, then, this cascade of young water fractions (and their associated threshold ages) should directly express the catchment's cumulative distribution of travel times, thus solving the longstanding problem of measuring the shape of the transit-time distribution. A proof-of-concept study of this direct approach to deconvolution is currently underway.

I used benchmark tests with data from simple synthetic catchments (Fig. 4) to test how catchment heterogeneity affects estimates of mean transit times (MTTs) derived from seasonal tracer cycles in precipitation and streamflow (e.g., Fig. 1). The relationship between tracer cycle amplitude and MTT is strongly nonlinear (Fig. 3), with the result that tracer cycles from heterogeneous catchments will underestimate their average MTTs (Fig. 5). In heterogeneous catchments, furthermore, the shape of the transit-time distribution (TTD) in the mixed runoff will differ from that of the tributaries; e.g., mixtures of exponential distributions are not exponentials (Fig. 6) and mixtures of gamma distributions are not gamma-distributed. These two effects combine to make seasonal tracer cycles highly unreliable as estimators of MTTs, with large scatter and strong underestimation bias in heterogeneous catchments (Figs. 7, 13). These results imply that many literature values of MTT are likely to be underestimated by large factors and thus that typical catchment travel times are much longer than previously thought.

However, seasonal tracer cycles can be used to reliably estimate the

More generally, these results vividly illustrate how the pervasive
heterogeneity of environmental systems can confound the simple conceptual
models that are often used to analyze them. But not all properties of
environmental systems are equally susceptible to aggregation error. Although
environmental heterogeneity makes some measures (like MTT) highly
unreliable, it has little effect on others (like

This analysis was motivated by intensive discussions with Scott Jasechko and Jeff McDonnell; I thank them for their encouragement, and for many insightful comments. I also appreciate the comments by Markus Weiler and two anonymous reviewers, which spurred improvements in the final version of the manuscript. Edited by: T. Bogaard