Interactive comment on “ Aggregation effects on tritium-based mean transit times and young water fractions in spatially heterogeneous catchments and groundwater systems , and implications for past and future applications of tritium ” by M . K

exactly, and where the distribution is due to the different path lengths in the catchment, and isnot accurate at all for the dispersion model and the gamma model either, both of which include dispersion and diffusion, but ALSO the distribution of pathlength. The distribution collapses to a piston flow ONLY if all flow lines can be assumed to be of equal length. In the rest of that first comment, referee #1 seems to confuse two related issues: complex systems and heterogeneous systems. In a complex system, different and conceptually clearly separate reservoirs sustain discharge at the system’s outlet. Usually, hydrogeological understanding can lead to a choice of LPM combination which best simulates that system (say two exponential components for the quickflow and baseflow reservoirs), or the fit improves significantly by doing so. This assumes that each reservoirs is sufficiently homogeneous to use a given LMP shape, all of which have been developed for a homogeneous medium. Piotr Maloszewski and co-authors have over the years shown quite a few example of improving the fit to measured tritium activity by combining models and hydrogeological understanding. Kirchner’s model is somewhat similar in that TTDs are added up to simulate a heterogeneous system. This is not the same however, first because one would expect a heterogeneous catchment to be made up of more than two or three subcatchments (and hence two or three TTDs) which flow into one another, and second because while this combination is a conceptual contrivance, the combination used in complex systems is a conscious decision of the experimenter made based on data. The last sentence of that comment is not clear to me. What does the referee mean by "a specific set" ? What other set of heterogeneities does he think about ? Maybe he is pointing to the shortcoming of the model I was writing about in my comment.

Abstract.Kirchner (2016a) demonstrated that aggregation errors due to spatial heterogeneity, represented by two homogeneous subcatchments, could cause severe underestimation of the mean transit times (MTTs) of water travelling through catchments when simple lumped parameter models were applied to interpret seasonal tracer cycle data.Here we examine the effects of such errors on the MTTs and young water fractions estimated using tritium concentrations in two-part hydrological systems.We find that MTTs derived from tritium concentrations in streamflow are just as susceptible to aggregation bias as those from seasonal tracer cycles.Likewise, groundwater wells or springs fed by two or more water sources with different MTTs will also have aggregation bias.However, the transit times over which the biases are manifested are different because the two methods are applicable over different time ranges, up to 5 years for seasonal tracer cycles and up to 200 years for tritium concentrations.Our virtual experiments with two water components show that the aggregation errors are larger when the MTT differences between the components are larger and the amounts of the components are each close to 50 % of the mixture.We also find that young water fractions derived from tritium (based on a young water threshold of 18 years) are almost immune to aggregation errors as were those derived from seasonal tracer cycles with a threshold of about 2 months.

Introduction
Environmental tracers are commonly used to obtain transit time distributions (TTDs) in groundwater systems (Małoszewski and Zuber, 1982) or catchments (McDonnell et al., 2010).Transit time is the time it takes for rainfall to travel through a system from recharge to emergence in a well, spring, or stream.TTDs provide important information about transport, mixing, and storage of water in systems and therefore on the retention and release of pollutants.In addition, mean transit times (MTTs) determined from these distributions provide practical information for various aspects of water resources management.For example, MTTs have been used to estimate the volume of groundwater storage providing baseflow in catchments (Morgenstern et al., 2010;Gusyev et al., 2016) and to predict lag times and life expectancies of contaminants in the subsurface (Hrachowitz et al., 2016).The drinking water securities of wells in New Zealand are partly assessed by an absence of water with less than 1-year travel time by the New Zealand drinking water Published by Copernicus Publications on behalf of the European Geosciences Union.
quality standard (Ministry of Health, 2008).As useful as they are, TTDs cannot be measured directly in the field and have to be inferred from age-dependent tracer concentrations with the use of lumped parameter models (LPMs).
Catchments are inherently heterogeneous on various scales.Point-scale properties vary greatly from place to place, while streams integrate the various catchment outputs.The top-down approach uses catchment outputs, such as streamflow and stream chemistry, to infer or predict catchment TTDs.The hope is that these average out local heterogeneities allowing one simple LPM to provide a good fit and its parameters to be representative of the catchment.But individual areas within catchments can vary greatly because of geology, geography, aspect, etc. Groundwater systems also show heterogeneity.Kirchner (2016a) showed by means of virtual experiments that aggregating subcatchments with different TTDs can lead to severe underestimation of the composite MTT when simple LPMs were applied to interpret seasonal tracer cycles.This is because the smoothing out of the seasonal cycles is a non-linear process which acts more rapidly on the younger water components thereby causing underestimation of the composite MTT.He also found that the young water fraction was a much more robust metric than the MTT against aggregation error.These results raise an important question: are tritium-derived MTTs also susceptible to aggregation error due to spatial heterogeneity?This work aims to answer this question.
Seasonal tracer cycle and tritium-based MTTs are determined by different methods and have given very different results in catchments.The seasonal tracer cycle method depends on damping of input cycles on passing through a system into the output, whereas the tritium method depends on radioactive decay of tritium between input and output (with half-life of 12.32 years).Effects of mixing within systems need to be accounted for in both cases (Małoszewski and Zuber, 1982).Results from seasonal tracer cycles have given MTTs up to about 5 years, at which point the input cycles in homogeneous systems are completely damped within tracer measurement errors, while results from tritium measurements have shown that large proportions of the flow in many streams have MTTs of 1-2 decades or more (Stewart et al., 2010;Seegar and Weiler, 2014;Michel et al., 2015).Aggregation errors due to the non-linearity of the damping of the seasonal tracer cycles in time (noted above) add to this loss of signal in seasonal tracer cycles, thereby increasing the underestimation of the real MTTs in streams.Similarly, radioactive decay of tritium is a non-linear process and therefore spatial aggregation errors are expected when water components with different MTTs are combined (Bethke and Johnson, 2008).
Calibration of LPMs using environmental radioisotope and stable isotope data has been the subject of study for many years (see Małoszewski and Zuber, 1982, and early work summarised therein).If a catchment outflow is a mixture of two or more components of different water ages, it can be difficult to calibrate a LPM uniquely when we only have data for tracers.For example, for springs in Czatkowice, Poland, only when the proportion in which the water components (water fluxes) was mixed was known could the unique answer based on tritium measurements be found (Grabczak et al., 1984;Małoszewski and Zuber, 1993).In heterogeneous catchments, it is always helpful (i) to measure a variable tracer periodically, and (ii) to combine those data with water fluxes in the inputs and outputs to separate "fast" and "slow" components; see for example studies at Lainbach Valley, Germany (Małoszewski et al., 1983), and Schneealpe, Austria (Małoszewski et al., 2002).The choice of LPM, or equivalently the TTD function, must be based more on the hydrogeological situation and not on artificial mathematical (fitting) considerations.Consideration of hydrological parameters known independently (e.g.mean thickness of the water-bearing layers in the catchment) is required for model validation in order to examine whether the model is likely to be applicable to the real situation.We can have a very wellcalibrated model in terms of tracer data being fitted by an LPM, but the MTT can be far from the hydrological reality.
The aim of this paper is to examine the aggregation effects of spatially heterogeneous catchments and groundwater systems on MTTs and young water fractions determined using tritium concentrations.We conducted our investigation by combining two dissimilar water components in virtual experiments and comparing the true mixed MTTs with the tritium-inferred apparent MTTs, as Kirchner (2016a) did with seasonal tracer cycles.Our experiments did not include examination of non-stationary hydrological systems, for which Kirchner (2016b) had found similar underestimation of MTTs with seasonal tracer cycles.We also examined aggregation effects for young water fractions estimated using tritium.Our calculations are based on the gamma LPM with shape factors (α) between 1 and 10, which is also representative of other frequently used simple LPMs such as the exponential, exponential piston flow, and dispersion models.The different tritium input functions for Northern and Southern Hemisphere locations were also tested.

Transit time determination: simple and compound lumped parameter models
The varied flow paths of water through the subsurface of catchments imply that outflows contain mixtures of water with different transit times.That is, the water in the stream does not have a discrete transit time, but has a TTD.This distribution is often described by a conceptual flow or mixing model, which reflects the average (steady-state) conditions in the catchment or groundwater system.Rainfall incident on a catchment is affected by immediate surface/near-surface runoff and longer-term evapotranspira-tion loss.The remainder constitutes recharge to the subsurface water stores.Tracer inputs to the subsurface water stores (i.e.seasonal tracer cycles and tritium concentrations in the recharge water) are modified during passage through the hydrological system by mixing of water with different transit times (represented by the flow model) and radioactive decay in the case of tritium before appearing in the output.The convolution integral and an appropriate flow model are used to relate the tracer input and output.The convolution integral is given by where C in and C out are the input and output concentrations in the recharge and baseflow respectively; t is calendar time and the integration is carried out over the transit times τ ; h(τ ) is the flow model of the hydrological system based on the distribution of water fluxes in the catchment; the exponential term accounts for radioactive decay of tritium; and λ is the tritium decay constant (= ln 2/T 1/2 ), where T 1/2 is the halflife of tritium (12.32 years).
Tritium concentrations in precipitation were different in each hemisphere, and are proxies for tritium recharge concentrations (C in ).Input functions (tritium concentrations in monthly samples of precipitation) at Kaitoke, New Zealand, in the Southern Hemisphere (Morgenstern and Taylor, 2009) and Trier, Germany, in the Northern Hemisphere (IAEA/WMO, 2016) are given in Fig. 1.Tritium data for Trier before 1978 were calculated by regression from data for Vienna, Austria.Both curves in Fig. 1 have pronounced bomb peaks due to nuclear weapons testing mainly in the Northern Hemisphere during the 1950s and 1960s, but the peak was much larger in the Northern Hemisphere than in the Southern Hemisphere.Since then there have been steady declines due to leakage of tritium from the stratosphere into the troposphere followed by removal by rainout and radioactive decay.However, the tritium concentrations in the troposphere are now reaching the background cosmogenic levels which they had before the dawn of the nuclear age (conventionally taken as 1950).The levelling-out process occurred about 20 years ago in the Southern Hemisphere and 5-10 years ago in the Northern Hemisphere.The bomb peaks have been good markers of 1960s precipitation in past tritium studies, but the steady declines which mimic radioactive decay of tritium have caused problems with ambiguous (i.e.multiple) age estimations for given tritium values (Stewart et al., 2010).
The curves also show smaller variations due to annual peaks in tritium concentrations caused by increased stratospheric leakage during spring in each hemisphere, and possibly small longer-term variations related to sunspot cycles.Tritium concentrations are expected to remain at the present cosmogenic levels for the foreseeable future, and this means that multiple age solutions are becoming less of a problem (Stewart et al., 2012;Stewart and Morgenstern, 2016; Figure 1.Tritium concentrations (TU) in monthly precipitation samples at Kaitoke, New Zealand, in the Southern Hemisphere, and Trier, Germany, in the Northern Hemisphere.Gusyev et al., 2016).However, the minimal variation will mean that tritium will not be effective for identifying flow models in the future.
Several simple flow models are commonly used in tracer studies.The piston flow model (PFM) describes systems in which all of the water in the output has the same transit time (MTT or τ m ).Its TTD is where the single parameter is τ m (yr), and δ(τ − τ m ) is a δfunction that gives a spike when τ = τ m (see Fig. 2a).The output tritium concentration is and the output concentration equals the input concentration delayed in time by τ m and for tritium decayed by radioactive decay during the delay.The exponential model (EM) is given by where again the single parameter is τ m (yr).In this model, water parcels with different transit times combine in the outflow to approximate the exponential TTD.This is mathematically equivalent to the well-mixed model (also called the linear reservoir), but it does not imply that full mixing occurs within real systems.
The gamma model (GM) has TTDs based on the gamma distribution: where the two parameters α (-) and β (yr −1 ) are shape and scale factors respectively, and τ m = αβ (Kirchner et al., 2000).The gamma distribution reduces to the exponential distribution for the special case of α = 1.The exponential piston flow model (EPM) combines a volume with exponential transit times followed by a piston flow volume to give a model with two parameters (Małoszewski and Zuber, 1982).The TTD is given by where f is the ratio of the exponential volume to the total volume.Małoszewski and Zuber (1982) used the parameter η instead of f , where η = 1/f .f τ m is the time required for water to flow through the exponential volume, while τ m (1 − f ) is the time in the piston flow section.The dispersion model (DM) assumes a tracer transport which is controlled by advection and dispersion processes (Małoszewski and Zuber, 1982), with a TTD of where P D (-) is the dispersion parameter (being the measure of the variance of the transit time distribution, i.e. the sum of the variance resulting from the space distribution of the infiltration through the catchment surface and variance resulting from the dispersive flow through the underground).The two parameters are τ m and P D .
This paper makes a particular distinction between simple LPMs (meaning specifically the GM, the EPM with end members piston flow and exponential models, and the DM) and compound LPMs (binary or other parallel combinations of simple LPMs).Simple LPMs describe homogeneous systems, while compound LPMs can accommodate heterogeneity in the system.
Compound LPMs have generally only been explored for more complicated systems or when simple LPMs have given poor fits to data (such as seasonal tracer cycles or tritium concentrations) (e.g.Małoszewski et al., 1983;Stewart and Thomas, 2008;Blavoux et al., 2013;Morgenstern et al., 2015).The binary parallel LPM is given by where LPM 1 and LPM 2 are simple LPMs with individual PDFs representing two water components contributing to the system output, and b (-) is the fraction of the first component in the combined output.The overall combined MTT (τ m ) is An example of a compound LPM is the parallel combination of two exponential models describing a system with young and old water components.This is called the "double exponential model" when applied to tritium (Michel, 1992;Taylor et al., 1992) and the "two parallel linear reservoirs" (TPLR) model when applied to seasonal tracer cycles (Weiler et al., 2003).The PDF is given by where τ f and τ s are the MTTs of the fast and slow reservoirs respectively.The model has three parameters with the overall combined MTT (τ m ) being Other compound LPMs referred to in this work are the double gamma model (DGM), double exponential piston flow model (DEPM), and the double dispersion model (DDM), which are binary parallel combinations of the respective models.They each have five parameters.

Estimation of spatial aggregation effects on mean transit times
To estimate the effects of spatial aggregation on mean transit times (MTTs), we perform virtual experiments by combining two homogeneous subsystems.Each subsystem or water component is described by a simple LPM (a GM with assumed parameters α and β).The combined or mixed system is then describable by a compound LPM (Eq.8), which yields the "true" MTT via Eq.( 9) using the assumed MTTs of the components.
To determine the "apparent" MTT, the tritium concentrations of the water components from 1940 to the present are calculated from the GMs applying to each component using the convolution process described above (Eq.1).The input function was first assumed to be constant at 2 TU for the calculations given in Sect.3.1.1;then the Kaitoke or Trier input functions (Fig. 1) were used for the calculations in Sect.3.1.2and 3.1.3.In all cases, the tritium concentrations of the mixed system (C m ) are given by where C 1 (TU) and C 2 (TU) are the tritium concentrations in components 1 and 2 respectively.The mixed system is then treated as if it is homogeneous to produce the "apparent" MTT by fitting a simple LPM (a GM) to the tritium concentrations of the mixture (C m ).The true and apparent MTTs of the mixture are compared for different assumed values of the MTTs of the components.b is assumed to be 0.5 for simplicity in what follows.Following Kirchner (2016a), we did not consider evapotranspiration in our analysis of tritium aggregation effects.

Determination of young water fractions
The young water fraction (Y f ) is the fraction of water with transit times between zero and a young water threshold (t y ), i.e.
The young water threshold for tritium was estimated by trial and error using the GM with parameter α in the range 1 to 10.It was found that a constant threshold value of 18 years gave agreement between the apparent and true young water fractions to within about 10 %.This included the case with the greatest difference in ages between the two water components (i.e.waters with MTTs of 3 and 397 years respectively in this study).Accordingly, the young water threshold has been taken as 18 years in what follows.The "true" Y f is determined by mixing the two waters according to the equation in analogy with Eq. ( 9).b is the fraction of component 1 in the mixture, and Y f1 are Y f2 are the young water fractions of the two components.The "apparent" Y f is determined by fitting a simple LPM to the tritium concentrations of the mixture (Eq.12).b is assumed to be 0.5.

Comparison of transit time distributions of different flow models
The transit time distributions of the three cases of the GM investigated in this work are illustrated in Fig. 2a, as normalised PDFs (i.e.h(τ )×τ m ) versus normalised transit times (τ/τ m ).These cover the range of shapes observed in streams and groundwater using tritium concentrations.They are also approximately representative of the other simple flow models described above.The GM case with α = 1 is the exponential distribution (linear storage); the same as the EPM with f = 1.GM cases with α = 3 and 10 are more peaked and have smaller tails (short and long transit times are reduced compared to transit times close to the mean).The PFM is the end member of the series, being all peak and no tail (see Fig. 2a).
The other simple flow models are compared with the GM in Table 1 and Fig. 2b-c.The standard deviation (SD) and Nash-Sutcliffe efficiency (NSE) are used to quantify the goodness of fit between the GM (GM i ) and the best-fitting version of each of the other models (LPM i ), where The NSE efficiency can vary between −∞ and 1. NSE = 1 indicates a perfect fit between the GM and the other model, while NSE = 0 means that the variation between the models is the same as the variation about the mean of the other model.The standard deviation and NSE gave the same results in terms of identifying the most similar shapes of the  1).TTD shapes for the GM with α between 1 and 10 are equivalent to EPM shapes with exponential fractions (f ) between 1.0 and 0.44 (Table 1), which have been found suitable for interpreting tritium concentrations in baseflow and groundwater (e.g.Małoszewski et al., 1983;Stewart et al., 2007;Morgenstern and Stewart, 2004).The useful range of the DM has dispersion parameters (P D ) between about 1.3 and 0.05 corresponding to the GM with α between 1 and 10 (Table 1).The GM and EPM shapes become less similar to each other as α increases to 10, while the GM and DM shapes become more similar.

Results
3.1 Aggregation effects on mean transit times determined using tritium

Relationships between mean transit time and tritium concentration
We first demonstrate the relationships between mean transit time and tritium concentration for mixed systems (Fig. 3) by assuming constant annual input tritium concentration of 2 TU over time, i.e. without the bomb pulse during the nuclear age and only natural background concentrations are present.This simplifying assumption is necessary to allow for the analysis shown in Fig. 3; with the real peaked input the figures would be much more complicated.The assumption of a constant tritium input function is however becoming increasingly realistic in the Southern Hemisphere, with the bomb tritium from 50 years ago now fading away and assuming no more largescale releases of tritium to the atmosphere.This assumption is not limited to tritium but would also be valid for all radioactive tracers with constant input such as carbon-14 and argon-39.Figure 3a shows the relationship for the GM with shape factor α = 1.The red points indicate the assumed water components (with MTTs of 3 and 197 years respectively) and the red dashed line is the mixing relationship between them de-scribed by Eqs. ( 9) and ( 12).The "true" MTT (100 years) of a 50 : 50 mixture of the components (i.e.b = 0.5) is shown on the red dashed line.The black curve is the result of applying the GM with α = 1 to the mixed tritium concentrations (Eq.12).A 50 : 50 mixture of the components gives the "apparent" MTT shown (20.5 years), which is much less than the "true" MTT.This results from the strongly non-linear character of the black curve (Fig. 3a) and therefore combining two dissimilar subsystems causes aggregation bias in a similar way to that demonstrated for seasonal tracer cycles by Kirchner (2016a) in his Fig. 5 (and also for radioactive decay by Bethke and Johnson, 2008, in their Fig.3a).
Figure 3b-d show the same calculations applied to the GMs with α = 3 and 10 and the PFM.The different shape factors describe different fractional contributions of past water inputs to the present water output as illustrated by the transit time distributions in Fig. 2a.The GMs with α = 3 and 10 have slightly greater differences between the true and apparent MTTs than the GM with α = 1.The PFM is the most sharply peaked of all, and has the greatest true-apparent MTT difference of 100 years to 15 years.Since there is no mixing, the non-linearity of the black curve is solely due to radioactive decay of tritium (Fig. 3d).

Effect of young component fraction (b) on aggregation
Figure 4 shows the effect of changing the fraction of the young component (b) on the aggregation error for a mixture of two components with MTTs of 10 and 90 years.As b increases from zero, the aggregation effect increases from zero reaching a maximum near b = 0.5 and then decreasing to zero again at b = 1.This is an important factor in the aggregation error.The virtual experiments below (carried out with b = 0.5) showed the maximum effects.

True versus apparent mean transit times
The true versus the apparent MTTs calculated using the real tritium input function from Kaitoke (expressed as annual values) are given in Fig. 5.The calculations were structured so that the two water components were initially assumed to have the same MTTs (i.e.τ 1 = τ 2 ) and therefore the mixture had the same true and apparent MTTs, and plotted on the 1 : 1 lines.The second component (MTT2) was then allowed to become older in 50-year steps so that the difference in MTTs between the two components increased.This caused the apparent MTTs to become younger than the true MTTs and the points to move further and further away from the 1 : 1 line as shown by the curves in Fig. 5.The dots show the effects of the step changes in MTT2.As expected, the greatest age differences caused the biggest deviations from the 1 : 1 lines.The different values of α cause differences to the patterns observed, but the patterns are similar overall.They are tighter around the 1 : 1 line for α = 1 showing smaller ag-   gregation effects, and are most divergent for α = 10.Errors of fitting for determining the apparent MTTs (expressed as standard deviations; Eq. 15) are greatest when component 1 is youngest, these are shown by fine dashed lines above and below the curves.The errors are largest with α = 10.The fitting errors are important because big errors would lead researchers to apply more complicated and therefore more re-alistic LPMs (such as binary LPMs), as many have in the past (e.g.Małoszewski et al., 1983;Uhlenbrook et al., 2002;Stewart and Thomas, 2008;Morgenstern et al., 2015).
Using the Trier (Northern Hemisphere) tritium input function (Fig. 1) results in very similar aggregation biases for tritium MTTs (Fig. 6) compared to those obtained with the Kaitoke input (Fig. 5).Using Northern Hemisphere or Southern Hemisphere tritium input functions makes only slight differences to the curves.Note that the problem of multiple age solutions often experienced using tritium with the Northern Hemispheric input function (e.g.Stewart et al., 2012) does not arise here because we calculate around 75 tritium values (one for each year) and this constrains the final "apparent" fitting to a single unique solution.However, the fitting errors for the apparent MTTs with the Trier input function are much larger than those determined with the Kaitoke input function.
Some of the calculation results are replotted in Fig. 7   Northern Hemisphere locations by the young water threshold (18 years).

Aggregation effects on young water fractions
The effect of combining two different water components on the true and apparent young water fractions (Y f ) of a mixture are examined in this section using the same procedure as before (i.e.testing mixtures with MTT1 at 10 years, 25 years).
The two water components were initially assumed to have the same MTTs and young water fractions (i.e.Y f1 = Y f2 ) and therefore the mixture had the same true and apparent young water fractions and are plotted on the 1 : 1 lines in Figs. 8 and  9.The second component (MTT2) was then allowed to become older in 50-year steps so that the differences in MTTs and young water fractions between the two components increased.But now the true and apparent young water fractions did not diverge very much from each other (Figs. 8 and 9).The figures show the young water fractions decreasing as the mixtures become older, but the curves lie mostly along the 1 : 1 lines.There are only small divergences from an apparent to true young fraction ratio of one (up to about 10 %).
The maximum divergences from this ratio are affected by choice of young water threshold 13).The present calculations have been made using a young water threshold of 18 years.With higher values for the threshold, the maximum divergences from the 1 : 1 line were found to become larger.Consequently, 18 years is taken as the recommended value for the young water threshold.
For stable isotopes, Kirchner (2016a) reported a young water threshold range from 0.1 to 0.25 years (or approximately 2 months) for the GM shape factor α ranging from 0.2 to 2. From our tritium evaluation with MTT1 at 10 years, the young water threshold of tritium-based transit times was 18 years for all values of the shape factor α between 1 and 10.
Young water fractions evaluated using tritium are of practical interest for various threshold ages -for example 1 year for assessing drinking water security of groundwater wells (water mixtures without any fraction of water of less than 1 year are regarded as secure in terms of potential for pathogen contamination; Close et al., 2000;Ministry of Health, 2008), or 60 years to assess the fraction of water that has already been impacted by high-intensity industrial agriculture starting after WWII (e.g.Morgenstern et al., 2015).

Aggregation effects on MTTs for seasonal tracer cycles
Aggregation effects for seasonal tracer cycles have been determined by the methods of Kirchner (2016a) for comparison with the tritium effects.The rainfall input variation has been approximated as a sine wave with a 1-year period to imitate the seasonal tracer cycle, and the sine wave has been traced through the convolution using the gamma distribution.Figure 10 shows the aggregation effects for the GM with α = 1.The pattern is very like those observed using tritium concentrations (Fig. 5), so it is clear that the effects are effectively the same whether seasonal tracer cycles or radioactive isotopes are being used.Although our methodology was the same as Kirchner's in that two components were combined, we followed the process of starting with the same MTTs and then allowing the second component to become older.For this reason, the results show the dependence of the aggregation error on the difference in MTTs more explicitly than the random sampling of non-similar MTT components method of Kirchner.

Implications of tritium MTT aggregation bias
The analysis of Sect.3.1 and 3.2 has shown that tritiumderived MTTs are just as susceptible to aggregation bias as seasonal tracer cycles when flows from dissimilar parts of catchments are combined using simple LPMs.Likewise, groundwater wells or springs fed by two or more water sources with different MTTs will also show aggregation bias.However, the transit times over which the biases are manifested are different because the two methods are applicable to different time ranges, up to 5 years for seasonal tracer cycles and up to 20 years for tritium concentrations (based on appropriate mixing models).Note particularly that the bias applies not only to samples at the limits of the methods (i.e. with very small tracer cycles or near-zero tritium concentrations) but also to MTTs far below these limits.The calculations have been made for extreme cases to highlight the aggregation bias.Firstly, the heterogeneity is assumed to be represented by just two homogeneous but different areas of hydrological systems.This is the worst type of heterogeneity for aggregation bias.Secondly, the water components from these areas are assumed to combine in the proportions of 1 : 1 in the outlet.This causes close  to the maximum aggregation bias for a given pair of waters, since it ranges from zero at b = 0 or 1 to a maximum around b = 0.5 (Fig. 4).Obviously there is no aggregation bias when the MTTs of the two components are the same, and the bias increases as the difference between the MTTs increases (Figs. 5 and 6).The bigger the span of MTTs between the two components, the bigger the aggregation error.
When the old component is so old that it has essentially no tritium and could have any age (hundreds, thousands, or even millions of years), the aggregation error could be very large.This is a well-recognised problem with the use of many radioactive isotopes and chemicals for dating water (e.g.Cook and Böhlke, 2012;Stewart, 2012).The analogous problem with seasonal tracer cycles is when the old component is too old to have any seasonal variation at all and the age is effectively truncated at around 5 years (Stewart et al., 2010).

How can aggregation error be detected in tritium-based MTTs?
Both simple and compound LPMs can be free of aggregation error or conversely be affected by aggregation error depending on whether or not they capture the nature of the heterogeneity in the catchment or groundwater system relevant to the error.Simple LPMs have fewer parameters, but have no ability to capture heterogeneity because of their underlying perceptual model (i.e. the assumption of homogeneity), and therefore would be expected to underestimate MTTs be-cause of aggregation error if there is heterogeneity producing flows with different MTTs in the system.(However, note that a highly skewed simple LPM (in that case a GM with α = 0.24) was able to mimic the MTT of a specific case of binary flow in Sect.3.1.1(see Reviewer's Report #1).This did not mean that the model gave a good representation of the TTD of the binary system.The dispersion model has the same ability to be highly skewed if the dispersion parameter (D P ) is allowed to become very much larger than normal.)Compound LPMs have more parameters and therefore more flexibility to capture heterogeneity, but the model structure must be based on the underlying perceptual model or the parameters extracted could misrepresent the system and lead to aggregation error.Calibration of the parameters of compound LPMs requires that there be sufficient quantity and quality of data and with enough variation to enable retrieval of the increased number of parameter values with compound LPMs.
In the past, the bomb tritium pulse introduced strong variation in tritium concentrations of precipitation, but variation in the future will be very much less because of the passing of the bomb pulse from the atmosphere.We therefore suggest that the answer to the question in the title of this section may be what has often been practised in the past, even though the term "aggregation error" was not used (e.g.Małoszewski et al., 1983;Uhlenbrook et al., 2002;Stewart and Thomas, 2008;Morgenstern and Taylor, 2009;Stewart et al., 2010;Blavoux et al., 2013;Morgen-stern et al., 2015).This ideally involves evaluation of many types of information about a hydrological system (geological, hydrological, hydrochemical, tritium, and other isotopes) to establish a perceptual model, and experiments with simple and compound LPMs in harmony with the derived perceptual model to fit tritium data (and, if available, other types of chemical or isotopic data).Compound LPMs in harmony with the perceptual model would be expected to yield MTTs with less aggregation error than simple LPMs, because the former have the ability to separate young and old water components while the latter do not.Comparison of MTTs from simple and compound models should then show whether there is much aggregation error.Parameters yielded by bestfitting models have been used in the past, but they may not be the most appropriate ones if the parameters are to be used in other contexts.There is also risk of missing less apparent (alternative) parameter solutions if there are any elsewhere in the parameter space.Gallart et al. (2016) applied a GLUE-based uncertainty assessment method which used Monte Carlo searching of the parameter space of the EPM to estimate MTTs from tritium.This allowed the uncertainties of the parameters to be quantified.

Summary
MTT estimations based on tritium concentrations show very similar aggregation effects to those for seasonal tracer variations.Our virtual experiments with two water components show that the aggregation errors are largest when the MTT differences between the two components are largest and the amounts of the components subequal.We also find that young water fractions derived from tritium based on a young water threshold of 18 years are almost immune to aggregation errors as were those derived from seasonal tracer cycles with a threshold of about 2 months.We conclude with a discussion of the implications of aggregation bias on tritium MTTs and detection of aggregation errors in past studies.
Data availability.No data sets were used in this paper.Readers can consult with the authors regarding the methods used in the virtual experiments.
Competing interests.The authors declare that they have no conflict of interest.

Figure 2 .
Figure 2. (a) Gamma model (GM) distributions for shape factors α between 1 and 10.The axes show normalised transit time (τ/τ m ) and normalised probability density function (PDF) (h(τ ) × τ m ).Note that the distribution for GM (α = 1) is the same as that for the exponential model (EM).(b, c) Comparison between the GM with α = 3 and the best-fitting exponential piston flow and dispersion models.

Figure 3 .
Figure3.Aggregation errors when the tritium input concentration is assumed to be constant at 2 TU.Mean transit times (MTTs) are inferred from tritium concentrations in mixed runoff from two subcatchments with different tritium concentrations and MTTs (shown by red dots) using a range of GMs and the PFM.The relationships between MTTs and tritium concentrations given by the simple models (black curves) are strongly non-linear causing marked differences between the true and apparent MTTs.

Figure 4 .
Figure 4. Effect of changing b (the young component on the aggregation error with the GM with α = 1 for mixing of two components with MTTs of 10 and 90 years.The blue line shows the effects at b = 0.5. to compare results for the Northern Hemisphere and Southern Hemisphere.This figure shows the possible aggregation error (expressed as percentage deviation of the apparent from the true MTTs) versus the MTT of component 2 (MTT2) for the GM with α = 1.The curves show results for MTT1 = 10 years -these are restatements of the curves in Figs.5a and 6a for α = 1 and MTT1 = 10 years.Aggregation errors are 8 % for Southern Hemisphere and 15 % for www.hydrol-earth-syst-sci.net/21/4615/2017/ Hydrol.Earth Syst.Sci., 21, 4615-4627, 2017

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Aggregation effects for tritium MTTs for GMs with different values of α using the Kaitoke input function.Curves show changes as component 2 (MTT2) becomes older in 50-year steps and therefore the mixtures older in 25-year steps (shown by dots).The first step of the MTT1 = 10 years curve is 15 years.Fitting errors in the apparent MTTs are shown by fine dashed lines.

Figure 8 .
Figure 8. True versus apparent tritium young water fractions for GMs with different values of α using the Kaitoke input function.Curves show changes as component 2 (MTT2) becomes older in 50-year steps and therefore the mixtures become older in 25-year steps (shown by dots).

Table 1 .
Comparison of the shapes of the gamma (GM), exponential piston flow (EPM), and dispersion (DM) model transit time distributions.The shape parameters of the best-fitting versions of the other models and the goodness of fit (standard deviation, SD; Nash-Sutcliffe efficiency, NSE) between them and the GM are given.