Approximating uncertainty of annual runoff and reservoir yield using stochastic replicates of global climate model data

Two key sources of uncertainty in projections of future runoff for climate change impact assessments are uncertainty between global climate models (GCMs) and within a GCM. Within-GCM uncertainty is the variability in GCM output that occurs when running a scenario multiple times but each run has slightly different, but equally plausible, initial conditions. The limited number of runs available for each GCM and scenario combination within the Coupled Model Intercomparison Project phase 3 (CMIP3) and phase 5 (CMIP5) data sets, limits the assessment of within-GCM uncertainty. In this second of two companion papers, the primary aim is to present a proof-of-concept approximation of within-GCM uncertainty for monthly precipitation and temperature projections and to assess the impact of withinGCM uncertainty on modelled runoff for climate change impact assessments. A secondary aim is to assess the impact of between-GCM uncertainty on modelled runoff. Here we approximate within-GCM uncertainty by developing nonstationary stochastic replicates of GCM monthly precipitation and temperature data. These replicates are input to an off-line hydrologic model to assess the impact of withinGCM uncertainty on projected annual runoff and reservoir yield. We adopt stochastic replicates of available GCM runs to approximate within-GCM uncertainty because large ensembles, hundreds of runs, for a given GCM and scenario are unavailable, other than the Climateprediction.net data set for the Hadley Centre GCM. To date within-GCM uncertainty has received little attention in the hydrologic climate change impact literature and this analysis provides an approximation of the uncertainty in projected runoff, and reservoir yield, due to withinand between-GCM uncertainty of precipitation and temperature projections. In the companion paper, McMahon et al. (2015) sought to reduce between-GCM uncertainty by removing poorly performing GCMs, resulting in a selection of five better performing GCMs from CMIP3 for use in this paper. Here we present withinand between-GCM uncertainty results in mean annual precipitation (MAP), mean annual temperature (MAT), mean annual runoff (MAR), the standard deviation of annual precipitation (SDP), standard deviation of runoff (SDR) and reservoir yield for five CMIP3 GCMs at 17 worldwide catchments. Based on 100 stochastic replicates of each GCM run at each catchment, withinGCM uncertainty was assessed in relative form as the standard deviation expressed as a percentage of the mean of the 100 replicate values of each variable. The average relative within-GCM uncertainties from the 17 catchments and 5 GCMs for 2015–2044 (A1B) were MAP 4.2 %, SDP 14.2 %, MAT 0.7 %, MAR 10.1 % and SDR 17.6 %. The Gould– Dincer Gamma (G-DG) procedure was applied to each annual runoff time series for hypothetical reservoir capacities of 1×MAR and 3×MAR and the average uncertainties in reservoir yield due to within-GCM uncertainty from the 17 catchments and 5 GCMs were 25.1 % (1×MAR) and 11.9 % (3×MAR). Our approximation of within-GCM uncertainty is expected to be an underestimate due to not replicating the GCM trend. However, our results indicate that within-GCM uncertainty is important when interpreting climate change impact assessments. Approximately 95 % of values of MAP, SDP, MAT, MAR, SDR and reservoir yield from 1×MAR or 3×MAR capacity reservoirs are expected to fall within Published by Copernicus Publications on behalf of the European Geosciences Union. 1616 M. C. Peel et al.: Approximating uncertainty of annual runoff and reservoir yield using stochastic replicates twice their respective relative uncertainty (standard deviation/mean). Within-GCM uncertainty has significant implications for interpreting climate change impact assessments that report future changes within our range of uncertainty for a given variable – these projected changes may be due solely to within-GCM uncertainty. Since within-GCM variability is amplified from precipitation to runoff and then to reservoir yield, climate change impact assessments that do not take into account within-GCM uncertainty risk providing water resources management decision makers with a sense of certainty that is unjustified.


Abstract.
Two key sources of uncertainty in projections of future runoff for climate change impact assessments are uncertainty between global climate models (GCMs) and within a GCM. Within-GCM uncertainty is the variability in GCM output that occurs when running a scenario multiple times but each run has slightly different, but equally plausible, initial conditions. The limited number of runs available for each GCM and scenario combination within the Coupled Model Intercomparison Project phase 3 (CMIP3) and phase 5 (CMIP5) data sets, limits the assessment of within-GCM uncertainty. In this second of two companion papers, the primary aim is to present a proof-of-concept approximation of within-GCM uncertainty for monthly precipitation and temperature projections and to assess the impact of within-GCM uncertainty on modelled runoff for climate change impact assessments. A secondary aim is to assess the impact of between-GCM uncertainty on modelled runoff. Here we approximate within-GCM uncertainty by developing nonstationary stochastic replicates of GCM monthly precipitation and temperature data. These replicates are input to an off-line hydrologic model to assess the impact of within-GCM uncertainty on projected annual runoff and reservoir yield. We adopt stochastic replicates of available GCM runs to approximate within-GCM uncertainty because large ensembles, hundreds of runs, for a given GCM and scenario are unavailable, other than the Climateprediction.net data set for the Hadley Centre GCM. To date within-GCM uncertainty has received little attention in the hydrologic climate change impact literature and this analysis provides an approximation of the uncertainty in projected runoff, and reservoir yield, due to within-and between-GCM uncertainty of precipitation and temperature projections. In the companion paper, McMahon et al. (2015) sought to reduce between-GCM uncertainty by removing poorly performing GCMs, resulting in a selection of five better performing GCMs from CMIP3 for use in this paper. Here we present within-and between-GCM uncertainty results in mean annual precipitation (MAP), mean annual temperature (MAT), mean annual runoff (MAR), the standard deviation of annual precipitation (SDP), standard deviation of runoff (SDR) and reservoir yield for five CMIP3 GCMs at 17 worldwide catchments. Based on 100 stochastic replicates of each GCM run at each catchment, within-GCM uncertainty was assessed in relative form as the standard deviation expressed as a percentage of the mean of the 100 replicate values of each variable. The average relative within-GCM uncertainties from the 17 catchments and 5 GCMs for 2015-2044 (A1B) were MAP 4.2 %, SDP 14.2 %, MAT 0.7 %, MAR 10.1 % and SDR 17.6 %. The Gould-Dincer Gamma (G-DG) procedure was applied to each annual runoff time series for hypothetical reservoir capacities of 1 × MAR and 3 × MAR and the average uncertainties in reservoir yield due to within-GCM uncertainty from the 17 catchments and 5 GCMs were 25.1 % (1 × MAR) and 11.9 % (3 × MAR). Our approximation of within-GCM uncertainty is expected to be an underestimate due to not replicating the GCM trend. However, our results indicate that within-GCM uncertainty is important when interpreting climate change impact assessments. Approximately 95 % of values of MAP, SDP, MAT, MAR, SDR and reservoir yield from 1 × MAR or 3 × MAR capacity reservoirs are expected to fall within twice their respective relative uncertainty (standard deviation/mean). Within-GCM uncertainty has significant implications for interpreting climate change impact assessments that report future changes within our range of uncertainty for a given variable -these projected changes may be due solely to within-GCM uncertainty. Since within-GCM variability is amplified from precipitation to runoff and then to reservoir yield, climate change impact assessments that do not take into account within-GCM uncertainty risk providing water resources management decision makers with a sense of certainty that is unjustified.

Introduction
This study is part of a research project that seeks to enhance our understanding of the uncertainty of future annual river flows, leading to more informed decision-making for the sustainable management of scarce water resources. This is the second of two papers examining the uncertainty of streamflow estimates derived from global climate models (GCMs). In the first paper, McMahon et al. (2015) assessed the adequacy of GCMs from phase 3 of the Coupled Model Intercomparison Project (CMIP3; Meehl et al., 2007) to simulate observed values of mean annual precipitation, standard deviation of annual precipitation, mean annual temperature, monthly patterns of precipitation and temperature, and Köppen climate classification. Five GCMs (HadCM3, MIROCM, MIUB, MPI and MRI; see Table 1 of McMahon et al. (2015) for full GCM names) were selected as better performing GCMs for use in this second paper.
In this paper we address a significant limitation to characterising the uncertainty of future runoff which is the lack of sufficient GCM runs of historical (20C3M) and future projections (e.g. A1B). Modelling historical runoff involves numerous uncertainties (Peel and Blöschl, 2011) including uncertainties in observed input data used to drive the hydrologic model (Andréassian et al., 2004;McMillan et al., 2011), observed data against which the hydrologic model is calibrated (Di Baldassarre and Montanari, 2009;McMillan et al., 2010), the calibration method and objective function adopted (Efstratiadis and Koutsoyiannis, 2010) and the hydrologic model structure itself (Andréassian et al., 2009;Vogel and Sankarasubramanian, 2003). Additional uncertainty is introduced when modelling future runoff through (1) assuming the hydrologic model calibration applies into the future (Chiew et al., 2014), (2) assuming a bias correction for adjusting GCM data developed over the observed period applies into the future and (3) through differences in future climate projections between GCMs and within a GCM. Recent investigations into uncertainty introduced at different stages of the model train, from GCM to hydrologic model, for climate change impact assessments include Bosshard et al. (2013), Dobler et al. (2012), Hingray and Saïd (2014), Kay et al. (2009), Lafaysse et al. (2014), Prudhomme and Davies (2009a, b), Steinschneider et al. (2012), Teng et al. (2012) and Woldemeskel et al. (2014).
The uncertainty between GCM projections of future climate can be assessed through analysis of runs from a wide range of GCMs, such as those available from CMIP3 and being collated within the Coupled Model Intercomparison Project phase 5 (CMIP5). Our selection of five better performing GCMs from CMIP3 in the companion paper (McMahon et al., 2015) is an attempt to reduce between-GCM uncertainty by removing poorly performing GCMs from the analysis conducted in this paper. Our primary aim in this proof-of-concept paper is to present an approximation of within-GCM uncertainty, which is the variability in GCM output that occurs when running a scenario multiple times but each run has slightly different, but equally plausible, initial conditions. Although the importance of within-GCM uncertainty for climate change impact assessments has been highlighted by Tebaldi and Knutti (2007), Sutton (2009, 2011) and Deser et al. (2012Deser et al. ( , 2014, to date it has received little attention in the hydrology climate change impact literature. Here we develop an approximation of within-GCM uncertainty and apply it to a climate change impact assessment for future runoff and reservoir yield. The magnitude of within-GCM uncertainty for a metric like mean annual precipitation can be assessed directly from GCM output if a large enough ensemble of runs from a GCM for a given emission scenario are available. The number of GCM runs required to adequately assess uncertainty depends upon the metric of interest and the level of confidence adopted. For example, for a given level of confidence an extreme value metric will require many more runs than a mean to obtain a reliable estimate. For a more detailed discussion of this issue see Salas (1992). Currently, large ensembles of runs from each GCM and scenario are unavailable. In the CMIP3 data set most GCMs have a single run of a given scenario from which a direct assessment of within-GCM uncertainty is impossible. In terms of ensemble members, CMIP5 is an improvement over CMIP3 in that more runs of each scenario are being reported for each GCM. However, the number of runs per GCM and scenario combination in CMIP5 is still of the order of 3 to 10, rather than the hundreds of runs required for adequate estimation of within-GCM uncertainty of some metrics. The Climateprediction.net data set contains an ensemble of several thousand runs from the Hadley Centre GCM (Frame et al., 2009;Rowlands et al., 2012). However, this data set can only be used to directly assess within-GCM uncertainty for the Hadley Centre GCM.
Previous assessments of the impact of within-GCM uncertainty on runoff have been limited by the lack of available GCM runs. For example, when investigating sources of uncertainty in the climate change impact on hydrology, Chen et al. (2011) were limited to 5 runs with different initial conditions from the MRI GCM. Similarly, Velázquez et al. (2013) were limited to five runs from one GCM and  Peel et al., 2010) is within 5 % of reported catchment area; b see Peel et al. (2007); c Gen: shows the mean value ± standard deviation of 100 replicates; MAP: mean annual precipitation; SDP: standard deviation of annual precipitation; MAT: mean annual temperature.
three runs from a second in their comparison of the uncertainty due to hydrologic models and within-GCM uncertainty. Prudhomme and Davies (2009a) sought to overcome the limited number of GCM runs by introducing a seasonal block-resampling technique to estimate natural climate variability via 100 bootstrap replicates of observed and GCM time series. In Prudhomme and Davies (2009b) they applied seasonal block-resampling to a 30-year baseline period and future period to assess whether climate change impacts were significantly different to baseline climate variability. However, seasonal block-resampling is unable to address interdecadal variability, as noted by Kay et al. (2009), or periods with significant trend as the bootstrap replicates will scramble any inter-decadal variability or trend. Finally, Hingray and Saïd (2014) and Lafaysse et al. (2014) adopted a stochastic approach whereby they generated 100 stochastic replicates from each of 6 statistical downscaling models for each of 11 runs from 5 GCMs (Hingray and Saïd, 2014), or 12 runs from 6 GCMs (Lafaysse et al., 2014). They used this multimodel ensemble of stochastic replicates to investigate the magnitude of within-and between-GCM uncertainty for the Durance catchment in France.
In this proof-of-concept paper, we develop an approximation of within-GCM uncertainty using non-stationary stochastic replicates of GCM monthly precipitation and temperature data that seeks to preserve any inter-decadal variability and trend. Unlike Hingray and Saïd (2014) and Lafaysse et al. (2014) whose replicates were produced by the statistical downscaling model, here we stochastically replicate the original GCM runs prior to downscaling. Estimating uncertainty in a time-series metric via stochastic modelling of a time series is standard hydrologic practice (Hipel and McLeod, 1994). A stochastic model is fit to the time series of interest and an ensemble of time-series replicates with the same stochastic properties as the original series is generated. The metric of interest is calculated for each ensemble member and the metric uncertainty is estimated from the distribution of metric values. In this paper we stochastically replicate the GCM output data, then use an ensemble of stochastic replicates as input to an off-line hydrologic model to estimate an ensemble of future runoff projections, from which we estimate the variability in mean and variance of annual runoff. Finally, the ensemble of future runoff projections is used to investigate the impact of within-and between-GCM uncertainty on future reservoir yield.
In this paper we model runoff in an off-line hydrologic model rather than adopt GCM generated runoff. Arora (2001) demonstrated the quality of GCM runoff mainly depends on the quality of GCM precipitation, with any bias in precipitation amplified in the resulting runoff. In the companion paper (McMahon et al., 2015), we assessed GCM bias in reproducing observed precipitation conditions and found substantial biases for all GCMs; thus, we would expect significant bias in runoff generated by a GCM. Furthermore, Sperna Weiland et al. (2012) found that runoff estimates from an external hydrologic model generally outperformed GCM runoff estimates. However, Sperna Weiland et al. (2012) noted that when the GCM Land Surface Scheme is specifically tuned to reproduce observed runoff and a routing scheme is added then GCM runoff becomes more acceptable. We also use the terms streamflow and runoff interchangeably and adopt depth (in millimetres) as a measure of flow rather than a volume unit.
Following this introduction, in Sect. 2 we outline the approximation methodology (ensemble empirical mode decomposition (EEMD), stochastic data generation, quantilequantile bias correction of precipitation and temperature, precipitation-evapotranspiration-runoff modelling and uncertainty in reservoir yield) and related literature. We test our stochastic within-GCM uncertainty approximation for the largest ensemble of GCM runs in the CMIP3 data set for a given GCM and scenario in Sect. 3. In Sect. 4 results of applying the methodology to output from five GCMs identified in the companion paper (McMahon et al., 2015) are presented and discussed. Conclusions from the analysis and discussion are presented in Sect. 5. Further details about the precipitation-evapotranspiration-runoff model, source code and example input and output are provided in the Supplement.

Overall methodology
The methodology to approximate within-GCM uncertainty and assess the impact of within-and between-GCM uncertainty on future runoff and reservoir yield is shown in Fig. 1. Five better performing GCMs were identified in the companion paper for use in this paper through a literature review and assessment of how well CMIP3 GCMs reproduced observed mean annual precipitation, annual temperature and average monthly precipitation and temperature at GCM grid cell scales (McMahon et al., 2015). The five GCMs identified were HadCM3, MIROCM(1), MIUB(1), MPI(1) and MRI(3), where the number in brackets refers to the run number for that GCM in the CMIP3 data set (see McMahon et al. (2015), Table 1 for full GCM names). As part of the analysis in McMahon et al. (2015) catchment average values of concurrent monthly precipitation and temperature for the 20C3M and A1B emissions scenarios were extracted from each GCM in the CMIP3 data set. The catchment average was calculated for each catchment and GCM combination by determining the proportion of catchment area associated with each GCM grid cell and performing an area weighted average of the GCM data for each month. Catchment average precipitation and temperature from the five better performing GCMs are used throughout this paper.
An ideal assessment of within-GCM uncertainty would involve analysis of hundreds of runs of a single GCM for a given scenario with each run having slightly different, but equally plausible, initial conditions. Each run in this ideal ensemble would have a different sequence of monthly values and a different overall trend. How different the monthly sequence and overall trend is from one run to the next represents the within-GCM uncertainty. In this paper we do not seek to approximate the overall trend, as this information is best provided by a GCM responding to an emissions sce- nario. Here we approximate differences in the monthly sequence around the trend by using stochastic data generation. To achieve this we de-trend the catchment average GCM data, stochastically replicate the de-trended series and add the trend to the stochastic data to form a stochastic replicate of the GCM data for the entire period of GCM record. In this way we approximate the uncertainty around the overall trend, but not the uncertainty in the trend. Therefore, the approximation presented here represents an underestimate of the true within-GCM uncertainty as the trends used are restricted to those available in GCM runs in the CMIP3 data set. This stochastic methodology is a temporary solution for approximating within-GCM uncertainty until sufficient GCM runs become available to directly estimate within-GCM uncertainty from a large ensemble of GCM runs. The procedure adopted here to approximate within-GCM uncertainty for a catchment consists of the following steps (see Fig. 1): 1. De-trend the 20C3M and A1B catchment average GCM monthly precipitation and temperature data using EEMD. EEMD also allows any low-frequency signals in the time series to be identified.
2. Generate stochastically at a monthly time step k replicates (where k is arbitrarily adopted as 100 to demonstrate the proof-of-concept) of precipitation and temperature of ∼ 250 years (∼ 150 from 20CM3 and 100 from A1B) ensuring the cross-correlations and auto-correlations in the precipitation and temperature time series are preserved and any significant low- 3. Add the appropriate trend to the time series for each replicate of monthly precipitation and temperature.
4. Bias correct both the precipitation and the temperature time series using the quantile-quantile procedure.
5. Calibrate the precipitation-evapotranspiration-runoff monthly model (PERM) for each catchment using observed precipitation, temperature and runoff data.
6. Model runoff using PERM and the bias-corrected stochastic replicates of GCM monthly precipitation and temperature.
7. Compute mean annual runoff (MAR), standard deviation of annual runoff (SDR), the lag-1 serial correlation of annual runoff (lag-1) and hypothetical reservoir yield for each replicate.
8. Estimate the within-and between-GCM uncertainty in MAR, SDR and lag-1 and hypothetical reservoir yield based on the 100 replicates.
An advantage of this methodology over a bootstrap-based methodology, like Prudhomme and Davies (2009a), is that the entire period of the GCM run is replicated, which can preserve any inter-decadal variability and trend in the replicates. The replicates can be used to drive a hydrologic model for the entire period (∼ 250 years), rather than for short (∼ 30 years) separate periods, since the overall trend has been preserved. Therefore, hydrologic model stores will be representative of prior conditions at the beginning of any future period of runoff impact assessment.

De-trend GCM data
GCM projections of precipitation and temperature are nonstationary in terms of mean and existing stochastic data generation techniques generally deal with stationary data. In order to apply existing stochastic methods we de-trend the GCM monthly precipitation and temperature data using EEMD. The original empirical mode decomposition (EMD) algorithm, introduced by Huang et al. (1998), is an adaptive spectral analysis technique that is robust when applied to nonlinear and non-stationary data. EMD decomposes a time series into a set of intrinsic mode functions (IMFs) and a residual. Each IMF is a zero-mean fluctuation in which the frequency and amplitude may vary within a given IMF. Subsequent IMFs represent progressively lower frequency fluctuations. The EMD residual captures any trend in a time series which may be an unresolved low-frequency fluctuation with an average period longer than the period of record or a linear or non-linear trend. The nature of the EMD residual is not assumed prior to running the algorithm, rather it is a datadriven output. More recently, Wu and Huang (2009) proposed ensemble EMD (EEMD), a noise assisted data analysis procedure as an improvement over the original EMD. In EEMD, an ensemble of EMD trials is obtained by adding white noise of finite amplitude to the time series prior to each EMD run. The IMFs and residual from each trial are grouped by IMF order into ensembles and the average of each IMF group and the average residual yield the EEMD result. Because the white noise is different for each EMD trial, during averaging the noise cancels out as the ensemble size increases. The purpose of the noise is to change the ordering of local maxima and minima within the time series, thus generating a different EMD outcome in each trial. Details are given in Wu and Huang (2009) and an application to the Southern Oscillation Index is presented by Peel et al. (2011b) and to Australian monthly rainfall and temperature by .
For this analysis the relevant features of EEMD are the residual, which represents the time-series trend, and any lowfrequency signal in the GCM data. Some GCMs reproduce features of the El Niño-Southern Oscillation (ENSO) and associated low-frequency variability (van Oldenborgh et al., 2005). For GCMs with an ENSO signal in precipitation we would like to maintain this information in the stochastic replicates. To identify low-frequency signals in GCM data we follow Wu and Huang (2004) and compare each set of EEMD results against a white noise model. Low-frequency IMFs (average period > 2 years) with more variance than expected from a white noise model are considered a lowfrequency signal. The white noise model is an ensemble of 200 EMD results from a white noise series of the same length and variance as the GCM series.
EEMD was applied to 20C3M and A1B precipitation and temperature data and the residual (trend) identified. For temperature data all IMFs are summed together to form a detrended time series ready for stochastic replication. For precipitation data, where a low-frequency signal is not present, all IMFs are summed together to form a de-trended time series ready for stochastic replication. Where a low-frequency precipitation signal is identified, all IMFs with an average period ≤ 2 years are summed to form a high-frequency component and all IMFs with an average period > 2 years are summed to form a low-frequency component.
In this EEMD analysis we use a rational spline EMD (Pegram et al., 2008) with a tension parameter = 0.5 and a reflective spline end condition (Peel et al., 2009). Each EEMD analysis has 200 ensemble members, the standard deviation of white noise added to the series represents 0.4 of the original series standard deviation and non-orthogonal IMFs, IMF pairs with Orthogonality Index > 0.1, are automatically combined following Peel et al. (2011a).

Stochastic data generation
In this step we approximate uncertainty around the GCM trend by generating stochastic replicates of de-trended GCM catchment average time series of concurrent monthly precipitation and temperature. In order to preserve any crosscorrelation between the precipitation and temperature series and their auto-correlations, the Matalas (1967) multi-site stochastic data generation procedure was adopted. In order to preserve any low-frequency precipitation information, the generation procedure also needs to be able to simulate both high-and low-frequency time series. To achieve this we adapt the method of McMahon et al. (2008) who used EMD to decompose 6-month precipitation data into intra-and interdecadal components, replicated each component separately, and then combined the component replicates to form the 6month precipitation replicate. In this way their stochastic replicates were able to reproduce observed multi-year dry periods. Replicating intra-and inter-decadal components separately was possible in McMahon et al. (2008) as IMFs from EMD, and EEMD, are orthogonal to each other. In this paper we use EEMD to identify any low-frequency component (> 2 years) in the precipitation and utilise the orthogonal nature of EEMD IMFs to replicate the high-and low-frequency components separately before combining the generated component replicates and the trend to form the overall replicate.
The first step in the data generation process is to remove the trends (one for precipitation and one for temperature) identified through EEMD analysis in Sect. 2.2 from the monthly precipitation and temperature time series. If the GCM precipitation does not contain a low-frequency component then there are two separate time series to replicate concurrently: (1) the de-trended temperature (sum of EEMD IMFs), and (2) the de-trended precipitation (sum of EEMD IMFs). If GCM precipitation does contain a low-frequency component then the de-trended precipitation is divided into a high-frequency component (sum of EEMD IMFs with average period ≤ 2 years) and a low-frequency component (sum of EEMD IMFs with average period > 2 years), resulting in three time series to replicate concurrently. Next, for each calendar month, these time series are standardised to zero mean and unit variance. Data generation then takes place and the resulting standardised values are rescaled by the calendar monthly means and variances. Finally, the respective trends are added to the rescaled precipitation and temperature data to form the final replicates.
For sites without a low-frequency precipitation component, the following auto-regressive lag-1 (AR1) model is appropriate: where [B] 2×2 are 2 × 2 matrices of constant coefficients to preserve the cross-correlations between the standardised monthly P t and T t and their auto-correlations, [ε t ] is a matrix of random skewed variates with a mean = 0 and variance = 1.
To take into account the skewness in a time series, ε t is defined by the Wilson-Hilferty transformation (Wilson and Hilferty, 1931) and replaced in Eq. (1) by where ε t is a random skewed variate (zero mean and unit variance) to account for the skewness in the standardised data defined by the coefficient of skewness γ , ξ is a random normal variate with zero mean and unit variance. The matrices [A] and [B] are determined from (Matalas, 1967) [   (Hipel and McLeod, 1994). The elements b ij of [B] are obtained from the following recursive relationships: where c ij is the element of the matrix [B] in Eq. (4). The remaining element in the first column of [B] is given by The second diagonal element is obtained from Once matrices [A] and [B] are determined, 100 replicates of standardised skewed values are generated using Eqs. (1) and (2), and then rescaled by the standard deviation and the mean of the monthly calendar values and trend added to obtain the stochastic replicates of monthly precipitation and temperature data. In general, the generated monthly data will Hydrol. Earth Syst. Sci., 19, 1615-1639, 2015 www.hydrol-earth-syst-sci.net/19/1615/2015/ not preserve the annual characteristics especially for precipitation as it is highly variable. The generated monthly temperature data were found to preserve the annual characteristics, while the generated precipitation did not. Hence, the generated monthly precipitation data were nested in an annual model (Srikanthan, 2004). The details of the nesting are described later.
For sites with a low-frequency precipitation component, an AR(2) model is used to incorporate the low-frequency component. A general multi-site AR(2) model takes the following form: where , and the elements of [A] 3×3 and [B] 3×3 are 3 × 3 matrices of constant coefficients to preserve the cross-correlation between P H , P L and T at time t and their auto-correlations. Due to problems with inverting matrices that are not positive semi-definite and only the low-frequency precipitation is AR(2), a contemporaneous form of the model in Eqs. (12) and (13) is used (Hipel and McLeod, 1994): where a 11 is the AR(1) parameter of the high-frequency precipitation IMFs, a 22 , b 22 are the AR(2) parameters of the low-frequency precipitation IMFs, and a 33 is the AR(1) parameter of the temperature IMFs. Matrix [C] is determined from the following equation (Salas et al., 1980) where    (14). One hundred replicates of standardised variates are generated from Eq. (11) then the skewness is incorporated using Eq.
(2). The mean and standard deviation are reintroduced and, finally, the trends added to obtain the 100 stochastic replications of monthly precipitation and temperature. As mentioned above, to ensure the generated monthly precipitation data preserved the annual characteristics, the generated monthly precipitation data were nested in an annual AR(1) model.
where Y i is the adjusted annual precipitation for year i, µ the mean of the observed annual data, σ the standard deviation of the observed annual data, ρ the lag-1 auto-correlation coefficient of the observed annual data, Y ι the annual precipitation for year i obtained by aggregating the generated monthly data, µ g the mean of the generated annual data and σ g the standard deviation of the generated annual data. The generated monthly data are then multiplied by the ratio Y i Y ι . The stochastic model was tested by applying the above procedure to monthly precipitation and temperature data for 20CM3 from the MIROCM GCM after the data were subjected to EEMD analysis. Table 1 summarises the performance of the stochastic procedure to replicate annual data for six catchments covering a range of climate types worldwide. Five of the catchments were modelled by an AR(1) process, whereas station 6304 required an AR(2) model because it exhibited a low-frequency precipitation component. Table 2 summarises the performance of the stochastic procedure to replicate monthly data for station 6304. Overall, the stochastic model performed satisfactorily at the monthly and annual timescales. As a general rule one would expect the value of the input parameters (GCM in this study) to be within ±2 × standard deviation of the mean of the generated values. This is achieved for all variables and catchments except in two cases: (1) annual lag-1 auto-correlation in catchment 6304, where the low-frequency precipitation IMFs have high auto-correlation, which we assume bias the standardised variates and, therefore, the generated series; and (2) the standard deviation of January precipitation in catchment 6304. There was some variation between the generated and historical coefficient of skewness (results not shown) but in terms of the level of modelling required for this project, these differences are acceptable. The monthly precipitation and temperature values were satisfactorily generated as represented by station 6304 in Table 2, which was the most difficult catchment to replicate due to the high-and low-frequency precipitation components. We conclude from this assessment that the AR(1) and AR(2) stochastic models are able to preserve the monthly and annual precipitation and temperature characteristics satisfactorily for the purposes of this study.

Quantile-quantile bias correction of P and T
Prior to using GCM, or stochastic replicates of GCM, data in a climate change impact assessment, any bias between GCM and observed conditions needs to be corrected. The extent of bias in the GCM precipitation and temperature data is reported in the companion paper (McMahon et al., 2015). For example, the MAP data for MIUB(3) compared with CRU MAP data at the GCM grid scale exhibit a slope of 0.69 on logarithmic scales, which indicates the GCM overestimates low MAP and underestimates high MAP. Mean annual temperatures are much less biased and require only a small amount of bias correction. Ehret et al. (2012) presented a detailed review of bias correction and discusses the associated assumptions and implications of applying bias correction to GCM or regional climate model data. Many procedures are available for bias correction, with techniques falling into two categories: dy-namical downscaling and statistical downscaling. Dynamical downscaling procedures are sophisticated and resource intensive (Tisseuil et al., 2010) and are impractical for applying to globally distributed catchments and a range of GCMs as proposed in this study. In keeping with the proof-of-concept nature of this paper we adopt a simple empirical-statistical downscaling and error correction approach that is appropriate for bias correcting catchment average monthly (not daily) GCM outputs for input into a lumped (not spatially distributed) hydrologic model. We did not adopt the delta change method, also known as simply daily scaling (Chiew, 2010), where the observed series is scaled by the relative difference between future and baseline conditions, as delta change would not make full use of the re-ordering of precipitation and temperature events provided by the stochastic replicates. Rather, we adopted quantile-quantile or quantile mapping as discussed in Themeßl et al. (2012) and Bárdossy and Pegram (2011). The basis of the quantile-quantile bias correction is a comparison of the empirical cumulative distribution functions (ECDF) of the observed data and the GCM data for a common period. Here the common period is the observed catchment record and the concurrent period of GCM data from the 20C3M scenario. The difference between observed and GCM ECDFs for a given value provides the bias correction. Here we also adopt the frequency adaptation method discussed in Themeßl et al. (2012) for when the GCM series has a higher frequency of zero values than the observed series. The issue of new extremes, values outside the range of the GCM and observed data during the period in which the bias correction is established, was also investigated by Themeßl et al. (2012). We adopt option QMv1a of Themeßl et al. (2012), which takes the bias correction at the highest (lowest) quantile and applies that correction to all new upper (lower) extremes. In our analysis we establish and apply a bias correction for each calendar month (12 corrections in all), rather than a single correction for the whole of record at each catchment.
An assumption of using this bias correction is that the correction applies into the future under different conditions. This assumption is supported by Teutschbein and Seibert (2013) who found the quantile-quantile method performed best out of six alternate bias corrections in differential split sample tests for non-stationary conditions. The quantile-quantile bias correction is applied to the precipitation and temperature stochastic replicates (100 replicates of ∼ 250 years covering the 20CM3 and A1B scenarios) for each GCM run (5) at 17 global catchments. These 8500 (100 × 5 × 17) sets of monthly precipitation and temperature are used as input to our hydrologic model PERM to estimate monthly runoff.

Monthly precipitation-evapotranspiration-runoff model
In order to convert GCM monthly precipitation and temperature into runoff, the PERM model was developed specifically to meet the requirements for hydrologic modelling in this project. PERM is a simple lumped, not spatially distributed, conceptual precipitation-runoff model run on a monthly time step with 5 parameters to be optimised. The time step was dictated by the availability of monthly streamflow data and concurrent precipitation and temperature data. Further details about the precipitation-evapotranspirationrunoff model, source code and example input and output are provided in the Supplement.

Model structure
The structure of PERM is shown in Fig. 2 with the parameters to be calibrated highlighted in bold. As observed in Fig. 2 monthly precipitation is either added to the interception store (if the monthly mean daily temperature is > 0 • C) or accumulated in a snowpack (if the monthly mean daily temperature is ≤ 0 • C). The contents of the interception store are reduced by evaporation. Excess precipitation is subject to an infiltration function in which a surface runoff component (designated as PAreaF) is dependent on the contents of the soil moisture store. When precipitation is accumulated as snow, there is no evaporation for that month from the snowpack or evapotranspiration from the soil moisture store. The snowpack continues to accumulate as long as the monthly mean daily air temperature is ≤ 0 • C. The snowpack begins to melt when the monthly mean daily temperature is > 0 • C. Snowmelt is partitioned into two components, a runoff and soil moisture infiltration component based on the parameter Melt. When the maximum capacity of the soil moisture store is exceeded, saturation excess runoff (SMF) occurs. Evapotranspiration from the soil moisture store is estimated either as a linear function of the available soil moisture or as a linear function of monthly mean daily temperature. The algorithms representing these soil moisture or energy limiting conditions are given in Fig. 2. Baseflow from the soil moisture store is simulated as a linear recession of the water content in the store. Two other hydrologic processes -impervious area runoff and deep recharge -were considered for inclusion in PERM. The inclusion of impervious area was considered unnecessary. With respect to deep seepage, the reviews of Petheram et al. (2002), Scanlon et al. (2006) and Crosbie et al. (2010) suggest the maximum effect could be, on average, equivalent to 5 % of the long-term average annual precipitation. From the results of these reviews and taking into account the model time step, the available data and the fact that the parameters in PERM are calibrated, it was concluded that incorporating deep seepage would yield little benefit to the modelling exercise.

Model calibration
PERM is run on a monthly time step and calibrated against observed annual runoff. Details of the calibration are set out in the Supplement. In summary, an objective function, defined as the sum of squared differences between the estimated and observed annual runoff, was minimised with penalties applied to the objective function to ensure the calibrated model approximately reproduced the mean and coefficient of variation of observed annual runoff. An automatic pattern search optimisation method was used to calibrate the model (Hooke and Jeeves, 1961;Monro, 1971) with 10 different parameter sets used as starting points to increase the likelihood of finding the global optimum of parameter values. A K-fold cross-validation method (where K = 3) described by Efron and Tibshirani (1993) was used to evaluate calibrated model performance. PERM was calibrated for 699 catchments worldwide using the observed monthly precipitation, temperature and runoff data described in Peel et al. (2010).

Model performance and catchment selection
An objective of our study is to examine the within-GCM uncertainty in runoff estimated from GCM projections of precipitation and temperature. To examine this uncertainty we need to minimise any uncertainty in future runoff due to poor hydrologic model calibration. Therefore, a sub-set of the 699 catchments was selected for further analysis that exhibited minimum error as a result of the calibration process. Several criteria were used to assess the adequacy of the PERM calibration for selecting the catchments. These criteria included the following: the annual Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), between observed and modelled runoffs, was > 0.8, NSE values based on 3fold independent testing were > 0.60, monthly NSE values were > 0.6, and the mean and the coefficient of variation of annual runoff were estimated to be within ±5 and ±10 %, respectively, of the observed values. From a practical point of view, catchments less than 1000 km 2 were excluded as were several that were spatially very close. Using these selection criteria, 17 catchments from the initial 699 data set were selected for later analyses. Figure 3 shows the location of all 699 catchments and the sub-set of 17 catchments used in later analyses, while Table 3 provides information about the selected catchments.
Details of the modelling performance of PERM for the 699 catchments are presented in the Supplement. For the 17 selected catchments the difference between the average modelled and observed MAR is −0.2 % and coefficient of variation of annual runoff is −4.4 %. At the annual time step, the average NSE and R 2 between modelled and observed runoff are both 0.88, and the monthly NSE is 0.72. PERM is well calibrated for these 17 catchments and, therefore, uncertainty in runoff due to poor model calibration is minimised using these catchments.
A key assumption of using PERM, or any hydrologic model, to model future runoff is that the calibrated parameters are appropriate for the future climatic conditions. Where future climatic conditions are similar to the observed calibration period, then this assumption is likely to hold. If climatic conditions differ from the calibration period, then there is no evidence to support this assumption. However, in terms of the analysis conducted in the next section this assumption is a pragmatic one that may well affect the bias of future runoffs but should have less impact on the range of uncertainty.

Uncertainty in reservoir yield
The 17 catchments modelled by PERM are unregulated catchments and do not have an existing reservoir on which to base our analysis. Therefore, we need to assume a hypothetical reservoir for each catchment. Many procedures exist to estimate reservoir yield from a hypothetical storage (see McMahon and Adeloye, 2005). For the purposes of where D is the annual yield or draft (mm yr −1 ), µ and σ are the mean and the standard deviation, respectively, of annual runoff (mm yr −1 ) into the reservoir storage, τ is the hypothetical storage capacity specified as a ratio of mean annual runoff, γ is the coefficient of skewness of annual runoff, ρ is the lag-1 serial correlation of annual runoff, and z p is the standardised normal variate (zero mean and unit variance) at p the probability of failure. In our analysis, we adopted 95 % reliability of supply; thus, (1 − p) = 0.05. In our analysis we specify two hypothetical storage sizes, τ = 1 and τ = 3, which are storage capacities equal to 1 and 3 times the mean annual runoff. The G-DG procedure assumes there is no net evaporation loss from the storage (see  for a detailed description of the procedure), which in this analysis is not considered critical as we are mainly considering relative changes in yield. In this analysis the uncertainty in reservoir yield is estimated via the G-DG procedure using annual runoff parameters (µ, σ , γ , and ρ) estimated from the monthly runoff time series generated by PERM (see steps 7 and 8 of Fig. 1).

Testing the stochastic within-GCM uncertainty approximation
Testing our stochastic approximation of within-GCM uncertainty requires multiple runs from a single GCM for a given scenario from which to estimate within-GCM uncertainty and compare against our stochastic results. In the CMIP3 data set the Community Climate System Model (CCSM) GCM has the most runs (seven) for the 20C3M and A1B scenarios. In this section we test the ability of our stochastic methodology to approximate within-GCM uncertainty for the CCSM GCM using the seven available runs for the period 1870-2100 (20C3M and A1B emissions scenarios). A comparison of within-GCM uncertainty based on seven runs from the CCSM GCM and the stochastic approximation of within-GCM uncertainty for (a) annual precipitation and (b) annual temperature for the Herbert River at Gleneagle, Australia, is shown in Fig. 4. The CCSM runs and stochastic replicates presented in Fig. 4 are not bias corrected. In each plot the maximum, median and minimum annual value for a given year are shown for the seven CCSM runs and are compared with the maximum, median and minimum of the 700 (7 × 100) stochastic repli- cates of the CCSM runs for annual precipitation and temperature. For both precipitation and temperature the median of the 700 stochastic replicates overlies the median of the seven CCSM runs, and the difference between the maximum and minimum lines around the median for the two data sets is totally consistent considering there are only seven CCSM runs and 700 stochastic replicates. A comparison of the standard deviation of all annual values calculated for the seven CCSM runs (precipitation = 110 mm, temperature = 1.21 • C) and the 700 stochastic replicates (precipitation = 111 mm, temperature = 1.20 • C) confirms the stochastic replicates are replicating the CCSM GCM runs well in terms of overall trend and variability around the trend. These results confirm the credibility of the stochastic methodology for approximating the within-GCM uncertainty when limited GCM runs are available.

Results and discussion
In this section we present and discuss results from the methodology described in the previous section to approximate within-GCM uncertainty of precipitation and temperature from five GCMs and assess the consequent impact of these uncertainties on estimated runoff and reservoir yield at 17 catchments for two 30-year periods -1965-1994 (20C3M emissions scenario) and 2015-2044 (A1B emissions scenario).
To assist interpretation of within-and between-GCM uncertainty results for 17 catchments and 5 GCMs subsequently presented in tables and figures, we present results for an example catchment, the Herbert River at Gleneagle in Australia, in Fig. 5. The box plots of MAP (Fig. 5a) and MAT (Fig. 5b) are presented for two 30-year periods for each GCM. These box plots represent our approximation of within-GCM uncertainty of MAP and MAT. The box represents the inter-quartile range of MAP (MAT) from the 100 bias-corrected stochastic replicates of GCM precipitation (temperature). The median MAP (MAT) is represented by the bar across the box and the box-plot whiskers represent the maximum and minimum MAP (MAT) from the 100 replicates. The range of within-GCM uncertainty of MAP (Fig. 5a) is similar for all GCMs except MIROCM(1), where the inter-quartile and maximum-minimum range are approximately 50 % larger. The range of within-GCM uncertainty of MAT (Fig. 5b) is similar for all GCMs.
The box plots in Fig. 5 can also be used to assess between-GCM uncertainty through differences between GCMs in the range of within-GCM uncertainty and differences in the direction of change between 30-year-period box plots. All GCMs have an increasing trend in MAP over time for this catchment except HadCM3 (Fig. 5a), whereas all GCMs show a similar increasing trend in MAT over time (Fig. 5b).
Also shown in Fig. 5 is a Raw symbol plotted next to each box plot. These MAP and MAT values are calculated from bias-corrected original CMIP3 GCM runs and are the only values of MAP and MAT available for this combination of catchment, GCM and scenario if stochastic replication is not used. In a traditional climate change impact assessment, without stochastic replication, the Raw values are all that are available for analysis and the magnitude of uncertainty associated with them is unknown. Figure 5 shows that the range of within-GCM uncertainty associated with Raw values of MAT is smaller than for MAP. Figure 5 can also be used to check whether our stochastic methodology is performing well at this catchment. Our stochastic methodology generates statistically similar replicates of each 20C3M and A1B GCM run from which we calculate MAP and MAT over two 30-year periods to obtain our box plots. If our methodology is performing well we would expect the Raw values from the original GCM runs to fall within our box-plot range, which they do in all cases. It should be noted that the true within-GCM uncertainty range for MAP and MAT will be larger than what is shown by our box plots since we have only replicated the uncertainty around the GCM trend and not the uncertainty in the trend itself.
In Table 4 within-GCM uncertainty results are presented for the five GCMs over the period 1965-1994 at the 17 catchments for six variables -MAP, SDP, MAT, MAR, SDR and lag-1. Here the six variables have been calculated for each stochastic replicate and the results are presented as the mean ± the standard deviation of the 100 replicate estimates. The mean values in Table 4 show the range of hydroclimatic conditions represented by the 17 catchments, while the standard deviation around each mean represents our approximation of within-GCM uncertainty of that variable. The mean values differ between the five GCMs for a given catchment for at least two reasons: (1) the stochastic variability in the mean value of a sample of 100 replicates, and (2) each GCM has a different trend for each catchment during this period.
Although Table 4 provides absolute values of within-GCM uncertainty for each combination of catchment, GCM and variable, it is difficult to draw conclusions from this table. Therefore, in Tables 5, 6 and 8 we express within-GCM uncertainty in relative form as the standard deviation of the 100 replicate estimates as a percentage of the mean of the 100 replicate estimates. If the 100 replicate values are normally distributed, then approximately 95 % of the values will be within ±2 standard deviations of the mean. The assumption of normality was tested for six variables -MAP, SDP, MAT, MAR, SDR and reservoir yield for a reservoir equal to 3 × MAR -estimated from 100 replicates of HadCM3 at each of the 17 catchments for the period 1965-1994. Normality was assessed using the Anderson-Darling normality test (Anderson and Darling, 1954). Of the 102 normality tests (17 catchments × 6 variables) 11 (10.8 %) were not normally distributed at the 5 % level of significance (not shown), which is more than expected from random chance. The distribution of non-normal results was MAP(2), SDP(0), MAT(1), MAR(4), SDR(2) and reservoir yield(2). Based on our analysis of HadCM3 replicates, an expectation that roughly 95 % of MAP, SDP and MAT values will be within ±2 standard deviations of the mean appears reasonable. Whereas for MAR, SDR and reservoir yield this expectation is less justified and within-GCM uncertainty is less likely to be symmetrically distributed around the mean.

Annual precipitation and temperature
In this sub-section we present and discuss within-and between-GCM uncertainty results for annual precipitation (MAP and SDP) and temperature (MAT). In Table 5 a summary is presented of the within-GCM uncertainty results shown in Table 4. The uncertainty results in Table 5 are in relative form (standard deviation as a percentage of the mean), except for lag-1 where the standard deviation is used, and are the average uncertainty across the 17 catchments  0.14 ± 0.17 0.29 ± 0.17 0.00 ± 0.18 0.01 ± 0.16 0.07 ± 0.18 * Mean value ± standard deviation based on 100 replicates. Table 5. Relative within-GCM uncertainty of mean and standard deviation of annual precipitation, mean annual temperature and mean, standard deviation and lag-1 serial correlation of annual runoff. Relative uncertainty is the standard deviation of the 100 replicate estimates as a percentage of the mean replicate estimate for each GCM during the period 1965-1994 (20C3M). The average of the 17 catchment relative uncertainty values is presented, except for lag-1 annual runoff which is the average of the 17 standard deviations. for each GCM. Averaging relative uncertainty values across the catchments allows differences in within-GCM uncertainty between GCMs to be examined and the average uncertainty across all GCMs for a given variable of interest to be estimated. For MAP within-GCM uncertainty varies between GCMs from 3.4 to 4.6 % and the average across the five GCMs is 4.1 %. Given a normal distribution of MAP values across the 100 replicates this translates into 95 % of MAP values being within ±7 to ±9 % of the replicate mean. Although the average within-GCM uncertainty for SDP (14.3 %) is 3-4 times higher than for MAP (4.1 %), the difference between GCMs is smaller for SDP (13.9-14.6 %). Within-GCM uncertainty of inter-annual variability of pre-cipitation is high with approximately 95 % of SDP values being within ±28 to ±29 % of the mean SDP. In contrast, within-GCM uncertainty for MAT is very small (1 % with 95 % within ±2 % of mean MAT) and is very consistent between GCMs. Across the five GCMs, MIUB(1) has the least within-GCM uncertainty for MAP and SDP and the highest for MAT, while MIROCM(1) has the highest for MAP and SDP.
A similar set of results to those in Table 5 are presented in Table 6 for the 30-year period 2015-2044 (A1B). On average across the five GCMs, within-GCM uncertainty of MAP, SDP and MAT are similar between the two periods. However, differences in uncertainty between GCMs are ap- Table 6. Relative within-GCM uncertainty of mean and standard deviation of annual precipitation, mean annual temperature and mean, standard deviation and lag-1 serial correlation of annual runoff. Relative uncertainty is the standard deviation of the 100 replicate estimates as a percentage of the mean replicate estimate for each GCM during the period 2015-2044 (A1B). The average of the 17 catchment relative uncertainty values is presented, except for lag-1 annual runoff which is the average of the 17 standard deviations.

HadCM3
MIROCM (1) MIUB (1) MPI (1) MRI(3) Figure 7. Within-GCM uncertainty (mm yr −1 ) in standard deviation of annual precipitation versus the standard deviation of annual precipitation based on 100 replicates of monthly precipitation (1965-1994, 20C3M) for five GCMs. parent across the two periods. For example, for MAP the maximum and minimum values in Tables 5 and 6 are from MIROCM(1) (4.6 and 4.9 %) and MIUB(1) (3.4 and 3.4 %), respectively. It should be noted that the uncertainties in Table 6 are for the projected values of precipitation, temperature and runoff in 2015-2044, and not the uncertainties in their changes between the earlier and the later periods. Often, regional climate change projections present uncertainties in the projected changes in variables. However, here we present the uncertainties in the projected values, as these are important for the projected runoff and reservoir yield. Within-GCM uncertainty results for MAP, SDP and MAT from Table 4 are also summarised in Figs. 6 to 8, where results from the 17 catchments are plotted for each GCM. Figure 6 shows that within-GCM uncertainty in MAP varies from below 20 mm yr −1 to more than 90 mm yr −1 . For each GCM there is a weak positive relationship between uncertainty and MAP. The strongest relationship is for MIUB(1), which also has the lowest uncertainty. In Fig. 7, within-GCM uncertainty results for SDP are in contrast to the MAP results of Fig. 6. Although the range in uncertainty of SDP is only slightly less than for MAP, ranging from approximately 10 to above 70 mm yr −1 , there is a very strong positive relationship between uncertainty in SDP and SDP for each GCM shown. The reason for the difference in relationship strengths in Figs. 6 and 7 is due to the uncertainty in MAP being strongly positively related to SDP (not shown). This is to be expected as higher inter-annual variability, represented here by SDP, increases the uncertainty in a 30-year estimate of MAP. Thus, in Fig. 6 the weak relationship between MAP and the uncertainty in MAP is due to a combination of stochastic variability and the SDP at each catchment. Whereas the strong relationship observed in Fig. 7 between SDP and uncertainty in SDP is solely a function of stochastic variability. Within-GCM uncertainty in mean annual temperature is small, varying from less than 0.05 to 0.18 • C, and decreases with increasing MAT for all GCMs, although the relationships between uncertainty and mean annual temperature are weak (Fig. 8).

Annual runoff
In this sub-section we present and discuss within-and between-GCM uncertainty results for annual runoff (MAR, SDR and lag-1). For each GCM and catchment, 100 biascorrected stochastic replicates of monthly P and T were used as input to PERM and 100 series of monthly runoff generated. In Fig. 9 box plots of MAR (Fig. 9a) and SDR (Fig. 9b) are presented for the Herbert River at Gleneagle in Australia as an example of the runoff results for 17 catchments and 5 GCMs. For each GCM a box plot of the 100 values calculated from PERM is presented for the two 30-year periods. Consistent with results for MAP shown in Fig. 5a the range of within-GCM uncertainty of MAR is similar for all GCMs except MIROCM(1), where the inter-quartile and maximumminimum range are approximately 50 % larger. For MAP all GCMs, except HadCM3, had an increasing trend in MAP over time (Fig. 5a). However, trends in MAP are not consistently transferred to MAR. As expected HadCM3 has a decreasing trend in MAR and MIROCM(1) shows an increasing trend in MAR. However, MIUB(1), MPI(1) and MRI (3) show little change in MAR between periods. In Fig. 9b  range of within-GCM uncertainty of SDR is similar across all the GCMs, except MIROCM(1) which again has the highest inter-quartile and maximum-minimum range. The pattern of trend in SDR between the two periods is complex. MIROCM(1) is the only GCM to have an increase in median SDR and inter-quartile range over time, while HadCM3, MIROCM(1) and MRI(3) all have an increase in maximumminimum range over time. Also shown in Fig. 9 are the Raw values of MAR and SDR calculated from PERM runs of biascorrected original GCM data, which are the only values of MAR and SDR available for this combination of catchment, GCM and scenario if stochastic replication is not used. As was the case for MAP and MAT, the Raw MAR and SDR values for each GCM and 30-year period fall within the box-plot range, indicating that our stochastic replication methodology is performing satisfactorily. Again, the true within-GCM uncertainty range for MAR and SDR is expected to be larger than what is shown by our box plots since we only replicated the noise around the GCM trend and not the GCM trend itself. Within-GCM uncertainty results for MAR and SDR averaged across the 17 catchments for each of the 5 GCMs and expressed in relative form (standard deviation as a percentage of the mean) are shown in Table 5 for the 30-year period 1965-1994 (20C3M). Also shown in Table 5 are the standard deviation and the lag-1 serial correlation of runoff. In Table 5 within-GCM uncertainty of MAR varies between GCMs from 8.1 to 10.9 % and the average across the GCMs is 9.7 %. Although the 100 MAR values may not be normally distributed we would expect roughly 95 % of the MAR values to be within ±16 to ±22 % of the replicate mean MAR. The average within-GCM uncertainty of MAR (9.7 %) is over double that of MAP (4.1 %), which demonstrates how uncertainty in precipitation is magnified in runoff through the precipitation-runoff relationship. In Table 5 the average within-GCM uncertainty for SDR (17.4 %) is approx-  1965-1994 (20C3M) and 2015-2044 (A1B) for five GCMs. Each box plot is based on 100 quantile-quantile bias-corrected stochastic replicates of GCM data that have been input to the PERM model of catchment 6058 -Herbert River at Gleneagle (Australia). The box represents the inter-quartile range and the whiskers extend to the maximum and minimum values. The Raw value next to each box plot represents the PERM output when using the bias-corrected GCM runs that the stochastic replicates are based on.
imately 80 % higher than for MAR (9.7 %) and the difference between GCMs is smaller for SDR (16.4-18.4 %) than for MAR. Within-GCM uncertainty of inter-annual variability of runoff is high with approximately 95 % of SDR values being within ±33 to ±37 % of the mean SDR. Although the within-GCM uncertainty of SDR is high (17.4 %), it is only ∼ 20 % higher than the uncertainty for SDP (14.3 %). In Table 5 there is little difference between GCMs in within-GCM uncertainty of lag-1 serial correlation with standard deviation values varying from 0.17 to 0.18 and the average across the GCMs is 0.18.
Within-GCM uncertainty of MAR, SDR and lag-1 serial correlation for the 30 year period 2015-2044 (A1B) in Table 6 is similar to the results shown in Table 5. However, differences in uncertainty between GCMs are apparent across the two periods. For example, for MAR the minimum values in Tables 5 and 6
the second highest MAR in Table 5 and the maximum value in Table 6 are from HadCM3 (10.8 and 13.0 %). Within-GCM uncertainty results for MAR and SDR from the 17 catchments and 5 GCMs in Table 4 are summarised in Figs. 10 and 11. Figure 10 shows the within-GCM uncertainty in MAR varies from below 10 mm yr −1 to more than 80 mm yr −1 . For each GCM there is a positive relationship between uncertainty and MAR. The strongest relationship is for MIUB(1), which also has the lowest uncertainty. In Fig. 11, the within-GCM uncertainty results for SDR are in contrast to the MAR results of Fig. 10. Although the range in uncertainty of SDR is slightly less than for MAR, ranging from under 10 to above 60 mm yr −1 , there is a much stronger positive relationship between the uncertainty in SDR and SDR for each GCM shown. Although the uncertainty relationships for precipitation (Figs. 6 and 7) and runoff (Figs. 10 and 11) are broadly similar, modelling runoff through PERM has modified the uncertainty relationships of precipitation relative to runoff. The relationships in Fig. 10 for MAR are stronger than those for MAP in Fig. 6, while the relationships for SDR in Fig. 11 are weaker than those for SDP in Fig. 7. The within-GCM uncertainty of lag-1 serial correlation of annual runoff is approximately constant at 0.18 and shows little difference between GCMs (not shown).

Reservoir yield
In this sub-section we present and discuss within-and between-GCM uncertainty results for reservoir yield. The impact of within-GCM uncertainty on reservoir yield is shown in Fig. 12 for the Herbert River at Gleneagle. The Gould-Dincer Gamma method was used to estimate average annual yield from a hypothetical reservoir of capacity equal to 3 × MAR with 95 % reliability of draft using annual runoff statistics from PERM modelled runoff. Each box plot is based on 100 values of average annual reser- Uncertainty in SDR (mm)
voir yield for the two 30-year periods estimated from PERM runs using stochastic replicates of precipitation and temperature for each GCM. In Fig. 12 the minimum average annual yield is zero in two cases -HadCM3 (2015-2044) and MIROCM(1) . In these two cases the Gould-Dincer Gamma method returned a physically impossible negative draft, which indicates that a positive draft cannot be supplied with 95 % reliability from the hypothetical reservoir for at least one replicate. For a given GCM, an increasing trend in average reservoir yield between periods is due to either an increasing trend in MAR (Fig. 9a) and or a decreasing trend in SDR (Fig. 9b). The within-GCM uncertainty of average annual yield for MIROCM(1) is 70 % larger than for the other GCMs for this catchment, which is consistent with the MAP, MAR and SDR results. The Raw values of average reservoir yield again fall within the box-plot range, indicating that our stochastic replication methodology is performing satisfactorily. Table 7 lists reservoir yield results for the 17 catchments, 5 GCMs and two hypothetical reservoir capacities (1 × MAR and 3 × MAR) for the 30-year period 2015-2044 (A1B). The average reservoir yield is the average of the 100 PERM runs and the uncertainty is the standard deviation of the 100 PERM runs. Differences between GCMs in average reservoir yield at a given catchment largely reflect differences in MAR and SDR trends during this period. If the Gould-Dincer Gamma method returned a physically impossible negative draft, the yield for that run was set to zero. In Table 7 if more than half the yields (> 50) were set to zero, results for that GCM and catchment combination were not reported (N/R).
The yield uncertainties in Table 7 due to within-GCM uncertainty are expressed as a percentage of the mean yield and averaged across the 17 catchments in Table 8. For the larger storage, τ = 3, the average uncertainty across the 5 GCMs is Table 7. Average and within-GCM uncertainty of reservoir yield (mm yr −1 ) for 17 catchments using the Gould-Dincer Gamma reservoir storage model for two reservoir sizes (τ = 1 × MAR and τ = 3 × MAR) and 95 % reliability of draft. Average and uncertainty (standard deviation) are calculated from 100 bias-corrected replicates of precipitation and temperature passed through the PERM model for each GCM over the period 2015-2044 (A1B). GCM Raw 1965Raw -1994Raw 1965Raw -1994 Raw 2015  Reservoir yield is estimated using the Gould-Dincer Gamma reservoir storage model for reservoir size τ = 3 × MAR and 95 % reliability of draft. Runoff metrics for the Gould-Dincer Gamma method are estimated from 100 PERM runs of quantile-quantile biascorrected stochastic replicates of GCM data for catchment 6058 -Herbert River at Gleneagle (Australia). The box represents the interquartile range and the whiskers extend to the maximum and minimum values. The Raw value next to each box plot represents the Gould-Dincer Gamma output when using the bias-corrected GCM runs that the stochastic replicates are based on.
11.9 %, which is approximately half the average uncertainty (25.1 %) for the smaller storage (τ = 1). The GCM with the highest uncertainty in reservoir yield is MIROCM(1), which is consistent with the high uncertainty in MAP, MAR and SDR for this GCM. MIUB(1) is the only GCM to have below average uncertainty for both reservoir sizes. The uncertainty data in Table 7 for τ = 3 are plotted in Fig. 13 against mean reservoir yield. Uncertainty of reservoir yield is only weakly positively related to reservoir yield. For a given reservoir yield MIROCM(1) and HadCM3 generally have higher uncertainty than the other GCMs. The range of uncertainty is approximately 15 mm yr −1 for low-yield reservoirs (100 mm yr −1 ) through 40 mm yr −1 for reservoirs that yield 1000 mm yr −1 .
The main driver of uncertainty in reservoir yield is the variability of annual reservoir inflows. Figure 14 shows the relationship between uncertainty in reservoir yield, for τ = 3, against the variability of reservoir inflows expressed as the standard deviation of annual runoff for the five GCMs over the period 2015-2044 (A1B). The relationship between uncertainty in reservoir yield and annual runoff variability is strongly positive with all GCMs having an R 2 ≥ 0.78, except for MIROCM(1). Uncertainty in yield is driven more strongly by inflow variability than by inflow mean (not shown, but very similar to Fig. 13 as reservoir yield and mean annual runoff are highly correlated).

Conclusions and implications
Climate change impact assessments for future hydrology are subject to significant uncertainties. The contribution of within-GCM uncertainty to total uncertainty has not been well quantified due to the limited number of GCM runs available for each GCM and scenario combination. In this paper we developed a methodology to approximate within-GCM uncertainty of precipitation and temperature projections using non-stationary stochastic data generation. Our methodology is a contribution toward quantifying within-GCM uncertainty and provides an objective approach for communicating the uncertainty in climate change impact assessments in a quantitative manner. In a proof-of-concept application of our procedure we estimated the impact of within-GCM uncertainty on annual runoff and reservoir yield, which can inform water resources engineers and management decision makers about the uncertainty in climate change impacts in the shortto medium-term planning horizon. For the research community our stochastic data generation methodology provides a way to assess within-GCM uncertainty on a temporary basis until the number of GCM runs for a given GCM and scenario combination becomes adequate to estimate within-GCM uncertainty directly from GCM runs. In our proof-of-concept application, we de-trended GCM projections of monthly precipitation and temperature from five better performing CMIP3 GCMs (HadCM3, MIROCM, MIUB MPI and MRI) identified in the companion paper (McMahon et al., 2015). We stochastically replicated the detrended series 100 times, combined the replicates with their respective trends and applied a bias-correction to the replicates. Within-GCM uncertainty of precipitation and temperature were assessed using the stochastic replicates from each GCM for two periods: (1)  (2) 2015-2044 (A1B scenario) at 17 catchments distributed around the world. At each catchment within-GCM uncertainty was estimated as the standard deviation of the replicate values divided by the mean replicate value. The uncertainty value for a given GCM was taken as the average of the 17 catchment values for that GCM. Within-GCM uncertainty of mean annual precipitation varied from 3.4 to 4.9 % between GCMs over the two periods and averaged approximately 4.1 % across the five GCMs. For the standard deviation of annual precipitation the average within-GCM uncertainty (14.3 %) was 3-4 times larger than for mean annual precipitation, while within-GCM uncertainty of mean annual temperature was smaller (1 %). The stochastic replicates were input to a calibrated hydrologic model (PERM) to estimate future projections of annual runoff. The impact of within-GCM uncertainty on mean annual runoff varied from 8.0 to 13.0 % between GCMs over the two periods and averaged approximately 9.9 % across the five GCMs. The uncertainty in the standard deviation of annual runoff varied from 16.0 to 20.1 % between GCMs and averaged approximately 17.5 % across the five GCMs. The within-GCM uncertainty in precipitation and temperature is amplified in the runoff through the precipitation-runoff relationship. Summary statistics for the two periods were estimated from each annual runoff series (100 per catchment) and used in the Gould-Dincer Gamma method to estimate reservoir yield from two hypothetical reservoir capacities (1 × and 3 × mean annual runoff) for 95 % reliability of supply. For the period 2015-2044, the uncertainty in reservoir yield due to within-GCM uncertainty varied from 18.6 % (9.3 %) to 33.2 % (14.3 %) for the 1 × (3 ×) mean annual runoff capacity reservoir and averaged approximately 25 % (12 %) across the five GCMs. The main driver of uncertainty in reservoir yield was the variability of annual runoff inflows.
In this analysis between-GCM uncertainty was limited to small differences in within-GCM uncertainty for a given variable and differences in trend between the two 30-year periods analysed. The reason why differences between GCMs are not larger here is due to the application of bias correction. The quantile-quantile bias correction forces the mean and variance of the GCM precipitation and temperature data over the observed period to match the observed mean and variance. Thus, a significant source of between-GCM uncertainty, their bias in mean and variance, has been removed.
A significant implication of our results is that within-GCM uncertainty is important when interpreting climate change impact assessments. Although the variables calculated from the stochastic replicates and hydrologic modelling of the replicates are not strictly normally distributed, a rough guide to the magnitude of within-GCM uncertainty is to double the values reported above (±2 × standard deviation/mean). Thus, for the five GCMs analysed here during the period 2015-2044 the within-GCM uncertainty around a value of mean annual precipitation is approximately ±7 to ±10 %. For the standard deviation of annual precipitation the uncertainty is approximately ±27 to ±29 %, while for mean annual temperature the uncertainty is approximately ±1.2 to ±1.6 %. Compared to precipitation, runoff uncertainties are larger with approximate uncertainties of mean and standard deviation of annual runoff being ±16 to ±26 % and ±32 to ±40 %, respectively. The uncertainty around reservoir yield for a 1 × mean annual runoff reservoir is approximately ±37 to ±66 % and ±18 to ±28 % for a 3 × mean annual runoff reservoir.
The amplification of precipitation and temperature within-GCM uncertainty in runoff and reservoir yield has significant implications for interpreting climate change impact assessments of these variables. For example, in Fig. 5 (MAP, MAT), Fig. 9 (MAR, SDR) and Fig. 12 (3 × MAR yield) we presented within-GCM uncertainty box plots for two periods for each GCM and the Raw value associated with each box plot that would be the only value available in a traditional climate change impact assessment for the Herbert River at Gleneagle. For MAT (Fig. 5b), low within-GCM uncertainty allows conclusions based on analysis of the Raw values to be consistent with conclusions drawn from the box plots -the Raw values indicate MAT is projected to increase by ∼ 5 % between the two periods and the box plots do not overlap between the two periods for any of the GCMs. However, for variables where within-GCM uncertainty is higher, conclusions drawn from a traditional climate change impact assessment would be misleading. For example, in Fig. 5a MIUB(1) shows the largest increase in Raw MAP between the two periods (14.7 %), yet the box plots for this GCM clearly overlap and the increase in median MAP is only 4.3 %. Similarly in Fig. 9a MIUB(1) shows the largest increase in Raw MAR between the two periods (34.1 %), yet the box plots for this GCM overlap and the increase in median MAR is only 0.3 %. Finally, in Fig. 12 MIUB(1) shows the largest increase in Raw reservoir yield between the two periods (41.3 %), yet the box plots for this GCM overlap and the increase in median yield is only 0.6 %. A traditional, without stochastic replication, climate change impact assessment reporting future increases in MAP, MAR and reservoir yield of 14.7, 34.1 and 41.3 %, respectively, would initially seem significant, yet our approximation of within-GCM uncertainty suggests these increases could be solely due to within-GCM uncertainty.
Finally, we expect our results are an underestimate of the true within-GCM uncertainty due to our stochastic method only approximating the uncertainty around the overall GCM trend and not the uncertainty in the GCM trend itself. To obtain a true estimate of within-GCM uncertainty requires analysis of many (≥ 100) GCM runs of a given scenario. Until considerably more GCM runs of a scenario become available the methodology presented here provides an interim objective technique for estimating the influence of within-GCM uncertainty on climate change impact assessments that is suitable for existing, or future, GCM scenario runs. Climate change impact assessments based on projections that do not take into account within-GCM uncertainty risk providing water resources management decision makers with a sense of certainty that is unjustified.
The Supplement related to this article is available online at doi:10.5194/hess-19-1615-2015-supplement.