Sensitivity analysis of runoff modeling to statistical downscaling models in the western Mediterranean

Abstract. This paper analyzes the sensitivity of a hydrological model to different methods to statistically downscale climate precipitation and temperature over four western Mediterranean basins illustrative of different hydro-meteorological situations. The comparison was conducted over a common 20-year period (1986n2005) to capture different climatic conditions in the basins. The daily GR4j conceptual model was used to simulate streamflow that was eventually evaluated at a 10-day time step. Cross-validation showed that this model is able to correctly reproduce runoff in both dry and wet years when high-resolution observed climate forcings are used as inputs. These simulations can thus be used as a benchmark to test the ability of different statistically downscaled data sets to reproduce various aspects of the hydrograph. Three different statistical downscaling models were tested: an analog method (ANALOG), a stochastic weather generator (SWG) and the cumulative distribution function–transform approach (CDFt). We used the models to downscale precipitation and temperature data from NCEP/NCAR reanalyses as well as outputs from two general circulation models (GCMs) (CNRM-CM5 and IPSL-CM5A-MR) over the reference period. We then analyzed the sensitivity of the hydrological model to the various downscaled data via five hydrological indicators representing the main features of the hydrograph. Our results confirm that using high-resolution downscaled climate values leads to a major improvement in runoff simulations in comparison to the use of low-resolution raw inputs from reanalyses or climate models. The results also demonstrate that the ANALOG and CDFt methods generally perform much better than SWG in reproducing mean seasonal streamflow, interannual runoff volumes as well as low/high flow distribution. More generally, our approach provides a guideline to help choose the appropriate statistical downscaling models to be used in climate change impact studies to minimize the range of uncertainty associated with such downscaling methods.


Introduction
Climate change impact studies (CCIS) focusing on water resources have become a hot topic in the last decade.However, such studies need reliable climate simulations to drive hydrological models efficiently.General circulation models (GCMs) have demonstrated significant skills in simulating climate variables at continental and hemispherical scales but are inherently incapable of representing the local sub-gridscale features and dynamics required for regional impact analyses.For most hydrologically relevant variables (precipitation, temperature, wind speed, humidity, etc.), GCMs currently do not provide reliable information at scales that are appropriate for impact studies (e.g., Maraun et al., 2010).The mismatch between the spatial resolution of the GCM outputs and that of the data required for hydrological models is a major obstacle (e.g., Fowler et al., 2007).Some post-processing is thus required to improve these large-scale models for impact studies and downscaling methods have been developed to meet this requirement.
Downscaling methods can be dynamical or statistical, both approaches being driven by GCMs or reanalysis data.Dynamical downscaling methods correspond to the so-called regional climate models (RCMs), aiming at generating detailed B. Grouillet et al.: Sensitivity analysis of runoff modeling to statistical downscaling models regional and local information (from a few dozen km down to a few km) from low-resolution simulations (generally with a horizontal resolution ranging from 100 to 300 km) by simulating high-resolution physical processes consistent with the required large-scale dynamics.Easier and less costly to implement as compared to dynamical downscaling techniques, statistical downscaling models (SDMs) are also used in anticipated hydrologic impact studies under climate change scenarios (for a review, see, e.g., Fowler et al., 2007).SDMs rely on determining statistical relationships between largeand local-scale variables and do not try to solve the physical equations that model atmospheric dynamics.Due to their statistical formulation, they generally have a low computational cost and provide simulations relatively rapidly.SDMs are based on a static relationship; i.e., the mathematical formulation of the relation between predictands (i.e., the localscale variable to be simulated) and predictors (i.e., the largescale information or data used as inputs in the SDMs) has to be valid not only for the current climate on which the relationship is calibrated, but also for future climates, for example.Most state-of-the-art SDMs belong to one of the four following families (Vaittinada Ayar et al., 2015): transfer functions, weather typing, methods based on stochastic weather generators and model output statistics (MOS) models, which generally work on cumulative distribution functions (CDFs).Many studies demonstrated that caution is required when interpreting the results of climate change impact studies based on only one downscaling model (e.g., Chen et al., 2011).It is thus recommended to use more than one SDM to account for the uncertainty of the downscaling (e.g., Chen et al., 2012).However, uncertainty can be very high due to the inability of some SDMs to realistically reproduce the local climate, and this can be critical when the aim is to produce accurate inputs for hydrological models at the basin scale in the context of CCIS.On the other hand, a sensitivity analysis of hydrological modeling to different downscaling methods can produce an indicator to assess the quality of downscaled climate forcings via their ability to generate reasonable simulations of discharge from hydrological modeling.This analysis can also help to quantify the impact of the error in a runoff simulation that stems from SDMs.
Several works have already attempted to compare climate simulations, downscaled or not, from a hydrological point of view.Although these studies revealed significant differences between SDMs in hydrological responses including seasonal variability of runoff (e.g., Dibike and Coulibaly, 2005;Prudhomme and Davies, 2009;Chen et al., 2012;Teng et al., 2012), interannual discharge dynamics (e.g., Wood et al., 2004;Salathé, 2005), or the distribution of extreme events (e.g., Diaz-Nieto and Wilby, 2005), they were not able to clearly conclude on how to choose one method over another.Difficulties in selecting among different SDMs may arise from the choice of relevant criteria.While some may be appropriate from the statistical or climatological point of view, these criteria may not adequately highlight the differ-ences between the methods with respect to the hydrological responses.As a result, the aforementioned studies generally suggest an ensemble approach including several methods to offer a range of downscaling uncertainty when studying climate change impact on runoff.However, this uncertainty range can be reduced to a minimum if inappropriate statistical downscaling methods are excluded from the ensemble approach.
Our analysis of the literature revealed that no consensus has emerged on the best downscaling techniques among the state-of-the-art SDMs in the context of CCIS on runoff.This calls for an original protocol to assess the strengths and weaknesses of the different SDMs in providing accurate hydrological simulations according to different insights.Indeed, assessing water resource availability for different uses requires accounting for different aspects of the hydrograph including interannual runoff volumes, mean seasonal streamflow, and low/high flow distribution.First, hydrologists need to correctly reproduce the interannual water balance in order to evaluate changes in the storage capacity of the hydrosystems, for instance.Second, analysis of the interannual variability of flows makes it possible to test the ability of the climate simulations to reproduce the occurrence of dry and wet years, as well as the frequency and intensity of change.Third, surface water resources can be evaluated through a seasonal analysis so as to focus on intra-annual high and low flow events.While high flows are particularly important, e.g., when the focus is on flood risk, low flows are generally studied in connection with the water needed for agriculture and tourism, as in these cases, there is generally an increase in water demand when flows are low (see, e.g., Fabre et al., 2015;Grouillet et al., 2015).Consequently, assessing water availability means focusing on low flows, which generally occur during peak water demand.
Water resource issues are particularly important in the Mediterranean region, which has been identified as a hotspot of climate change (Giorgi, 2006).The western Mediterranean basins are of particular interest since they are characterized by complex and varying hydro-climatic conditions due to the contrasted influences of the Atlantic Ocean and the Mediterranean Sea, and of mountain ranges.These contrasted conditions offer an opportunity to account for the uncertainty linked to the differences in spatial and temporal patterns that may arise from one downscaling technique to another.
The aim of this study is to propose a method to analyze the sensitivity of hydrological responses to different methods used to statistically downscale climate values by means of criteria that are commonly used in CCIS to assess the impact on water resources: volume of water flow, interannual and seasonal variability of runoff, and distribution of extreme events, including high and low flows.We compare statistical downscaling methods via a guideline aimed at providing an overview of their capabilities to reproduce the main features of the hydrograph in view of their use in CCIS.The rest of this article is organized as follows.In Sect. 2 we describe the basins in the western Mediterranean and a hydro-climatic analysis based on the available data.In Sect.3, we provide an overview of downscaling models and of the steps involved in hydrological modeling.In Sect.4, we summarize the results for each hydrological indicator, and in Sect. 5 we discuss these results and provide a short conclusion.

Four catchments in the western Mediterranean
Four catchments were chosen to account for the variety of hydro-climatic conditions in the western Mediterranean region (Fig. 1): the Herault basin at Laroque (910 km 2 , France), the Segre basin at Seo de Urgel (1265 km 2 , Spain), the Irati basin at Liedena (1588 km 2 , Spain) and the Loukkos basin at Makhazine (1808 km 2 , Morocco).These basins were also chosen because they are located upstream from storage dams and in areas in which withdrawals are negligible (Ruelland et al., 2015), so their streamflow regime can be considered to be natural.For brevity's sake, the basins are referred to as Herault, Segre, Irati and Loukkos.
The Herault basin, from 165 to 1565 m a.s.l., comprises two-thirds karstified limestone favoring delayed and sometimes sudden restitution and one-third of basement rocks with low groundwater reserves favoring surface runoff.The mountainous basin of Segre, located upstream from the Ebro basin in northern Spain from 670 to 2830 m a.s.l., is characterized by basement rocks (granite and quartzite) and a rugged topography that favors runoff.The Irati basin, from 407 to 2017 m a.s.l., is located upstream from the Ebro basin.This mountainous catchment, composed mainly of limestone and conglomerate, is characterized by a high upstreamdownstream topographic gradient favoring a rapid hydrological response.The Loukkos basin, from 55 to 1668 m a.s.l., is characterized by sandstone and marl successions favoring surface runoff.

Hydro-climatic data
Preliminary studies (Tramblay et al., 2013;Fabre et al., 2015;Ruelland et al., 2015) provided daily hydro-climatic data (precipitation, temperature and streamflow) over a common 20-year period (1986-2005), thus making it possible to compare the basins.Climate data for the Herault basin were extracted from the SAFRAN 8 × 8 km meteorological analysis system (Vidal et al., 2010) and observed runoff was provided by the French ministry of ecology and sustainable development from their Banque Hydro database (MEDDE, 2010).As mentioned by Vidal et al. (2010), SAFRAN is a gauge-based analysis system using the optimal interpolation (OI) method described by Grandin (1965).This method has been found to outperform other objective techniques for precipitation notably studied in France over the Cévennes area, a region with very high spatial and temporal variability (Creutin and Obled, 1982).Climate data for the Segre and Irati basins were obtained by interpolating daily precipitation and temperature measurements on an 8 × 8 km grid with the inverse distance weighted (IDW) method (Shepard, 1968).This method is particularly efficient for gauge-based analyses of global daily precipitation (Chen et al., 2008).The precipitation and temperature data were extracted based on numerous stations available at the Ebro basin scale (Dezetter et al., 2014), of which around 19 and 6 precipitation stations, and 10 and 3 temperature stations, concern the Irati and Segre catchments, respectively.Elevation effects on temperature distribution were taken into account using a digital elevation model and a lapse rate of −6.65 • C / 1000 m estimated from the data.Daily streamflow data were provided by the center of studies and experiments on hydraulic systems (CEDEX, 2012).In the Loukkos basin, precipitation data were interpolated on a 8 × 8 km grid based on 11 stations using the IDW method.Since daily temperature data were only available from a station located at the basin outlet, a universal lapse rate of −6.5 • C / 1000 m was used for temperature interpolation.Hydro-climatic data including daily streamflow were provided by the Moroccan Département de Planification des Ressources en Eau (DPRE).Due to the lack of additional data such as wind and humidity in the Moroccan basin, a simple formula relying on solar radiation and temperature was chosen (Oudin et al., 2005) to assess daily potential evapotranspiration (PE) in each basin.
The atmospheric variables used for the calibration of the SDMs as predictors were selected from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) daily reanalysis data (Kalnay et al., 1996) with a 2.5 • spatial resolution, from 1 January 1976 to 31 December 2005.The variables covered the region [−15 • E, 42.5 • E] × [27.5 • N, 50 • N] encircling the Mediterranean Sea as defined in Vrac and Yiou (2010) and corresponding to 240 grid cells.For the temperature models, five predictors were used: the temperature at 2 m (T2), the sea level pressure (SLP), as well as the geopotential height and the zonal and meridional wind components at 850 hPa (respectively, Z850, U850 and V850).For precipitation models, the same five predictors were used, and the dew point temperature at 2 m (D2) was added.The predictors and the pre-processing of those predictors according to the SDM and the predictands are summarized in Table 1.Calibration was performed over the usual four seasons in the Northern Hemisphere.The calibrated SDMs were forced with three different data sets: NCEP reanalysis data over the 1976-2005 calibration period and with the IPSL-CM5A-MR (from the French Institut Pierre Simon Laplace, IPSL Climate Modelling Centre, Dufresne et al., 2013) and CNRM-CM5 (from the French National Centre for Meteorological Research, CNRM, Voldoire et al., 2013) GCMs, regridded at a 2.5 • spatial resolution, over the GCM historical (or CTRL) period (i.e., 1986-2005).The regridding was done through a bilinear interpolation in order to have the GCMs and NCEP data at the same resolution.This is a requirement in order to use GCMs as predictors in the different SDMs calibrated from NCEP at a 2.5 • resolution.Over the mid-latitudes, 2.5 • correspond approximately to 250 km.The Herault, Segre and Loukkos basins are included in a single GCM grid cell.The Irati basin straddles two grid cells, split equally.Also, the basins are not on the edge of the GCM grid and therefore are not subject to border effects in interpolation.

Hydro-climatic analysis
The four basins are characterized by a more or less pronounced Mediterranean climate with low precipitation in summer and more abundant precipitation in winter (see Fig. 1).Mean annual precipitation decreases from north to south, from 1397 mm in the Herault basin to 935 mm in the Loukkos basin.Mean annual precipitation in the Segre basin (813 mm) is low compared to neighboring basins because of the rain shadow effect of the mountains surrounding the basin, which often stops precipitation from the Atlantic (west) as well as from the Mediterranean Sea (east).Summer is hot and dry, especially in the Loukkos basin, which causes severe low flows during this season.In contrast, winter is milder and wetter.In the Herault and Irati basins, the precipitation peaks in spring and fall are produced by precipitation events whose intensity can vary greatly over short periods.The spring and fall streamflows are strongly influenced by these precipitation events as well as by snowmelt in spring in the mountainous basins (mostly in the Segre and Irati basins).
No significant trends in interannual variations in precipitation and streamflow were observed in the four basins over the period 1986-2005.Nevertheless, mean precipitation during the first 10 years of the study period was 4 to 19 % higher than during the last 10 years, except in the Segre basin (−3 %).Furthermore, the analysis of the precipitation indices (Eq. 1) showed that the wet and dry years observed in the four basins occurred at the same time in nearly half the years (grey lines in Fig. 2).Mean annual temperature remained almost constant during the 1986-2005 period and the temperature indices (Eq.2) were the same in the four basins in two-thirds of the years (Fig. 2).
I P = (P y − P y ) / σ P (1) where P y is the annual precipitation for the year y, P y is the mean of the annual precipitation, and σ P is the standard deviation of the annual precipitation.T y is the annual temperature for the year y, T y is the mean of the annual temperature, and σ T is the standard deviation of the annual temperature.
Table 1.Selected predictors according to the SDM and the predictand.These variables are the dew point at 2 m (D2), the temperature at 2 m (T2), the sea level pressure (SLP), the relative humidity, the zonal and meridional wind components, the geopotential height at the 850 hPa pressure level (R850, U850, V850 and Z850) and the large-scale precipitation (PR).The pre-processing (PC) of the predictors depends on the SDM. 3 Models and evaluation procedures

Statistical downscaling models
Based on the preliminary climatological study of Vaittinada Ayar et al. (2015), three downscaling methods were retained according to their ability to reproduce commonly used climatic patterns on the E-OBS (Haylock et al., 2008) grid scale (0.44 • or approximatively 50 km spatial resolution).These SDMs were thus used to provide the climate data, i.e., precipitation and temperature, used as inputs for the hydrological model at the basin scale.For each variable, three models were calibrated and applied: analogs of atmospheric circulation patterns (ANA), the cumulative distribution function-transform approach (CDFt) and a stochastic weather generator (SWG).The analog method and the stochastic weather generator are both calibrated and run on a seasonal basis, using the usual four seasons of the Northern Hemisphere, whereas the CDFt approach is run on a monthly basis.For ANALOG and SWG, the calibration was performed on NCEP reanalysis.Conversely, for CDFt, coming from the family of the bias correction (BC) techniques, the calibration was performed directly on the GCM to downscale.Although CDFt is derived from the quantile-mapping technique, none of the three SDMs is bias corrected.Those three models (i.e., CDFt included) all have the particularity of providing high-resolution precipitation and temperature simulations (constrained by large-scale reanalysis or GCM data) and therefore all belong to the family of the statistical downscaling methods.For all three models, calibration was done over 1976-2005 (except for Loukkos, on which data availability limited the calibration to 1986-2005).Their assessment when applied to NCEP reanalysis and GCM data was performed according to a common 20-year 1986-2005 evaluation period.Sections 3.1.1 to 3.1.3describe the different models.

The analog model
The analog model used here is based on the approach of Yiou et al. (2013) and applied to the fields of anomalies over the Mediterranean region [−15 • E, 42.5 as defined in Sect.2.2.For any given day to be downscaled in the validation period, it consists in determining the day in the calibration period with the closest large-scale atmospheric situation X ANA .More precisely, for a given day, the analog is taken from the 15 days before and after this date in the calibration data set.Note that the days in the same year are excluded.Therefore, this prevents the analog day from being too close (in time) to the day to be downscaled.The closest large-scale atmospheric situation X ANA is determined by minimizing a distance metric (here the Euclidian distance) between the large-scale situation (X d ) of the day to be downscaled and the large-scale situation (X c ) of all the days in the calibration period.More technically, this can be written as where argmin(f ) is the function returning the minimum value of a function f, here computed over all the X c situations of the calibration period.The daily large-scale atmospheric situations correspond to the daily fields of anomalies of the predictors.Those anomalies were calculated with respect to the seasonal cycle, as is classically done in analog techniques; see, e.g., Yiou et al. (2013), and references therein.X d , the large-scale situation of the day to be downscaled, corresponds to the fields of anomalies of all the pre-dictors of that day.X c corresponds to any large-scale situation (defined in the same way) in the calibration period.Hereafter this model is referred to as ANA.

The CDFt model
The cumulative distribution function-transform (CDFt) method was originally developed by Michelangeli et al. (2009) to downscale wind velocity and was later applied to temperature and precipitation in, for example, Vrac et al. (2012) and Vigaud et al. (2013).The CDFt model is a quantile-mapping-based approach, which consists in relating the local-scale cumulative distribution function (CDF) of the variable of interest to the large-scale CDF (here from NCEP or GCMs) of the same variable.Let F Gc (x) and F Oc (x) define the CDFs of the variable of interest, respectively, from a GCM (subscript G) and from a local-scale observation-based data set (subscript O) over the calibration period (subscript c), and F Gv (x) and F Ov (x) the CDFs over the validation period (subscript v).First, CDFt estimates F Ov (x) as with × in the range of the physical variable of interest.Then, a quantile mapping between F Gv and F Ov is performed to retrieve the physical variable of interest at the local scale.
All the technical details on Eq. ( 4) and subsequent quantile mapping can be found in Vrac et al. (2012).Note that, for this method, only the variable of interest (i.e., precipitation or temperature) at a large scale is used as a predictor.In contrast to ANALOG and SWG, the CDFt approach comes from the family of the bias correction (BC) techniques.In that sense, CDFt does not need NCEP reanalyses for its calibration, but is directly calibrated to link GCM simulations and high-resolution data (through their CDF).Note that CDFt is used here as a downscaling technique and not a BC, since it is applied here to downscale (i.e., to go from large scale to high resolution) temperature and precipitation time series.

The stochastic weather generator model
The stochastic weather generator (SWG) model used in this study is based on conditional probability distribution functions in a vector generalized linear model (VGLM) framework, as in Chandler and Wheater (2002).This means that the distribution family is fixed and the distribution parameters are estimated as functions of the selected predictors.
Modeling precipitation is usually divided into two steps: first the occurrence and second the intensity.The modeling of intensity has been introduced in previous sections.The rain occurrence at a given location is modeled as a binomial distribution B(1,p) using a logistic regression (LR, e.g., Buishand et al., 2004;Fealy and Sweeney, 2007).Let p i be the probability of rainfall on day i conditional on an Nlength predictor (or covariate) vector X i = (X i1 , . .., X iN ) as defined in the previous section. of occurrence p i is formulated through a LR as where (p 0 , . .., p N ) is the vector of coefficients to be estimated.The LR is only used for SWG.The analog and CDFt models directly provide zeros or positive precipitation values.
Temperature is expected to follow a Gaussian distribution and rain intensity a Gamma distribution.The mean µ and the standard deviation σ of the Gaussian distributions and the shape α and the rate β of the Gamma distributions are estimated as functions of the large-scale predictors.The parameters σ , α and β at day i are computed with a common formulation, illustrated here for the α parameter: with (α j ) j =0,...,N the regression coefficients to be estimated, N the number of predictors, and X i,j the j th daily large-scale predictor for day i.Note that Eq. ( 7) models the logarithm of the parameter of interest to ensure that the parameter obtained (σ , α or α or β) is positive.The parameter µ is formulated in the same way but without the positivity (i.e., log) constraint: As in Vaittinada Ayar et al. ( 2015), the predictors used for this model are the two first principal components (PCs) calculated from a principal component analysis (PCA, Barnston and Livezey, 1987) applied separately to each variable.

Hydrological model
The GR4j lumped conceptual model (Perrin et al., 2003) was chosen to simulate the seasonal and interannual variations in runoff at a daily time step (see Fig. 3).Many studies have demonstrated the ability of the model to perform well under a wide range of hydro-climatic conditions (e.g., Perrin et al., 2003;Vaze et al., 2010;Coron et al., 2012) and notably in the Mediterranean region (e.g., Tramblay et al., 2013;Fabre et al., 2015;Ruelland et al., 2015).This model relies on precipitation (P) and potential evapotranspiration (PE) and is based on a production function that determines the effective precipitation (the fraction of the precipitation involved in runoff) that supplies the production reservoir and on a routing function based on a unit hydrograph.According to the available data (cf.Sect.2.2), a simple formula relying on solar radiation and temperature (cf.Eq. 9) was chosen (Oudin et al., 2005) to assess daily potential evapotranspiration (PE).
where R e is the extraterrestrial solar radiation (MJ m −2 d −1 ) given by the Julian day and the latitude, λ net latent heat flux (2.45 MJ kg −1 ), ρ water density (kg m −3 ) and T the mean air temperature at a 2 m height ( • C).Four parameters are used in the GR4j basic version: the maximum capacity of the soil moisture accounting store ×1, B. Grouillet et al.: Sensitivity analysis of runoff modeling to statistical downscaling models a groundwater exchange coefficient ×2, the maximum capacity of routing storage ×3, and a time base for unit hydrographs ×4.A three-parameter snow module based on catchment-average areal temperature (Ruelland et al., 2011(Ruelland et al., , 2014) ) was activated to account for the contribution of snow to runoff from the catchments.Below a temperature threshold ×5, a fraction ×6 of precipitation is considered to be snowfall; this fraction feeds the snow reservoir.Above the threshold ×, a fraction ×7, weighted by the difference between the daily temperature and the threshold ×5, is taken from the snow reservoir to represent snowmelt runoff.

Optimization of hydrological simulations
The model parameters were calibrated and the simulation performances were analyzed by comparing simulated and observed streamflow at a 10-day time step (averaged from daily streamflow outputs) in a multi-objective framework.This time step was retained because it constitutes an interesting compromise for CCIS on water resources, between a daily time step useful for representing small runoff effects and a monthly time step too coarse to capture hydrological variability.The following objectives were considered: (i) the overall agreement of the shape of the hydrograph via the Nash-Sutcliffe efficiency (NSE) metric (Nash and Sutcliffe, 1970); (ii) the agreement of the low flows via a modified, log version of the NSE criterion; and (iii) the agreement of the runoff volume via the cumulated volume error (VE C ) and the mean annual volume error (VE M ).
where Q t obs and Q t sim are, respectively, the observed and simulated discharges for the time step t, N is the number of time steps for which observations are available, Q y obs and Q y sim are the observed and simulated volumes for year y, and N years is the number of years in the simulation period.
The NSE criterion is as well-known form of the normalized least squares objective function.Perfect agreement between the observed and simulated values yields an efficiency of 1, whilst a negative efficiency represents a lack of agreement worse than if the simulated values were replaced with the observed mean values.The optimal value of the VE C and VE M criteria is zero.The latter criterion express the relative difference between observed and simulated values.This multi-objective calibration problem was transformed into a single-objective optimization problem by defining a scalar objective function F obj that aggregates the different objective functions: Calibration was performed in a 7-D parameter space by searching for the minimum value of F obj .To achieve this high-dimensional optimization efficiently, the shuffle complex evolution (SCE) algorithm was used (Duan et al., 1992).

Cross-calibration and validation of hydrological model
To test the performance of the hydrological model in contrasted conditions, the calibration-validation periods were sub-divided using a differential split-sample testing (DSST) scheme (Klemeš, 1986).Thus, two sub-periods of 10 years each divided according to the median annual precipitation for the period were used either for calibration or for validation.These two sub-periods define dry and wet year periods.
For the cross-calibration-validation process, three calibration-validation periods (for the whole period, for dry years, and for wet years) were used to test the performance of the hydrological model in contrasted conditions.A 2-year warm-up period was included at the beginning of each period to attenuate the effect of the initialization of storage.In addition, hydrological years starting in a typical low-flow period in the Mediterranean region (from September to August) were used in the modeling process to minimize the boundary limits of the model reservoir.The quality of the simulations was then assessed by comparing the "optimal" parameter set for each calibration period.For each basin, three simulations based on the three sets of parameters were compared (see Fig. 4).The four criteria employed for the multi-objective function (NSE, NSE log , VE C and VE M ) were used to assess the quality of the simulations.F obj is optimal at 0, and considered satisfactory below 1.
The hydrographs in Fig. 4a illustrate the ability of the model to correctly simulate runoff in the basins, according to the parameter sets used for the calibration periods "whole period", "dry years" and "wet years".All F obj values were below 1, underlining the quality of the simulations.Whatever the calibration period (whole period, dry years or wet years), the objective function F obj did not vary more than 0.1 over the validation period (except for the Segre basin in the wet year validation period).This shows the stability of the simulations when the model is calibrated under contrasted hydro-climatic conditions.The lower quality of the simulations for the Segre basin may be attributed to (i) complex snowmelt processes that are not well represented by the hydrological model; (ii) insufficient quality of data inputs due to the limited number of precipitation and temperature gauges (e.g., only 2 precipitation gauges in a total of 6 stations are included within the Segre basin, while 10 stations are included within the Irati basin); and (iii) the very particular hydro-climatic context characterized by a mountainous climatic barrier, which limits Atlantic influence and reduces the quantity of solid and liquid precipitation supplying the streamflow inside the basin.Although the hydrological simulations were less efficient in this basin than in the others, we found them sufficiently correct to provide an additional basin for the inter-comparison of the SDMs through a regional analysis in different hydro-climatic contexts.
Figure 4b shows that the parameter sets are quite stable whatever the calibration period used for the basins.However, the model parameters were normalized with respect to the lower and upper limits of the parameters obtained.As a result, the more the bounds are widened, the less the normalized parameters are able to account for the differences between the calibration periods.Nonetheless, the relative stability of the normalized parameters underlines the robustness of the model under contrasted climatic conditions.However, in the Segre basin, differences in the GR4j native parameters reflect the difficulty in correctly simulating runoff in this basin including NSE values of around 0.7.Snow module parameters (×5, ×6 and ×7) in the Herault and Loukkos basins are less stable, but the contribution of snowfall in these basins is rather small.Finally, the low drift of the parameters and the relatively homogeneous simulations obtained whatever the calibration period led us to retain the parameter set from the whole period to simulate streamflow under the various climate data sets.To facilitate interpretation and to limit biases in hydrological modeling, the simulated streamflow produced with the best parameter set for the "whole period" calibration period was used as a benchmark (instead of the observed data) for the comparison between the climate data sets in the following steps.

Comparing downscaling methods from the point of view of water resources
Based on the preliminary calibration of the hydrological model, runoff simulations forced by statistically downscaled climate simulations were compared using hydrological indicators that reflect the main issues of impact studies on water resources.Figure 5 illustrates the different steps of this approach.
First, three low-resolution climate data sets (NCEP, CNRM and IPSL) were downscaled using three different statistical methods (ANALOG, CDFt and SWG) to produce new high-resolution hydro-climatic data sets (P and T).Daily PE time series were calculated using the same formula (Oudin et al., 2005) as that used to estimate PE from observed temperature.
After preliminary calibration over the whole reference period under observation-based climate inputs, the hydrological model was then forced with the nine sets of downscaled hydro-climatic data (high resolution) and the three raw data sets (low resolution) to produce an ensemble of 12 runoff simulations.These simulations were compared to a reference runoff simulation (REF) corresponding to the model outputs over the whole reference period calibrated with observationbased climate inputs.This comparison relies on hydrological indicators that are relevant to the water resource challenges according to four complementary aspects of the hydrograph: volume of the water flow, interannual and seasonal variability of runoff, and streamflow distribution.The water flow volume was assessed according to the cumulated volume error (VE C ; see Eq. 12).Interannual variability was as-  sessed according to a root mean square error applied to the sorted annual flows.This criterion was then normalized by dividing the RMSE value by the mean of annual observed discharge.Choosing a normalized root mean square error criterion (NRMSE, Eq .15)applied to this distribution gets round the non-synchronicity of the simulations.Note that applying the NRMSE criterion to sorted flows may favor high flows.Seasonal variability was assessed using a NSE criterion (Eq.10) applied to the mean 10-day discharge series.The last comparison criterion was based on the flow duration profile, divided between high and low flows.High flows correspond to daily flows exceeding the 95th percentile (> Q95); i.e., the 5 % highest daily flows or flows exceeded 5 % of the time.Low flows correspond to daily flows not exceeding the 80th percentile (< Q80); i.e., the 80 % lowest daily flows or flows exceeded 20 % of the time.This value was deliberately chosen to cover a wide range of flows to enable a meaningful distinction between simulations while correctly representing low flows.Both high and low flows were evaluated using a NSE criterion applied to the high and low flow time series.
where X obs is observed values and X sim is simulated values at time/place i. X obs is the mean of observed values.The 12 runoff simulations were compared via these five hydrological indicators.Finally, the downscaling methods (from the runoff simulations forced by the downscaled climate time series) were ranked using the same indicators.The median of the related criterion (VE C , NRMSE INT , NSE SEAS , NSE HF or NSE LF ) in the four study areas made it possible to rank the downscaling methods according to their respective performances in a given configuration: "climate dataindicator".Next, the simulations were combined by computing the median of the criteria values of the four basins, and the three climate data sets to make it possible to rank them.Finally, an additional criterion (Eq.16) was used to aggregate the different goodness-of-fit criteria to provide an overview of the performance of the different downscaling models driven by distinct climate data sets.The lower the aggregation criterion, the better the ranking.
For the remainder of this paper, REF refers to the simulated runoff with the parameters calibrated over the whole period based on the observed climate data.RAW refers to the simulations with raw low-resolution climate data from NCEP/NCAR reanalysis or GCM outputs over the reference period.ANA, CDFt and SWG refer to the simulations based on climate data downscaled via the ANALOG, CDFt and SWG methods, respectively.

Water volumes
Water volumes were assessed through the cumulative volume error, i.e., the error in the percentage of the cumulated volume of water flow over the whole period (Table 2).ANALOG-based simulations generally reproduced water volumes better than the other simulations.Nevertheless, differences appeared depending on the input data used (NCEP, CNRM or IPSL) and on the basin concerned (Fig. 6).Except in the Loukkos basin and for CNRM in the Herault and Segre basins, RAW-based simulations were always improved by downscaling.CDFt-based simulations were slightly better than ANALOG-based simulations in reproducing cumulated volume of water with VE C absolute values averaged between the four basins, with 12 % for CDFt and with 14 % for ANA-LOG.In addition, the results of ANALOG-based simulations were more constant without outlier criterion values.Criterion values can be considered to be outliers when VE C is greater than 50 %, which may be seen as an unacceptable error.In the Loukkos basin, simulations provided many outliers with both SWG and CDFt.The CDFt method improved the results according to the VE C criterion better than the other models.SWG-based simulations ranked first for both criteria with NCEP as inputs, but performed poorly with GCMs.

Interannual variability of streamflow
The ability to reproduce interannual runoff variability was assessed through a root mean square error (NRMSE INT ) criterion applied to the sorted time series of annual discharge and normalized by dividing RMSE by the mean annual discharge of the reference (see Fig. 7).In other words, for each basin, the downscaling method and input data, and the annual discharge values were sorted from the highest value to the lowest one to generate new decreasing time series on which the NRMSE criterion was calculated with respect to the sorted reference time series.The results show that the interannual variability of runoff is correctly reproduced by the simulations based on most of the downscaled climate data sets, particularly ANALOG-and CDFt-based simulations in which NRMSE values rarely reached more than 30 %.On the whole, RAW-based simulations were improved by downscaling, especially when driven by NCEP and IPSL, except for SWG-based simulations driven by GCMs (Fig. 7).Indeed, when driven by NCEP, the SWG method reproduced interannual variability better than the other methods for three of the four basins, but produced poor results with GCMs, in which case ANALOG-and CDFt-based simulations generally performed better.

Seasonal variability of streamflow
Seasonal variability was assessed using a NSE criterion (Eq.10) applied to the mean 10-day discharge series.In most cases, the downscaling methods improved the reproduction of the seasonal variability of streamflow compared to the low-resolution raw data sets (see Fig. 8).This was particularly true of NCEP reanalyses, for which downscaled inputs considerably improved the simulation of the seasonal dynamics more realistically than with RAW-based simulations.Although the ANALOG method did not systematically match the best NSE values, on the whole, the method reproduced the seasonal variability better than CDF-t and SWG.The CDFt method performed particularly well with GCMs as inputs, but proved to be unsuitable with NCEP under the particular hydro-climatic conditions that prevail in the Segre basin.Except with NCEP, SWG-based simulations repro- duced poorly the seasonal variability of runoff, due notably to systematic overestimation of high-flow events.

Streamflow distribution: high and low flows
Streamflow distribution was divided between high flows, i.e., the 5 % highest daily flows, and low flows, i.e., the 80 % lowest daily flows.Both were evaluated using a NSE criterion applied to the high and low flow time series.On the whole, the downscaling methods improved the reproduction of the distribution of sorted high flows (Fig. 9a).However, it should be noted that the downscaled simulations with CNRM data deteriorated raw data in the Segre basin.Results showed that ANALOG generally reproduced the 5 % highest flows best; the NSE values were quite stable and never below 0.47.The CDFt-based simulation results were very close to those obtained with ANALOG, with equivalent scores when NCEP or GCM data were used as inputs.Nevertheless, it should be noted that ANA and CDFt reproduced less accurately high flows in the Segre basin than in the other basins.This can be explained by a lower efficiency of the hydrological model in this area as shown in Sect.3.2.3,thus leading to a reference simulated streamflow more uncertain than in the other basins.The SWG method reproduced high flows well with NCEP data as inputs, but not with GCM data.
Figure 9b shows the distribution of sorted low flows and the associated NSE criterion.Moreover, applying a NSE criterion to the sorted low flows tended to emphasize the differences between the simulations and thus made it easy to distinguish simulations that reproduced low flows poorly.The downscaling methods improved the representation of the 80 % lowest flows in all basins, except for the SWG method with GCM data used as inputs.In general, the best results were obtained from ANALOG-based simulations, with NSE values always above 0.81.The CDFt-based simulations performed significantly better when forced with GCMs than with NCEP.The SWG-based simulations were unable to reproduce low flows when GCM data were used as inputs.

Discussion and conclusions
The aim of this study was to test the ability of different statistical downscaling climate models to provide accurate hydrological simulations for use in climate change impact studies (CCIS) on water resources.To get round the constraints represented by the inherent characteristics of each climate model, we compared three statistical downscaling methods applied to three low-resolution raw data sets: NCEP/NCAR reanalysis data and two GCM data (CNRM and IPSL).The  To account for uncertainty related to the spatial variability of the downscaled climate simulations, this approach was applied over four western Mediterranean basins of similar size but that represent a wide range of hydro-meteorological situations.
The proposed sensitivity analysis enabled us to identify the strengths and weaknesses of different statistical downscaling methods with respect to the sensitivity of runoff simula-tions to low-resolution and high-resolution downscaled climate data sets (see Fig. 10).Our study revealed the performances that could be expected from downscaling techniques applied to large-scale data sets to provide acceptable hydrological simulations.To complement the usual calibrationvalidation exercises conducted by climatologists for assessing the suitability of SDMs based on predictors and to reanalyze grids (see, e.g., Vaittinada Ayar et al., 2015), we focused on a validation protocol directly based on streamflow, thus allowing the combined impacts of the downscaled precipitation and temperature inputs to be considered through the hydrological response.
On the whole, the ANALOG-based simulations performed well in all the situations tested, whatever the large-scale climate data set used as inputs (NCEP or GCMs), notably in reproducing interannual and seasonal runoff and low flows.ANALOG-based simulations were closely followed by CDFt-based simulations, notably when GCM outputs were used, but with a lower variability of scores than with ANALOG.On the contrary, the results clearly showed that the SWG method should not be used "as is" in climate change impact studies on water resources.Indeed, although the SWG-based simulations were satisfactory when based on the NCEP large-scale climate data set, they significantly underperformed when based on GCM outputs.Biases of the GCM data with respect to the NCEP/NCAR reanalyses may explain the poor performances of the SWG method.As SWG is calibrated with "perfect" predictors from reanalyses, its application to biased GCM predictors led to unsatisfactory SWG-based hydrological simulations.To make the SWG method more applicable in climate change impact studies on runoff, one solution could be to correct the GCM predictors with respect to reanalyses, as done for example by Colette et al. (2012) before performing a dynamical downscaling.
Although the ANALOG method appeared to be the best SDM in this study, it may suffer from certain limitations when used in a climate change context, notably when downscaling GCM projections over the 21st century.One main limitation is that ANALOG is not able to provide suitable simulations for the extreme events if such events increase in intensity in the future (see, e.g., Teng et al., 2012).Indeed, by construction, as ANALOG works by resampling the calibration set, it never supplies downscaled values beyond the range of the calibration reference data set.
On the other hand, although CDFt-based simulations were less consistent than ANALOG simulations, they were more sensitive to climate forcing and also more sensitive to the chosen indicators.The CDFt method was particularly appropriate when we focused on the cumulated volume, seasonal variability and high flows.In addition, it should be noted that the CDFt method is the most parsimonious technique since it generally needs only one variable as a predictor.This could obviously be considered an advantage since the complexity of CDFt is very low.However, this low level of complexity could mean that some climate information needed to drive the CDFt more efficiently will be missing.In that sense, one possible improvement could consist in incorporating additional covariates in CDFt, as done by Kallache et al. (2011).Nevertheless, the approach including those additional predictors means that this conditional CDFt has to be calibrated on reanalyses or, at a minimum, on the outputs of a climate model of which the day-to-day evolution of large-scale weather states matches that of the real world.This could be a limitation, since additional biases may appear with those constraints.
The next step will be exploring the potential impact of climate change on the runoff in the basins studied here.To this end, an ensemble approach will be proposed based on the construction of high-resolution climate scenarios using different climate models, gas emission scenarios, and downscaling techniques.In view of the acceptable hydrological simulations obtained with the ANALOG and CDFt methods, it may be useful to develop high-resolution climate forcings downscaled with these two methods in order to account for the uncertainty of the downscaling, as recommended by some authors (e.g., Chen et al., 2011Chen et al., , 2012) ) for applications in climate change impact studies.Our study also showed the benefits of evaluating the relevance of SDMs in a given hydro-climatic context using a suitable validation protocol.Indeed, selecting unsuitable downscaling methods, such as SWG with GCM outputs, can expand the range of uncertainty linked to the range of SDMs.
Furthermore, our study showed that hydrological responses were sensitive to the climate data sets used as inputs.Indeed, despite the significant contribution of the downscaling methods, hydrological simulations are better from reanalysis data than from GCM data.This demonstrates the limits of GCMs in reproducing current climatic conditions and therefore the associated hydrological responses.This point raises the question about the use of GCM, and thus about the need to correct them afterwards for the evaluation of future hydrological impact in CCIS.Finally, although it is commonly acknowledged that the uncertainty resulting from climate modeling (GCMs, gas emission scenarios and downscaling methods) is highest in a context of climate change (e.g., Wilby and Harris, 2006;Arnell, 2011;Teng et al., 2012), it should be noted that the uncertainty stemming from hydrological modeling may also be high.Several authors (e.g., Benke et al., 2008;Brigode et al., 2013;Hublart et al., 2015;Ruelland et al., 2015) showed that the choice of the hydrological model (structural uncertainty) and its parameterization (parameter uncertainty) could cause significant variability in runoff simulations.Consequently, further analyses of the applicability of the model parameters in a nonstationary context and with different calibration criteria are needed before the model is used in future climate conditions.
Similarly, the different sources of uncertainties and their propagation in the hydrological projections need to be evaluated.To this end, a standard ensemble approach based on various climatic, downscaling and hydrological models may not be sufficient, since using many models without prior validation of their efficiency can lead to very large uncertainty bounds due to the poor quality of some models in the ensemble framework.Minimizing uncertainty thus requires selecting models that perform reasonably well over the reference period in the context of the current climate.Although this cannot guarantee the quality of the models for future conditions, we believe it is an essential step to provide more reliable and relevant hydrological projections.

Figure 1 .
Figure 1.Study catchments (Herault, Segre, Irati and Loukkos) in the western Mediterranean region with their topography and mean seasonal variability in precipitation (P ) and discharge (Q) for the period 1986-2005.

Figure 2 .
Figure 2. Precipitation (I P = (P y − P y ) / σ P ) and temperature (I T = (T y − T y ) / σ T ) indices applied to the four basins over the 1986-2005 period.The grey lines highlight years when the signs of the indices are the same for the four basins.P y is the annual precipitation for the year y, P y is the mean of the annual precipitation and σ P is the standard deviation of the annual precipitation.T y is the annual temperature for the year y, T y is the mean of the annual temperature and σ T is the standard deviation of the annual temperature.

Figure 4 .
Figure 4. Cross-calibration/validation of the hydrological model.(a) Seasonal representation (from September to August) of simulated and observed runoff during the whole period (WHO, first row), dry years (DRY, second row) and wet years (WET, third row) according to parameter sets optimized, respectively, for the whole period (in grey), dry years (red) and wet years (yellow).F obj (F obj = (1 − NSE) + (1 − NSE log ) + |VE C | + VE M ) is computed on daily series.F obj is optimal at 0, but is considered satisfactory below 1.(b) Normalized model parameters obtained over the three calibration periods.

Figure 5 .
Figure 5. Flowchart illustrating the method used to compare the three downscaling methods through a hydrological sensitivity analysis.

Figure 6 .
Figure 6.Comparison of the downscaling methods according to the cumulative volume error (VE C ) used as a criterion to compare the downscaling methods applied to NCEP, CNRM and IPSL climate inputs in the four basins.The smaller the absolute value of the criterion, the better the simulation.

Figure 7 .
Figure 7.Comparison of the sorted annual discharge simulated using REF data, RAW (NCEP or GCM) data, and the three downscaling methods (applied to NCEP, CNRM and IPSL) for each basin.The NRMSE values above each panel represent a root mean square error applied to the sorted time series of annual discharge normalized by dividing RMSE by the mean annual discharge of the reference time series.The best values are in bold.

Figure 8 .
Figure 8.Comparison of seasonal variations in streamflow simulated using REF data, RAW (NCEP or GCM) data, and the three downscaling methods (applied to NCEP, CNRM and IPSL) for each basin.The NSE values for the mean 10-day discharge between REF and the simulation concerned are given above each panel.The best values are in bold.

Figure 9 .Figure 10 .
Figure 9.Comparison of (a) the 5 % daily high flows and (b) the 80 % daily low flows simulated with REF data, RAW (NCEP or GCM) data, and the three downscaling methods (applied to NCEP, CNRM and IPSL) for each basin.The NSE values calculated on the 5 % high and the 80 % low flows are indicated on the right in each panel.NSE values higher than 0.5 for high flows and 0.8 for low flows are in bold.

Table 2 .
Cumulative volume error (VE C ) between hydrological simulations based on downscaled or raw climate data (ANA, CDFt, SWG, RAW) and the reference (REF).Values are expressed in % of difference in the total volume of water flowed during the period.