Reliability of lumped hydrological modeling in a semi-arid mountainous catchment facing water-use changes

. This paper explores the reliability of a hydrological modeling framework in a mesoscale (1515 km 2 ) catchment of the dry Andes (30 ◦ S) where irrigation water use and snow sublimation represent a signiﬁcant part of the annual water balance. To this end, a 20-year simulation period en-compassing a wide range of climate and water-use conditions was selected to evaluate three types of integrated models referred to as A, B and C. These models share the same runoff generation and routing module but differ in their approach to snowmelt modeling and irrigation water use. Model A relies on a simple degree-day approach to estimate snowmelt rates and assumes that irrigation impacts can be neglected at the catchment scale. Model B ignores irrigation impacts just as Model A but uses an enhanced degree-day approach to account for the effects of net radiation and sublimation on melt rates. Model C relies on the same snowmelt routine as Model B but incorporates irrigation impacts on natural streamﬂow using a conceptual irrigation module. Overall, the reliability of probabilistic streamﬂow predictions was greatly with Model C, resulting in narrow uncertainty bands and reduced structural errors, notably during dry years. This model-based analysis also stressed the importance of considering sublimation in empirical snowmelt models used in the subtropics, and provided evidence that water abstractions from the unregulated river are impacting on the hydrological response of the system. This work also highlighted areas


Introduction
Mountains act as natural water towers in many semi-arid regions.Glaciers and seasonal snowpack in the uplands serve as reservoirs, accumulating water during the winter and sustaining streams and aquifers during the spring and summer.This reduces streamflow variability in the lowlands and provides local communities with the opportunity to develop agricultural systems based on regular water supplies.Irrigation often represents a large part of crop water use in these areas due to the dry conditions that prevail during the growing season (Siebert and Döll, 2010).
This makes such systems highly vulnerable to projected changes in climate conditions, for at least two reasons.First, warmer temperatures will reduce the fraction of precipitation falling as snow and tend to accelerate snowmelt, leading to earlier and reduced spring peak flows and increased winter flows (Adam et al., 2009;Sproles et al., 2013).Reduced summer and fall flows could in turn significantly impact water availability for irrigation purposes.Second, higher temperatures in the valleys will affect the timing of pheno-Published by Copernicus Publications on behalf of the European Geosciences Union.P. Hublart et al.: Reliability of lumped hydrological modeling logical events (Cleland et al., 2007), which drive the seasonal pattern of crop water needs.Some perennial crops like grapevines are already showing a tendency toward earlier budburst events and shortened growth intervals in many regions of the world (Jones et al., 2005;Duchêne and Schneider, 2005).Vineyards located in semi-arid mountainous areas are particularly exposed, owing to high diurnal temperature variations and overall sub-optimal growing temperatures (Caffarra and Eccel, 2011).It has also been noted that elevated temperatures may adversely affect the ability to meet chilling requirements during the crop dormancy (Webb et al., 2007).
Thus, the future of agricultural systems in snowdominated, semi-arid catchments relies on our ability to anticipate the complex relationships between climate conditions, snowmelt timing, water availability and crop water use.

Advantages and limitations of current conceptual precipitation-runoff models
To understand and forecast the response of hydrological systems, hydrologists often rely on numerical catchment models known as "conceptual precipitation-runoff models".Precipitation inputs are processed into runoff through a number of inter-connected water stores representing different aspects of the system's behavior (e.g., slow vs. fast responses, surfacewater vs. groundwater compartments).In general, relatively simple structures are used, in which typically fewer than 10 parameters require calibration against physically observable responses (e.g., streamflow data) (Wagener et al., 2001).Such models also have low data and computer requirements, making them especially attractive in data-scarce areas such as remote mountainous catchments.As a result, they are being increasingly used to evaluate the potential impacts of land-use and/or climate changes on the capacity to meet agricultural water demands (e.g., Merritt et al., 2004;Collet et al., 2015;Fabre et al., 2015a).
The conclusions drawn from these models, however, are naturally bounded by a range of uncertainty arising from multiple sources of error and approximations.This includes the impacts of input data errors, numerical approximations, structural inadequacies and model non-uniqueness.Parameter instability under changing climate and/or anthropogenic conditions represents an additional source of uncertainty that may be difficult to distinguish from parameter equifinality in the absence of uncertainty analysis (Seibert and McDonnell, 2010;Brigode et al., 2013).Such limitations remain largely overlooked in many impact studies.Instead, it is often assumed that the uncertainty associated with climate and/or water-use scenarios greatly outweighs that arising from the modeling process itself.From a water management perspective, however, the added value of precipitation-runoff models lies not simply in their ability to provide accurate streamflow predictions, but also in the systematic examination of the un-certainty surrounding these predictions and the ultimate decision being addressed (Ajami et al., 2008).
One of the most effective means of providing such information is through the use of Bayesian inference methods.Notwithstanding important issues in how best to handle epistemic uncertainties, and whether probability theory is the right tool to use (Beven et al., 2011;Montanari, 2011), formal Bayesian approaches offer the opportunity to test the reliability of model predictions through a series of posterior diagnostics.This, in turn, provides a meaningful way to discuss the relative merits of competing model structures or different versions of the same model.Very often, structural inadequacies can be partially alleviated by comparing alternative representations of the processes at work.This paper addresses two specific issues pertaining to the use of conceptual models in semi-arid catchments where the effects of irrigation water use and snow sublimation cannot be dismissed a priori.

Potential impacts of water abstraction and irrigation water use
The first issue deals with water abstraction for irrigation, which has many potential impacts on hydrological processes, including changes in groundwater recharge (Scanlon et al., 2006) and low-flow characteristics (Yang et al., 2010).In arid and semi-arid catchments, these impacts may be hard to quantify because a high degree of temporal and spatial variability in climate conditions often masks anthropogenic trends (Kim et al., 2007).During low-flow and drought periods, however, a much greater proportion of natural flow may be abstracted, leading to amplified impacts (in relative terms) on the flow regime.The poor performance of most conceptual models during these critical periods is a well-recognized issue in the hydrological research community and many studies have formulated different approaches towards improving low-flow simulations (e.g., Smith et al., 2010;Staudinger et al., 2011;Pushpalatha et al., 2011).However, most of these studies have been concerned mainly with undisturbed river systems.The impacts of river damming and regulation have also been studied extensively, but there is a surprising dearth of work regarding the effects of water abstraction from unregulated streams.
A common approach to remove such effects in model building and evaluation is to rely on "naturalized" streamflow data (e.g., Ashagrie et al., 2006).This requires detailed information on surface water or groundwater withdrawals and irrigation water use, which is rarely available.In practice, the sum of all water access entitlements is often taken as an upper bound for the actual water consumption at the catchment scale, and added back to observed streamflow data before calibrating a given model.However, farmers may not withdraw their full entitlement all year long and a significant part of water withdrawals actually returns to the river system within a few days or weeks due to conveyance and field losses.In theory, ignoring these return flows would lead to overestimation of natural streamflow.But in reality, it can be very difficult to disentangle the relative influence of epistemic errors in streamflow estimates (rating curve errors, unknown return flows) and input data (precipitation, temperature, potential evapotranspiration).Therefore, for a proper assessment of model reliability, streamflow naturalization should be considered an integral part of the modeling process and explicitly recognized as an additional source of imprecision in streamflow predictions (Hughes and Mantel, 2010;Hublart et al., 2015a).

Potential impacts of sublimation losses
The second issue addressed by this paper concerns the means by which snowmelt inputs are obtained in snowdominated, semi-arid catchments.Many studies rely on empirical degree-day approaches, in which air temperature is taken as a reasonable proxy for the energy available for melt (Ohmura, 2001).Melt rates are assumed to be linearly related to air temperature by a constant of proportionality known as the "melt factor", which can vary on a seasonal basis (Hock, 2003).Enhanced degree-day methods are sometimes implemented to include the effects of additional variables such as solar radiation or wind speed.However, by focusing exclusively on melt rates, such approaches can prove highly misleading where sublimation losses represent a large part of ablation rates.This is generally the case in semi-arid areas located around 30 • S and 30 • N.
Sublimation rates in the subtropics are expected to be high as a result of very low relative humidity and intense solar radiation during most of the year.In the dry Andes, for instance, Gascoin et al. (2013) found that sublimation losses represented more than 70 % of the total ablation simulated by a physically based model in the instrumented site of Pascua-Lama (1043 km 2 , 2600-5630 m a.s.l.).Similar results were also obtained by experimental studies conducted on small glaciers of the same region (MacDonell et al., 2013).In the Northern Hemisphere, Schulz and de Jong (2004) attributed up to 44 % of annual snow ablation to sublimation in a 140 km 2 catchment of the High Atlas range (2000-4000 m a.s.l.).It is becoming increasingly recognized that failure to account for sublimation losses in commonly used temperature-index methods can impair model performance, distort parameter identification and question the reliability of snowmelt estimates under higher temperatures (e.g., Boudhar et al., 2009;Ayala et al., 2015).

Objectives
Ideally, the incorporation of new processes into a given model structure should be achieved using the same level of mathematical abstraction and process representation as in the original model.Blöschl and Montanari (2010) insisted that "a better understanding of the hydrological processes should not necessarily translate into more complex models used in im-pact studies".Indeed, maintaining low-dimensional, holistic modeling approaches is essential to constrain parameter uncertainty and help the modelers focus on understanding the main drivers of hydrological change.
This paper investigates one possible way of integrating the effects of irrigation water use and snow sublimation into a parsimonious, catchment-scale modeling framework.Such processes are typically not accounted for in currently available precipitation-runoff models.Particular attention is paid to the representation of changes in irrigated areas and crop varieties over time.The method is tested in a snowmelt-fed catchment of the Coquimbo region (Chile), which is currently facing one of the worst droughts in its recorded history (Salinas et al., 2015).In the same catchment, Hublart et al. (2015a) attempted to reduce structural uncertainty in a non-probabilistic way by comparing 72 alternative models derived from the same modular framework.However, the potential effects of irrigation and sublimation were not included in this multiple-hypothesis framework, thereby limiting its ability to cope with climate and anthropogenic changes.Hublart et al. (2015b) provided a first attempt to incorporate these two processes in a precipitation-runoff model, but several important aspects, such as the quantification of model uncertainty and the quality of snowmelt simulations, remained outside the scope of their study.Compared to this previous paper, the present study makes use of (1) extended calibration and validation periods to encompass a wider range of climate and water-use conditions, (2) formal Bayesian theory to quantify predictive uncertainty in a probabilistic way, and (3) remotely sensed snow-cover data to evaluate the internal consistency of the snow module.
2 Study area and data 2.1 General setting

Physical landscape
The Claro River catchment is a semi-arid, mountainous catchment located in northern-central Chile (30 • S).It drains an area of about 1515 km 2 characterized by a series of granitic mountain blocks interspersed with steep-sided valleys.Elevations range from 820 m a.s.l. at the catchment outlet in Rivadavia to approximately 5500 m a.s.l.near the border with Argentina (Fig. 1a).Above 3000 m a.s.l., repeated glaciations and the continuous action of frost and thaw throughout the year have caused an intense shattering of the exposed rocks, leaving a landscape of bare rock and screes almost devoid of soil.The valley-fill material consists of mostly unconsolidated glaciofluvial and alluvial sediments mantled by generally thin soils (< 1 m) of sandy to sandy-loam texture.Natural vegetation outside the valleys is extremely sparse and composed mainly of subshrubs (e.g., Adesmia echinus) and cushion plants (e.g., Laretia acaulis) with very low transpiration rates (Squeo et al., 1993;Kalthoff P. Hublart et al.: Reliability of lumped hydrological modeling  et al., 2006).In the lower part of the catchment, vineyards and orchards cover most of the valley floors and lower hill slopes, where they benefit from a unique combination of clear skies, high diurnal temperature variations and overall dry conditions during the growing season.The Claro River originates from a number of small, snowmelt-fed tributaries flowing either permanently or seasonally in the mountains.

Climate
Most of the annual precipitation falls as snow during typically two or three winter storms (Favier et al., 2009), when the South Pacific High reaches its northernmost position (June-August).Mean annual precipitation ranges from approximately 100 mm at the catchment outlet (Rivadavia) to 670 mm in the High Cordillera (Bourgin et al., 2012).The annual snow-cover duration estimated from MODIS snowcovered area (SCA) data (see Sect. 2.2) ranges from less than 20-40 days at low elevations (< 2000 m a.s.l.) to about 160-180 days at high elevations (> 4000 m a.s.l.), where sublima-tion is expected to be the dominant cause of ablation (Gascoin et al., 2013;MacDonell et al., 2013).In the dry Andes, net shortwave radiation represents the dominant source of energy available for melt and sublimation (Pelliciotti et al., 2008).
At the inter-annual timescale, the El Niño-Southern Oscillation (ENSO) represents the largest source of climate variability (Montecinos and Aceituno, 2003).Anomalously wet (dry) years in the region are generally associated with warm (cold) El Niño (La Niña) episodes and a simultaneous weakening (strengthening) of the South Pacific High.It is worth noting, however, that some very wet years in the catchment can also coincide with neutral to weak La Niña conditions, as in 2000-2001, while several years of below-normal precipitation may not exhibit clear La Niña characteristics (Verbist et al., 2010).These anomalies may be due to other modes of climate variability affecting the Pacific basin on longer timescales.The Interdecadal Pacific Oscillation (IPO), in particular, has been shown to modulate ENSO's influence ac-cording to cycles of 15 to 30 years (Schulz et al., 2011).Figure 1c shows a sustained decrease in mean annual streamflow since the mid-1990s, which could be associated with a shift in the IPO phase around 1998.

Agricultural activity
Grape growing is by far the main agricultural activity in the catchment.All grapes are grown to be exported as earlyseason table grapes or processed into a brandy-like national drink known as pisco.Reliable water supplies are critical to satisfy crop water needs in the summer, since precipitation events occur mostly at high elevations or outside the growing season.Irrigation water is diverted at multiple locations along the river's course and conveyed to the fields through a complex network of open, mostly unlined canals.The amount of water diverted from the river depends on both historical water rights and current water availability.Table varieties are mostly drip-irrigated, while pisco varieties remain largely furrow-irrigated.
Irrigated areas in the Claro River catchment have increased by about 200 % between 1985 and 2005 (Fig. 1b).This expansion has been limited by both water and agricultural land availability, and irrigated areas currently represent less than 5 % of the total catchment area.A rough estimate of the effects of increased irrigated areas on mean annual streamflow can be obtained by looking at the difference in discharge measured at Rivadavia (downstream from cultivated areas) and that measured at Cochiguaz and Alcohuaz (upstream from cultivated areas) (Fig. 1c).Note that transmission losses caused by infiltration through the riverbed may also reduce streamflow at downstream points, especially during dry periods when the depth of water tables is low.

Hydro-climate data
Precipitation and temperature data were interpolated from, respectively, 12 and 8 stations to a 5 × 5 km grid using an inverse distance squared weighting (Ruelland et al., 2014).Orographic effects on precipitation were considered using the approach described in Valéry et al. (2010a) with a correction factor of 6.5 × 10 −4 m −1 (determined by sensitivity analysis), resulting in a gradient of around 0.4 m water equivalent per kilometer.For temperature, a constant lapse rate of −6.0 • C km −1 was estimated from the observed data.Daily streamflow data were extracted from the Chilean "Dirección General de Aguas" database.
In addition, remotely sensed data from the MODerate resolution Imaging Spectroradiometer (MODIS) sensor were introduced to estimate the seasonal patterns of fractional snow-covered areas (F SCA ) over a 12-year period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011).Daily snow-cover products retrieved from NASA's Terra (MOD10A1) and Aqua (MYD10A1) satellites were combined into a single, composite 500 m resolution product to reduce the effect of swath gaps and cloud obscuration.The remaining data voids due to cloud cover or missing data were subsequently filled using a linear temporal interpolation method, where a pixel was classified as snow/land depending on the closest previous/next observation of snow/land.

Agricultural data
Two different grapevine varieties were selected to represent phenological patterns in the valleys, namely, Flame Seedless (for table grapes) and Moscatel Rosada (for pisco grapes).Phenological observations for these two varieties were carried out over a 10-year period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) at the Instituto de Investigaciones Agropecuarias (INIA), located a few kilometers downstream from the catchment outlet.Grapevines were trained using an overhead trellis system and fully irrigated during the whole growing season.The experiment kept track of three major events: budburst (BB), full bloom (FB) and the beginning of harvest (HV).Budburst was defined as the moment when the first leaf tips become visible and full bloom as the moment when 80 % of the flower caps are off.The beginning of harvest depends on the intended use of the grapes.Table varieties require lower sugar contents (∼ 16 • Brix, i.e., 160 g of sucrose per liter) than those dedicated to the production of pisco (22 • Brix), which are generally harvested a few months later (Ibacache, 2008).
A database of water access entitlements was used to estimate the total volume of water licensed for abstraction in the catchment.This included a time series of monthly restrictions to these entitlements issued by the Dirección General de Aguas during prolonged dry periods.

Modeling framework
In this paper we developed and compared three different models.These differed in their approach to snowmelt and irrigation modeling.The first one, referred to as "Model A", used a simple degree-day approach to estimate snowmelt rates while neglecting the effects of irrigation water use (IWU) at the catchment scale.The second one, referred to as "Model B", ignored IWU effects just as Model A but relied on an enhanced degree-day approach to account for the effects of net radiation and sublimation on melt rates.The third one, referred to as "Model C", used the same snowmelt routine as Model B and incorporated IWU effects on natural streamflow using a conceptual irrigation module.
Figure 2 shows a block diagram of this modeling framework.The blue blocks refer to the hydrological part of the framework shared by the three models (see Sect. 3.1.2 and 3.1.3).The green blocks relate to the estimation of irrigation water requirements (IWR) used only by Model C.This involves several phenological models to cap- ture the main dynamics of crop water needs over each growing season (Sect.3.1.4)and a moisture-accounting store representing the valley soils (Sect.3.1.3).Net irrigation water use at the catchment scale is computed as a function of IWR, irrigated areas and water availability (i.e., natural streamflow) (Sect.3.1.3).The whole modeling chain is fed by precipitation and temperature data.
We also stress that smoothing functions were used throughout this framework to remove all threshold nonlinearities from the models' equations (insofar as possible), as recommended by several authors (e.g., Fenicia et al., 2011).These smoothing functions will not be shown in the following sections for the sake of clarity.

Simplifying assumptions
The modeling framework described in Fig. 2 relies on three important assumptions regarding the representation of IWU and IWR at the catchment scale.
1. First, IWU refers to the amount of water lost by evapotranspiration from the cropped fields and the riparian vegetation that thrives along the irrigation canals.
It should not be confused with the actual surface-water withdrawals (SWW) that vary on a weekly or monthly basis depending on historical water rights and planning/management decisions.SWW include IWU but also non-consumptive losses caused by canal seepage and deep percolation in the fields.Unfortunately, the impact of SWW on the catchment behavior is difficult to estimate because reliable information on these additional losses and the proportion of abstracted flows coming back to the system is lacking.In this study, all return flows were assumed to come back to the river within each 10-day time step.A similar assumption can be found in Kiptala et al. (2014).
2. Second, IWR refer to the amount of water needed to satisfy crop evapotranspiration under optimal conditions.In practice, this quantity depends on the irrigation technique used by the farmers.In furrow-irrigated fields, IWR would be expected to bring the soil moisture to saturation (or field capacity) and thereby satisfy crop water needs during several days.In drip-irrigated fields, irrigation is required to compensate for the difference between the amount of water stored in the soil and crop water needs.In this study, we assumed that both irrigation techniques lead to the same water requirements over a sufficiently long time interval.
3. Third, the two varieties (Flame Seedless, Moscatel Rosada) selected to represent phenological patterns in the valleys are at best a rough approximation of the real crop diversity in this catchment.In reality, phenological dates for each type of grape (pisco or table grapes) can spread over several days or weeks depending on the variety involved.For instance, pisco producers report differences of between 1 and 2 weeks between the various varieties used for pisco (Ibacache et al., 2010).
Taking heed of these underlying assumptions, all models (A, B and C) were run at a daily time step but evaluated using a 10-day time step.This 10-day interval was also more consistent with the temperature-index approach used to estimate snowmelt rates (Hock, 2003) (Sect.3.1.2).

Snow accumulation and ablation modules
The snow accumulation and ablation (SAA) modules developed in this study borrow much of their philosophy and equations from the Cemaneige model (Valéry et al., 2014).
The catchment was divided into five elevation zones (EZs) of equal area, within which separate modules operated simultaneously based on the same set of parameters.At each time step t, precipitation was partitioned into rain and snow by assuming a linear transition from snow to rain across a fixed temperature range defined as 'Hôte et al., 2005).The amount of water contained in the snowpack, or snow water equivalent (SWE, in mm), was then updated as As in the original Cemaneige model, an antecedent temperature-index approach was used to keep track of the snowpack temperature (T S , in • C) and determine when the pack was ready to melt: where T A ( • C) is the mean air temperature within the elevation zone and θ S is a parameter quantifying the sensitivity of the snowpack temperature to T A .As such, θ S is expected to be higher in regions characterized by thick snowpacks (see also Sect.4.2.1).A similar representation can be found in other hydrological models, including enhanced versions of SWAT (Fontaine et al., 2002) and SRM (Harshburger et al., 2010).Melt rates (mm day −1 ) were computed as follows: for Models B and C , (4) where MF (mm • C −1 day −1 ) is the melt factor, T thr is the temperature threshold at which snowmelt begins (usually set at 0 • C), λ f is the latent heat of fusion (∼ 0.34 MJ kg −1 at 0 • C), ρ is the density of water (∼ 1000 kg m −3 ), R SW and R LW (MJ m −2 day −1 ) are the net shortwave and longwave radiations, respectively (more details are given in the Appendix), C T is the specific heat of snow (∼ 0.0021 MJ kg −1 at 0 • C), F SCA is the fractional snow-covered area and V min is a parameter accounting for the effects of low SWE levels on melt rates.Y N represents the energy available from net radiation and changes in the snowpack heat storage.The F SCA function can be thought of as a basic depletion curve representing the influence of snow distribution within each zone.As a first approximation, it was assumed to increase linearly with SWE until a threshold SWE max was reached, above which the whole elevation zone was assumed to be covered by snow.Following Valéry et al. (2014), the value of SWE max was fixed at 90 % of the mean annual snowfall observed within each elevation zone separately.Similarly, the value of V min was fixed at 0.1 as in the original Cemaneige model (Valéry et al., 2010b) to ensure that melt still occurred when F SCA was close to zero.
For Models B and C, sublimation losses (mm day −1 ) were estimated as follows: where λ s is the latent heat of sublimation (∼ 2.84 MJ kg −1 at 0 • C).Note that when T A ≥ T thr and T S < 0 • C, all the energy available at the snow surface was used to warm the snowpack.The SAA module of Model A is equivalent to the Cemaneige model (Valéry et al., 2014), whereas that of Models B and C corresponds to an enhanced version of this model in which sublimation and net radiation are considered explicitly.However, both of these modules rely on the same calibrated parameters.
P. Hublart et al.: Reliability of lumped hydrological modeling

Runoff production and routing modules
Spatially averaged rainfall and snowmelt estimates were combined into a single "precipitation" term that was used as input to the lumped GR4J model (Perrin et al., 2003).Potential evapotranspiration (PE) was first determined for each grid cell using the temperature-based formula proposed by Oudin et al. (2005): where N C is the number of grid cells, N Z is the number of elevations zones (Z), λ v is the latent heat of vaporization (∼ 2.46 MJ kg −1 ) and PE Oudin, C (mm) is given by Eq. ( 11).Note that PE GR4J accounts for evapotranspiration from soils, natural vegetation and crops only insofar as it relates to precipitation or meltwater.It is not supposed to include evapotranspiration from cultivated areas caused by irrigation water use.Thus, the GR4J model simulates only those hydrological processes that relate to the "natural" catchment behavior.Incorporation of IWU into the modeling framework does not modify the structure and governing equations of the GR4J model, but only the estimates of natural streamflow.This choice can be justified by the fact that the cultivated areas concentrate mainly in the lower part of the catchment and represent only a small portion of the total area (Fig. 1).The GR4J model was chosen for its simplicity and parsimony.Basically, the precipitation-runoff process is broken down into two components: a runoff generation module computes the amount of water available for runoff, i.e., "effective precipitation", while a routing module subsequently routes this quantity to the catchment outlet.In the first module, a soil-moisture accounting (SMA) store is used to partition the incoming rainfall and/or snowmelt into storage, evapotranspiration and excess precipitation.At each time step, a fraction of the SMA store is also computed to represent soil drainage and added to excess precipitation to form the effective precipitation.The second module splits this quantity between two different pathways with respect to a constant ratio: 10 % passes as direct runoff through a quick flow routing path based on a unique unit hydrograph, whereas 90 % passes as delayed runoff through a slow flow routing path composed of a unit hydrograph and an additional routing store.Outputs from both pathways are finally added up to simulate natural streamflow at the catchment outlet.This model relies on four calibrated parameters (X1, X2, X3 and X4) that are described in Table 1.

Irrigation water-use module (Model C)
In Model C, irrigation water requirements per unit area (IWR, in mm day −1 ) were estimated for each crop variety i using a simple soil-water balance approach: where ETM (mm day −1 ) refers to crop evapotranspiration under optimal conditions and SWC (mm) to the average soil-water content in the root zone.P Valley (mm day −1 ), ET 0 (mm day −1 ) and T A, V ( • C) are, respectively, the areal effective precipitation, reference evapotranspiration and air temperature in the valleys, and K C is a coefficient depending on crop growth stages.A realistic estimate of ET 0 was provided by using a modified version of Oudin's formula (Eq.11).In Oudin et al. (2005), the values of K 1 and K 2 were chosen as those giving the best streamflow simulations for different hydrological models applied to a large number of catchments.In this study, the FAO Penman-Monteith equation for a reference grass was used as a basis to re-calibrate these parameters at different locations across the valleys.This modification was required since the Penman-Monteith equation, which was more suited to estimating crop water needs, could not be used over the whole study period due to limited data availability (wind speed, relative humidity, solar radiation).Interpolated K C curves were constructed for each crop variety using a series of phenological models to simulate the annual dates of budburst, full bloom, harvest and leaf fall (see Sect. 3.1.5).The value of K C at each of these dates (K C, BB , K C, FB , K C, HV and K C, LF ) was determined from the literature (Villagra et al., 2014) and interviews with local grape growers.Net irrigation water use in the catchment (IWU, in m 3 s −1 ) was computed as a function of IWR, irrigated areas and surface-water availability: where Q nat (m 3 s −1 ) is the natural streamflow simulated by the GR4J model, is a conversion factor and A i (ha) is the irrigated area for crop variety i, which varies on a yearly basis as shown in Fig. 1b.Q min (m 3 s −1 ) is a minimum discharge below which no withdrawal is allowed.This parameter was fixed at 0.25 m 3 s proportion to its irrigated area: where AIW i (mm) is the amount of water allocated to crop variety i and A tot (ha) is the sum of all irrigated areas.Finally, the average soil water content in the root zone was updated as

Phenological modeling (Model C)
To construct the K C curves, the growing season was split into five phenophases: endodormancy, ecodormancy, flow- ering, ripening and senescence.For each grapevine variety, different process-based models were applied to predict the start and end dates of each phenophase (Fig. 3).
A simplified version of the UniChill model (Chuine, 2000) was chosen to simulate the annual dates of budburst (t BB ).This model covers the periods of endodormancy, when growth inhibition is due to internal physiological factors, and ecodormancy (or quiescence), when buds remain dormant because of inadequate environmental conditions.To emerge from endodormancy, grapevines usually require an extended period of low temperatures, which is represented in the model as an accumulation of "chilling" rates R CH : where T A, V is the average daily temperature in the valley and t 0 , a, b and C BB are fitted parameters described in Table 1.δ is a scaling factor set at 0.5 to ensure that the optimal chilling rate (when T A, V = b) has a value of 1 (Caffarra and Eccel, 2010).A sensitivity analysis (not shown here for brevity's sake) was performed to determine the optimal value for t 0 , i.e., the starting date of the endodormancy period (see Table 1).Likewise, from dormancy release to budburst an extended period of high temperatures is generally required (ecodormancy).This process is represented as an accumulation of "forcing" rates R BB : where c, d and F BB are fitted parameters.To prevent overparameterization, the values of c and d were fixed at −0.25 and 15 • C based on information available in the literature (Caffarra and Eccel, 2010;Fila et al., 2012).The sigmoid function of Eq. ( 21) describes the temperature dependence of growth rates in a more realistic way than usual approaches based on growing degree-days.
The four-parameter model developed by Wang and Engel (1998) (hereafter referred to as WE) was selected to simulate the annual dates of full bloom (t FB ) and harvest (t HV ): Eventually, the post-harvest period was modeled as a constant number of days (N LF ) between t HV and the end of leaf fall (t LF ).The value of N LF was obtained from interviews with local grape growers for each variety (see Table 1).

Model evaluation
The phenological and hydrological models were evaluated separately using different methods and/or objective functions.Models A and B have the same number of calibrated hydrological parameters (i.e., six parameters).
The models were evaluated using either (1) simulations obtained with a single, "optimal" parameter set, or (2) probabilistic predictions obtained by sampling the posterior distributions of the parameters.In the first case, model efficiency and internal consistency were assessed.In the second case, predictive uncertainty bands were derived and scrutinized in terms of reliability and sharpness.

Model efficiency and internal consistency
Model efficiency measures the ability to fit the observed behavior of the system with regard to specific criteria.In this study, the Shuffle Complex Evolution (SCE) algorithm (Duan et al., 1993) was used to maximize the following criterion: where KGE and KGE inv refer to the Kling-Gupta efficiency (Gupta et al., 2009) computed from discharge (Q) and in-verse discharge (1/Q) values, respectively.This composite criterion was chosen to emphasize high and low flows equally (Pushpalatha et al., 2012;Nicolle et al., 2014).Internal consistency can be defined as the ability to reproduce the dynamics of internal catchment states without conditioning the model parameters on additional data.Here, this analysis was limited to the Snow Accumulation and Ablation module to evaluate its ability to reproduce the seasonal pattern of snow storage and release within each elevation zone.This was achieved through visual inspection of model-based and MODIS-derived F SCA time series and based on the snow error criterion defined in Hublart et al. (2015a).

Model predictive uncertainty
The Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009) was chosen to approximate the posterior distributions of model parameters and obtain probabilistic streamflow predictions.This required a statistical model of the differences between observed and simulated flows (i.e., residual errors).We used the generalized likelihood (GL) function introduced by Schoups and Vrugt (2010), which describes correlated, heteroscedastic and non-Gaussian errors based on a number of parameters given in Table 1.Uniform priors were assumed to reflect the lack of information on model parameters in this catchment.After a maximum of 30 000 iterations, the quantitative diagnostic of Gelman and Rubin (1992) was used to determine when the chains had converged to the stationary posterior distribution.
The reliability of the predictive distributions was first assessed by checking for the ability of various p confidence intervals (with p = 0.05 to 0.95) to bracket the adequate percentage of streamflow observations (hereafter called POCI for percentage of observations within the p confidence interval): where n is the total number of observations, Limit Upper (p) and Limit Lower (p) are the upper and lower boundary values of the p confidence interval and N indicates the number of observations enclosed within these boundaries.When plotted as a function of p, the POCI points should fall along the diagonal 1 : 1 line.The predictive distributions were also verified using the probability integral transform (PIT) values of streamflow observations, defined as (e.g., Thyer et al., 2009;Wang et al., 2009;Engeland et al., 2010) where F t is the empirical cumulative distribution function (CDF) of streamflow predictions at time t.For ideal predictions (i.e., based on correct statistical assumptions regarding model errors), the π t values are expected to be uniformly distributed between 0 and 1.More details on the correct use and Finally, the sharpness (or "resolution") of the predictive distributions was measured using the average relative interval length (ARIL) criterion proposed by Jin et al. (2010), which should be as small as possible for any p between 0 and 100 %: Each of these posterior diagnostics (POCI, PIT and ARIL) was performed separately for all streamflow observations and three distinct regions of the observed flow duration curve, namely, high flows (20 % exceedance probability), mid flows (20 to 80 % exceedance probability) and low flows (20 % exceedance probability).

Phenological modeling
The phenological models used in Model C were calibrated by minimizing the root-mean-square error (RMSE) between simulated and observed phenological dates over the whole data set (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013).This was achieved using the SCE algorithm with the same number of complexes for all models and crop varieties.Given the small number of available observations, a leave-one-out cross-validation technique was chosen to assess the robustness of each model.Additional metrics such as the Nash-Sutcliffe efficiency (NSE) and the mean difference between observed and predicted dates (i.e., model bias) were also used in validation to characterize the modeling errors.On the whole, eight parameters required calibration for each variety (Table 1).

Phenological simulations
Figure 5 and Tables 2 and 3 show the results obtained for both grapevine varieties with the three phenological models.On the whole, approximately 76 % of the differences between observed and predicted phenological dates fell within the range of ±5 days during calibration (Fig. 5).Moreover, mean absolute errors did not exceed 6.4 days in any case.Such errors can be considered acceptable with regard to the 10-day time step chosen to evaluate the hydrological models.The best results were obtained for Flame Seedless with the budburst (BB) model and for Moscatel Rosada with the full bloom (FB) and harvest (HV) models.RMSE values ranged from 3.0 to 6.1 days in calibration and from 5.4 to 7.9 days in validation, indicating a moderate loss of performance (Table 2).In general, bias values remained close to zero, except for Moscatel Rosada with the HV model.NSE values were positive for all varieties and models in calibration but decreased sharply in validation, with only two values above 0.50 and one negative value for Flame Seedless with the FB model.However, very low to negative NSE values are not uncommon in phenological modeling, when only a few observations (< 10 years) collected from a single site are used to calibrate the models (e.g., Parker et al., 2013).The optimized parameter values displayed in Table 3 are discussed in Sect.5.4.

Model efficiency and internal consistency
Table 4 shows the results obtained from the calibration and validation of Models A, B and C. Clearly, Model C was found to perform better than Models A and B with respect to the objective function given by Eq. ( 25).This higher performance was mostly the result of improved low-flow simulations (KGE inv ).Table 5 shows that simulated sublimation rates and contribution to snow ablation remained approximately the same when IWU was introduced in the model equations.Estimated mean annual sublimation rates at high elevations (EZ nos. 4 and 5) were consistent with those found by other studies, including experimental studies conducted on small glaciers of the region (MacDonell et al., 2013).
The internal consistency of the SAA module was verified over an independent validation period (2000-2011) using the parameters (θ S , MF) calibrated with each model from 1985 to 1995.The snow errors displayed in Table 4 vary from 2 % in the first elevation zone (EZ no. 1) to 11-17 % in the last one (EZ no.5).Such errors were very encouraging, as they were comparable to those obtained by Hublart et al. (2015) in the Hydrol.Earth Syst.Sci., 20, 3691-3717, 2016 www.hydrol-earth-syst-sci.net/20/3691/2016/  same catchment with less parsimonious (and less realistic) snowmelt models.The impact of considering net radiation and sublimation in the model equations, however, was only evident for EZs no. 4 and 5, where a moderate drop in the snow error was observed.Model A even performed slightly better than Model B with respect to the F obj function.
Figure 6 provides a visual comparison of simulated and observed fractional snow-covered areas (F SCA ) during this validation period for Model C. On the whole, it can be seen that the SAA model did not accumulate snow from one year to another, which was consistent with the observed interannual pattern of snow cover in the catchment.However, there were important discrepancies between the lower and upper elevation zones.In the lower zones (EZ nos. 2 and 3), the model did fairly well during several years of the period (e.g., 2001, 2004, 2009 and 2010), but also underestimated the annual snow-cover duration (SCD) during several other years (e.g., 2002, 2003 and 2007).In the upper zones (EZ nos. 4 and 5), the model generally failed to reproduce the observed variations in F SCA despite improved estimates of the annual SCD.In EZ no. 5, there was also a tendency to overestimate the SCD during the last 3-4 years of the period.

Model predictive uncertainty
Between 10 000 and 13 000 model evaluations were required to reach convergence to a limiting distribution depending on the model used.In each case, the last 5000 samples generated with DREAM were used to compute the posterior diagnostics Flame Seedless 0.11 11.5 27.4 21.2 22.0 55.5 30.2 28.9 Moscatel Rosada 0.57 11.3 10.8 41.8 20.2 49.9 32.9 31.3 Table 4. Goodness-of-fit (calibration) and predicting performance (validation) of the hydrological models.
Calibration (1985( -1995( ) Validation (1985( -1995) ) Snow errors (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011) Model Figure 7 provides a range of formal tests of the statistical assumptions made to describe model residuals in the case of Model C. The density plot of Fig. 7a confirms that model residuals were broadly symmetric and kurtotic, although kurtosis appears to be slightly overestimated.Heteroscedasticity (Fig. 7c) was largely removed by the variance model of the GL function.However, Fig. 7b shows that the assumption of independence was not fully respected, as residuals remained slightly correlated (0.35) at a lag of 1 and at some greater lags, indicating potential storage errors in the model structure.
Figure 8 displays the scatter plots and posterior histograms of hydrological parameters for Models A and C. The results obtained with Model B are not shown here as they were generally close to those of Model C. As can be seen, differences between the structures of Models A and C had no particular effect on parameter identifiability.All parameters appeared to be relatively well-defined with approximately Gaussian distributions, although the values of θ S , MF and X3 occupied a wider range of their prior intervals with Model A than with Models B and C. Introducing sublimation and net radiation into the SAA module reduced the correlation between θ S and MF observed with Model A but simultaneously increased the interaction of θ S with X3 and X4.Likewise, additional checks performed with Models B and C showed that the incorporation of irrigation water use in Model C led to a strong correlation between X2 and X3, which questions the internal consistency of the runoff production and routing module when increasing the model complexity.
Figure 9 shows the posterior diagnostics used to evaluate the reliability (PIT, POCI) and resolution (ARIL) of forecast distributions for Models B and C. At first sight, the PIT values obtained with all streamflow observations appear to be distributed quite uniformly during both simulation periods.Small departures from the diagonal line and the 5 % Kolmogorov confidence bands indicate a tendency to underpredict the observed data, but this applies to both models, especially in validation.On the contrary, significant differences between the two models become obvious when looking at specific portions of the observed flow duration curve.At low flows, the PIT values obtained with Model B revealed a significant overprediction bias during both calibration and validation periods.While it did not affect the percentage of observations covered by the confidence intervals (as POCI values remained close to the diagonal line), this systematic bias resulted in very high ARIL values (exceeding 1.5 in calibration and 3 in validation with the 95 % confidence intervals).By contrast, Model C slightly overestimated predictive uncertainty in calibration but led to highly reliable lowflow predictions in validation, as evidenced by the PIT and POCI plots.This resulted in relatively low ARIL values (< 1).At mid flows, the two models exhibited a similar behavior characterized by a systematic underprediction bias, underestimated POCI values and relatively low ARIL values (< 1).At high flows, the PIT values were well within the Kolmogorov confidence bands for both models, although there was still a tendency to underpredict the observed data.In validation, this underprediction bias translated into an excessively low number of observations enclosed within any p confidence interval for p > 70 %.
Figure 10 shows the uncertainty bands obtained with Models B and C during the two simulation periods.The dark blue region represents the uncertainty in streamflow predictions associated with the posterior parameter distributions, while the light blue region represents the total uncertainty arising from parameter, model structure and input errors simultane-Table 5. Sublimation rates and contribution to snow ablation over the period 2000-2011.These results are for Models B and C and for the five elevation zones (EZs).ously.Some portions of the observed hydrograph have been enlarged to highlight key differences between the two models.In general, uncertainty bands should be wide enough to include the expected percentage of streamflow observations (here, 95 %), but not so wide that the representation of the observed hydrograph becomes meaningless.From this perspective, the main differences between Models B and C were observed for summer flows, i.e., during the irrigation season.

Mean annual sublimation rates (mm day
Model B results in large uncertainty bands that are able to capture most of the observations but that fail to reproduce the seasonal pattern of streamflow during dry years (e.g., 1989-1990, 1994-1995, 1996-1997, 1997-1998, 1999-2000).In this case, structural and input errors represent the dominant sources of uncertainty.By contrast, the width of the prediction limits obtained with Model C tends to decrease as the magnitude of the predicted streamflow decreases.In this  case, parameter uncertainty accounts for most of the predictive uncertainty during summer.However, winter and early summer flows are often underpredicted by both models.This last point is further discussed in Sect.5.3.

Snow accumulation and ablation
The "optimal" cold-content factor (θ S ) was very close to 1 with all models (Fig. 7), indicating a relative insensitivity of the snowpack temperature to changes in air temperature.This finding seems a contradiction of the idea that shallow snowpacks such as those observed in the region should have a low thermal inertia.By comparison, Stehr et al. (2009)  obtained a value of zero for θ S after calibrating the SWAT model in a snowmelt-fed catchment of the more humid central Chile (38 • S).One possible explanation for this apparent contradiction is that mean daily temperatures in northerncentral Chile are rarely negative at low and mid elevations (< 4000 m a.s.l.).A high value of θ S was therefore required to preserve the seasonality of melting during the spring and summer months, despite low snow depths and frequently positive air temperatures throughout the winter.In EZs no. 3 and 4, this model requirement may be due to the impact of latent heat fluxes on the snowpack cold content.During the winter, almost all the energy available from net radiation and sensible heat transfers is consumed by sublimation.This maintains the snowpack temperature slightly below 0 • C and effectively delays snowmelt until the mean daily air temperature stabilizes above 0 • C for a sufficiently long period of time.Another possible explanation is that a high value of θ S implicitly accounts for the effect of night-time freezing, which further delays snowmelt despite warm day-time temperatures.At high elevations (> 4000 m a.s.l., i.e., EZ no.5), where observed air temperatures are mostly negative, we note that a constant lapse rate of 6.0 • C km −1 , as applied in this study for all elevation zones, was also likely to overestimate temperature inputs.Lapse rates at these elevations are generally much greater than that, being in fact closer to the dry adiabatic lapse rate.Again, this would be expected to generate high values of θ S to compensate for temperature overestimation.
The main drawback of this approach (i.e., using air temperature as a proxy for the snowpack cold content) is that it remains largely implicit and only indirectly connected to the amount of water lost by sublimation in the model (i.e., the outcome of Eq. 10 has no effect on Eq. 2).This does not mean, however, that a physically oriented interpretation cannot be sought a posteriori to check for the model realism.Alternative approaches can also be used to account for the delay in meltwater production at the start of the ablation season.In general, these will involve an additional store representing the water-holding capacity of the snowpack (Schaefli and Huss, 2011).Although further research would be required to compare the relative merits of each approach, the representation chosen in this study may be more suited to catchments with shallow snowpacks and significant sublimation.
The "optimal" melt factor (MF) was significantly higher with Model A than with Models B and C (Fig. 7).This was not surprising since, in the case of Models B and C, the effects of net radiation were explicitly considered and the melt factor was meant to parameterize only the contribution of turbulent energy fluxes.Such a "restricted" melt factor is expected to increase with increasing wind speed and/or relative humidity, as shown by Brubaker et al. (1996).The relatively low values (∼ 2 mm • C −1 day −1 ) obtained here were therefore consistent with the overall dry conditions of the study area.However, we found little evidence of improved model performance and internal consistency when a restricted melt factor was used and net radiation and sublimation were introduced in the model equations (see Table 4).This lack of sensitivity may be due to other sources of uncertainty, in particular regarding the choice of an adequate snow depletion curve to estimate fractional snow-covered areas (Eq.6).
While most snowmelt routines used in conceptual catchment models assume either entirely snow-free or entirely snow-covered elevation zones, accounting for the proportion of each zone over which snow extends can be critical where mean snow depths are known to be small.As a first approximation, we relied on a linear relationship between SWE and F SCA that did not account for wind redistribution effects or differences in radiation receipt caused by slopes of different aspects.In the dry Andes, wind-induced redistribution has been shown to significantly increase the spatial variability in snow depth, hence reducing the total snow-cover area during winter (Gascoin et al., 2013;Ayala et al., 2014).For a proper assessment of predictive uncertainty, a multi-criteria likelihood function accounting for the differences between several types of simulated and observed responses (typically, fractional snow-covered areas and stream flows) should be used (e.g., Koskela et al., 2012).This is the subject of ongoing research.

Runoff generation and routing
Figures 9 and 10 revealed a clear underprediction bias in the simulation of winter and early spring flows during several water years.Further details on these systematic deficiencies are provided by Fig. 11, which focuses on a specific El Niño event (2002)(2003).From May to September 2002, the observed winter flow increased rapidly from 0.15 to 0.5 mm day −1 (Fig. 11a) in response to intense rainfall events (Fig. 11b) and gradual snowmelt (Fig. 11c).Most of this precipitation, however, served to refill the soil-moisture accounting (SMA) store of the model, which, after 3 years of intense La Niña-related drought (1999)(2000)(2001)(2002), was only 15 % of capacity (Fig. 11d).As a result, effective precipitation did not exceed 0.5 mm day −1 during this 5-month period (Fig. 11e), of which only 10 %, i.e., less than 0.05 mm day −1 , were processed through the quick flow routing path (Fig. 11f).The remaining 0.45 mm day −1 were added to the routing store, whose water level was also very low in May 2012.The overall quantity routed by both pathways was therefore largely in- sufficient to match the actual streamflow.A similar sequence was observed for all water years characterized by the same failures in streamflow predictions, shedding light on two critical sources of uncertainty related to structural deficiencies and input data errors.

Structural deficiencies
One possible source of model inadequacy lies in the representation of runoff production by a single SMA store, which lumps together quite distinct landscape units.In the mountains, most of the land cover is dominated by barren to sparsely vegetated exposed rocks, boulders and rubble.The topography is steep, with slopes as large as 30 • and very poor soil development above the mountain front zone.By contrast, the valley bottoms appear as relatively flat areas largely covered by vegetation.Alluvial fans are also found along the mountain foothills, acting as hydrologic buffers between the mountain blocks and the valleys.
Another potential source of structural uncertainty relates to the type of precipitation entering the SMA store.Snowmelt typically occurs at a much lower and more consistent rate than rainfall, and much of the meltwater is expected to soak into the ground.Rain, while not a dominant feature of semi-arid Andean catchments, can exert a significant influence on winter flows even during dry years.While snowmelt events occur mainly in the uplands, most rainfall events take place in the valley bottoms, i.e., much closer to the catchment outlet and generally not very far from the saturated riparian zone.In most precipitation-runoff models, however, rainfall and snowmelt inputs are treated as the same kind of "water" and processed through the same model paths.More research is needed to determine whether different types of precipitation inputs, which would be expected to involve different modes of runoff generation, should translate into different model representations.Investigating such hypotheses was far beyond the scope of this study.

Impacts of input data errors
Relatively high values were obtained for X1 (> 1000 mm) and X2 (∼ 4-5 mm), which was somewhat surprising given our understanding of storage capacities and water fluxes in the Claro River catchment.The X2 parameter, in particular, is used to represent groundwater exchanges with the underlying aquifer and/or neighboring catchments.shown from the analysis of 1040 French catchments that alluvial aquifers are more likely to be associated with negative values of X2, whereas crystalline bedrocks tend to correlate with values centered on zero (−5 < X2 < 5).Over the long term, however, the value of X2 is expected to be zero if the catchment is a closed system.
In this catchment, the valley-fill aquifers that compose most of the groundwater flow system are bounded by large mountain blocks of granitic origin, which drastically limits inter-catchment flow paths.Groundwater in the bedrock is typically found in fractures or joints, with a low storage capacity, and soils are, on the whole, poorly developed.As a result, low values of X1 and negative values of X2 would have seemed more "realistic".Note that the autocorrelation structure of model residuals shown in Fig. 7 was also indicative of substantial storage errors in the hydrological model.This lack of physical realism suggests that other factors may be at play.Both of these parameters, indeed, are known to interact strongly with precipitation and evapotranspiration input errors (e.g., Andréassian et al., 2004;Oudin et al., 2006;Thyer et al., 2009).The capacity of the SMA store tends to increase in the presence of random precipitation errors or if precipitation is systematically overestimated (Oudin et al., 2006).Likewise, an excessively high value of X2 might indicate that potential evapotranspiration is overestimated and/or precipitation underestimated.
As in many mountainous catchments, some precipitation events occurring at high elevations may not be captured by the gauging network (< 3200 m a.s.l.) used to interpolate precipitation across the catchment.These occasional errors naturally add to systematic volume errors caused by wind, wetting and evaporation losses at the gauge level, leading to an overall underestimation of precipitation at the catchment scale.However, a large uncertainty also surrounds the estimation of elevation effects on precipitation.Mean annual precipitation was assumed to increase by ∼ 0.4 m km −1 (in water equivalent) (Sect.2.2.1), yet in the absence of reliable precipitation data above 3200 m a.s.l., it is unclear whether this gradient underestimated or overestimated precipitation enhancement.In general, it is unlikely that a constant value would represent orographic effects correctly at all elevations and over the whole simulation period.Precipitation enhancement in the Andes can vary considerably on a year-to-year basis or from one event to another (Falvey and Garreaud, 2007), leading to time-varying errors in the estimation of precipitation inputs.From Fig. 6 we hypothesize that precipitation was on the whole underestimated, and only occasionally overestimated.Overestimation of potential evapotranspiration is also a plausible hypothesis for Models B and C owing to possible interactions with the estimation of sublimation rates and irrigation water use (Fig. 7).

Phenological modeling
In contrast to lumped catchment models, the phenological models used in this study allow for a direct interpretation of parameter values through comparison with existing experimental studies.This provides a second level of model validation.
The values obtained for T opt (i.e., the optimal forcing temperature) with the full bloom and harvest models (Table 3) were generally close to the range of optimal photosynthetic temperatures reported in the literature, i.e., typically 20-30 • C (García de Cortázar-Atauri et al., 2010).By contrast, relatively high values (around 11-12 • C) were found for parameter b (i.e., the optimal chilling temperature) compared to those reported by previous modeling and experimental (e.g., Fila et al., 2012) studies.Moreover, the values obtained for parameter a, which determines the range of acceptable chilling temperatures around the optimum b, imply that temperatures around 13-16 • C were still effective as chilling temperatures.Caffarra and Eccel (2010) and Fila et al. (2014) also found large effective chilling intervals with similar budburst models but different grapevine varieties, which they explained in different ways.In our case, this outcome was most likely related to the use of mean daily temperatures as inputs to the budburst model.Very high diurnal variations (∼ 20 • C) can be observed at the INIA experimental site, where a mean temperature of 11-12 • C actually reflects temperatures close to 0 • C during several hours of the day.The critical states of chilling (C BB ) obtained for both varieties indicate that between 11 and 27 days at 11-12 • C were required to break endodormancy.Assuming that winter temperatures remained close to zero during at least 5 h per day, these results are fully consistent with the fact that most grapevine varieties typically require between 50 and 400 h at temperatures below 7 • C to achieve budburst (Fila et al., 2012).However, given the limited number of years with available observations and the absence of direct evidence for the release of endodormancy, possible trade-offs between the chilling (a, b, C BB ) and forcing (F BB ) parameters during the optimization process cannot be dismissed a priori.Thus, while the phenological models can be considered reliable under the conditions observed over 1985-2005, their results should be treated very carefully when dealing with potential impacts of higher temperatures.
Figure 12.Comparison of net surface-water withdrawals (SWW) and irrigation water use (IWU) at the catchment scale: SWW were obtained by considering monthly restrictions to water access entitlements provided by the Chilean authorities, a conveyance efficiency of 0.6 and a field application efficiency of 0.6 for pisco varieties and 0.9 for table varieties; IWU was obtained from model simulations.

Irrigation water-use modeling
While no ground data were available to verify our estimates of irrigation water use, a comparison was made with net surface-water withdrawals (SWW) estimated from the water access entitlements database (Fig. 12).Not surprisingly, this comparison revealed large discrepancies between these two quantities, especially from 1985 to 1990, which could explain the poor performance of all models in water years 1985-1986and 1986-1987 (Fig. 10) (Fig. 10).It is worth noting, however, that SWW data reflect more a level of water availability in the catchment than the actual water consumption in the vineyards.These data may also indicate sudden changes in the management of water resources at the regional scale that do not necessarily affect irrigation requirements at the local scale.Overall, the actual water use in the catchment is likely to be somewhere between simulated IWU and net SWW estimates.Incorporating IWU simulations into conceptual catchment models can help reduce the uncertainty associated with low-flow simulations, yet it is by no means a substitute for accurate measurement of water withdrawals.
The relative stability of simulated IWU from year to year is perhaps more surprising given the complexity of the phenological models used.However, this stability could not be taken for granted before running the models (it can only be observed a posteriori).Using phenological models also has considerable advantages in terms of model robustness under climate-and/or human-induced changes, which are further discussed in Sect.6.

Conclusion and prospects
Hydrological processes are often poorly defined at the catchment scale due to the limited number of observations at hand and the integral (low-dimensional) nature of these signals (e.g., streamflow).This makes it relatively easy to over-fit the data by adding new hypotheses to our models, leading to a low degree of falsifiability from a Popperian perspective.Therefore the incorporation of new processes into a given model structure should be achieved using as few additional parameters as possible and the same level of mathematical abstraction as in the original model (as stated in Sect.1.4).Ultimately, it is also necessary to show that this increase in model complexity improves hydrological simulations without increasing predictive uncertainty.
In the present paper, sublimation losses were incorporated by assuming that the snowpack can either melt or sublimate.This modeling choice may seem to oversimplify the physics of snowpacks, yet it allows for the same level of process representation as in commonly used empirical melt models.On the whole, this modification helped to reduce errors in the simulation of snow-cover dynamics at high elevations without increasing the number of snow-related parameters.However, more research is needed to determine the exact interaction between snow sublimation and melt in the model.Compared to sublimation losses, the introduction of irrigation water use (IWU) increased the overall number of parameters.Yet this increase in complexity came with additional data (observed phenological dates) to reduce the number of degrees of freedom.The reliability of probabilistic streamflow predictions was greatly improved when IWU was explicitly considered, resulting in relatively narrow uncertainty bands and reduced structural errors.As such, this model modification appears to be supported by the available data.Incidentally, this approach also provided evidence that water abstractions from the unregulated Claro River are impacting on the hydrological response of the system.
One of the main advantages of incorporating IWU is that it provides an estimate of natural streamflow that can be used to assess the system's capacity to meet increasing irrigation needs (e.g., Fabre et al., 2015b).To our knowledge, most of the other approaches used to "naturalize" influenced streamflow in agricultural catchments do not account for the impacts of climate variability on crop water use.Instead, the sum of all historical water rights is usually taken as an upper bound for the actual water consumption and added back to observed streamflow before calibrating the model.This makes it difficult to use conceptual catchment models in climate change impact studies, since changes in temperature patterns are expected to affect both the timing and volume of irrigation water use.Depending on their magnitude, seasonal shifts in the timing of snowmelt runoff and phenological events could result in either additive or countervailing effects.Earlier peak flows, for instance, could lead to an increase in water supply at a time when it is not required, P. Hublart et al.: Reliability of lumped hydrological modeling or simply compensate for a similar shift in crop phenology.A new generation of low-dimensional modeling approaches is required to better understand how these processes interact and evaluate the possibility of selecting the most suitable varieties and irrigation strategies for a given hydro-climatic context (Duchêne et al., 2010;Palliotti et al., 2014).In this paper, the use of phenological models based on functions that integrate both the negative and positive effects of higher temperatures on crop development is suggested as a parsimonious way to improve model robustness in the future.
However, critical challenges remain to be addressed before the model can be used for such prospective studies.In particular, more research is needed to better separate the effects of rural land-use change from other sources of variability and uncertainty in conceptual catchment models (McIntyre et al., 2014).Future work will focus on improving the estimation of fractional snow-covered areas and the sensitivity of runoff generation components to intense rainfall and protracted droughts.Results also highlight the need for a better representation of surface water-groundwater interactions in the routing module.Given the difficulty in estimating precipitation in the dry Andes, isotope-based studies could considerably help to quantify the relative contributions of snowmelt, rainfall, groundwater and glacierized areas to streamflow (Ohlanders et al., 2013).Such understanding is critical to discriminate between several sources of errors and improve model reliability for use in impact and adaptation studies.

Data availability
Hydro-climatic data (precipitation, temperature, streamflow, monthly restrictions to water assess entitlements) are freely available on request from the website of the Chilean Direccion General de Aguas (2016) (http://www.dga.cl).MODIS data are freely available through the NASA Distributed Active Archive Center (DAAC) at the National Snow and Ice Data Center (NSIDC, 2016) (https://nsidc.org/data/modis/).

Figure 1 .
Figure 1.The Claro River catchment, Chile (30 • S): (a) topography and current location of irrigated areas, (b) evolution of irrigated areas since 1985 (interpolated from local cadastral surveys) for both types of grapes, and (c) potential effects of increased irrigation water use on mean annual hydrographs since the mid-1990s.These effects were estimated from the difference between streamflow measured at the outlet in Rivadavia (in black) and that measured at Cochiguaz and Alcohuaz (in red), which remains largely unaltered.

Figure 2 .
Figure 2. Block diagram of the lumped modeling framework developed in this study.The blue blocks refer to the hydrological part of the framework (used by Models A, B and C), while the green blocks relate to the estimation of irrigation water requirements and irrigation water use (used only by Model C).The simulated outputs and observed data used for calibration/validation are indicated in orange.A satisfaction rate can also be computed based on the ratio between water availability and irrigation requirements.

Figure 3 .
Figure 3. Crop growth and water requirement modeling framework: (a) partitioning of the growing season into five phenophases and parameterization of each phenophase, (b) functions used to express the accumulated chilling and forcing rates over each phenophase, and (c) translation of the simulated dates of budburst, full bloom and harvest into an interpolated K C curve for use in the IWU model.Model parameters are indicated in italic and colored in red.Note that parameters t 0 , c, d, T min , T max , K C, BB , K C, FB , K C, HV and K C, LF were fixed beforehand to avoid over-parameterization.
19)Hydrol.EarthSyst.Sci., 20, 3691-3717, 2016   www.hydrol-earth-syst-sci.net/20/3691/2016/ P. Hublart et al.: Reliability of lumped hydrological modeling 3701 with α = log (2) /log (T max − T min ) / T opt − T min , (20)where F FB , F HV and T opt ( • C) were calibrated separately for each variety.Note that T opt also varies with the phenophase under study (flowering or ripening).Compared to other flowering and harvest models based on forcing rates, this one has the major advantage of also accounting for the inhibiting effect of extreme temperatures on photosynthesis.As leaf growth typically ceases at temperatures below 0-5 • C (Hendrickson et al., 2004) and above 35-40 • C (Greer and Weedon, 2013), parameters T min and T max were fixed beforehand at 0 and 40 • C, respectively (García de Cortázar-Atauri et al., 2010).

P
.Hublart et al.:  Reliability of lumped hydrological modeling interpretation of PIT plots, including the use of Kolmogorov significance bands as a test of uniformity, can be found inLaio and Tamea (2007) (see also Fig.4).

Figure 4 .
Figure 4. Possible interpretations of PIT plots (modified from Laio and Tamea, 2007).The diagonal line (in black) represents the ideal case.

Figure 5 .
Figure 5. Observed vs. predicted dates of budburst, full bloom and harvest for Flame Seedless and Moscatel Rosada at the INIA experimental site.The dates are expressed in number of days since 1 June.The minimum, maximum and mean absolute errors (in days) are given for each variety and stage of growth (the values between parentheses relate to the validation step, while the values in front of the parentheses relate to the calibration step).The upper and lower blue lines indicate delays of ±5 days between observed and predicted dates, respectively.

Figure 6 .
Figure 6.Comparison of simulated (i.e., Model C, accounting for sublimation) and observed (i.e., MODIS-based) fractional snow-covered areas (validation period).The graduations on the x axis indicate 1 January of each year.

Figure 7 .
Figure 7. Formal checks of the statistical assumptions used to describe model residuals.Application to Model C (simulated for the validation period with the inferred maximum likelihood parameter set): (a) assumed and actual pdf; (b) partial autocorrelation; and (c) heteroscedasticity of standardized residuals.

Figure 8 .
Figure 8. Two-dimensional scatter plots of the posterior parameter samples obtained with Models A and C. The numbers in italic at the center of each cell indicate correlation coefficients.The histograms in orange represent the marginal posterior distributions of parameters with superimposed kernel density estimates.The scatter plots and histograms of Model B were not included here for brevity's sake, as they were very close to those of Model C.

Figure 9 .
Figure 9. Posterior diagnostics used to evaluate the reliability (PIT, POCI) and resolution (ARIL) of the forecast distributions obtained with Model B (in blue) and Model C (in red).

Figure 10 .
Figure 10.Predictive uncertainty bands obtained for Models B and C with the DREAM algorithm and GL function.The dark blue region represents the 95 % confidence intervals associated with parameter uncertainty, whereas the light blue region represents the 95 % confidence intervals associated with parameter, model structure and input errors.The tick values on the x axis indicate 1 January of each year.

Figure 11 .
Figure 11.Internal state variables and fluxes obtained with Model C during the 2002-2003 El Niño event (using the best-performing parameter set obtained by calibration against the F obj function).
the latent heat of vaporization (∼ 2.46 MJ kg −1 ) and K 1 (5 • C) and K 2 (100 • C) are fitted parameters (see Sect. 3.1.4for further details).Spatially averaged PE inputs to the GR4J model (i.e., PE GR4J ) were obtained after subtracting the energy consumed by melting and sublimation:

Table 1 .
Initial range or value of each model parameter.The third column provides explanations of the meaning of the parameters and their units (in parentheses).The fourth column indicates whether parameters are calibrated or fixed beforehand.
−1 based on historical low-flow records.Simulated (influenced) discharge at the catchment outlet was computed from the difference between Q nat and IWU at each time step.When IWR could not be entirely satisfied, irrigation water was allocated to each crop variety i in Hydrol.EarthSyst.Sci., 20, 3691-3717, 2016www.hydrol-earth-syst-sci.net/20/3691/2016/ * For more details on the GL function, see Schoups and Vrugt 2010).

Table 3 .
Calibrated parameter values of the phenological models.
Positive values indicate a net water gain at the catchment scale, whereas negative values relate to a net water loss.Le Moine et al. (2007) have P. Hublart et al.: Reliability of lumped hydrological modeling