Flood and drought hydrologic monitoring: the role of model parameter uncertainty

Land surface modeling, in conjunction with numerical weather forecasting and satellite remote sensing, is playing an increasing role in global monitoring and prediction of extreme hydrologic events (i.e., floods and droughts). However, uncertainties in the meteorological forcings, model structure, and parameter identifiability limit the reliability of model predictions. This study focuses on the latter by assessing two potential weaknesses that emerge due to limitations in our global runoff observations: (1) the limits of identifying model parameters at coarser timescales than those at which the extreme events occur, and (2) the negative impacts of not properly accounting for model parameter equifinality in the predictions of extreme events. To address these challenges, petascale parallel computing is used to perform the first global-scale, 10 000 member ensemble-based evaluation of plausible model parameters using the VIC (Variable Infiltration Capacity) land surface model, aiming to characterize the impact of parameter identifiability on the uncertainty in flood and drought predictions. Additionally, VIC’s global-scale parametric sensitivities are assessed at the annual, monthly, and daily timescales to determine whether coarse-timescale observations can properly constrain extreme events. Global and climate type results indicate that parameter uncertainty remains an important concern for predicting extreme events even after applying monthly and annual constraints to the ensemble, suggesting a need for improved prior distributions of the model parameters as well as improved observations. This study contributes a comprehensive evaluation of land surface modeling for global flood and drought monitoring and suggests paths forward to overcome the challenges posed by parameter uncertainty.


Introduction
Droughts and floods can have devastating consequences on ecosystems, food supply, and economies (Easterling et al., 2000). Providing real-time information and predictions to decision makers can be a valuable tool to mitigate their effects. This is an especially challenging task over data-sparse regions, where unreliable monitoring networks and generally low institutional capacity limits the spread of timely information (Sheffield et al., 2013). State-of-the-art land surface models, in conjunction with numerical weather forecasting and satellite remote sensing, pose a plausible solution to supplement local observation networks. Given the accessibility of these data sources, multiple systems have arisen over the past decade that aim to provide continental and global monitoring and predictions of the hydrologic cycle (Sheffield et al., 2013;Vogt et al., 2011;Svoboda et al., 2002;Verdin et al., 2005). The land surface model component of a monitoring system is useful to understand the impact of flood and drought on the energy, carbon, and hydrologic cycles. This is possible with the current generation of LSMs that include the main physical, biological, and chemical processes at the land surface (Niu et al., 2011). The increasing complexity and sophistication of land surface models can provide a more complete assessment of the state of the land surface but also requires an increase in the number of process parameterizations and model parameters. In the past, parameter estimation in land surface models consisted of using look-up tables to assign model parameters based on similarity between sites as a function of soil and vegetation. However, sensitivity analysis of macroscale land surface models suggests that this is overly simplistic and can lead to significant uncertainty (Rosero et al., 2010;Hou et al., 2012).
Parameter calibration, a common practice in hydrology, can help reduce model bias, understand model deficiencies, and increase the model's reliability (Harding et al., 2015;Cibin et al., 2010;Döll et al., 2003;Sheffield et al., 2013). However, optimizing model performance to a limited set of observations does not ensure the model is getting the right answer for the right reasons (Kirchner, 2006). Instead, there tend to be multiple parameter sets that satisfy the observations; in hydrology this is known as model parameter equifinality (Beven, 2006). Although the performance might be similar for a given calibration metric, the results can vary significantly when comparing other metrics, timescales, or variables (Gupta et al., 2008;Herman et al., 2013;Wagener and Gupta, 2005;Reusser and Zehe, 2011;Reusser et al., 2009;Clark and Vrugt, 2006).
The model equifinality hypothesis is especially relevant in global land surface modeling where the sparsity of observations in space and time and the increasing number of model parameters leads to heavily underconstrained parameter estimation. In this study, we use an ensemble of behavioral parameter sets to capture the spread in simulated energy and water cycles. This improves model evaluation by enabling a comprehensive assessment of the model parameter and model structure deficiencies (Pappenberger and Beven, 2006). A growing number of hydrologic monitoring systems already include the impact of uncertainty in meteorological forcing (Cloke and Pappenberger, 2009); this should be extended to include model parameter uncertainty.
Given the significant number of model parameters in existing global land surface models, carefully designed sensitivity analysis can help minimize the number of uncertain parameters that must be explored for effective model evaluations while reducing computational demands. Up to now, there have only been a limited number of sensitivity analyses of macroscale land surface models. These studies have shown that parameter sensitivity varies with climate, soil, and vegetation properties (Liang and Guo, 2003;Rosero et al., 2010). In the hydrologic cycle, evidence suggests that the runoff partitioning (i.e., between baseflow and surface runoff) plays a dominant role in daily flow estimates over a number of climates (Demaria et al., 2007). The baseflow generation model parameters can also play an important role in the seasonality of the land surface fluxes (Hou et al., 2012). However, questions remain regarding the applicability of these studies globally, suggesting the need for similar analyses over all global land area.
In this study, we accomplish this goal by performing a comprehensive sensitivity analysis of the global VIC (Variable Infiltration Capacity, Liang et al., 1996) macroscale land surface model. A Latin hypercube sample of 10 000 parameter sets is used to run the model from 1948 to 2010 per 1.0 • land grid cell over the globe. The GRDC (Global Runoff Data Centre) monthly climatology of gridded runoff obser-vations (Fekete et al., 2002) is used to isolate the behavioral parameter sets. The constrained ensemble is then used to understand: first, the consequence of identifying model parameters at coarser timescales than those at which the extreme events occur, second, the impact of not properly accounting for model parameter equifinality in the estimates of extreme events, and third, the model parameters that control the hydrologic processes at the annual, monthly, and daily timescales. Finally, the results are used to propose paths to provide reliable uncertainty estimates and suggest processes and parameters that require improved observations and parameterizations.

Meteorology: Princeton global forcing data set
The meteorological forcing data set consists of 3 hourly, 1.0 • resolution fields of near-surface meteorology for global land areas for 1948-2010 (PGF; Sheffield et al., 2006). The data set merges data from the NCEP-NCAR reanalysis (National Center for Environmental Prediction and National Center for Atmospheric Research; Kalnay et al., 1996) with the GPCP (Global Precipitation Climatology Project; Adler et al., 2003) and TMPA (TRMM multi-satellite precipitation analysis; Huffman et al., 2007) observation-based data sets of precipitation, temperature from CRU (Climatic Research Unit;New et al., 2000;Harris et al., 2013), and radiation from SRB (surface radiation budget; Stackhouse et al., 2004). For the simulations, we use precipitation, temperature, pressure, downward shortwave and longwave radiation, specific humidity, and wind speed.

Land data
The default model soil and vegetation parameters are the same as those described in Sheffield and Wood (2007). The global soil texture comes from the 5 min FAO-UNESCO (Food and Agricultural Organization-United Nations Educational, Scientific, and Cultural Organization) digital soil map of the world and the World Inventory of Soil Emission Potentials (WISE) pedon database (Batjes, 1995). Land cover information is given by the University of Maryland land cover type data set (Defries et al., 2000). The parameters for each land cover type are assigned using the sources described in Nijssen et al. (2001). The monthly climatology of leaf area index is based on Myneni et al. (1997). The baseline parameters for the land surface model come from these data sets.

Gridded runoff observations: GRDC climatology
The observations of global gridded runoff come from the GRDC global runoff climatology (Fekete et al., 2002). The data set provides the interstation observations at 663 stream gauges. To minimize river routing uncertainty, stream gauges are only used when the interstation area between two gauges is below 1 million km 2 and less than 10 % of the grid cells have a travel time to the gauge above 10 days (assuming a fixed flow velocity of 1 m s −1 ). The gridded estimates are obtained by spatially disaggregating the observed interstation area runoff using the VIC model ensemble. Following the work of Fekete et al. (2002), we assume that the simulations of the land surface model provide the true spatial heterogeneity at the monthly scale. The observed monthly climatology is then used to bias-correct each cell's ensemble mean of simulated monthly flow. Uncertainty in the observed monthly flow is assumed to be negligible relative to the impact of parameter uncertainty. Further details on the model ensemble will be given in Sect. 3.2.

Köppen-Geiger climate
The Köppen-Geiger climate classification is used to assess how model parameter sensitivity varies across climates. This data set divides the world into five different climates based on five vegetation groups. The second and third categories consider precipitation and air temperature. The most recent version of this data set was updated in 2006 using the CRU (Climatic Research Unit) and GPCC (Global Precipitation Climatology Centre) data sets. These updates make the data set suitable for the second half of the 20th century (Kottek et al., 2006). In this study only the five general climate groups are used: tropical, arid, temperate, continental, and polar.

VIC: land surface hydrologic model
The macroscale VIC land surface hydrologic model (Liang et al., 1996) simulates the land surface hydrologic and energy cycles. The model's sub-grid heterogeneity is parameterized using the variable infiltration capacity curve and tiling of land cover classes. Baseflow is modeled as a nonlinear recession from the lowest soil layer (Dumenil and Todini, 1992) and evapotranspiration is calculated using Penman-Monteith (Monteith, 1964). The subsurface is discretized into multiple soil layers; gravity drainage models the movement of moisture between the soil layers. The model captures cold land processes through snow pack storage, frozen soils, and subgrid distribution of snow based on elevation banding. For further details, see Sheffield and Wood (2007).

Model parameter uncertainty: Latin hypercube sample
Samples of the model parameter space are obtained using a Latin hypercube sample (LHS) of size 10 000. LHS is used due to its strength to properly sample the parameters by dividing the parameter space into regions of equal probability (McKay et al., 1979). Since this study focuses on the hydro-logic cycle, we focus on sampling parameters that contribute to runoff generation. Seven of the nine chosen parameters come from Troy et al. (2008). A multiplier of the tabular minimum stomatal resistance values is added due to its potential impact on the partitioning of runoff and evaporation. Table 1 shows each parameter's name, description, units, and range. Each parameter is drawn from a uniform distribution; parameters that cover 2 or more orders of magnitude are sampled in log 10 space. For each LHS parameter set, the model is run at a 3-hour time step between January 1948 and December 2010 with a 10-year spin up period. Parameter values are assumed to be uncorrelated in space. The 10 000 ensemble members are run for all 1.0 • land grid cells over the globe excluding Greenland and Antarctica (15 836 grid cells in total). To assess how well the model can reproduce observed runoff, a set of annual and monthly thresholds are used to obtain each grid cell's behavioral parameter sets. The 10 000 LHS ensemble is constrained using the 1.0 • observed gridded runoff. The relative error of the simulated annual runoff is used as a first constraint. For each grid cell, all parameter sets that lead to a relative error in annual mean runoff above 10 % are discarded. This threshold is set relatively high due to measurement uncertainties in the observation data set and the spatial disaggregation method described in Sect. 2.3. The second constraint attempts to find all ensemble members that also follow the observations' seasonality. The simulated and observed monthly runoff climatologies are normalized (to remove remaining annual biases) and the Pearson correlation between the observations and simulations is computed. The correlation threshold is set to 0.75. This threshold is set relatively low due to incomplete accounting of the effects of river routing in the observations and simulations. Ensemble members satisfying both the annual and monthly constraints are deemed behavioral, and the posterior distributions of behavioral parameter values are used to assess parameter sensitivity.

Model parameter sensitivity
Quantifying the role of each model parameter at different timescales can help discern the parameters (and processes) that can be constrained using coarse timescale observations (e.g., annual and monthly flows). It can also inform us about which parameters play an important role at finer timescales (e.g., daily flows) and are minimally impacted by coarse timescale constraints.

Parameter space reduction: annual and monthly flows
Beyond quantifying how many parameter sets of the 10 000 member ensemble satisfy the monthly and annual constraints, we aim to understand how the reduction in bias and increase in monthly skill is related to a location's climate. To accomplish this goal, the annual flows are analyzed by deter- mining the change in runoff ensemble mean after applying the constraints. Furthermore, since the monthly constraint attempts to improve the simulation's unbiased seasonality, it effectively aims to capture the temporal smoothness of the observed climatology. This effect is quantified by analyzing the change in the 1-month lag autocorrelation. Our computation of parameter sensitivities after applying the annual and monthly constraints -summarized in Fig. 1 -follows the work of Fenwick et al. (2014). For each grid cell, the area between each parameter's prior cumulative distribution function and the posterior cumulative distribution function is computed as where x l and x u are the lower and upper bounds of the parameter in question, which are normalized to [0, 1] to improve interpretability of the result. The integrals are computed numerically using the trapezoid rule with x = 0.01. The calculated area serves as a robust sensitivity metric indicating the change in the distribution of each parameter caused by applying the performance constraints. Because the prior parameter distributions in this study are uniform, the maximum value of this metric is 0.5 (i.e., if only a single ensemble member satisfies the performance constraints and remains in the posterior distribution). This "CDF distance" sensitivity method bridges the classical regional sensitivity analysis framework (Spear and Hornberger, 1980) and the Delta moment-independent measure (Plischke et al., 2013;Borgonovo, 2007). Regional sensitivity analysis employs the maximum difference between cumulative distributions as a sensitivity measure. The Delta moment-independent measure (Plischke et al., 2013;Borgonovo, 2007) uses the area between prior and posterior PDFs rather than CDFs. We compute two CDF distances: first, between the original uniform distribution and the posterior after applying the annual constraint (below 10 % absolute error), and second, between the posterior after the annual constraint and the posterior after applying the additional monthly constraint (r > 0.75). The advantages of the CDF distance method for this study are (1) it does not require special statistical sampling and will work for the given data, and (2) it ties parameter sensitivity to a model performance threshold to identify parameters responsible for a particular outcome rather than overall changes in the output.

Parameter uncertainty: daily flows
Reducing the annual and monthly model parameter uncertainty using the GRDC monthly climatology does not ensure a similar reduction in the uncertainty of daily flows. This is especially relevant to drought and flood monitoring systems that attempt to capture the sub-monthly hydrologic extremes over data-sparse regions. If the most sensitive parameters at the daily scale are also the most sensitive parameters at the annual and monthly timescales, then there should be a substantial decrease in uncertainty. However, if the parameter sensitivity at different timescales is orthogonal, then the reduction in uncertainty at the daily scale will be negligible. To address this question, for each grid cell, the daily flow duration curves of the full ensemble (10 000 members) and behavioral parameter sets are calculated. The changes in the spread at different sections (low, median, and high flows) of the flow duration curve are analyzed.
Given that uncertainty will persist in the daily flows after applying the constraints, the question remains about which parameters control the remaining ensemble spread and need to be more heavily constrained. This is done by analyzing the spread in daily flow extremes on both sides of the distribution (1st and 99th percentiles) for the strictest annual and monthly constraints (relative error below 10 % and monthly correlation above 0.75). For each percentile, the Spearman rank correlation between all behavioral parameters and their associated flow is computed. The Spearman correlation was chosen here because (1) observations of daily flows are not available, so behavioral parameters cannot be identified as with the CDF distance measure described in Sect. 3.2.1, and (2) in general the relationship between parameter values and daily extreme flows will be nonlinear. The Spearman correlation provides a metric describing how a given parameter  Figure 1. Steps used to build and constrain the 10 000 Latin hypercube VIC ensemble. The CDF distance is calculated for each VIC parameter after applying the annual error constraint and again after applying the monthly correlation constraint. controls the spread in daily flows, which may have been underconstrained by the annual and monthly performance requirements imposed in the previous step. This is done for each of the nine parameters.

VIC Latin hypercube sample and behavioral parameter space reduction
For each land 1.0 • grid cell (15 836) the VIC (Variable Infiltration Capacity) land surface model is run between 1948 and 2010 at a 3-hour temporal resolution for 10 000 parameter sets obtained from a Latin hypercube sample. This was possible due to the Blue Waters supercomputer (http:// www.ncsa.illinois.edu/enabling/bluewaters); the simulations required more than 2 million computing hours (> 200 years) and resulted in an output of over 1.5 petabytes. The data were then summarized into daily, monthly, and yearly data sets. Each grid cell's 10 000 LHS ensemble VIC simulations are constrained using the observed gridded runoff fields described in Sect. 2.3. Figure 2 shows global maps of the fraction of parameter sets that fulfill each error criterion. In the Northern Hemisphere, a considerable number of grid cells have a large fraction of ensemble members that are below 10 and 20 % relative error, suggesting a small annual bias in the input meteorological forcing and a diminished sensitivity to the parameters that impact the annual mean runoff. In many places, there is a sharp decrease in performance when constraining the ensemble with the normalized monthly climatology. This can most likely be attributed to the role that the parameter space plays in controlling runoff partitioning and the challenges when attempting to spatially disaggregate point runoff observations. However, the most prominent feature is the lack of runoff observations (grey areas) and behavioral parameter sets (pink areas) over arid regions and countries with limited adaptation capacity throughout the globe. Figure 3 further summarizes these results as a function of climate classification. Although most of the regions with observations meet the annual constraints (10 and 20 % relative error), there are distinct differences between climates. Tropical and dry climates see the smallest proportion of behavioral parameter sets while continental, polar, and temperate regions experience the largest. The number of behavioral parameter sets decreases even further for all climate types when applying the monthly constraint (Pearson correlation between the simulated and observed normalized monthly climatology). In the case of arid regions, the number of acceptable parameter sets is significantly smaller, especially for the North American Mountain West, the Sahel, and most of Australia. Figure 3 also shows how the change in behavioral parameter sets affects the climate-averaged runoff ensemble mean and 1-month lag autocorrelation. The first annual constraint (20 %relative error) leads to a decrease in annual runoff (increase in evaporation) in tropical, dry, temperate, and continental climates; there is an increase in annual runoff in polar climates. The changes in annual flows are negligible when applying the monthly constraints (explained by the normalization of the monthly runoff). The 1-month lag correlation is used as a smoothness metric to assess the impact of the chosen constraints on the simulated seasonality; a higher autocorrelation indicates smoother monthly flows. In all cases, the constraints increase smoothness. As expected, the largest changes occur when using the Pearson correlation as a constraint (increase in accuracy of seasonality of monthly runoff).
In the context of drought and flood monitoring, these results may have key implications. These include: (1) the large fraction of landmass without observations limits our abil-   ity to constrain the model parameter space over the globe; (2) a limited number of behavioral parameter sets over arid and regions with limited adaptation capacity -focus areas for monitoring systems -suggests considerable limitations in monitoring systems as well as the potential for significant model structural errors; (3) regions with a high fraction of behavioral parameter sets will be susceptible to the impact of model parameter equifinality.

Parameter space reduction: annual and monthly flows
We formalize the sensitivity analysis by examining the cumulative distribution function (CDF) distance between each parameter's prior and posterior distributions. Figure 4 shows the global maps of the CDF distance metric for each parameter after applying the annual and monthly constraints. The color scale of Fig. 4 ranges from 0.0, where the prior and posterior distributions match exactly, to 0.5, the maximum possible value of the CDF distance metric when the posterior distribution contains only a single ensemble member. In general, B, Ds max , Exp, and C Rsmin are the most sensitive parameters to the annual constraint (left panel). However, the sensitivity of C Rsmin dominates the other parameters. Since C Rsmin constrains the maximum transpiration rate in the model, these results suggest that the partitioning of evaporation and runoff dominates the model performance at the annual scale. Simi-  larly, Fig. 5 shows the mean CDF distance metric within each climate classification, with the interquartile range denoted by error bars. For the annual constraint, the sensitivities of B, Ds max , and Exp are highest in regions with less defined seasonal cycles (e.g., tropical). As will be discussed in the next

N. W. Chaney et al.: Flood and drought hydrologic monitoring
section, this can likely be attributed to these parameters playing a distinct role in runoff seasonality. When applying the monthly constraint, the sensitivity of most parameters changes. In Figs. 4 and 5, the negligible sensitivity of C Rsmin suggests that although it plays a fundamental role in ensuring the annual runoff ratio, it does not play an important role in the seasonality; the same applies to Exp. Instead, the most sensitive parameters are B and Ds max since they control the partitioning of runoff into baseflow and surface runoff. As shown in Fig. 5, this is especially true over regions with a characteristic seasonal cycle (e.g., continental climates). Regions that lack a distinct seasonality (e.g., tropical climates) are only sensitive to these parameters at annual timescales. When there exists a strong seasonality in runoff, these parameters can impact the seasonality at the monthly timescale. However, a weaker seasonality leads these parameters to act at an annual scale by controlling the soil water storage and therefore the partitioning of annual evaporation and runoff.
The contrast of the annual and monthly results brings to light the role that timescales can have on the sensitivity of model parameters (and, by extension, processes). The results suggest that the annual scale constraint does not play a large role in the partitioning of monthly baseflow and surface runoff. As will be discussed in the following section, these timescale-dependent changes in parameter sensitivity can have large implications on the ability to simulate daily flows without daily observations to further constrain the ensemble.

Parameter uncertainty: daily flows
The annual and monthly performance constraints allow us to explore the role of the remaining parameter uncertainty on daily runoff estimates. The runoff percentiles are calculated for each ensemble member of each grid cell. Figure 6 shows the climate-averaged spread of the flow duration curves of the 10 000 ensemble members and the most heavily constrained ensemble (annual and monthly). The change in spread provides insight into how constraining (or tuning) at coarser timescales can reduce uncertainty at the daily scale.
As expected from Fig. 3, the annual and monthly constraints lead to a reduction in the daily mean runoff for all climates (except polar). However, the constraints' ability to tighten the ensemble spread varies significantly among climates. The most substantial decrease occurs over continental and polar climates even though these regions experience the lowest decrease in the number of parameter sets (see Fig. 3). This decrease is most likely connected to the results from the monthly sensitivity analysis (see Sect. 4.2.1): over regions that have a distinct seasonal cycle, the monthly climatology is able to heavily constrain the B and Ds max parameters; this then helps constrain runoff at daily timescales. This also explains the small decrease in spread over tropical climates seen in Fig. 6; since the monthly constraints are Figure 6. Climate-averaged ensemble spread in the daily flow duration curve. The spread in flow duration curve is calculated for all 10 000 ensemble members. The blue shading shows the spread of the entire ensemble while the red shading shows the spread for parameter sets that have an annual mean runoff within 10 % of the observed runoff and normalized monthly runoff correlation above or equal to 0.75. not able to constrain the B and Ds max parameters, their uncertainty drives the runoff at daily timescales. While predictions in tropical climates are not well constrained with this approach, the results are encouraging for monitoring the hydrologic cycle with properly-constrained land surface models in continental and polar climates. Figure 6 also illustrates differences in the tightening of the flow duration curve spread at different percentiles. For example, in continental climates the percentiles close to the center experience a substantial decrease in spread; the change in the ensemble spread of the tails (hydrologic extremes) is less significant. This result holds to a varying degree for all climates. The most likely physical explanation is that the annual and monthly constraints focus on the percentiles that produce most of the runoff; this leads to a minimal impact on low flows and a reduced impact on high flows. The nonnegligible role that high flows play in runoff production helps explain the larger decrease in spread when compared to low flows.
Given that considerable uncertainty remains in the daily flows after applying the annual and monthly constraints, we aim to understand which parameters (and, by extension, processes) control the spread. Figure 7 shows the global Spearman correlations between the daily flow extremes (1st and 99th percentile, in the left and right panels, respectively) and the behavioral parameters. Red indicates a negative correlation, blue indicates a positive correlation, and white indicates no observed correlation. The results in Fig. 7 suggest that B, Ds max , Exp, and C Rsmin control the daily flow extremes, evidenced by a mix of strong positive and negative correlations. The negative correlation between the B parameter and low flows occurs because a decrease in B leads to an increase in infiltration. This results in a dampened response and an increase in available storage for low flow periods; the opposite is true for high flows. The negative correlation between low flows and Ds max occurs because a decrease in Ds max de- Figure 7. Global maps of the Spearman correlation between the simulated extreme daily flows (1st and 99th percentile) and the corresponding VIC parameter. The correlations are calculated using the ensemble members that fulfill the strongest error criteria (relative error below 10 % and monthly correlation above 0.75).
lays the release of water from storage, allowing for a thicker recession curve and higher low flows. Finally, the positive correlation between C Rsmin and high flows is because an increase in C Rsmin leads to a decrease in evaporation; an increase in storage leads to an increase in baseflow and surface runoff (increase in soil saturation). By controlling how quickly the hydraulic conductivity decreases as a function of soil moisture, Exp controls water movement between soil layers during dry-down periods. This parameter is negatively correlated with low flows since it controls the supply to the lowest soil layer where baseflow is created.

Global flood and drought monitoring: ensemble simulations
The results from this study are relevant to drought and flood monitoring systems that rely on land surface models to monitor and predict hydrologic extremes at daily timescales (Sheffield et al., 2013;Xia et al., 2012). When the land surface model parameters are not tuned, significant uncertainties exist in the estimated runoff. This is especially true over data-sparse regions where the prior estimates of the model parameters are inadequate. Furthermore, when the parameters are tuned, a scale mismatch (space and time) between the observations and the intended application leads to limited improvement. As shown in Sect. 4.2.2, although using annual and monthly observations does constrain the daily estimates near the median, considerable uncertainties remain in the simulated hydrologic extremes (low and high flows) over all Köppen-Geiger climates. One obvious path forward is to use daily streamflow observations to further constrain the land surface model. This solution is practical over dense stream gauge networks but presents considerable challenges over data-sparse regions and ungauged basins. A plausible solution is to use a more sophisticated technique to spatially disaggregate streamflow observations (e.g., Pan and Wood, 2013) to obtain daily gridded runoff fields. However, these methods will continue to struggle over sparse networks (e.g., Congo basin), areas that are heavily managed (e.g., southeast USA), and basins that experience substantial reinfiltration and stream evaporation (e.g., Colorado basin). Another option would be to use satellite-based altimetry measurements (e.g., SWOT;Durand et al., 2014). These observations could be combined with the spatially disaggregated runoff fields to provide the observed daily estimates of gridded runoff.
In any case, even if high-quality daily runoff observations existed over the globe, a non-negligible spread will remain after applying the constraints due to the effects of model parameter equifinality. For this reason, we suggest that flood and drought monitoring systems that aim to capture hydrologic extremes move towards model parameter ensemble frameworks to provide not only predictions but also uncertainty estimates. To make this feasible for operational use, further work will need to determine how to cluster the behavioral parameter sets to below 100 per grid cell to minimize the increase in computation and storage requirements.

Model parameters: improve prior distributions
A common practice when tuning land surface model parameters at continental scales (e.g. Troy et al., 2008) is to use the same prior distribution for each model parameter at each modeled grid cell or catchment; this uniform distribution is usually set to cover the entire span of physically plausible parameter values. This approach is one of the main drivers of the large spread in flow duration curves shown in Figure 6. Given the need to rely on monthly and annual observations to constrain the model parameter uncertainty, local prior distributions should be informed by spatial land surface characteristics to constrain the initial ensemble spread and the flow duration curves. Spatially distributed information could also be used to refine the distribution family and shape of the priors in addition to their ranges.
One option would be to use the uncertainty estimates available in remote sensing and in-situ data sets to define the local prior distributions. An example of this framework would be to use the Gridded Soil Survey Geographic (gSSURGO) continental soil data set (Soil Survey Staff, 2014) that provides detailed three-dimensional texture and hydraulic soil properties (and uncertainties) over the contiguous United States (CONUS). This would be simple to test for soil parameters that are used in land surface models and are generally reported in soil data sets (i.e., porosity). However, for parameters that are model-specific (e.g., Ds max and B in the VIC model), derived functional relationships will need to relate the model parameters to the observed parameters to assemble reliable prior distributions. However, as long as the uncertainties in the functional relationship (e.g., linear regression) inform the derived local prior distribution, the benefits should outweigh additional uncertainties.
A similar option would be to estimate model parameter prior distributions using local information (parameter covariates). The procedure used in this study (Latin hypercube sample) could be used over catchments with rich databases to constrain the uniform parameter values using available high spatial and temporal resolution observed data. The resulting behavioral parameter sets could then be related to the local information using machine-learning algorithms (e.g., random forests; Liaw and Wiener, 2002) to provide catchment specific prior distributions. In theory, available or upcoming high-resolution global data sets could then provide the covariates to estimate a parameter's prior distribution at each catchment or grid cell. These data sets could include HydroSHEDS DEM (Lehner et al., 2008), MODIS-derived products (e.g., NDVI, albedo, and land cover type), TMPA satellite precipitation (Huffman et al., 2007), and the upcoming GlobalSoilMap (Arrouays et al., 2014), among others. Although the challenges in parameter regionalization (Hrachowitz et al., 2013) in-catchment hydrology will also most likely apply to macroscale land surface models, we view it as a path that should be explored.

Model structure: next generation land surface modeling
Ultimately, more sophisticated parameter estimation techniques cannot fix model structure deficiencies. As the results of Sect. 4.2.1 indicate, if the observed flow is not contained in the constrained ensemble then the problem can be traced to model structure deficiencies (assuming error free observations and input meteorology). This problem is apparent over arid regions (see Fig. 3), arguably one of the main regions of focus for drought and flood monitoring systems. A lack of irrigation, reservoirs, river evaporation and reinfiltration, and groundwater in this version of VIC are most likely the drivers of model deficiency. Furthermore, parameterizations that play an important role in watershed dynamics and are highly sensitive to their parameter values (e.g., B in the variable infiltration curve) should be replaced with updated schemes that can effectively use available local highresolution information (e.g., topography, soils, geology, and land cover) to more accurately represent the local physical processes while reducing reliance on parameter estimation. The improved macroscale parameterizations to address these process deficiencies should capitalize on the increase in computation resources and available high-resolution land data and meteorological data to more explicitly model the fine scale hydrologic processes (Wood et al., 2011;Bierkens et al., 2014). This effort could provide solutions to improve the prediction of hydrologic extremes over the globe by including: (1) detailed hydrodynamic modeling to account for flash floods, irrigation, reservoirs, and urban flooding; (2) integrated river modeling to enable river evaporation and reinfiltration; (3) improved runoff generation processes. Although the addition of these processes will likely lead to additional parameter complexity and uncertainty, it is seen as a necessary next step to improve the reliability and utility of global drought and flood monitoring systems. The Variable Infiltration Capacity model (VIC) has been run globally at a 1.0 • spatial resolution between 1948 and 2010 using 10 000 parameter sets from a Latin hypercube sample to assess the role of parameter uncertainty in flood and drought monitoring. The 10 000 member ensemble is constrained using a spatially disaggregated version of the GRDC runoff climatology at annual and monthly timescales. A multi-timescale sensitivity analysis is then used to determine the role of each of the model's parameters and the overall model performance. The results vary according to Köppen-Geiger climate. While in arid and tropical regions few parameter sets fulfill the constraints, polar and continental climates maintain a large number of behavioral parameter sets. The annual constraints focus on reducing the annual bias by changing the annual evaporation; the monthly constraints alter the monthly autocorrelation of flow by partitioning the runoff into baseflow and surface runoff. The parameters that control the monthly runoff autocorrelation also play an important role at the daily timescale. For this reason, regions that have a distinct seasonality (continental and polar) see the largest decrease in the spread of their representative daily flow duration curves. These results illustrate the challenges in using current land surface models for global drought and flood monitoring. However, they also indicate a path forward which involves adopting ensemble frameworks to account for model parameter uncertainty, designing and implementing improved observation networks to better constrain land surface models, providing improved local prior distributions via emerging high-resolution land data, and improving model structure to better account for the processes that dominate the hydrology over regions prone to droughts and floods.