A High-Resolution Dataset of Water Fluxes and States for Germany Accounting for Parametric Uncertainty

Long term, high-resolution data about hydrologic fluxes and states are needed for many hydrological applications. Because continuous large-scale observations of such variables are not feasible, hydrologic or land surface models are applied to derive them. This study aims to analyze and provide a consistent high-resolution dataset of land surface variables over Germany, accounting for uncertainties caused by equifinal model parameters. The mesoscale Hydrological Model (mHM) is employed to derive an ensemble (100 members) of evapotranspiration, groundwater recharge, soil moisture and generated runoff at high 5 spatial and temporal resolutions (4 km and daily, respectively) for the period 1951-2010. The model is cross-evaluated against the observed runoff in 222 catchments, which are not used for model calibration. The mean (standard deviation) of the ensemble median NSE estimated for these catchments is 0.68 (0.09) for daily discharge simulations. The modeled evapotranspiration and soil moisture reasonably represent the observations from eddy covariance stations. Our analysis indicates the lowest parametric uncertainty for evapotranspiration, and the largest is observed for groundwater recharge. The uncertainty of the hydrologic 10 variables varies over the course of a year, with the exception of evapotranspiration, which remains almost constant. This study emphasizes the role of accounting for the parametric uncertainty in model-derived hydrological datasets.


Introduction
Consistent, long-term data of meteorological and hydrological variables at a high spatial resolution are needed for many applications, including (i) impact assessment studies, such as for drought, flood, or climate change analysis (Sheffield and Wood, 2007;Huang et al., 2010;Samaniego et al., 2013;Kumar et al., 2016;Zink et al., 2016), and (ii) studies that need spatially and temporally continuous, observation-based datasets, e.g., for downscaling or disaggregating climate model outputs (Wood et al., 2004;Thober et al., 2014) or for establishing Ensemble Streamflow Prediction (Day, 1985) and reverse Ensemble Streamflow Prediction approaches (Wood and Lettenmaier, 2008).
Continuous observations of hydrologic fluxes and states are economically and logistically not feasible on regional to national scales (Vereecken et al., 2008).In situ soil moisture observations, for example, are scarcely available.These point-scale observations are representative for a small control volume of a few cubic centimeters.Evapotranspiration measurements at eddy covariance stations have footprints of tens to hundreds of meters but they are available at less than 1000 stations worldwide (FLUXNET, 2007).
Alternatives include remote sensing or reanalysis products such as NCEP-CFSR (Saha et al., 2010) or ERA-INTERIM (Dee et al., 2011).Hydrologic products derived from remote sensing are broadly available, but they do not consider the conservation of mass, i.e., the closure of the water balance.Moreover, these products are not spatially and temporally continuous due to reliance on cloud-free conditions (Mu et al., 2007;Liu et al., 2012).Reanalysis products, in contrast, provide continuous data but they have coarse spatial resolutions of at most 1/4 • (Dee et al., 2016), which is not suitable for regional-scale applications.
Hydrologic models driven by ground-based meteorological observations are the prime alternative to derive spatially and temporally consistent water fluxes and states at large spatial domains.For example, Maurer et al. (2002); Zhu and Lettenmaier (2007); Livneh et al. (2013); and Zhang et al. (2014) provided model-based datasets on a national scale.These data are based on the Variable Infiltration Capacity (VIC) model (Liang et al., 1994) and have, at most, a spatial resolution of 1/16 • and cover the contiguous United States, Mexico, China, and parts of Canada.Livneh et al. (2015); Newman et al. (2015a); and Newman et al. (2015b) provide data on the same domain with a focus on meteorological data.A set of four models was used in the NLDAS project to assess the water balance components over the contiguous United States (Mitchell, 2004;Xia et al., 2012a, b).Studies by Nijssen et al. (2001); Fan and van den Dool (2004); Berg et al. (2005); and Sheffield et al. (2006) focus on the global domain.The spatial resolution of these global datasets is at most 1/2 • , and many of these studies focus on meteorological forcings rather than hydrologic variables.
The resolution of the abovementioned model-derived datasets are coarse according to Wood et al. (2011), who stated a need for higher-resolution data and models for purpose of, e.g., flood and drought forecasting.Moreover, Bierkens et al. (2015) stated that water resources or river basin managers will favor highly resolved data at resolutions of 1-5 km.
The application of observational-derived model products, however, also has some limitations.First, due to a limited amount of observed variables modeling approaches, such as the estimation of potential evapotranspiration (E p ), have to be adopted to the available data.As a result, temperaturebased E p methods may be preferred to more physically based approaches (e.g., radiation based).Second, the interpolation of point observations induces uncertainties depending on the applied interpolation method.Further, small-scale, convective precipitation events may not be caught by gauging networks and lead to an underestimation in precipitation.
Furthermore, hydrological models are subject to different sources of uncertainty, i.e., input, model structural, and parametric uncertainty (Beven, 1993).All of the aforementioned uncertainties propagate to the model results and can superpose each other (Zappa et al., 2011).The overall uncertainty of hydrological models is therefore summarized as predictive uncertainty.Uncertainties are often not considered when deriving hydrological or hydro-meteorological datasets (e.g., Huang et al., 2010;Livneh et al., 2013;Zhang et al., 2014).As a result, predictive uncertainties are often not addressed but may have substantial implications on subsequent studies, as shown by Samaniego et al. (2013).Herein, we will focus on the predictive uncertainties caused by equifinal parameter sets.
The specification of model parameters, which are valid beyond catchment boundaries poses another challenge in the application of hydrologic models over large domains.Large-scale hydrologic model studies apply either parameters originating from a single catchment (Henriksen et al., 2003), filter behavioral parameters from predefined sets (Perrin et al., 2008;Hartmann et al., 2015), extrapolate or regionalize parameters or hydrological variables from observed to unknown locations (Zhu and Lettenmaier, 2007;Troy et al., 2008;Xia et al., 2012b;Zhang et al., 2014), or use an uncalibrated model (Mitchell, 2004;Hostetler and Alder, 2016).A methodology considering the calibration in individual basins for creating a set of regionalized parameters, which are later on filtered for behavioral solutions in all considered basins, could be an alternative approach.Such an approach combines all of the aforementioned strategies.
The aim of this study is to derive a model based, consistent set of national-scale hydrological data for Germany within the period 1951-2010.We address the need for highly resolved data by conducting observation-driven hydrological simulations at a spatial resolution of 4 km × 4 km (1/25 • ).Daily fields of evapotranspiration, soil moisture, groundwater recharge, and grid-cell-generated runoff as well as precipitation, temperatures, and potential evapotranspiration are made freely available.To our knowledge, such a consistent and long-term dataset for Germany has not been freely available until now.The dataset accounts for predictive uncertainties by considering a set of equifinal parameters.An parameter estimation approach for deriving a set of 100 parameters on the national scale is developed.We further aim to assess and evaluate the spatiotemporal distribution of the simulated hydrological states and fluxes as well as their uncertainties using multiple validation variables at different scales.Finally, the parametric uncertainties are analyzed regarding their explanatory variables for the simulated fluxes and their propagation between different model compartments.

Study domain and datasets
The study is conducted on the territory of Germany, which covers an area of approximately 357 000 km 2 (Fig. 1).The region, located in central Europe, is mainly characterized by a humid climate but nonetheless has north-to-south and east-to-west climatic gradients.The topography varies from low-altitude, flat areas in the north (North German Plain) over mid-altitude mountains in central Germany (Central Uplands) to the high-altitude Alpine foothills and the Alps in the south.Whereas the northwestern part of Germany is still under maritime influence, the eastern part has a more continental climate that is characterized by colder winters and less precipitation.
The assessment of water fluxes and states is restricted to the national borders of Germany because meteorological data and land-surface characteristics are available in this domain.Thus, only basins entirely covered by German territory are used to derive parameters for the hydrological model.These seven major basins are depicted in Fig. 1.These basins represent the topographic and hydro-climatic gradient within Germany (see Table 1).They range in size from 6000 to 48 000 km 2 and are characterized by mean elevations ranging from 60 m a.s.l.(Ems basin) to 560 m a.s.l.(Danube basin).All basins have a comparable degree of urbanization ranging between 6 and 10 %.A remarkably low amount of forest is observed in the Ems basin, where agriculture and pasture are the dominant land use.
Due to different climatic regimes the average streamflow of the seven basins ranges from 161 to 469 mm a −1 .The low-lying Ems reaches a remarkably high discharge due to maritime influence, whereas the Saale River is characterized by the lowest streamflow.The runoff coefficient of the Saale differs significantly from the other basins, which originates from the high degree of anthropogenic influence within this basin; 3 of the 10 largest dams in Germany are located there (Bleiloch -215 million m 3 , the Hohenwarte -182 million m 3 , and the Rappbode reservoir -109 million m 3 ).Furthermore, open-pit mining has a large influence on the water budget of this basin.

Land surface properties
The land-surface characteristics required by the hydrologic model include a 50 m digital elevation model (DEM) acquired from the Federal Agency for Cartography and Geodesy (Federal Agency for Cartography and Geodesy , BKG), a digitized soil map at a scale of 1 : 1 000 000 (Federal Institute for Geosciences and Natural Resources , BGR), and a hydrogeological map at a scale of 1 : 200 000 (Federal Institute for Geosciences and Natural Resources , BGR).The soil map contains information on soil textural properties, such as the sand and clay contents of different soil horizons.The soils are classified into 72 soil types and have an average depth of 1.8 m.The hydrogeological map comprises 23 classes and gives information about saturated hydraulic conductivities and karstic areas.Based on the DEM, additional information, such as the slope, aspect, flow direction, and flow accumulation, are inferred.Land cover information is derived from CORINE land cover scenes of the years 1990, 2000, and 2006 (European Environmental Agency , EEA).The period prior to 1990 is assumed to be static and is represented by the scene of 1990.All datasets are remapped to a common spatial resolution of 100 m × 100 m using a nearest neighbor approach.
The location and shape of the major basins (Fig. 1) are derived via an automated delineation, which is based on gauging station and terrain information (flow accumulation and flow direction).Streamflow data are provided by the European Water Archive (EWA) and the Global Runoff Data Centre (GRDC).The results of the delineation are approved via comparison with the CCM River and Catchment Database (European Commission -Joint Research center , JRC; Vogt et al., 2007).In addition to the seven major basins (as described above), the model is set up in 222 additional, smaller basins to cross-validate the model performance.

Meteorological forcings
The hydrologic model is forced with daily fields of precipitation and minimum, maximum, and average temperature.They are derived from local observations operated by the national weather service (Deutscher Wetterdienst, DWD).The station network comprises, on average, 3800 rain gauges and 570 climate stations per year (period: 1951-2010), which have an average minimum distance of 6 and 14 km between neighboring stations, respectively.
These local observations are interpolated on a regular grid of 4 km × 4 km using external drift Kriging.The terrain elevation (DEM) is used as the external drift, and the Kriging weights are based on a theoretical variogram.The variogram is estimated for all of Germany by fitting to an empirical variogram (see Appendix A1).To avoid discontinuities in the interpolated meteorological forcings and consecutively in the hydrologic simulation, an estimation of multiple variograms for different climatic zones or distinct mor-  (Hargreaves and Samani, 1985), using interpolated temperatures (average, minimum, and maximum).
The interpolation of the precipitation is evaluated with gridded precipitation data (REGNIE) provided by the German Meteorological Service (Deutscher Wetterdienst, DWD; Rauthe et al., 2013).The REGNIE data are based on the same observations and have a spatial resolution of 1 km.They are derived by applying a multiple linear regression approach, which accounts for daily atmospheric conditions and terrain properties, such as elevation, slope, and aspect (Rauthe et al., 2013).After remapping the REGNIE data to the aforementioned 4 km × 4 km grid by bilinear interpolation, a satisfactory correspondence between the interpolation and the REG-NIE precipitation data is found (see Samaniego et al., 2013).The spatially averaged bias of the daily fields is 0 with a standard deviation of 0.11 mm d −1 within the period 1951-2010.
3 Methodology 3.1 The mesoscale Hydrological Model mHM mHM (www.ufz.de/mhm) is a distributed hydrologic model that accounts for the following main processes: snow accumulation and melting, evapotranspiration, canopy interception, soil water infiltration and storage, percolation, and runoff generation.These processes are conceptualized as water fluxes between internal model states similar to existing models, such as HBV (Bergrström, 1976) or VIC (Liang et al., 1994).Snow accumulation and melting processes are based on the improved degree-day method, which accounts for increased snow melting during intense rainfall events (Hundecha and Bárdossy, 2004).A three-layer discretization is used to account for the processes that represent the rootzone soil moisture dynamics.The two upper layers end in 0.05 and 0.25 m, and the lowest layer is spatially variable in depth depending on the soil map.On average, the lowest layer is 1.8 m deep in Germany.The evapotranspiration from soil layers is estimated as a fraction of the potential evapotranspiration depending on the soil moisture stress and the fraction of vegetation roots present in each layer.The runoff generation in mHM is formalized as the sum of the direct runoff, slow and fast interflow, and baseflow components.The runoff generated at every grid cell is routed to the outlet using the Muskingum-Cunge algorithm.For a detailed model description, interested readers may refer to Samaniego et al. (2010) and Kumar et al. (2013b).To date the model has been successfully applied to various river basins across Europe (including Germany), the USA (Kumar et al., 2010;Samaniego et al., 2013;Kumar et al., 2013a;Thober et al., 2015;Rakovec et al., 2016;Zink et al., 2016), and worldwide (Samaniego et al., 2016).
A feature that is unique to mHM is its technique for estimating effective model parameters: Multiscale Parameter Regionalization; Samaniego et al., 2010;Kumar et al., 2013b).Its basic concept is to estimate parameters (e.g., soil porosity) based on physiographic properties (e.g., sand and clay content) and transfer functions (e.g., pedotransfer functions).These transfer functions depend on transfer or global parameters (e.g., factors of the pedotransfer functions) that are time invariant and location independent.For the domain of Germany, 68 global parameters were purpose to an automated calibration (described in Sect.3.2).An overview of the global parameters and the resulting effective model parameters can be found in the Supplement.
This regionalization of model parameters is conducted at the high-resolution land surface property input, e.g., 100 m × 100 m.In a second step these parameters are subsequently upscaled to the user-specified resolution of Hydrol.Earth Syst.Sci., 21, 1769Sci., 21, -1790Sci., 21, , 2017 www.hydrol-earth-syst-sci.net/21/1769/2017/ the hydrologic simulations, e.g., 4 km × 4 km, by applying parameter-specific upscaling rules (Samaniego et al., 2010).This procedure yields in effective parameter values (e.g., soil porosity), which are used for the simulation of hydrological processes (e.g., soil water retention).Thus, the effective parameters account for the sub-grid variabilities of land surface properties, such as terrain or soil information.

Derivation of representative parameter sets
One of the goals of this study is to derive consistent model parameters to perform nationwide simulations of water fluxes and states.A two-step parameter selection procedure was used for this purpose.In the first step, we estimate 100 sets of global parameters via calibration in each of the seven inner German river basins (Fig. 1) independently.
In the next step, we transfer these calibrated parameter sets to the remaining basins.The parameter sets exceeding a Nash-Sutcliffe model efficiency of 0.65 (NSE ≥ 0.65) in all seven basins during the evaluation period  are retained.This parameter selection procedure ensures that the resulting ensemble parameter sets do not exhibit spatial discontinuities at basin boundaries.
The calibration is performed using the dynamically dimensioned search (DDS) algorithm (Tolson and Shoemaker, 2007).The objective function for calibration consists of an equally weighted power-law function for the NSE (Nash and Sutcliffe, 1970) of the streamflow and the logarithm of the streamflow to consider high and low flows within the objective function.A compromise programming technique (Duckstein, 1984) using a power law with an exponent p = 6 is used to estimate the multi-objective function ( ).This technique ensures equal improvement of the different measures φ i during a multi-objective calibration.The overall objective function is given as with where w i is the weight (w 1 = w 2 = 0.5) for a particular measure φ i , Q t and Q t denote the modeled and observed streamflow at a time step t, respectively.Q is the mean of the observed streamflow over all time steps T .
A period of 5 years from 2000 to 2004 is chosen for model calibration.This time period reflects various hydrologic conditions ranging from a high-impact flood event in central Europe in August 2002 to a significant drought event in 2003.The remaining 35 years of available data  are used for model evaluation.All simulations are conducted with a 5-year spin-up period to abrogate the influence of initial conditions.
In total, 100 independent calibration runs are performed for each of the seven basins (Fig. 1).Using 2000 model iterations per calibration run led to a large number of model evaluations per basin (200 000).Finally, 100 of the 700 parameters sets are retained to derive nationwide ensemble simulations of water fluxes and states at a daily resolution.

Validation data
In addition to streamflow in the seven major German river basins, the model performance is evaluated against streamflow in 222 additional basins and complementary datasets including evapotranspiration, soil moisture, and groundwater recharge.The cross-validation of ensemble parameter sets in basins that have not been used for parameter inference should prove the ability of the model to satisfactorily estimate streamflow in various regions of Germany with differing hydrologic characteristics.
The basins for cross-validation are distributed all over Germany and range in size from 100 to 8500 km 2 .A detailed characterization of these basins is given in Table S3 in the Supplement.A subset of these basins contains subbasins of seven major basins.The simulation time period is adopted for the available streamflow observations but is at least 10 years.The mean simulation time period of all 222 basins is 42 years.The streamflow estimation in these basins is evaluated using the ensemble median NSE, and its uncertainty is characterized by the range between the 5th and 95th percentiles of NSEs of the ensemble simulation.
Local evapotranspiration observations are available at seven eddy covariance towers located in Germany (Fig. 1, www.europe-fluxdata.eu).Carbon and water fluxes, as well as all components of the energy balance, latent heat (or evapotranspiration E a ), sensible heat H , ground heat flux G, and net radiation R n , are measured at the towers.The energy balance is, however, often not closed at the towers (Foken, 2008;Leuning et al., 2012) so that the observed fluxes usually underestimate the real values, which needs to be corrected before comparison with a model conserving the water balance.We apply a correction to the observed fluxes similar to Kessomkiat et al. (2013).The corrected evapotranspiration values at the eddy sites are compared with the corresponding model estimates based on the root mean squared error (RMSE), the Pearson correlation coefficient (ρ), and the bias.
Additionally, soil moisture observations, undertaken at eddy covariance stations, are used to evaluate modeled soil moisture.Soil moisture is measured using Time-Domain Reflectometer (TDR) or Frequency-Domain Reflectometer (FDR) sensors, which have a control volume of a few cubic centimeters.This is much smaller than the model resolution of 100 m × 100 m.A direct comparison between observed and simulated soil moisture may therefore be mislead-ing due to differences in spatial representativeness and sampling depth.Here we aim to analyze the temporal dynamics of soil moisture by normalizing the respective soil moisture time series (Koster et al., 2009).The anomalies are calculated as where µ is the mean and σ is the standard deviation of the entire soil moisture (SM) time series at a daily resolution.It is not possible to use deseasonalized values (normalization with monthly values) because the time periods of the available observations are too short (≈ 6 years).The modeled soil moisture is defined herein as the fraction of porosity, i.e., the soil water content divided by porosity.
The mHM simulation for comparing the observations at the location of the eddy covariance stations is conducted with deactivated lateral processes on a single grid cell.The model resolution (100 m × 100 m) is adapted to the size of the footprint of the energy flux measurements, which is typically several tens to hundreds of meters.Rather than downscaling the model results, the hydrologic processes are modeled at the resolution of the observations.The transferability of mHM across scales is presented in Samaniego et al. (2010) and Kumar et al. (2013b).
The model is evaluated with spatially distributed data, i.e., evapotranspiration and groundwater recharge, additionally to the evaluation of the model at the point or local scale.A remote-sensing-based dataset is used for evaluating the monthly modeled evapotranspiration between 2001 and 2010.For this purpose we used the gridded evapotranspiration (E a ) dataset based on the Moderate Resolution Imaging Spectroradiometer (MODIS), which was acquired from the Numerical Terradynamic Simulation Group at the University of Montana (Mu et al., 2007(Mu et al., , 2011)).The spatial resolution is approximately 5 km × 5 km (0.05 • ), which is close to the model resolution of 4 km × 4 km.The evapotranspiration estimates are based on the Penman-Monteith energy balance equation using global daily temperature, actual vapor deficit, incoming solar radiation as well as remotely sensed leaf area index, fraction of photosynthetic active radiation, albedo, and land cover characteristics.The meteorological variables are based on the reanalysis product from the Global Modeling and Assimilation Office, whereas vegetation products are derived from MODIS.Interested readers may refer to Mu et al. (2007Mu et al. ( , 2011) ) for a detailed description of the MODIS E a product.
As a second spatial dataset, we utilize a long-term estimate of annual recharge over Germany .Due to the lack of observations, the estimated recharge from the Hydrologic Atlas of Germany (Federal Ministry for the Environment Nature Conservation Building and Nuclear Safety, 2003) is taken here as a reference.This recharge estimate is obtained using a multiple regression model accounting for long-term-estimated generated runoff, depth of the ground-water table, and regionalized baseflow indices (Neumann and Wycisk, 2003).The regionalized baseflow indices are estimated with a linear regression based on the ratio between direct runoff and total runoff as well as terrain properties, such as slope and land cover among others.Due to the various assumptions and mathematical fittings behind this recharge estimate, it is taken as an indication for model evaluation rather than an evidence.The gridded recharge estimate is available at a 1 km × 1 km spatial resolution, which is remapped to a 4 km × 4 km resolution using bilinear interpolation to be comparable to the model estimates.

Uncertainty of ensemble model simulations
The uncertainty of the modeled evapotranspiration, groundwater recharge, grid-cell-generated runoff, and soil moisture is assessed by two different criteria.First, the spatially distributed uncertainties are presented as maps showing the coefficient of variation c v , which is defined as in which µ is the mean and σ the standard deviation of the ensemble simulations.A large c v describes a large variation in the modeled flux or state normalized with µ. µ and σ are derived from the 100 ensemble realizations of the hydrologic model mHM on every grid cell.The variances within the ensemble simulation are caused by predictive uncertainties.These uncertainties stem from the parametric uncertainty itself and from the transfer of parameters to locations that have not been used for model calibration.In the following, the variations of the ensemble simulations are denoted as uncertainty.
Second, to assess the temporal variation of the uncertainty throughout a year, the range and normalized range of the respective flux or state are considered.The range is defined as the difference between the 5th (p 5 ) and 95th (p 95 ) percentiles of the ensemble simulation, whereas the normalized range is defined as where p 50 (x) denotes the median value of the ensemble simulation (50th percentile).The 5th and 95th percentiles are chosen to exclude potential outliers from the analysis.

Results and discussion
The model simulations are evaluated against multiple variables available at different spatial and temporal resolutions.recharge map.mHM simulations are carried out at an hourly timescale at two spatial resolutions, i.e., 100 m × 100 m at the eddy covariance stations and 4 km × 4 km at the basin level and for the nationwide ensemble simulations.Finally, an analysis of the model runs for the nationwide water fluxes and states, including grid-cell-generated runoff (Q G ), evapotranspiration (E a ), groundwater recharge (R), and soil moisture (SM), is presented.The focus here is to provide a comprehensive overview of regional-scale water fluxes and states over Germany and analyze the uncertainty in modeled variables due to an ensemble of model parameters.The uncertainties are investigated with respect to their temporal and spatial distributions and their triggering sources.Finally, the interaction of uncertainties through the different model states and fluxes is analyzed.

Streamflow evaluation in major German river basins
In this section we present the evaluation of mHM simulated streamflow with observations in terms of NSEs at daily and monthly timescale for a validation  and a calibration (2000)(2001)(2002)(2003)(2004) period.Additionally, we show the hydrographs resulting from the ensemble parameter sets in comparison with observed streamflow.
The daily streamflow dynamics in the major German basins is satisfactorily captured by the model revealing a mean NSE of 0.89 and 0.84 using the on-site calibrated parameters in the calibration and validation periods, respectively (white boxes in Fig. 2a and b).The model performance is lower during the validation period in comparison to the calibration period.Such a deterioration of model per-formance, which is common to other hydrological model applications, is caused by differences in hydro-meteorological regimes between the calibration and validation periods (Merz and Blöschl, 2004;Merz et al., 2011) and constraining (overfitting) of the parameters to compensate for errors in the model structure (Clark and Vrugt, 2006).Using the on-site calibrated parameter sets, the model exhibited improved performance for monthly streamflow simulations with an average median NSE of 0.97 and 0.92 during the calibration and validation period, respectively (white boxes in Fig. 2c and d).
The ensemble parameter sets, which are depicted as the gray boxes in Fig. 2, also reveal appropriate model performance.The median NSE corresponding to the ensemble parameter sets is 0.80 for daily streamflow in the validation period averaged across the seven basins.The median NSE of the ensemble parameters drops by approximately 6 % compared to that of the on-site estimated parameters.This loss is reasonable considering that the ensemble parameter sets are a compromise solution, which should perform well across all seven basins (see Sect. 3.2).The performance loss can be attributed to changes in the specific basin climatic and land-surface conditions including terrain, soil, and vegetation properties.
Changes in the predictive uncertainty corresponding to on-site and ensemble parameter sets are assessed using the range of model performance.The spread of NSEs for the monthly streamflow is considerably narrower compared to the daily flows (Fig. 2).The high temporal variability of the daily streamflow is smoothed when averaged over a longer (monthly) timescale leading to an overall better correspondence between observed and simulated flows.The ranges of NSEs corresponding to the 100 on-site and ensemble parameter sets are comparable across the investigated basins with exception of the Main and Danube basins.In these two basins the ensemble parameter sets provided a relatively larger range of NSEs.The relatively higher spread in the NSE in those basins is likely to stem from the fact that different basins are sensitive to different parameters.For example, the Ems basin, located in the maritimeinfluenced north, is not as sensitive to snow parameters as the alpine-influenced Danube basin.Consequently, parameters that originate from the Ems basin potentially deteriorate ensemble predictions in the Danube basin.A simultaneous calibration of multiple, distinct basins would be beneficial for deriving hydrological fluxes and states at national or continental scales.
Examples of the modeled streamflow time series are given in Fig. 3.In general, the model is able to adequately capture the discharge dynamics across the investigated basins.A relatively lower model skill in capturing the discharge dynamics in the Saale basin can be attributed to heavy human interactions.The highly regulated streamflow in the headwaters of the Saale (see Sect. 2) is difficult to capture and thus leads to lower performance because mHM includes no reservoir operation.The main discharge mechanisms of Saale are considered to be adequately captured because the median NSEs are exceeding 0.85 and 0.7 at the monthly and daily resolutions for the ensemble parameter sets, respectively (Fig. 2).
Interestingly, this basin shows equal or higher performance for the ensemble parameter sets compared to the onsite parameter sets in the evaluation period.A similar behav-ior can be observed for the Weser basin.We conclude that streamflow simulations in some basins improve by gaining knowledge from remote locations.
The Mulde basin has a tendency to underestimate peak flows (Fig. 3).This could be attributed to the precipitation product.The headwaters of the Mulde basin are located in the Ore mountains at the border between Germany and the Czech Republic (Fig. 1).In addition to a sparse network of rain gauges in these mountainous area, a lack of information on meteorological variables from the neighboring country (i.e., the Czech Republic) leads to an underestimation of precipitation in the interpolation process, especially for orographicdriven events.The model performance for the Mulde is comparably superior to those found by other studies, such as Fleischbein et al. (2006) or Huang et al. (2010).
The results presented in this section show that the method for determining ensemble parameter sets (Sect.(2010).A further investigation of the applicability of the ensemble parameter sets on additional, smaller basins is shown in the following section.

Streamflow evaluation at non-calibrated basins
Following Klemeš (1986) (Schreiber, 1904;Ol'dekop, 1911;Budyko, 1974).cal location.The streamflow data of these proxy locations have not been used during the model calibration.This crossvalidation test focuses on evaluating the model performance against streamflow simulations along a diverse range of climatic and land-surface conditions.The evaluations shown in Fig. 4 indicate a satisfactory agreement between simulations and observations.The daily streamflow simulations (Fig. 4a,  b) reveal a median NSE value of at least 0.5 across the investigated basins based on the ensemble parameter sets.The overall average NSE value is 0.68.As expected, the model exhibits better skill in capturing monthly streamflow dynamics, with an ensemble median NSE averaged across all basins of approximately 0.81 (Fig. 4d, e).Furthermore, the ensemble median NSE exceeded a value of 0.75 in more than 20 % of the basins for the daily flows and 80 % for the monthly flows.The spatial variability of the median NSE across the investigated basins is low with a standard deviation of approximately 0.09 for both daily and monthly flows.
To illustrate different climatic regimes of the 222 basins, we make use of the dryness index E p /P (Budyko, 1974).Various studies describe the relationship between the dryness and evaporative index E a /P (Schreiber, 1904;Ol'dekop, 1911;Budyko, 1974;Gerrits et al., 2009) and span an uncertainty band around Budyko's curve.The model performance of the 222 basins is plotted in panels (a) and (d) of Fig. 4 using these indexes.It separates the basins into energy-(E p /P < 1) and water-limited conditions (E p /P > 1).The simulated evapotranspiration E a is used to derive the Budyko plot to identify potential errors in the water balance closure (Fig. 4a, d).All basins under investigation lie perfectly within the uncertainty ranges of the reported theoretical curves.Please note that energy-limited basins are closer to the lower uncertainty line of the reported curves, whereas water-limited basins tend to the upper curve.In consequence basins with energy limitation tend to underrepresent the original Budyko curve and develop to overrepresentation for water-limited locations.In conclusion, the water balances of those basins are well closed, with a mean closure error of 1 % for the median simulation.The performance is comparable for basins in different climatic regimes.Such behavior is not obvious as studies such as Newman et al. (2015b) and Xia et al. (2012a) found a significant dependency on the climatic regime.However, a tendency to perform better in large basins is observed.A similar conclusion was drawn by McMillan et al. (2016).
land-surface characteristics confirms the validity of the derived ensemble parameters for the national scale.In contrast, Newman et al. (2015b) and McMillan et al. (2016) observed significant dependencies between model performance and basin characteristics, such as aridity or basin area.
The uncertainty for the individual basins caused by the ensemble parameter sets is expressed as the range between the 5th and 95th percentiles of the NSE (Fig. 4c, f).Substantial performance differences occur in 70 % (45 %) of the basins exceeding a range of 0.1 NSE for the daily (monthly) flow simulations.A geographical dependency of the uncertainty cannot be found as no spatial clustering is observed.Whereas daily flows show almost no relation between median NSEs and the uncertainty range, i.e., worse performing basins reveal high uncertainties, the monthly NSEs show less uncertainty if the corresponding model performance is high.
The evaluation of the ensemble parameter sets presented in this section supports the hypothesis that the ensemble parameter sets are valid on the national scale.Studies such as Perrin et al. ( 2008 2016); and Hostetler and Alder (2016) validate their models based on streamflow over a large sample of basins and observed similar or lower NSEs.In the following section, evapotranspiration, soil moisture, and groundwater recharge estimates are evaluated.

Evapotranspiration and soil moisture evaluation at eddy covariance stations
The ensemble model simulations are further evaluated with the evapotranspiration (E a ) and SM observed at seven eddy covariance stations (Fig. 1) to assess the model's ability to represent other fluxes and states next to streamflow.The en-semble median of the daily sum of evapotranspiration is plotted against the corresponding observations in Fig. 5, and the resulting error metrics are summarized in Table 2.
The scatter plots shown in Fig. 5 indicate no systematic over-or underestimation of the observed evapotranspiration.The highest deviation in terms of RMSE is observed during summer, when the highest fluxes occur, and the lowest during winter, in which the contribution of E a is lowest among all seasons.The average bias estimated across all stations during spring is 0.34 mm d −1 , whereas it is 0.08, 0.04, and 0.04 mm d −1 for winter, summer, and autumn, respectively.The slight overestimation of the modeled E a during spring is likely caused by the lack of a dynamic vegetation growth module in mHM.Thus, the onset of the vegetation period may not be captured adequately by the model.With respect to the vegetation class, the stations E1 and E6 covered by crops have the largest errors, with E a RMSEs of 19.4 and 15.4 mm month −1 for monthly evapotranspiration, respectively (Table 2).These errors arise because of the high impact of human interactions on croplands, e.g., due to seeding, harvesting, or irrigation, compared to other vegetation classes.Additionally, the land cover class cropland is not explicitly represented within the model; rather, it is generalized within a mixed land cover class, representing all land cover types different from sealed and forest.Varying goodness of fit for different land covers and seasons for evapotranspiration at eddy flux towers were found for the four land surface models used in NLDAS (Xia et al., 2015) and thus are not uniquely observed for mHM.
In general, errors of local evapotranspiration estimates can be attributed to limitations of the Hargreaves-Samani approach for estimating the potential evapotranspiration.This approach may be inappropriate for local weather conditions.2).The four stations are chosen because they represent the two major mHM land cover classes (forest and mixed) and have 3 consecutive years of data without significant data gaps.Further the four station are spread over the three regions where eddy covariance observations are available.The solid dark gray line depicts the median model results and the light gray band depicts the range between the 5th and 95th percentile of the 100 ensemble simulations.
Because this method approximates the net radiation based on the minimum and maximum daily temperatures, local phenomena such as short-term cloudiness, e.g., due to convective precipitation cells, are not accounted for.This effect is especially high in summer, which causes the lowest correlations between observations and simulations during this period.Unfortunately, only temperature-based methods are supported by the available input data.Please notice that the observational error caused by the energy balance closure gap is, on average, 33 % for the herein considered stations before applying the abovementioned mathematical corrections.
In terms of temporal dynamics, the model is able to capture the observed evapotranspiration quite well across the different eddy covariance sites, as exemplarily shown in the upper panel of Fig. 6.The model is able to adequately represent the observed monthly dynamics with an average correlation of approximately 0.93 (Table 2).The correlation between the observed and the simulated daily evapotranspiration is at least 0.77, with the exception of the cropland site E1.
The lower panel of Fig. 6 shows the performance of mHM in representing the daily soil moisture anomalies, which are generally in good correspondence with observations.The temporal dynamics of observed soil moisture anomalies during the wetting and drying phases are well captured by the model.The resulting correlation shown in Table 2 at different eddy stations ranges between 0.53 and 0.93.These correlations were similar to those of other studies, such as Cai et al. (2014).The lowest values are observed at cropland sites, which is due to the abovementioned human interaction and land cover class representativeness.The amplitude of the observed soil moisture anomalies is adequately captured by the model.Still, some peaks are not reproduced satisfactorily, which could be due to the non-representativeness of the 100 m × 100 m model grid cell for TDR/FDR soil moisture measurements.Thus, the simulated soil moisture is smoother compared to the observation because it represents the effective soil moisture of the entire grid cell.

Evaluation with spatially distributed data
In this section, we present results of the model skill in representing gridded fluxes over the entire German domain.The first comparison is conducted for the assessment of reproducing the monthly fields of modeled E a against the remotely sensed MODIS product.The results are summarized in Fig. 7 in terms of three key metrics: relative bias, correlation, and RMSE.The analysis is conducted using the ensemble mean of E a from the 100 model simulations.The modeled E a is able to adequately capture the spatiotemporal features of the MODIS derived product with the majority of grid cells (74 %) having a relative absolute bias of less than 10 %.Notable differences among these two evapotranspiration datasets are appearing in lowland areas along the Danube River basin in southern Germany, where the modeled E a exhibited a dry bias compared to MODIS.An opposite trend of positive bias in modeled E a is observed for grid cells lying  along the coastal region in northern Germany.The temporal correspondence between both evapotranspiration datasets is also remarkably high with an average Pearson correlation coefficient of 0.96 (standard deviation 0.02).Notably, both evaporation datasets exhibit pronounced seasonal variability leading to a high temporal correspondence between them.
The second assessment evaluates the modeled groundwater recharge with long-term annual values from the Hydrologic Atlas of Germany (HAD) (Federal Ministry for the Environment Nature Conservation Building and Nuclear Safety, 2003).mHM's long-term recharge estimate implicitly represents the baseflow component of the total runoff based on the assumption that the underground basin is closed and that there are no external losses (e.g., irrigation or pumping).Consequently, this analysis serves as a proxy for assessing the model skill for partitioning the total runoff into interflow and baseflow.The comparison of the spatial pattern of the recharge shows good accordance between the two maps with a correlation coefficient of approximately 0.8 (Fig. 8).The spatial pattern of the recharge follows the known climatology of Germany with high recharge rates being observed in areas with high precipitation amounts (e.g., Alps -region 11 in Fig. 10).
There are some significant differences between the modeled and HAD groundwater recharge, particularly at cells characterized by urbanization (i.e., Munich, Hamburg, Berlin, and the metropolises of Ruhrgebiet in the northwest).The model tends to underestimate the HAD recharges, with differences as high as approximately 200 mm a −1 .Notably, the herein used version of mHM treats sealed areas as almost impermeable, which is unrealistic.This issue has been resolved in recent mHM versions (5.0 and higher).In general, the HAD estimate of recharge is, on average, 31 mm a −1 higher compared to the ensemble mean simulation.This mis- match arises from the differences in potential evapotranspiration (E p ), which were used for both estimates.The E p estimates used for the HAD (Federal Ministry for the Environment Nature Conservation Building and Nuclear Safety, 2003) are lower than those used for mHM simulations and result in higher water amounts remaining in the underground.Besides these mismatches, the spatial pattern of the modeled groundwater recharge compares well with the HAD estimates (Fig. 8).

Spatial patterns of ensemble means and uncertainties
The estimated evapotranspiration (E a ) and grid-cellgenerated runoff (Q G ), as well as their uncertainty, which is expressed as the coefficient of variation of the ensemble simulations, are presented in Fig. 9.In addition to these simulation results, Fig. 9 shows the mean annual precipitation, dryness index, and land surface properties, i.e., porosity and dominating land cover type.Thus, Fig. 9 is used to analyze the spatial patterns of uncertainty and their main causes.
The high precipitation amounts above 1000 mm a −1 in panel (a) correspond to mountainous areas in Germany.The driest region is located in the northeastern part of Germany.This is, on the one hand, due to its distance to the sea (continental climate) and, on the other hand, due to the Central Uplands in the western and central part of Germany.These mountains, especially the Harz mountains (central Germany), capture most of the precipitation events brought from the west.The low amounts of precipitation in the east lead to lower amounts of evapotranspiration (Fig. 9b) and grid-cell-generated runoff (Fig. 9c) in this region compared to the rest of Germany.Thus, the northeastern part of Germany is characterized by high dryness indexes of 1.2 and above.The uppermost dryness indexes up to 1.4 are located in the lee of the Harz mountains.The average dryness index in Germany is 0.98.Another region characterized by high dryness indexes is the Upper Rhine Valley, which is known have a locally warmer climate compared to its neighboring regions.Mountainous regions are characterized by stronger energy limitation due to high precipitation amounts, which results in dryness indexes lower than 0.65.
The spatial distribution of the uncertainty, i.e., the coefficients of variation (see Sect. 3.4), of the grid-cell-generated runoff (Fig. 9g) is mainly governed by the dryness index (Fig. 9d).The Spearman rank correlation between both variables is 0.92.The uncertainty patterns of evapotranspiration (Fig. 9f) have a closer relation to soil textural properties, i.e., porosity (Figure 9e), with a Spearman rank coefficient of 0.58 as compared to the dryness index (rank correlation = 0.28).Locations of high uncertainty in E a , e.g., northern Germany, correspond to regions of high porosities.Within this region, soils are dominated by sand and are highly conductive, which results in low water holding capacities.The modeled evapotranspiration is highly dependent on the soil parameterization because soil water is the main source of evaporative water.In contrast, the uncertainty patterns of grid-cell-generated runoff, e.g., (Q G ) in the northeastern part of Germany and the Upper Rhine Valley, correspond to high values in the dryness index in those regions.
In conclusion, the spatial distribution of the uncertainty in evapotranspiration is influenced by the parameterization of the soil, whereas the runoff uncertainty pattern is dominated by the dryness index.The patterns appearing in the evapotranspiration and grid-cell-generated runoff at the location of big cities (orange areas in panel (h) of Fig. 9) are caused by the abovementioned old representation of sealed areas for mHM versions prior to 5.0.

Spatiotemporal distribution of uncertainties
This section focuses on the spatiotemporal differences of uncertainties caused by the 100 ensemble parameter sets.Figure 10 shows the climatological dynamics and the normalized ranges (see Sect. 3.4) of the respective variables, i.e., evapotranspiration (E a ), SM, groundwater recharge (R), and grid-cell-generated runoff (Q G ).The rows refer to different environmental zones in Germany (Federal Environmental Agency, 2005), which are depicted in the upper right corner of Fig. 10.For comprehensibility only a selection of five environmental zones is depicted therein, representing the region of high dryness indexes in the north (zone 2), central Germany including Central Uplands (zones 4 and 9), the foothills of the Alps (zone 10), and the Alps (zone 11).
The magnitude of the evapotranspiration uncertainty, i.e., the uncertainty range, is lowest among the four variables.Evapotranspiration is estimated by scaling the potential evapotranspiration with the water availability in several reservoirs, i.e., the interception storage, the surface ponds in sealed areas, and the soil moisture.Notably, most of the areas in Germany are characterized by humid and continental climate where the E a is constrained by available energy.
The evapotranspiration is thus mainly driven by the potential evapotranspiration.A relatively large uncertainty in soil moisture does not directly propagate to evapotranspiration uncertainty.The highest uncertainties are observed for the groundwater recharge.This model's internal variable is neither closely related to the model input as E a nor indirectly constrained by calibration as the generated runoff.In consequence, its uncertainty is highest among the four variables.
The evapotranspiration uncertainty shows almost no dynamics during the course of the year.In contrast, the uncertainty in recharge and runoff changes significantly during the course of the year.Whereas the dynamics of the groundwater recharge and its uncertainty are positively correlated, the correlation for soil moisture and its uncertainty is negative.Thus, the recharge uncertainty is the lowest for low recharge values, which occur in summer when the subsurface reservoirs are comparably dry.The low amplitude of the soil moisture uncertainty is reasoned in the high persistence of soil moisture.Regions of high porosity and low dryness indexes in northern Germany have more distinct dynamics compared to southern locations.The uncertainty of the generated runoff is a composite of the dynamics of soil moisture and recharge and thus shows the distribution of water among the model's internal reservoirs.

Summary and conclusion
In this study, we present the derivation and evaluation of a high-resolution (4 km × 4 km) dataset of hydrologic and meteorological fluxes and states for Germany covering the period 1951-2010, which is freely available.The dataset incorporates 100 spatially consistent ensemble simulations, which are analyzed regarding their uncertainty caused by the parameter estimation.The parameter sets of the ensemble simulations are determined by a two-step parameter selection method.The model is calibrated in seven basins, and the parameter sets are filtered based on the cross-validation results in all of the basins.Thus, the uncertainty is composed of the uncertainty in parameter estimation and the uncertainty stemming from transferring these parameters to remote locations.The ensemble simulations are evaluated with streamflow, evapotranspiration and soil moisture observations, and recharge data.
A comparable study by Newman et al. (2015a) focuses on the provision of a 100 member ensemble dataset, which is focusing on meteorological variables for major parts of North America.Similar to the study presented herein they evaluate the data in a large sample of basins, i.e., 671.We, however, conclude that 100 realizations is an appropriate sample size for an uncertainty assessment study.
The evaluation regarding streamflow at 222 additional basins revealed a median NSE of 0.68.Thus, the 100 ensemble parameter sets are considered to be representative for Germany.The evaluation with evapotranspiration from eddy covariance stations showed deficiencies in mHM.Especially in spring, deviations of the modeled and observed The second part of the study focuses on the uncertainty of the simulated hydrological fluxes and states due to uncertainties in parameter estimation.It is shown that uncertainty varies in time, location, and magnitude between hydrological variables.Among all of the variables, the uncertainty was the lowest for evapotranspiration and the highest for groundwater recharge.The spatial distribution of runoff uncertainty is closely related to the spatial distribution of the dryness index.In contrast, the uncertainty patterns of evapotranspiration estimates are mostly connected to soil properties.In general, the highest uncertainties occur in the northeastern part of Germany, which is characterized by low precipitation amounts and high soil porosities.The temporal variation of uncertainties is almost constant for evapotranspiration, medium for grid-cell-generated runoff and soil moisture, and high for groundwater recharge and depends on geographical location.
Based on these results we suggest incorporating additional data, e.g., in situ soil moisture or satellite observations, into the calibration procedure to better constrain the model's internal states.The results of this study emphasize the importance of the considering parametric uncertainty for historical analysis, nowcasting, and forecasting in hydrology.
Appendix A: Interpolation of meteorological data

A1 Variogram estimation
The variogram for the German domain is estimated based on two different approaches.In the first approach regionalized variograms for rectangular sub-domains (blocks) were estimated (Fig. A1).The interpolation of meteorological variables based on these regionalized variograms, however, lead to discontinuous fields of these meteorological variables.This result contradicted the aim of deriving seamless fields of hydro-meteorological fluxes and states for entire Germany.As a result, continuous meteorological interpolations have been the prerequisite for the next approach.In the second approach, a compromise variogram for entire Germany is estimated by considering all available data from all meteorological stations, e.g., approximately 5700 stations for precipitation, for the estimation of an empirical variogram.An exponential, theoretical variogram is fitted to this empirical variogram.The fitted variogram curves of both methods are presented exemplarily for precipitation in Fig. A1.The empirical variogram is well represented by the theoretical variogram with a RMSE of 0.02.The consecutive estimation of meteorological fields is based on the second approach using a compromise variogram for Germany.

A2 Interpolation error
The interpolation error was assessed by a leave-one-out strategy, i.e., the Jackknife method.This cross-validation informs about the ability of the external drift Kriging to estimate meteorological variables at locations where observations are available.The algorithm is as follows: 1. exclude one station from the set of observations; 2. estimate the meteorological time series at this location using external drift Kriging; 3. compare the interpolated time series with the observation and assess the interpolation error at each station; 4. interpolate the Jackknife-error estimates over the domain of Germany, using ordinary Kriging to obtain error maps for visualization purposes.
The error at each station is characterized by the bias, relative bias, RMSE, and Pearson correlation coefficient (Fig. A2).Exemplarily we present the errors of the precipitation interpolation because this variable has the highest spatial and temporal variability among the interpolated variables (precipitation; minimum, maximum, and average temperature).The average and the standard deviation for the different errors assessments over all stations are 0.01 and 0.15 mm d −1 for the bias, 0.64 and 5.60 % for the relative bias, 0.93 and 0.03 for the Pearson correlation coefficient, and 1.75 and 0.48 mm d −1 for the RMSE.Reviewing these values the chosen interpolation approach is seen as appropriate.The analysis for identifying relations between land surface and hydro-climatic characteristics and model performance is presented in Fig. B1.This analysis does not reveal any hydrometeorological or morphological conditions, which explain different model performance in distinct basins.In conclusion, the retrieved parameter sets are representative for various climatic and physiographic conditions.

Figure 1 .
Figure 1.Study area showing the seven basins used for estimation of the ensemble parameter sets for Germany.The different colors are making the basins better distinguishable.The points E1-E7 denote eddy covariance stations, which are used for the evaluation of evapotranspiration and soil moisture.

Figure 2 .Figure 2 .
Figure 2. Model performance expressed as Nash Sutcliffe Efficiency (NSE) at daily (upper row) and monthly (lower row) resolutions for the calibration period 2000-2004 (left-hand side) and validation period 1965-1999 (right-hand side).The white box plots show the results of the on-site calibration, whereas the gray box plots are simulations using the 100 ensemble parameter sets for Germany.Please note that the y-axis starts at NSE=0.5

Figure 3 .
Figure3.Observed and modeled monthly streamflow for the seven basins, which were used for parameter inference.The figure shows 1 decade(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999) of the evaluation period.The solid dark gray line depicts the median model results and the light gray band depicts the range between the 5th and 95th percentile of the 100 ensemble simulations.
3.2) leads to satisfactory estimations of streamflow in the basins used for parameter inference.Overall, the model performance shown herein compares well to those of other studies, such asLohmann et al. (1998);Strasser and Mauser (2001);Menzel et al. (2006);Fleischbein et al. (2006); and Huang et al.

Figure 4 .
Figure 4. Budyko plot and performance maps for 100 ensemble parameter sets at 222 basins spread over Germany.The upper row depicts evaluations based on daily values (a, b, c), whereas the lower row depicts monthly streamflow evaluations (d, e, f).In the first column the basins are presented as Budyko plots (a, d), which are color-coded based on the ensemble median NSE for daily (a) and monthly (d) streamflow values.The gray band envelops different estimations of the Budyko curve(Schreiber, 1904;Ol'dekop, 1911; Budyko, 1974).A separation to energy-(E p /P < 1) and water-limited basins (E p /P > 1) can be made based on the x axis.The center column depicts the location of the 222 basins shown in the Bydyko plots using the same color code (b, e).The right column shows the range of the 5th and 95th ensemble percentiles for the NSE on daily (c) and monthly (f) basis.(a), (b), (d), and (e) share the left color bar, and (c) and (f) share the right color bar.The simulation period is adopted according to the available streamflow observations but is at least 10 years (average = 42 years).
Figure 4. Budyko plot and performance maps for 100 ensemble parameter sets at 222 basins spread over Germany.The upper row depicts evaluations based on daily values (a, b, c), whereas the lower row depicts monthly streamflow evaluations (d, e, f).In the first column the basins are presented as Budyko plots (a, d), which are color-coded based on the ensemble median NSE for daily (a) and monthly (d) streamflow values.The gray band envelops different estimations of the Budyko curve(Schreiber, 1904;Ol'dekop, 1911; Budyko, 1974).A separation to energy-(E p /P < 1) and water-limited basins (E p /P > 1) can be made based on the x axis.The center column depicts the location of the 222 basins shown in the Bydyko plots using the same color code (b, e).The right column shows the range of the 5th and 95th ensemble percentiles for the NSE on daily (c) and monthly (f) basis.(a), (b), (d), and (e) share the left color bar, and (c) and (f) share the right color bar.The simulation period is adopted according to the available streamflow observations but is at least 10 years (average = 42 years).

Figure 6 .
Figure6.Exemplary time series of observed and modeled monthly evapotranspiration and daily soil moisture anomalies at four eddy covariance stations (Fig.1, Table2).The four stations are chosen because they represent the two major mHM land cover classes (forest and mixed) and have 3 consecutive years of data without significant data gaps.Further the four station are spread over the three regions where eddy covariance observations are available.The solid dark gray line depicts the median model results and the light gray band depicts the range between the 5th and 95th percentile of the 100 ensemble simulations.

Figure 7 .
Figure 7.Comparison of monthly estimates of evapotranspiration from mHM and MODIS in the period 2001-2010.The ensemble is represented by the ensemble mean of 100 evapotranspiration estimates.The comparison is based on three metrics: (a) relative bias, (b) Pearson correlation coefficient, and (c) root mean squared error (RMSE).The respective units are given in brackets.

Figure 8 .
Figure 8.Comparison of mean annual groundwater recharge (R) modeled with (a) mHM and from (b) the Hydrologic Atlas of Germany (Federal Ministry for the Environment Nature Conservation Building and Nuclear Safety, 2003; Neumann and Wycisk, 2003).Panel (c) shows the difference (a-b) between the two datasets.The units are [mm a −1 ] for all panels.

Figure 9 .
Figure 9. Water balance variables, their coefficients of variation, and land-surface characteristics for Germany.(a) Mean annual precipitation P , (b) ensemble mean annual evapotranspiration E a , (c) grid-cell-generated runoff Q G , (d) dryness index E p /P , (e) sum of porosities (saturated soil water content) of all model layers, (f) coefficient of variation of the ensemble of annual evapotranspiration and (g) generated runoff, (h) dominating land cover class on a 4 km × 4 km grid.The mean values and coefficients of variation are based on the period 1951-2010.

Figure 10 .
Figure 10.Spatiotemporal patterns of uncertainty for five different environmental zones in Germany.The locations of the different zones are depicted on the map on the upper right.The presented hydrologic variables are evapotranspiration (E a ), soil moisture (SM), recharge (R), and grid-cell-generated runoff (Q G ).The uncertainty ranges and the ensemble median refer to the left ordinate (black and gray), whereas the normalized uncertainty range refers to the right ordinate (blue).The reference period for the climatological values is 1951-2010.

Figure A1 .
Figure A1.Panel (a) shows the empirical variogram (blue circles) and a fitted exponential variogram (red curve) for the entire domain of Germany as well as fittings for sub-domain (block) variograms (gray lines).The 52 sub-domains (blocks) are depicted in (b).

Figure A2 .
Figure A2.Evaluation of the interpolation at precipitation stations based on a leave-one-out cross-validation strategy, i.e., the Jackknife method.The performance criteria from the individual stations are interpolated to a 4 km × 4 km grid using ordinary Kriging.The panels denote different performance metrics: (a) bias, (b) relative bias, (c) Pearson correlation coefficient, and (d) root mean squared error (RMSE).

Figure B1 .
Figure B1.Relation between land surface and hydro-climatic conditions and model performance for the 222 river basins.The location of the basins is depicted in Fig. 4. The mean and standard deviation (SD) of a characteristic for the single basins are based on the morphological input data at the 100 m × 100 m resolution.

Table 1 .
Basin properties and water balance characteristics of the seven major German river basins.The geographical location of the basins is depicted in Fig.1.Abbreviations: avg -average, SD -standard deviation, min -minimum, max -maximum, P -precipitation, Qstreamflow, E a -evapotranspiration (P − Q), E p -potential evapotranspiration.

Table 2 .
Evaluation of evapotranspiration E a and soil moisture SM at seven eddy covariance stations.The evaluation is based on daily and monthly values for the available observation period.The location of the eddy stations is depicted in Fig.1.Abbreviations: RMSE -root mean squared error, ρ -Pearson correlation coefficient, E a -evapotranspiration, SM -soil moisture.
indicate room for improving the representation of vegetation dynamics within mHM.The sites covered by cropland showed the largest deviations from evapotranspiration observations because croplands are highly human-influenced (seeding, harvest, or eventually irrigation), which makes it difficult to model their dynamics at the local scale.Additionally, cropland is generalized in a mixed land cover class in mHM.Soil moisture estimations at the same locations have been in good agreement with the observed dynamics.