Interactive comment on “ A Climate Data Record ( CDR ) for the global terrestrial water budget : 1984 – 2010 ” by Yu

Abstract. Closing the terrestrial water budget is necessary to provide
consistent estimates of budget components for understanding water resources
and changes over time. Given the lack of in situ observations of budget
components at anything but local scale, merging information from multiple
data sources (e.g., in situ observation, satellite remote sensing, land
surface model, and reanalysis) through data assimilation techniques that
optimize the estimation of fluxes is a promising approach. Conditioned on
the current limited data availability, a systematic method is developed to
optimally combine multiple available data sources for precipitation ( P ),
evapotranspiration (ET), runoff ( R ), and the total water storage change
(TWSC) at 0.5 ∘ spatial resolution globally and to obtain
water budget closure (i.e., to enforce P - ET - R - TWSC =  0) through a constrained
Kalman filter (CKF) data assimilation technique under the assumption that
the deviation from the ensemble mean of all data sources for the same budget
variable is used as a proxy of the uncertainty in individual water budget
variables. The resulting long-term (1984–2010), monthly
0.5 ∘ resolution global terrestrial water cycle Climate Data
Record (CDR) data set is developed under the auspices of the National
Aeronautics and Space Administration (NASA) Earth System Data Records
(ESDRs) program. This data set serves to bridge the gap between sparsely
gauged regions and the regions with sufficient in situ observations in
investigating the temporal and spatial variability in the terrestrial
hydrology at multiple scales. The CDR created in this study is validated
against in situ measurements like river discharge from the Global Runoff
Data Centre (GRDC) and the United States Geological Survey (USGS), and ET
from FLUXNET. The data set is shown to be reliable and can serve the
scientific community in understanding historical climate variability in
water cycle fluxes and stores, benchmarking the current climate, and
validating models.


Introduction
Quantification of the terrestrial water budget and its evolution over time at fine spatial resolutions is critical to understanding the availability and variability of Earth's terrestrial water budget and the exchanges and interactions among the terrestrial, atmospheric, and oceanic branches of the hydrosphere, and to assess the risk of hydrological extremes such as floods and droughts at regional to global scales.Understanding the mean state and variability of the terrestrial water budget is also one of the primary goals of World Climate Research Programme's (WCRP) Global Energy and Water EXchanges (GEWEX; Morel, 2001) project and the Y. Zhang et al.: A Climate Data Record National Aeronautics and Space Administration (NASA) Energy and Water cycle Study (NEWS; NASA NEWS Science Integration Team, 2007).The overarching goal of GEWEX is to "reproduce and predict, by means of suitable models, the variations of the global hydrological regime, its impact on atmospheric and surface dynamics, and variations in regional hydrological processes and water resources and their response to changes in the environment, such as the increase in greenhouse gases" (http://www.gewex.org).The grand challenge of the NEWS project is "to document and enable improved, observationally based predictions of energy and water cycle consequences of Earth system variability and change" (http://www.nasa-news.org).Toward these goals, a number of Earth System Data Records (ESDRs) for the major components of the terrestrial water budget are developed under NASA's Making Earth Science Data Records for Use in Research Environments (MEaSUREs) program.While the MEaSUREs program refers to long-term, satellitebased data records as ESDRs, they are generally referred to as Climate Data Records (CDRs) following the National Research Council report where a CDR is defined as "a time series of measurements of sufficient length, consistency, and continuity to determine climate variability and change" (National Research Council, 2004).We will refer to the data set developed and described in this paper as a CDR.
The terrestrial water budget consists of four major components: precipitation (P ), evapotranspiration (ET), runoff (R), and total water storage change (TWSC) as shown in Eq. ( 1).The TWSC over a time interval is balanced by the difference between the incoming water flux of P and outgoing water fluxes of ET and surface and subsurface R for a control volume from the Earth's surface to a lower bound at depth: TWSC = P − ET − R. (1) In situ observations are often considered as the ground "truth" to quantitatively estimate the water budget terms.However, limited network coverage, especially for datasparse regions, has resulted in a long-time challenge for assessing the terrestrial water budget.Presently, satellite remote sensing has become a major data source to measure the various terms because of its generally global coverage and sufficient temporal repeat times.A number of satellite-based products have been developed to estimate precipitation over the globe, including the Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA; Huffman et al., 2007Huffman et al., , 2010)), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks -Cloud Classification System (PERSIANN-CCS; Hong et al., 2007), and the Climate Prediction Center MOR-PHing method (CMORPH; Joyce et al., 2004).For evapotranspiration, global estimates can be derived from a combination of satellite surface radiation budget (SRB), surface meteorology and vegetation cover (Fisher et al., 2008;Mu et al., 2007;Vinukollu et al., 2011;Zhang et al., 2010Zhang et al., , 2015)).
With the NASA Gravity Recovery And Climate Experiment (GRACE) mission, launched in March 2002 (Landerer and Swenson, 2012;Tapley et al., 2004;Wahr et al., 2004), the changes in gravity detected by the GRACE satellites can be used to derive estimates of TWSC, albeit at relatively coarse scale.GRACE has been widely used to study changes in the terrestrial water storage (Rodell et al., 2009(Rodell et al., , 2011)), the terrestrial water budget (Long et al., 2014a, b;Pan et al., 2012;Sahoo et al., 2011;Gao et al., 2010;Sheffield et al., 2009;Wang et al., 2014), and hydrological extremes such as droughts (Thomas et al., 2014;Famiglietti, 2014).For runoff, earlier studies estimated the global mean terrestrial runoff by simply calculating the differences between precipitation and evapotranspiration under the assumption of negligible longterm total water storage change (Berner and Berner, 1987;Baumgartner and Reichel, 1975).However, this "inferred" runoff estimation approach can only be applied to estimate the long-term mean since water storage change cannot be neglected at short temporal scales, e.g., daily, monthly, or seasonally.Furthermore, human interaction with the storage might also play an important role.For example, reservoir filling after construction, interannual reservoir storage changes, and groundwater pumping (Rodell et al., 2009;Famiglietti, 2014;Voss et al., 2013) can significantly contribute to observed storage changes at regional scales.As an alternative, river discharge can be estimated from satellite altimetry (Birkett et al., 2002;Berry et al., 2011), for example, the future Surface Water Ocean Topography (SWOT) (Durand et al., 2010) mission.These satellite missions provide a promising and cost-efficient way of estimating individual water budget components.However, when combined together, they do not close the water budget because of errors in the individual component estimates.Sheffield et al. (2009 found that high bias in satellite precipitation, particularly in the summer, was the major factor in budget non-closure over the Mississippi River basin.Gao et al. (2010) also concluded that water budget closure over 13 large continental rivers in the US was not achieved using remote sensing data mainly due to the biases in precipitation and ET.
In addition to space-borne satellites, our understanding of the hydrological cycle in data-scarce regions has also depended on other data sources such as land surface models (LSMs) (Trenberth et al., 2007;Trenberth and Fasullo, 2013b) and weather/climate reanalysis (Reichle et al., 2011).Offline LSM simulations can provide long-term budget estimates with closure by design (Nijssen et al., 2014;Sheffield and Wood, 2007;Trenberth et al., 2007;Oki and Kanae, 2006).Reanalysis model output provides information that can be used to estimate the water budget at basin to continental (Betts et al., 2003a(Betts et al., , b, 2005) ) and global (Reichle et al., 2011;Balsamo et al., 2015) scales.These large-scale land surface and reanalysis models have pushed the global water budget inventories into a new era where sparse traditional in situ observations are supplemented.
However, different types of uncertainties exist in these sources of information, including those in the parameteri-zations (satellite retrieval algorithms, LSM, and reanalysis process representations), in LSM parameters such as soil and vegetation properties, forcing data (surface radiation and meteorology) and in reanalysis data assimilation procedures.Therefore, an optimal "combination" of all data sources, including in situ and remote sensing, LSM, and reanalysis data, with their extensive spatial coverage and fine resolution, has the potential to overcome the limitation of relying on a single data source, and to offer improved accuracy, spatial and temporal coverage, and consistency in creating long-term, large-scale water budget information at fine spatial resolutions (Pan et al., 2012).
To address the non-closure problem, techniques have been developed to assess the uncertainties of each budget component and to enforce water budget closure from either multiple data sources (Pan et al., 2012) or single source (Sahoo et al., 2011), usually at the scale of major river basins across the globe.For example, Rodell et al. (2015) recently quantified the mean annual and monthly water budgets over continents and ocean basins for the first decade of the 21st century by using data sets that combine satellite remote sensing and conventional observations.In this study, the constrained Kalman filter (CKF), which is a simplified version (non-ensemble) of the constrained ensemble Kalman filter (CEnKF; Pan and Wood, 2006), is chosen to close water balance.The CKF is a non-ensemble form, and is a stand-alone procedure after a regular Kalman filter update; thus, it is ideal for closing the water balance without filtering or data assimilation.
Building on an increasingly available inventory of global water budget data sets from in situ, satellite, reanalysis, and land surface models, the study reported here has five advances over previously reported work.These are to (1) expand the use of the CKF data assimilation technique in closing the water budget from that reported by Pan et al. (2012) and Sahoo et al. (2011), (2) extend the data records back in time to 1984(vs. 2000in Rodell et al. (2015) ) and forward to 2010 (vs.previous analyses which usually stop near the turn of the 21st century), (3) refine the spatial resolution to 0.5 • for the land surface (vs.basin-scale analysis in Pan et al., 2012and Sahoo et al., 2011, and continental and oceanic analysis in Rodell et al., 2015and Trenberth and Fasullo, 2013a) and account for the oblateness of Earth, (4) develop a harmonized global terrestrial water cycle CDR by merging the full combination of in situ and satellite remote sensing observations, LSM simulations, and reanalysis model outputs at monthly and 0.5 • spatial resolution for the period of 1984-2010 (the CDR data set includes estimates for all major terrestrial water budget variables, i.e., P , ET, R, and TWSC, with budget closure at the grid scale), and (5) validate the CDR against in situ observations not used in the development of the data set.
To the authors' knowledge, this paper presents the first attempt to estimate over multiple decades the global terrestrial water budget (Greenland and Antarctica excluded), with closure at a 0.5 • grid scale using this diversity of observational  (Pan et al., 2012).data sources.The data set provides comprehensive and detailed information for water budget analyses over land and will be of particular significance in those sparsely gauged or ungauged regions for understanding historical climate variability of the water cycle, and for benchmarking and validating climate models.
In developing the data set, significant challenges are faced that need to be addressed.These include the following: 1. How consistent are the different products at different spatial scales?
2. What is the best approach to assess the uncertainty of each individual product and then optimally merge them?
3. What is the spatial and temporal variability of the nonclosure errors, and how can they be attributed?
Given the developed CDR, a key question is whether the merged data set is in agreement with in situ observations and thus able to capture historical hydroclimatological events (e.g., floods and droughts).Section 2 introduces the data sources and the methodology.Section 3 carries out a consistency and uncertainty analysis for the multiple input data sources and investigates the spatial variability of the non-closure errors and their attribution during the budget closure enforcement process.Budget estimates based on the closure constrained data set are presented at global, continental, and large basin scales.Then, the CDR is validated against in situ runoff and ET in Sect. 4. Conclusions from the research and future work are discussed in Sect. 5.

Data description, analysis, and methodology
In this study, the water budget is estimated and constrained at 0.5 • , monthly, for the global land area excluding Antarctica and Greenland.In addition, continental-and basin-scale budget estimates are also provided, including six continents and 32 major basins (Fig. 1) from across the world with a range of climatic regimes.Information about the input data sources (data length, original spatial and temporal resolutions, and references) is listed in (CLM) and NOAH land surface model are used for seasonal cycle analysis but are not included later in the merging and constraining algorithm because of significant disagreement between their seasonal cycles and observations, as discussed in Sect.2.1.3and 2.1.4.The 27-year period is divided into four consecutive subperiods (1984-1997, 1998-2002, 2003-2007, and 2008-2010) based on the data availability and overlap (Table 2).Note that the total water storage from GRACE for the initial year ( 2002) is excluded from the study due to missing values.
3 Input data sets

Precipitation
A set of precipitation products is evaluated including the remote sensing precipitation product from Colorado State University (CSU; Bytheway and Kummerow, 2013) with uncertainty estimates, the gauge-based Global Precipitation Climate Centre (GPCC) product (Schneider et al., 2014), the multi-source merged products of the Princeton Global Forcing data set (PGF; Sheffield et al., 2006), and the Climate Hazard group InfraRed Precipitation with Stations (CHIRPS; Funk et al., 2014).Please refer to Supplement I for more information on these data sets.
Figures 2 and 3 show the seasonal cycles of these four precipitation products over six continents and over 12 selected representative basins distributed in different continents and climate regimes, for their overlapping period of 1998-2010.The coefficient of variation (CV), calculated as the standard deviation divided by the ensemble mean, is plotted to quantify the uncertainties among the precipitation product ensemble of PGF, CSU, GPCC, and CHIRPS.The CV is first calculated for each grid cell and then averaged over continents or basins.There is no spatial coverage beyond 50 • N, 50 • S from CSU or CHIRPS.Therefore, only the grids between 50 • N and 50 • S are used to calculate the seasonal cycles in Fig. 2. Likewise, in Fig. 3, only PGF and GPCC are compared over those basins which are either outside or extend poleward of 50 • N (or 50 • S) (e.g., Lena and Mackenzie river  basins).Similar to the conclusion of Pan et al. (2012), who examined a different set of data sets, the spread among these four products is higher in the densely gauged continents in Europe and North America (and basins in those two continents such as the Danube and Mississippi), with a CV ranging from 5 to 12 and 2 to 8 %, respectively (Figs. 2 and 3), than in the sparsely gauged regions, such as the Amazon (Fig. 3).There is an "abnormal" high spread (high CV) for the Niger River basin (sparsely gauged) during the dry season because the ensemble mean of the four precipitation products is close to zero (Fig. 3).The uncertainties are also high for the Mekong River basin where the rainfall totals are high and dominated by the monsoon season (Fig. 3).The high uncertainties in less densely gauged regions could originate from the different gauge densities from different products or the ways in which the data are merged and gridded.It is interesting to note that the average discrepancy between the highest estimates (CSU) and the lowest (CHIRPS) over Europe is around 15 mm month −1 throughout the year (Fig. 2).This discrepancy is more prominent at basin scales; for example, the monthly mean difference between CSU and CHIRPS in the densely gauged basins such as Danube and Mississippi is around 20 mm month −1 (Fig. 3).CHIRPS is a blended precipitation product (e.g., precipitation climatology, remote sensing from multiple sources, seasonal forecast form Climate Forecast System version 2 (CFSv2), and in situ observations) but it is dominated by gauge corrections in regions with higher gauge density such as Europe and North America, and therefore in basins such as the Danube and Mississippi.The differences among the three gauge-merged products (PGF, GPCC, and CHIRPS) might possibly be from the different data sources that they merge rather than from gauge observations, different numbers of gauges used, and undercatch corrections.The seasonal cycles in Figs. 2 and 3 are consistent with the climate regimes, e.g., the inversed sea- sonality in the Murray-Darling Basin, the high peak in South America in March, and wet summer in low latitudes.

Evapotranspiration
Unlike precipitation with relatively dense in situ observations, especially for developed regions, in situ based evapotranspiration estimations (from flux towers) are very sparse.
Here, we collect 10 gridded global terrestrial ET products, of which 5 are satellite derived, 2 are reanalysis products, and 3 are from land surface models.One satellite product is the Global Land Evaporation Amsterdam Model (GLEAM; Miralles et al., 2011).As parts of the MEaSUREs products, the four other satellite products are derived using two algorithms, the Penman-Monteith (PM) and Priestley-Taylor (PT), cross-combined with two forcing inputs that are different from the other six ET products, the SRB-CFSR (Surface Radiation Budget -Climate Forecast System Reanalysis), and SRB-PGF.These four products are referred to as SRB-CFSR-PM, SRB-CFSR-PT, SRB-PGF-PM, and SRB-PGF-PT (Vinukollu et al., 2011).Satellite remote sensing carries the mission of observing Earth at fine spatial resolution and comprehensive coverage and makes it possible to estimate water budget in sparsely gauged regions.Therefore, five satellite ET products are merged into the CDR.The  and 3).At continental scales (Fig. 4), the reanalysis ET products (ERA-Interim and MERRA) generally have relatively high values for the six continents, while the LSMs generally predict lower values over Asia, Europe, and North America.Most of the satellite ET products (i.e., GLEAM, SRB-CFSR-PM, SRB-CFSR-PT, SRB-PGF-PM, and SRB-PGF-PT) lie between the reanalysis and LSMs in Asia, Europe, and North America.More striking is the relative lack of consistency among those 10 ET products for the wet tropical basins (Amazon, Congo, and Mekong).The seasonality of ET over these basins is complex because of the overall energy limitation but seasonally and spatially varying moisture limitation (Guan et al., 2015).These results imply that the 10 approaches have significant differences in their derived surface radiation budget and meteorology as well as the parameterizations of evaporative processes (potential ET, transpi-  ration, interception, and soil evaporation) and their interaction with phenological and environmental controls.The relatively higher consistency of the remotely sensed algorithms for these basins is in part a result of using the same (or closely similar) surface radiation budget but different meteorological forcings.

Runoff
The three LSMs are forced by the same meteorological forcing from PGF to simulate global runoff over land.The VIC simulation was calibrated over 43 well-distributed major global basins against the measured streamflow data from the Global Runoff Data Centre (GRDC; http://grdc.bafg.de)while CLM and NOAH are uncalibrated.Please refer to Sup-  plement I for additional model information under the evapotranspiration section.Figures 6 and 7 display the seasonal cycles over the six continents and the 12 representative major river basins.NOAH shows opposite seasonal cycle against VIC and CLM in Europe and North America, which include high-latitude regions (Fig. 6).Unlike VIC and NOAH, CLM almost shows no seasonal cycle in Oceania (Fig. 6).The disagreement between the LSMs can also be found at basin scales (e.g., Danube, Lena, Mackenzie, Yukon, and Murray-Darling in Fig. 7).Additionally, Fig. 8 shows the verification of the runoff from the LSMs against GRDC observations for 26 basins that have available data records longer than 3 years during 1984-2010.NOAH shows a negative runoff bias against GRDC for most of the midlatitude to high-latitude basins (Columbia, Danube, Indigirka, Kolyma, Kena, Mackenzie, Northern Dvina, Ob, Olenek, Pechora, Yenisei, and Yukon; Fig. 8).CLM has better performance over high-latitude basins than NOAH but it shows a high overestimation of runoff for the Danube and Don (Fig. 8).None of the LSMs capture the seasonal cycles for the Indus and Senegal basins.Nonetheless, the authors recognize that runoff estimates using a number of LSMs (e.g., Haddeland et al., 2011) can provide uncertainty estimates (i.e., spread or standard deviation among different data sources) in simulated runoff.However, CLM and NOAH runoff estimates are not merged into the CDR developed in this study in order to avoid the large biases from their uncalibrated parameters.Additional reasons for not merging CLM and NOAH are discussed in Sect.2.1.4.

Total water storage change
The TWSC, which measures the changes in total water storage during the specific unit period, is taken from the LSMs and the GRACE data.The GRACE monthly total water storage anomaly (TWSA) time series, which are anomalies relative to the 2004-2009 time-mean baseline from ReLease 05 (RL05), that are processed by three centers, GeoForschungsZentrum Potsdam (GFZ), Center for Space Research (CSR) at University of Texas at Austin, and Jet Propulsion Laboratory (JPL), are used to calculate the TWSC via the backward difference equation in Eq. ( 2) and central difference equation in Eq. ( 3).Comparisons indicate that the central difference calculation (Eq. 3) is in better agreement with the VIC-inferred TWSC.Therefore, the central difference TWSC has been used.
Different parameters and solution strategies were explored and applied by these three processing centers and the differences between the centers have generally decreased over the releases (https://grace.jpl.nasa.gov/data/choosing-a-solution/).Even though VIC only computes the water storage in the upper few meters of the soil column (depending on the calibrated storage capacity in its second and third layers), this is the most active part of the soil column.Therefore, studies (e.g., Gao et al., 2010;Tang et al., 2010) found reasonable agreement between changes in TWSC from GRACE and the VIC model.Similar results were also found in this study: TWSC from VIC and GRACE (from GFZ, CSR, and JPL) are in good agreement at both continental (Fig. 9) and basin scales (Fig. 10) except for some timing lags in the high-latitude basins of the Lena and Yukon.This lag between GRACE-derived minimum TWSC and VICinferred minimum TWSC suggests that the snowmelt (and subsequent runoff) starts earlier in VIC than observed by GRACE or that more snowmelt ponds into wetlands or discharges into lakes, neither of which are well represented in VIC as its snowmelt discharges more directly into rivers.In contrast, NOAH shows a reversed seasonal cycle in those high-latitude continental regions such as Asia and North America, and basins such as the Danube, Lena, Mackenzie, and Yukon, while CLM shows disagreement in the seasonal cycle in Oceania as well as in the Danube and Mississippi basins relative to GRACE observations.Not surprisingly, the spread within the three GRACE products is very small compared to the differences against VIC (Figs. 9 and 10).Sakumura et al. (2014) found that the ensemble mean (simple arithmetic mean of JPL, CSR, and GFZ) was the most effective method in reducing the noise in the gravity field solutions within the available scatter of the solutions.Therefore, the ensemble mean of the TWSC from GFZ, CSR, and JPL is taken as the best TWSC product derived from GRACE, and this is used in the later water budget analysis together with TWSC from VIC.

Methods
All the data sets, as listed in Table 1, are first aggregated or disaggregated to 0.5 • spatial and monthly values using bilinear interpolation; then, the errors/uncertainties of each product are assessed.Estimates for the same water budget variable are then merged following the algorithm described in Luo et al. (2007).The merged water budget estimates are further adjusted to ensure closure at every grid using the CKF approach of Pan et al. (2012).Then, the unconstrained and constrained water budgets are analyzed at different scales.
Figure 11 provides a flow chart of the procedure.

Uncertainty estimation and data merging technique
There is no best estimate or observation of each individual water budget component at the grid scale over the globe due to the limited spatial coverage of in situ measurements.This is especially true for evapotranspiration observations from the flux tower networks.Thus, the limited availability of gridded ground observations makes it impossible to quantify the error in each water budget component.Therefore, in this study, the deviation from the ensemble mean of all data sources for the same budget variable is used as a proxy of the uncertainty/error in individual products.The merging procedure for each budget component is a weighted averaging where the optimal merging weight w i is given by the following equation (Luo et al., 2007;Sahoo et al., 2011): in which w i is the merging weight for product i, σ 2 i is the error variance of product i calculated against the ensemble mean, and n is the total number of products.Note that w i equals 1.The larger the error variance of product i, the lower its weight.The number of products merged into single water budget estimate varies in the different subperiods due to the data availability (Table 2).A "data consistency adjustment" is applied after the data merging process in order to guarantee the consistency of the CDR estimated in this study.Taking precipitation as an example, first, for the period with complete data records (i.e., 1998-2008), the inner-annual monthly mean precipitation merged from all the available products (i.e., PGF, GPCC, CHIRPS, CSU) and the mean precipitation merged from the available products (i.e., PGF, GPCC, CHIRPS) during the incomplete data records period (i.e., 1984-1997 during which the CSU is not available) are calculated, respectively.Then, the interannual monthly climatological bias, which is the monthly mean precipitation merged from PGF, GPCC, CHIRPS, and CSU minus that merged from PGF, GPCC, and CHIRPS, is simply added to the interannual monthly mean precipitation during the incomplete data records period (i.e., 1984-1997).This "data consistency" approach aims to avoid the "jump" in the merged precipitation time series in the year 1998 when the CSU became  available.The same procedure is then applied to adjust the data consistency for ET during 2008-2010and TWSC during 1984-2002. .We contend that this is a key step, as the temporal consistency of the CDR will impact the reproduction of historical hydrological extremes and the analysis of long-term trends for all the available water budget variables.

Enforcing water budget closure using CKF
In short, CKF redistributes the non-closure errors back onto the various water budget components according to their error levels and correlations.We define the water balance residual as r = P − ET − R − TWSC.If we write the budget com- The residual term r is redistributed back onto the various water budget variables through the above equation.Mathematically, the CKF algorithm mimics assimi lating a "perfect" (zero-error) observation of r = 0. Further details are presented in Pan and Wood (2006).In this study, the error of runoff is simply assumed as 10 %, as VIC is the single source of runoff.This is highly empirical, as it is based on the authors' knowledge and confidence about the VIC model calibration given there is no global grid-level (0.5 • in this study) runoff observations to quantify the error.The water budget closure is done monthly based on variational error from month to month.
4 Water budget merging and constraint

Data merging
All the products for the same water budget component are merged into a single estimate based on their uncertainties/errors relative to their ensemble mean as described in Sect.2.2.1.The values in Table 2 summarize the mean merging weights of each individual product for different periods.
Please refer to Figs.S1 to S3 in Supplement II for the spatial maps of the merging weights from different products.The global mean merging weights for the precipitation are calculated over 50 • N-50 • S during 1984-1997 and 1998-2010.CHIRPS and CSU only cover 50 • N-50 • S; therefore, for those regions outside 50 • N-50 • S, PGF and GPCC are merged with equal weights (50 %).Before the availability of the CSU product in 1998, the average merging weights of PGF, GPCC, and CHIRPS over 50 • N-50 • S (land) are 29.6, 34.6, and 35.8 %.CHIRPS is closest to the ensemble mean especially for the Amazon Basin and therefore has a higher weight in that region (Fig. S1 in Supplement II).For the period of 1998-2010 when CSU becomes available, CHIRPS (26.5 %), GPCC (26.8 %), and CSU (26.0 %) have similar weights.Note that the weights vary with time and location.The annual mean of the merged precipitation is 767.0 mm for 1984-1997, 792.7 mm for 1998-2002, 786.7 mm for 2003-2007, and 802.9 mm for 2008-2010 (Table 3).Equivalent numbers at monthly scale are displayed in Fig. 12 in terms of global maps.The values from Table 3 and Fig. 12 are calculated using the data consistency adjustment described in Sect.2.2.1.
The runoff simulated from VIC is used as the "merged" terrestrial runoff at the grid scale since the gauge observations are discrete and spatially incomplete.The annual averaged runoff over the globe is 338.9 mm year −1 during 1984-2010 (Table 3; see Fig. 12 for the spatial maps for the subperiods).
For the total water storage change, the uncertainty in VICinferred storage change and GRACE-derived storage change is simply assumed to be 5 and 10 % of their actual values due to the lack of a better source for their validation (Pan et al., 2012).Consequently, the higher merging weight from VIC (67.1 %) and lower merging weight from GRACE (32.9 %) in Table 2 (and Fig. S3 in Supplement II for the spatial maps of merging weights) are a result of the assigned error ratios (i.e., 5 and 10 %).Given the good agreement in TWSC between VIC and GRACE (Figs. 9 and 10), the impact of such a subjective error assignment is relatively small.Table 3. Annual mean water budgets (mm year −1 ) over the globe (Greenland and Antarctica excluded) before (normal font) and after (in bold) water balance constraint, and their attributions (in italic) to non-closure error throughout subperiods.1984-1997 1998-2002 2003-2007 2008-2010 1984 However, for a high-latitude basin such as the Yukon where VIC and GRACE have relative large discrepancy, the error is relatively high.Globally, the monthly mean of TWSC is almost zero during the four subperiods as shown in the fourth row of Fig. 12. Nonetheless, multi-year variability due to drought and wet periods is observable.For example, the long-term drought in the central US and Canadian prairies over the 1998-2002 period shows up along with the Brazilian droughts in 1994-1995 and 2004-2005 that extended into Argentina (2004Argentina ( -2006)).Also seen in Fig. 12 is the wetting trend over the last two decades of the Sahel since the severe mid-1980s drought as well as the floods in Brazil in 2008.

Data assimilation to close the water budget
The last row of Fig. 12 shows the global maps of the nonclosure errors for the subperiods.The long-term mean nonclosure error relative to precipitation is around −9.8 % over land during 1984-2010 (Table 3).The annual mean imbalance over land ranges from −55.3 to −80.6 mm year −1 during the four subperiods (Table 3).Figure 13 shows an example of the unconstrained (Fig. 13a, c) and constrained (Fig. 13b, d) water budgets for the Amazon Basin together with imbalances (Fig. 13e) and their attribution (Fig. 13f).Over the Amazon Basin where the total precipitation is large and the gauges are sparse, the precipitation uncertainty is higher.This results in precipitation being the main recipient of the non-closure error attribution (Fig. 13f), receiving around 50 % of the non-closure error for each of the subperiods as well as the complete analysis period.Due to the "inconsistencies" in terms of different numbers of available data sources merged into the budget during the four consecutive subperiods, the imbalance/non-closure error (Fig. 13e) does not show a regular seasonal cycle and a continuous pattern of imbalance.
The annual mean water budget in terms of P , ET, R, and TWSC after water balance constraint is 781.8, 463.9, 318.0, and 0 mm during 1984-2010, respectively (Table 3).Note that direct application of CKF to enforce the water balance without other constraints may possibly lead to a non-zero TWSC over a long term and sometimes a negative runoff.Therefore, two additional "filters" are added after the CKF.First, if the runoff is negative, we will re-run the CKF and only redistribute the non-closure error onto the other three budget components.Second, for each grid cell, if the longterm mean TWSC over 1984-2010 is not zero, the monthly long-term mean TWSC will be subtracted from the TWSC and added to the precipitation and evapotranspiration month by month during 1984-2010.Figure S4 in Supplement II shows the mean water budget components after the CKF water balance enforcement in addition to the mean water budget components before the enforcement in Fig. 12.The longterm mean of TWSC at each grid cell is zero over the entire 27 years after the second filter, which is also named as "TWSC detrending".Though at regional scales, some places have experienced groundwater depletions such as the US high plains and central valley, western Iran, India, etc., starting from different years.One of the challenges is a lack of data on groundwater extractions.Therefore, from the global perspective, for almost three decades during the study period  covered by this study, the authors assume the long-term TWSC to be zero and thus apply the detrending, after which the spatial variability of TWSC still exists during the four subperiods (Fig. S4).The "zero TWSC" assumption would potentially introduce local/regional bias into the water budget estimates in the regions with groundwater depletions.A more comprehensive comparison of the water budget esti-  1997, 1998-2002, 2003-2007, 2008-2010, and 1984-2010; the numbers listed on each subpanel are the monthly mean value for each merged water budget variable before water balance constraint (mm month −1 ) during different subperiods (Greenland and Antarctica excluded).This is the same as in Figs.S4 and S5 in Supplement II but for the merged water budget variable after water balance constraint (mm month −1 ) and the non-closure error attributions to each water budget variable (%) ).Hydrol.Earth Syst.Sci., 22, 241-263, 2018 www.hydrol-earth-syst-sci.net/22/241/2018/ mation before and after the closure enforcement is listed in Table 4 at both the continental and basin scales.These water budget component values are spatially and temporally aggregated for each continent or basin over the analysis period of 1984-2010.
The attribution of the non-closure term for each water budget variable is based on the uncertainties among different products.The results from this study are in general agreement with Pan et al. (2012) where the authors showed that ET has a high non-closure attribution in a large portion of the 32 river basins that they analyzed.The average attribution of non-closure errors to ET over the globe is 45.4 % during 1984-2010 compared to 38.4 % for P , 4.9 % for R, and 11.2 % for TWSC (see Table 3).For most of the regions, ET receives the highest attribution of the non-closure error, particularly in Africa (50 % attributed to ET vs. 37 % to precipitation, 3 % to runoff, and 10 % to TWSC; see Table 4) and Oceania (46 % attributed to ET vs. 41 % to precipitation, 2 % to runoff, and 10 % to TWSC; see Table 4).Figure S5 additionally shows the global maps of the mean water budget non-closure error attribution during different subperiods.Higher attributions to precipitation occur in basins in midlatitudes to high latitudes such as the Danube (42 % to precipitation vs. 38 % to ET, 6 % to runoff, and 12 % to TWSC; see Table 4) and Don (42 % to precipitation vs. 39 % to ET, 3 % to runoff, and 16 % to TWSC; see Table 4), where the estimation of extreme rainfall rates remains less well resolved (Huffman et al., 2007;Yong et al., 2014).High non-closure attributions to precipitation also occur in tropical basins such as the Amazon (46 % to precipitation vs. 33 % to ET, 9 % to runoff, and 12 % to TWSC; see Table 4) and Congo (46 % to precipitation vs. 37 % to ET, 6 % to runoff, and 11 % to TWSC; see Table 4) because the precipitation is large and the gauges are scarce in these basins.The attribution to the total water storage change is generally small except for the northern regions where snow, ice melt, and seasonal storage changes in wetlands dominate the water budgets (Fig. S5 in Supplement II).Runoff receives the smallest attribution of the imbalance among the four water budget components for most regions over the globe, which is in agreement with what was concluded in Sahoo et al. (2011).The mean attributions of each water budget component over different continents and basins over 1984-2010 are listed in Table 4 as well.

Validation of the MEaSUREs global terrestrial water budget CDR
The final CDR, which is the constrained global water budget with closure, is validated against in situ observations in terms of runoff and ET at multiple spatial scales.

Runoff verification
In situ river discharge observations are collected from three major data sources: (1) GRDC, (2) USGS, and (3) the Australian Land and Water Resources Audit project (Peel et al., 2000).The observations were collected from GRDC for a total number of 32 large basins and 26 of them are used (as shown in Fig. 8) after filtering out those basins with less than 3 years of data during 1984-2010.Figure 1 provides the locations of these basins.A total of 165 out of a total of 362 medium sized basins (5000 to 10 000 km 2 , 331 from GRDC, and 31 from USGS) were selected for validations.For validation over small basins, discharge data for 862 basins (1000 to 5000 km 2 ) were collected from GRDC, USGS, and the Australian Land and Water Resources Audit project.Basins under any one or more of the following conditions were excluded: (1) GRDC basins for which the catchment boundaries could not be reliably determined; (2) basins with large dams (reservoir capacity greater than 10 % of annual streamflow); (3) basins with urban areas greater than 2 % (using the "artificial areas" class of the map from GlobCover, version 2.3; Bontemps et al., 2011); (4) basins with irrigated areas greater than 2 % (using the Global Irrigated Area Map; http://waterdata.iwmi.org/Applications/GIAM2000/;and (5) basins with either a gain or loss forest (change in land cover) > 20 % of the basin area.For both the medium and small basins, those basins with data records length less than 5 years were also excluded.Figure 14a displays the locations of medium and small basins.The observed discharge data were converted to runoff by dividing by the basin area upstream of the gauge location.
The seasonal cycles of runoff from the CDR created in this study over the 26 large basins are compared against the GRDC observations as shown in Fig. 8.Not surprisingly, the runoff estimated from the constrained system (grey dashed line) is not much different from the runoff estimated in the unconstrained system (which is VIC runoff shown by the solid blue line) as we assign a small error (10 %) on the runoff component within the budget constraint algorithm.In general, VIC outperforms the other two LSMs as VIC was calibrated over 43 major global river basins (Sheffield and Wood, 2007) although the calibration periods varied.Therefore, we believe that VIC can provide a reliable grid-scale estimate of runoff budget.Note that the seasonal peaks from NOAH and VIC are in agreement for the Indus Basin but their peaks precede the peak from the GRDC observations, which strangely happen in November.Comparison to other studies for the Indus River (Bookhagen and Burbank, 2010) shows that the discharge peak occurs in the summertime, which is consistent with VIC and NOAH.Likewise, for the Senegal River, records from regional studies (Andersen et al., 2001) and Stisen et al. (2008) show runoff peaks in August to September instead of April to May from the GRDC record.In summary, we believe that our CDR provides good runoff estimates over the Amur, Danube, Mackenzie, Mekong, Mississippi, Pearl, Pechora, Yangtze, and Yenisei rivers but unsatisfactory estimates over the Congo, Lena, Murray-Darling, and Yellow rivers in that the predicted seasonal discharge differs significantly from the observed seasonal cycle.The reasons for this are due to water management not being included in the VIC model (e.g., Murray-Darling and Yellow rivers), a combination of scarce data, and not including large wetlands (e.g., the Congo and Lena basins).
A test of significance was conducted to remove those medium and small basins with non-significant correlations between GRDC runoff observations and CDR runoff records.This was done in order to remove those basins such as Indus and Senegal which might have incorrect observational data.Figure 14 compares CDR estimated runoff against in situ observations for 165 medium basins and 862 small basins in terms of correlation coefficient (CC; Fig. 14a and b) and scatter plots (Fig. 14c and d) at the monthly scale.Again, the observed discharge measurements are converted to runoff using the basin area.A total of 84 out of 165 medium basins (∼ 51 %) and 625 out of 862 small basins (∼ 73 %) have CC values that are larger than 0.5 as shown in Fig. 14a and b.There are some medium basins with extreme low CC values (red dots in Fig. 14a) in northern Canada where the lake/wetland influences are not modeled, and in south Africa where the sporadic rainfall is not picked up and the model fails to replicate the quick runoff.The runoff from the CDR has CC values of 0.86 and 0.83 for the same medium and small basins, as shown in Fig. 14a and b, and has a bias ratio of 6 % for medium basins and −16 % for small basins (Fig. 14c and d).There is also a tendency for the model to underestimate runoff in the small basins in wetter regions (Fig. 14d).This scatter may be due forcing uncertainty, model calibration, or omitted processes like water management (reservoirs, irrigation), all which might shift the timing of the runoff peak, particularly on a monthly basis.For the small basins, though they were filtered in an attempt to remove basins impacted by factors such as reservoirs, irrigation, urbanization, and so forth, they might be impacted by the scaling issues.The CDR was computed at 0.5 • grid resolution, which is approximately 50 km near the Equator.The small basins range from 1000 to 5000 km 2 so that the small basins only cover a maximum of two grid pixels and a minimum of 0.2 of a grid pixel for the smallest basin.The basin masks were extracted at a higher spatial resolution and then aggregated onto the 0.5 • grids with the fractional area for the basin in order to minimize the impact of spatial mismatch.Nonetheless, the coarser spatial resolution of the CDR still affects the comparison of the runoff estimates with smallscale basin observations.No estimate of this resolution effect has been determined but the results shown in Fig. 14d suggest that the effect is limited to a small number of basins.

ET verification
Estimated ET is verified by two different approaches.First, against an inferred ET is computed from the difference between observed precipitation and observed discharge (P -R) at the annual scale to minimize the effect of seasonal TWSC.This is done for the 25 large, 169 medium, and 813 small basins which are selected by the criteria of no less than 5 years of annual records.In addition, it is then secondly verified against in situ observations from FLUXNET tower data (Baldocchi et al., 2001).
The precipitation used in computing the inferred ET is from GPCC, which is a gridded rain gauge analysis that merges around 67 000 gauge measurements globally (Schneider et al., 2014).The observed runoff, R, is from the same sources as used in Sect.4.1.As shown in Fig. 15, the correlation coefficients between MEaSUREs CDR ET and inferred ET from observed P − R are 0.97, 0.96, and 0.76 for those large, medium, and small basins.For some of the MEaSUREs CDR ET over medium basins, particularly wetter basins, they do not match well with the observed P − R and are attributed to the effects of water management on our estimates of R. Essentially, if the CDR runoff that does not reflect water management is too large, then the estimates of ET will be too low, which is what is seen in Fig. 15b.The small basins show worse agreements with the inferred ET with 20 % bias (vs. 4 % bias for large basins and −4 % bias for medium basins in Fig. 15) that we attribute to scaling effects in estimating R than the medium basins.
The ET estimates from the CDR are further assessed by comparing the grid-scale estimates with observations from 47 FLUXNET towers, which measure the turbulent latent flux using the eddy covariance technique.Those 47 flux towers were selected based on data availability (Michel et al., 2015) in terms of the meteorological variables and radiations, and the final selection represents a variety of biomes and dry/wet climate regimes.The raw data are at 3-hourly resolution and the most complete data were recorded during the warm seasons.Therefore, the comparisons are made only over the summer (warm) seasons by filtering out those years with less than 70 % data records based on the data availability at each tower.The 47 flux towers are located in four continents (North America, Europe, Asia, and Africa) as shown under different land covers that are defined by the International Geosphere-Biosphere Programme (IGBP; Loveland et al., 2000) in Fig. 16a.The tower stations are also described in  Table 5.The validations against the FLUXNET observations are only carried out during the warm season when ET is more dominant and when there are fewer missing values.From the 47 flux towers, we found out that our ET estimates from the CDR are in high agreement with FLUXNET observations under the land cover types WSA (woody savanna -one station in Africa and the other one in the US) and EBF (evergreen broadleaf forest -only one station in France; Fig. 16b).In general, our CDR ET matches well with the observation with a correlation coefficient of around 0.77 and a bias ratio of 11 % except for some over estimations for the stations, most of which are under the land cover types CRO (cropland) and ENF (evergreen needleleaf forest; Fig. 16b).The positive bias of MEaSUREs CDR ET relative to FLUXNET observations is attributed to the tower management -during the rainy days in the summer the flux towers are usually turned off and thus underestimate the actual ET during the rainy days.
6 Discussion and future work A well-constrained global inventory of the historical terrestrial water budget at fine resolution is essential to understanding the terrestrial hydrological cycle, its partitioning into individual components, and their variability at regional to global scales.In this study, the consistency and uncertainties of multiple hydrological data products are investigated, with precipitation found to have the highest consistency among the available products at both continental and basin scales compared to ET and TWSC.Data products from multiple sources that include in situ and satellite remote sensing observations, land surface model estimates, and reanalysis model outputs are combined to create homogenized terrestrial water budget estimates at 0.5 • spatial and monthly temporal scales for the period 1984-2010.This long-term water budget data record has both spatial and temporal consistency, and is part of NASA's ESDRs program.The CDR data set was created by applying a water balance closure constraint using the CKF data assimilation technique of Pan and Wood (2006).For the individual data products, their ensemble mean is taken as the best estimate for the variable, and the ensemble spread against the ensemble mean as a proxy for their uncertainty.These estimates of the mean and uncertainty for the product are important assumptions underlying the development of data records.The CDR is validated against ground observations, i.e., GRDC, USGS, and Australian Land and Water Resources Audit project for runoff and FLUXNET for ET, which seem not independent from the merged and constrained CDR.However, data developed from either satellite remote sensing or models are often calibrated against "ground truth", i.e., gauge observations, which  (Loveland et al., 2000).

Name
Available Additionally, the developed data set allows for the documentation of the water budget at continental and basin scales resulting in a better depiction across multiple scales.The attribution analysis of the budget imbalance (non-closure) shows that ET receives the largest adjustment in most regions -particularly in Africa and Oceania.In contrast, runoff tends to receive the lowest attribution of the non-closure error, in part due to the calibrated land surface model estimates from 43 large global basins.TWSC receives larger adjustments in high-latitude regions, which we attribute to the impacts from snowmelt and seasonal dynamics of wetlands and small lakes that are not well represented in VIC LSM.
Currently, the authors are carrying out another study in comparing the CDR water budget records against around 20 high-impacted studies at multiple spatial scales (i.e., continental and global).This ongoing study is the first attempt to gather and compare global water budget estimates from studies as early as 1974 (i.e., Budyko, 1974) to the current study in order to provide a comprehensive overview of global water budget estimates, even though the studies focused on different periods using different data sources and have different global coverage (e.g., some of them exclude Antarctica, Greenland, or both).Figure S6 in Supplement II gives an example comparison with Trenberth et al. (2007;T2007 hereafter), which estimated the water budget during 1979-2000 and excluded Antarctica.The total precipitation is quite close to this study (114 × 10 3 km 3 year −1 ) to T2007 (113 × 10 3 km 3 year −1 ).By converting the water budgets into mm year −1 based on the global coverage information available in each of those studies, the long-term mean precipitation is around 28 mm year −1 (vs.32 mm year −1 in the CDR and 27 mm year −1 from T2007), ET is around 78 mm year −1 (vs.78 mm year −1 in the CDR and 77 mm year −1 from T2007), and runoff is around 47 mm year −1 (vs.46 mm year −1 in this study and T2007).Figure S7 further provides an example of how the CDR captured the 1998-1999 US drought in terms of Standardized Precipitation Index (SPI) and drought extents calculated from CDR precipitation.The 6-month SPI exceeds the threshold of exceptional drought (which is defined by the US Drought Monitor system; http://droughtmonitor.unl.edu/AboutUSDM/DroughtClassification.aspx) around the year 1998.The CDR developed in this study, as a time series of measurements of sufficient length, consistency, and continuity, can also be applied to determine climate variability.The major challenge for the creation of ESDR/CDR of the terrestrial water budget (and potentially the terrestrial surface energy budget) is the lack of "ground truth" observations that can serve as reference data sets for bias correction.The sparseness of the observations in accessible data archives (e.g., GRDC for river discharge, GPCC for precipitation and publicly accessible and quality-controlled FLUXNET data) is both a scientific and institutional challenge.Many additional gauge locations and data records exist and could contribute to the development of improved CDR and our understanding of climate variability and change but have not been made available.Besides these operationally focused observations, the relative inaccessibility of global FLUXNET tower observations is also disappointing, although this situation has improved over the recent past.Even though there are over 650 towers in 30 regional networks covering five continents, the free fair-use subset of the La Thuile FLUXNET data set (which has been harmonized, standardized, and gap filled) contains only 154 stations, of which 47 were deemed useful for the validation presented here, based on quality assessment (e.g., closure of the energy budget) and record length.Data availability and accessibility challenges need to be at the top of the agendas of the world's major space agencies (ESA, NASA, JAXA), international data programs such as the Global Climate Observing System (GCOS), the GEWEX project of WCRP, Global Earth Observing System of Systems (GEOSS), and international agencies like the World Meteorological Organization.The "standard" statements and claims about "free and open access" to climate data from these programs have not resulted in improved access.If the needed improvements to CDR are to occur, and must occur to better assess the impacts from global environmental change, improved in situ data archiving and access by the scientific community are imperative for a more accurate analysis of climate variability and change.
The CDR developed in this study -the global terrestrial water budget at 0.5 • monthly for 1984-2010is currently archived on our public server, available at http://stream.princeton.edu:8080/opendap/MEaSUREs/WC_MULTISOURCES_WB_050/, and will be formally archived at the NASA Goddard Earth Science Data and Information Services Center (GES DISC) for the future use of climate and water management communities, and will advance our understanding climate variability and trends at multiple spatial scales.
As the authors are aware, essential directions in global water and energy cycle research are towards improved understanding historical climate, benchmarking future climate predictions, validating models, and improving the understanding of the interactions among land, ocean, and atmosphere hydrospheres.Future work will be targeted at extending the data sets to even longer periods, and at finer resolutions, by combining upcoming new satellite missions with the analysis and predictions from more advanced modeling systems.
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Observations and modeling of land surface water and energy exchanges across scales: special issue in Honor of Eric F. Wood".It is a result of the Symposium in Honor of Eric F. Wood: Observations and Modeling across Scales, Princeton, New Jersey, USA, 2-3 June 2016.

Figure 2 .
Figure 2. Seasonal cycles of precipitation from different products over the six continents for 1998-2010 (CHIRPS and CSU only cover the region between 50 • N and 50 • S); therefore, only the grids between 50 • N and 50 • S are counted in the calculation of the seasonal cycle; the coefficient of variance (CV, %) is calculated as the standard deviation divided by the ensemble mean of all the products (the same for Figs.3-9).

Figure 3 .
Figure 3. Seasonal cycles of precipitation from different products over 12 representative large basins for 1998-2010.(CHIRPS and CSU only cover the region between 50 • N and 50 • S. For those basins either outside or across 50 • N-50 • S, only PGF and GPCC are visualized.) two reanalysis ET products are from ERA-Interim(Simmons et al., 2006) and NASA's Modern-Era Retrospective Analysis for Research and Application (MERRA; Rienecker et al., 2011).The LSM ET data sets are from the Variable Infiltration Capacity model (VIC v4.0.6),CLM v3.5, and NOAH v3.4 forced by an updated version of PGF.Please refer to Supplement I for more information.The 10 ET products show less consistency in the seasonal cycle (Figs.4 and 5) than the precipitation data sets (Figs. 2

Figure 4 .
Figure 4. Seasonal cycles of evapotranspiration from different products over the six continents for 1984-2007 (Greenland is excluded for North America, as is true for Figs.6 and 8).

Figure 5 .
Figure 5. Seasonal cycles of evapotranspiration from different products over 12 representative large basins for 1984-2007.

Figure 6 .
Figure 6.Seasonal cycles of runoff from different products over the six continents for 1984-2010.

Figure 7 .
Figure 7. Seasonal cycles of runoff from different products over 12 representative large basins for 1984-2010.

Figure 8 .
Figure 8. Seasonal cycles of runoff from VIC, CLM, NOAH, and MEaSUREs against GRDC runoff observations over 26 large basins for different periods according to in situ data availability.

Figure 9 .
Figure 9. Seasonal cycles of TWSC from different products over the six continents for 2003-2010.TWSC is first normalized and then the CV (%) is calculated.The same is true for Fig. 10.

Figure 10 .
Figure 10.Seasonal cycle of TWSC from different products over 12 representative large basins for 2003-2010.

Figure 11 .
Figure 11.The flowchart describes the progress of data preprocessing, error analysis, water balance constraint, and multi-scale water budget analysis.

Figure 12 .
Figure12.Monthly mean (mm month −1 ) of different water budget terms (from the first row to the bottom: precipitation, evapotranspiration, runoff, total water storage change, imbalance) before water balance constraint throughout different periods (from the left to the right:1984- 1997, 1998-2002, 2003-2007, 2008-2010, and 1984-2010; the numbers listed on each subpanel are the monthly mean value for each merged water budget variable before water balance constraint (mm month −1 ) during different subperiods (Greenland and Antarctica excluded).This is the same as in Figs.S4 and S5 in Supplement II but for the merged water budget variable after water balance constraint (mm month −1 ) and the non-closure error attributions to each water budget variable (%) ).

Figure 13 .
Figure 13.Unconstrained (a, c, e) and constrained (b, d) water budget estimates (mm month −1 ) over the Amazon River basin.The top, middle, and bottom rows show the time series of water budget in terms of fluxes (precipitation, evapotranspiration, and runoff), TWSC, and imbalance.The imbalance/non-closure error after water budget constraint equals to zero and the imbalance/non-closure attributions to each water budget variables throughout different subperiods are shown in panel (f).

Figure 14 .
Figure 14.(a) Correlation coefficient (CC) between monthly GRDC runoff observations and MEaSUREs runoff estimates for 165 medium basins; panel (b) is the same as (a) but for 862 small basins; (c) monthly mean of MEaSUREs runoff estimates against GRDC runoff observations for medium basins; panel (d) is the same as (c) but for small basins.
in Supplement II, as an example, provides the interannual variability of the available water (P − ET) over the globe during the CDR period 1984-2010.

Table 1 .
The Community Land Model Y. Zhang et al.: A Climate Data Record

Table 1 .
Summary of the gridded data used in this study.(The study period is 1984-2010; CLM and NOAH, written in bold, are analyzed but not merged into the final water budget CDR in this study.)

Table 2 .
Data sources of merged water budgets with their averaged merging weights in brackets throughout different subperiods (TWSC from GRACE in 2002 is incomplete, so GRACE for 2002 is excluded; the spatial maps of merging weights over the globe can be found in Figs.S1 to S3 in the Supplement II).

Table 4 .
The summary table of annual mean water budgets (mm year −1 ) before and after water balance constraint and their corresponding attributions (%) to the non-closure error at both continental and basin scales (Greenland is excluded for North America).

Table 5 .
Flux tower information list.From left to right: the station name, available data time span, latitude, longitude, and IGBP land cover type