Modeling the water budget of the Upper Blue Nile basin using the JGrass-NewAge model system and satellite data

The Upper Blue Nile basin is one of the most datascarce regions in developing countries, and hence the hydrological information required for informed decision making in water resource management is limited. The hydrological complexity of the basin, tied with the lack of hydrometeorological data, means that most hydrological studies in the region are either restricted to small subbasins where there are relatively better hydrometeorological data available, or on the whole-basin scale but at very coarse timescales and spatial resolutions. In this study we develop a methodology that can improve the state of the art by using available, but sparse, hydrometeorological data and satellite products to obtain the estimates of all the components of the hydrological cycle (precipitation, evapotranspiration, discharge, and storage). To obtain the water-budget closure, we use the JGrass-NewAge system and various remote sensing products. The satellite product SM2R-CCI is used for obtaining the rainfall inputs, SAF EUMETSAT for cloud cover fraction for proper net radiation estimation, GLEAM for comparison with NewAge-estimated evapotranspiration, and GRACE gravimetry data for comparison of the total water storage amounts available in the whole basin. Results are obtained at daily time steps for the period 1994–2009 (16 years), and they can be used as a reference for any water resource development activities in the region. The overall water-budget analysis shows that precipitation of the basin is 1360± 230 mm per year. Evapotranspiration accounts for 56 % of the annual water budget, runoff is 33 %, storage varies from −10 to +17 % of the water budget.


Introduction
Freshwater is a scarce resource in many regions of the world, and the problem continues to be aggravated by growing populations and significant increases in demand for agricultural and industrial purposes. The Nile River basin is one such region, with relatively arid climate because of high temperatures and solar radiation, which foster rapid evapotranspiration. Most of the countries within the basin, such as Egypt, Sudan, South Sudan, Kenya, and Tanzania, receive insufficient freshwater (Pimentel et al., 2004). Exceptions to this are the small areas at the Equator and the Upper Blue Nile (hereafter UBN) basin in the Ethiopian highlands, which receives up to 2000 mm of precipitation per year (Johnston and McCartney, 2010). The UBN basin is the main source of water in the region.
In Ethiopia, UBN is inhabited by 20 million people whose main livelihood is subsistence agriculture (Population Census Commission, 2008). The Ethiopian government, therefore, has started many water resource development projects, such as irrigation schemes and dams, including the Grand Ethiopia Renaissance Dam (GERD), which, upon completion, will be one of the largest dams in Africa. However, as the principal contributor (i.e., 51 % of discharge) to the main Nile Basin, UBN also supports hundreds of millions of people living downstream, and it is referred to as the "Water Tower" of northeastern Africa. Therefore UBN is a part of transboundary river, and its development and management require obtaining agreements between many national governments and also non-governmental organizations, each in-volving different policies, legal regimes, and contrasting interests. Tackling all these complexities and developing better water resource development strategies is only possible by gathering quantitative information (Hall et al., 2014). Understanding the hydrological processes of UBN, therefore, is the basis for both the transboundary negotiations about sharing the water resources and for assessing the sustainability of farming systems in the region. In fact, because of the lack of hydrometeorological data and a proper modeling framework, the recent modeling efforts conducted within the basin have evident limitations in addressing these problems. Studies in the region are limited to small basins, particularly within the Lake Tana basin where there are relatively better hydrometeorological data (Rientjes et al., 2011;Uhlenbrook et al., 2010;Tekleab et al., 2011;Wale et al., 2009;Kebede et al., 2006;Bewket and Sterk, 2005;Steenhuis et al., 2009;Conway, 1997;Mishra et al., 2004;Mishra and Hata, 2006;Teferi et al., 2010), or on the whole-basin scale, in which case, however, information on spatial variability is usually ignored Kim and Kaluarachchi, 2009;Gebremicael et al., 2013;Tekleab et al., 2011). Other studies are limited to a specific hydrological process (e.g., rainfall variability (Block and Rajagopalan, 2007;Abtew et al., 2009) and evapotranspiration (Allam et al., 2016) and statistical analysis of in situ discharge and rainfall data (Teferi et al., 2010;Taye and Willems, 2011)) or perform modeling at very low temporal resolutions (e.g., monthly; Tekleab et al., 2011). Spatially distributed information on all the components of the water budget does not exist, and basin-modeling approaches that are tailored to a single component do not provide an effective picture of the dynamics of the water resources within the basin.
To overcome data scarcity, large-scale hydrological modeling can be supported by remote sensing (RS) products, which fill the data gaps in water balance dynamics estimation (Sheffield et al., 2012). For instance, a considerable amount of research has been carried out in the last 2 decades in developing satellite rainfall estimations procedures (Hong et al., 2006;Bellerby, 2007;Huffman et al., 2007;Kummerow et al., 1998;Joyce et al., 2004;Sorooshian et al., 2000;Brocca et al., 2014). RS is also a viable option to fill the gaps for basin-scale evapotranspiration estimation. Global satellite evapotranspiration products have been available by applying energy balance and empirical models to satellite-derived surface radiation, meteorology, and vegetation characteristics, and they are recognized to have a certain degree of reliability (e.g., Fisher et al., 2008;Mu et al., 2007;Sheffield et al., 2010). Basin-scale storage estimation is the most difficult task. Fortunately, the Gravity Recovery and Climate Experiment (GRACE) (Landerer and Swenson, 2012) came to fill this gap (e.g., Han et al., 2009;Muskett and Romanovsky, 2009;Rodell et al., 2007;Syed et al., 2008;Rodell et al., 2004). Guntner (2008), Ramillien et al. (2008) and Jiang et al. (2014) reviewed the use of GRACE data and positively recommended it for large-scale water-budget mod-eling. At the moment, satellite-based retrievals of discharge are not available as operational or research products, but potentially data can be retrieved from satellite altimetry and multispectral sensors (e.g., Tarpanelli et al., 2015;Van Dijk et al., 2016). Moreover, the Surface Water Ocean Topography (SWOT, Durand et al., 2010) mission, which is expected to be launched in 2020, will provide river elevation (with an accuracy of 10 cm), slope (with an accuracy of 1 cm/1 km) and width that can be used in estimating river discharge (Paiva et al., 2015;Pavelsky et al., 2014). Notwithstanding the availability of these RS products at various (spatial and temporal) resolutions and accuracy, their use is clearly a new paradigm in water-budget closure estimations (Sheffield et al., 2009;Andrew et al., 2014;Sahoo et al., 2011;Gao et al., 2010;Wang et al., 2014).
This study is an effort to contribute to answering the quantitative issues related to the aforementioned management problems by estimating the components of the water budget of the UBN basin using a new hydrological modeling framework (see Sect. 3.1) and remote sensing data. It obtains, on relatively small spatial scales and at daily time steps, groundwater storage, evapotranspiration, and discharges in such a way as to satisfy the water-budget equation. It is also a methodological paper, in that it delineates various methodologies to overcome the data scarcity. The paper is organized as follows: firstly, descriptions of the study area are given (Sect. 2), then the methodologies for each water-budget component and the model set-up are detailed in Sect. 3. The results and discussions of each component and the water budget are presented in Sect. 4. Finally, the conclusions of the study are given (Sect. 5).

The study basin
The Upper Blue Nile river originates at Lake Tana at Bahir Dar, flowing southeast through a series of cataracts. After about 150 km, the river enters to a deep canyon and changes direction to the south. After flowing for another 120 km, the river again changes its direction to the west and northwest, towards the El Diem (Ethiopia-Sudan border). Many tributaries draining from many parts of the Ethiopian highlands join the main river along its course. The total length of the river within Ethiopia is about 1000 km.
The UBN basin represents up to 60 % of the Ethiopian highlands' contribution to the Nile River flow, which is itself 85 % of the total (Abu-Zeid and Biswas, 1996;Conway, 2000). The area of the river basin enclosed by a section at the Ethiopia-Sudan border is about 175 315 km 2 (Fig. 1), covering about 17 % of the total area of Ethiopia. The large-scale hydrological behavior of the basin is described in a series of studies (Conway, 1997(Conway, , 2000(Conway, , 2005Conway and Hulme, 1993). Specifically, its hydrological behavior is characterized by high spatiotemporal variability. Since the UBN basin gives the largest contribution to the total Nile flow, it is the Numbers inside the circles designate the river gauging stations whose names are provided in Table 4. Subbasin partitions and meteorological stations used for simulation (b). economic mainstay of downstream countries (i.e., Sudan and Egypt). Moreover, the Ethiopian highlands are highly populated and have high water demands for irrigation and domestic uses on their own.
The maps of elevation of the basin are shown in Fig. 1. The topography of UBN is very complex, with elevation ranging from 500 m in the lowlands at the Sudan border to 4160 m in the upper parts of the basin. Due to the topographic variations, the climate of the basin varies from cool (in the highlands) to hot (in the lowlands), with large variations in a limited elevation range. The hot season is from March to May, and the wet season, with lower temperatures, is from June to September, while the dry season runs from October to February. The mean annual rainfall and potential evapotranspiration of the UBN basin are estimated to be in the ranges of 1200-1600 and 1000-1800 mm yr −1 , respectively (Conway, 1997(Conway, , 2000, with high spatiotemporal variability. The annual temperature mean is 18.5 • C, with small seasonal variability.

Methodology
Water-budget simulation is essential to the estimation of both water storage and water fluxes (rate of flow) for given, appropriate control volumes and time periods. It is given by the following: where J (t) is rainfall, and ET(t) is actual evapotranspiration, Q(t) is discharge, and Q ki (t) are inflows from upstream HRUs. The index k = 1, 2, 3. . . is the control volume where the water budget is solved. In our case, the control volume is a portion of the basin (a subbasin) derived from topographic partitioning as described in Sect. 3.1.

JGrass-NewAge system set-up
UBN water budget is estimated using the JGrass-NewAge hydrological system, which is, in turn, based on the Object Modelling System framework (David et al., 2013). It is a set of modeling components, reported in Table 1, that can be connected at runtime to create various modeling solutions. Each component is presented in detail and tested against measured data in the corresponding papers cited in the Table 1. A similar study using the JGrass-NewAge system, but utilizing mostly in situ observations, has been conducted in Posina River basin in northeastern Italy . A brief description of the components used in this study are provided in the following sections. In this study, the shortwave solar radiation budget component (Sect. 3.3), the evapotranspiration component (Priestley and Taylor,Sect. 3.3), the Adige rainfall-runoff model (Sect. 3.4), and all the components illustrated in Fig. 2 are used to estimate the various hydrological flows. A necessary step for spatial hydrological modeling is the partitioning of the topographic information into an appropriate spatial scale. The SRTM 90 m × 90 m elevation data are used to generate the basin Geographic Information System (GIS) representation. The basin topographic representation in GIS, as detailed in Abera et al. (2014) and Formetta et al. (2011), is based on the Pfafstetter enumeration. The basin is subdivided in hydrologic response units Table 1. JGrass-NewAge system components and respective references. The components in bold are the ones used in this study.

Role
Component name Description

Basin partitioning GIS spatial toolbox and Horton Machine
A GIS spatial toolbox that uses DEM to extract basin, hillslopes, and channel links for NewAge-JGrass set-up (Formetta et al., 2014a;Abera et al., 2014).
(HRUs), where the model inputs (i.e., meteorological forcing data), and hydrological processes and outputs (i.e., evapotranspiration, discharge, shortwave solar radiation) are averaged (Formetta et al., 2014a). A routing scheme, which is applied to move the discharges from HRUs to the basin outlet through the channel network, is included in the Adige component.
In this study, the UBN basin is divided into 402 subbasins (HRUs of mean area of 430 ± 339 km 2 ) and channel links, as shown in Fig. 1b. This spatial partitioning may not be the finest scale possible, but it is consistent with input data resolution, including satellite products, meaning that a finer subdivision would imply uniform inputs for adjacent HRUs, and a coarser one would average out input variability. In this paper, the term HRU actually refers to the different subbasins.

Precipitation (J (t))
The spatiotemporal precipitation input term of Eq. (1), J (t), is quantified with RS-based approaches. Currently, there are several satellite rainfall estimates (SREs) available for free, and Abera et al. (2016) compared five of them with high spatial and temporal resolutions over the same basin. It was shown that SM2R-CCI (Brocca et al., 2013(Brocca et al., , 2014 is one of the best products, particularly in capturing the total rainfall volume. With regard to the quality of SM2RAIN-based products, recent studies positively assessed their accuracy on a regional Ciabatta et al., 2017) and a global (Koster et al., 2016) scale. A comparative analysis of the effects of different SREs on basin water-budget components is an interesting area of research; however, here SM2R-CCI is only used for obtaining the precipitation input. The systematic error (bias) of SM2R-CCI is removed accord-ing to the "ecdf" matching techniques by Michelangeli et al. (2009) and specialized for UBN by Abera et al. (2016) by using in situ observations. The subbasin mean precipitation is estimated by averaging all the pixels of RS-corrected data within each subbasin. In accordance with the basin partition described in Sect. 3.1, the 1994-2009 daily precipitation set is generated for each of the 402 subbasins.

Evapotranspiration (ET)
Evapotranspiration estimation is crucial for agricultural and water resource management as it is an important flux within a basin. The lack of in situ data relating to ET impedes modeling efforts and makes it probably the most difficult task in water-budget assessment. Here, ET is estimated according to the NewAge specific component. It provides estimates at any temporal and spatial resolution required by using the Priestley and Taylor (PT) formula (Priestley and Taylor, 1972), which is one of the more common models used. PT is mainly based on net radiation estimation, R n , grouping all the unknowns into the α PT coefficient, as shown in Eq. (2): where PET is Priestley-Taylor potential evapotranspiration, is the slope of the Clausius-Clapeyron relation and γ is the psychometric constant (Brutsaert, 2005). In this study, however, we need an estimate of the actual evapotranspiration, which is constrained not only by the atmospheric demands as in Eq. (2), but it uses storage information which can be obtained from the ADIGE rainfall-runoff component of JGrass-NewAge. Hence, the ET equation is modified as Figure 2. Workflow with a list of NewAge components (in white) and remote sensing data processing parts (shaded in grey, not yet included in JGrass-NewAGE and currently performed with R tools) used to derive the water budget of the UBN. It does not include the components used for the validation and verification processes.
follows : where S(t) is the subsurface storage, and S max is the maximum storage capacity for each HRU. The important unknown coefficient α PT (Pejam et al., 2006;Assouline et al., 2016) and the S max are calibrated within the rainfall-runoff model component, as explained below. The ratio S(t)/S max represents a stress coefficient which became very popular since the work of Feddes et al. (2001). In our procedure, given that S(t) is not measured, the assumption that there is null water storage difference after a long time, named Budyko's time, T B , (Budyko, 1978), is required. So, here, what is searched is a time duration (T B ) such that the water storage again assumes its initial value . Once T B is fixed, the tools for automatic calibration, provided by the Object Modelling System, produce the set of parameters in Table 4, including α PT and S max , for which discharge is well reproduced and is also S(T B ) = S(0). In this study, T B = 6 years.
In Eq.
(3), R n is the main input modulating the atmospheric demand component of ET. To this scope, the NewAge shortwave radiation budget component (SWRB; Formetta et al., 2013) is used to return a value for each sub-basin in clear sky conditions. Irradiance in clear sky conditions, however, is unsuitable for all sky conditions since surface shortwave radiation is strongly affected by cloud cover and cloud type (Arking, 1991;Kjaersgaard et al., 2009). Therefore, the clear sky SWRB estimated using NewAge-SWRB is modified by using the cloud fractional cover (CFC) satellite data set (Karlsson et al., 2013), processed and provided by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF) project (Schulz et al., 2009). In this case, net radiation is generated only from the shortwave radiation and the cloud cover data, as in the following formulation (Kim and Hogue, 2008): where R S is the net shortwave radiation and R n is the net radiation. The daily CFC data originates from polar orbiting satellites, version CDRV001, using a daily temporal resolution and a 0.25 • spatial resolution, from 1994 to 2009 (16 years). Satellite data are processed (Karlsson et al., 2013) to obtain the mean daily CFC for each subbasin. In comparison to CFC, the effects of surface albedo on R n is minimal, particularly in highland areas with vegetation and no snow cover, such as the UBN basin. Once ET is estimated according to the methods described, it is useful to compare it with independently obtained ET estimates or data. In situ ET observations are not present for this basin, as is the case for most regions. Estimates of ET based on RS have been made available by different algorithms (Norman et al., 1995;Mu et al., 2007;Jarmain et al., 2009;Fisher et al., 2008). In this study, the Global Land Evaporation Amsterdam Methodology (GLEAM, ver-sion_v3_BETA; Miralles et al., 2011a;Martens et al., 2017), a global, satellite-based, ET data set is used. GLEAM, as well NewAge, uses the PT scheme for estimating ET. However, all inputs of the formula, in GLEAM and NewAge, are evaluated according to different strategies and RS tools. GLEAM sets α PT = 0.98 while in NewAge it has been calibrated. In GLEAM PET is additively increased by intercepted rainfall estimated according to a version of the Gash model (Gash, 1979), and multiplicatively decreased by a stress coefficient depending on five soil cover types (bare soil, snow, tall vegetation, two levels of low vegetation) and has a different expression for any one of the storages. Moreover, depending on the case, the stress coefficient is evaluated through various RS products, according to procedures which are described in the paper by Martens et al. (2017). In contrast to the NewAge approach, GLEAM also considers dynamic vegetation information to estimate the stress factor (Miralles et al., 2011a).
GLEAM is available at 0.25 • spatial resolution (28 km laterally or 800 square kilometers of area) and daily temporal resolution, and is assessed positively in different studies (Mc-Cabe et al., 2016;Miralles et al., 2011b). The most recent version of GLEAM was validated globally over 64 Fluxnet sites (Martens et al., 2017) with consistent results, letting us assume that it behaves properly also in Ethiopia. The differences between NewAGE estimation and GLEAM's one allow us to assume that the our results and their results can be seen as largely independent. For comparison with NewAge ET, we averaged GLEAM ET for each HRU polygon. Comparison of the NewAge ET with MODIS-standard ET product is also available in the Supplement of the paper.

Discharge (Q)
For discharge estimation, the ADIGE rainfall-runoff component is used. It is based on the well-known HYMOD model (Moore, 1985) as a runoff production component which also include a routing component and artificial inflowoutflow management. Detailed descriptions of HYMOD implementations in the NewAge model system are given in Formetta et al. (2011) and  and summarized in Appendix A. The main inputs for the ADIGE model are J (t) and PET(t), as estimated in the previous sections. The NewAge Hymod component is applied to each HRU, in which the basin is subdivided and the total watershed discharge is the sum of the contribution of each of the 402 HRUs routed to the outlet. The ADIGE rainfallrunoff has five calibration parameters (C max , B exp , α Hymod , R s , and R q ; see the details in Appendix A), and the calibration is performed using the particle swarm (PS) optimiza-tion method. PS is a population-based stochastic optimization technique (Kennedy and Eberhart, 1995). The objective function used to estimate the optimal value of the parameter is the Kling-Gupta efficiency (KGE, Kling et al., 2012). The KGE is preferred to the commonly used Nash-Sutcliffe efficiency (NSE, Nash and Sutcliffe, 1970) because the NSE has been criticized for its overestimation of model skill for highly seasonal variables by underestimating flow variability (Schaefli and Gupta, 2007;Gupta et al., 2009). For evaluation of the model performances, in addition to the KGE, the two other goodness-of-fit methods (percentage bias (PBIAS) and correlation coefficient) used in this study are described in Appendix B.

Total water storage change (ds / dt)
The ds / dt in Eq. (1) is the water contained in the ground, soil, snow and ice, lakes and rivers, and biomass. It is the total water storage (TWS) change, calculated as the residuals of the water-budget fluxes for each control volume. In this paper, the ds / dt estimation at daily time steps is based on the interplay of all the other components, as presented in Eq. (1). There is no way to estimate areal TWS from in situ observations. The new GRACE data (Landerer and Swenson, 2012) have a potential to estimate this component, but at very low spatial and temporal resolutions. On a large scale, however, it can still be used for constraining and validating data of the modeling solutions. Here, the performance of our modeling approach to close the water budget is assessed using the GRACE estimation on the basin scale. Monthly GRACE data are obtained from NASA's Jet Propulsion Laboratory (JPL) ftp://podaac-ftp.jpl.nasa.gov/allData/tellus/L3/ landmass/RL05. The leakage errors and scaling factor (Landerer and Swenson, 2012) that are provided with the product are applied to improve the data before the comparison is made. The total error of GRACE estimation is a combination of GRACE measurement and leakage errors (Billah et al., 2015). Based on the data of these two error types, the mean monthly error of GRACE in estimating total water storage change (TWSC) in the basin is about 8.2 mm. Since the other fluxes, for instance Q and ET, are modeled as functions of basin water storage, the good estimation of water storage by a model affects the goodness of fit of all the other fluxes as well (Döll et al., 2014).

Calibration and validation approach
The satellite precipitation data set (SM2R-CCI) is error corrected based on in situ observations. At the basin outlet (Ethiopia-Sudan border), the ADIGE rainfall-runoff component (i.e., HYMOD model) is calibrated to fit the observed discharge during the 6 years of the calibration period (1994)(1995)(1996)(1997)(1998)(1999) at daily time steps. Based on the approach described in Sect. 3.2, α PT is calibrated by imposing that S(0) = S(T B ) after T B = 6 years. The value of 6 years is arbitrary but it was found to give good agreement with GRACE data (see below), so no other values were used. The simulation for each hydrological component is then verified using available in situ or remote sensing data (Table 2), and three goodness-of-fit methods (KGE, PBIAS, r) are used as comparative indices (for detailed information please see Appendix B), as follows: -Discharge validation: discharge simulation is validated at the outlet close to the Ethiopian-Sudan border, where the model is calibrated. In addition, the simulation of NewAge at the internal links is validated at 15 discharge measurement stations, where in situ data are available. The evaluations of discharges at the internal links provide an assessment of model estimation capacity at ungauged locations.
ds / dt validation: the water storage change, ds / dt, estimated as residual of the water budget, is validated against the GRACE-based data set. To harmonize and enable comparison between the model and the GRACE TWS data, it is necessary to do both temporal and spatial filtering. Following the GRACE TWSC temporal resolution, the model ds / dt is aggregated at monthly time steps and on the whole-basin scale.

Results and discussion
The results of the study are organized as follows: firstly, we present the results for (1) precipitation, (2) evapotranspiration, (3) discharge and (4) total water storage; secondly, the JGrass-NewAGE system is used to resolve the water-budget closure at each subbasin, and the contribution of each waterbudget term is further is analyzed.

Precipitation (J )
The spatial distribution of mean, long-term, annual precipitation is presented in Fig. 3a. Generally, precipitation increases from the east (about 1000 mm yr −1 ) to the south and southwest (1800 mm yr −1 ). This spatial pattern is consistent with the results of Mellander et al. (2013) and Abtew et al. (2009). SM2R-CCI shows that the southern and southwestern parts of the basin receive higher precipitation than the eastern and northeastern parts of the highlands. The rainiest subbasins are in the southern part of the basin. For this location, the precipitation data used correspond to a mean annual rainfall of about 1900 mm yr −1 , while the mean annual precipitation reported for this region by Abtew et al. (2009) is about 2049 mm yr −1 . The latter estimation, however, is from point gauge data, while this study is based on areal data. To understand the spatial distribution of the seasonal cycle, the quarterly percentage of total annual precipitation, calculated from 1994 to 2009 in daily estimations, is presented in Fig. 3b. During the summer season (June, July, and August), while the subbasins in the north and northeast receive about 65 % of the annual precipitation (Fig. 3b), the subbasins in the south receive about 40 % of total precipitation.

Evapotranspiration (ET)
ET is estimated for each subbasin at daily time steps. In this section we mainly provide discussion about the comparison of NewAge ET and RS estimates, but further comments on ET can be found in Sect. 4.5, which show that ET is more water-limited than energy-limited. Figure 4a shows the comparisons of the ET time series from 1994 to 2002 (aggregated at daily, weekly, and monthly intervals, from top to bottom) between NewAge and GLEAM. The figure specifically refers to three selected subbasins representing different ranges of elevations and spatial locations. NewAge estimates have higher temporal variability in comparison to GLEAM. In the represented locations, GLEAM therefore accumulates a systematic growing difference in water-volume evapotranspiration, which could be not consistent with the estimated storage (see below). The agreement or disagreement between the two ET estimations vary from subbasin to subbasin (Fig. 4). The spatial distribution correlation and PBIAS between the NewAge and GLEAM ET is presented in Fig. 4b. Spatially, the correlation between JGrass-NewAGE and GLEAM is higher in the eastern and central parts of the basin, while it tends to decrease systematically towards the west (i.e., to the lowlands, see Fig. 4b). The correlation between the two ET estimations increases when passing from daily to monthly time steps. The PBIAS between the two estimates ranges from −10 to 10 %, with large numbers of subbasin being from −3 to 3 %. Spatially, the comparison shows that GLEAM overestimates ET in the western parts of the basin (border to the Sudan) and underestimates ET in the northern parts of the basin (Fig. 4b). The overall basin correlation is 0.34 ± 0.07 (daily time step), 0.51 ± 0.08 (weekly time step), and 0.57 ± 0.10 (monthly time steps). Generally, except at daily time steps, the two estimates have acceptable agreements (very low bias and acceptable correlation). In comparison with the correlation (0.48 ± 0.15) and PBIAS (14.5 ± 18.9 %) obtained between NewAge ET and MODIS ET Product (MOD16), as shown in the Supplement, the correlation and PBIAS between NewAge ET and GLEAM ET is much better.

Discharge (Q)
The optimized parameters of the Adige model, obtained using automatic calibration procedure of NewAge, are given at Table 3. At the basin outlet, the automatic calibration of the NewAge components provided very good values of the goodness-of-fit indices (KGE = 0.93, PBIAS = 2.2, r = 0.94). The performances at the outlet also remain high during the validation period, having KGE = 0.92, PBIAS = 2.4, and r = 0.93. Model performances are also evaluated within the basin at the internal-catchment outlets (Table 4) where stage measurements are available. Figure 5 shows simulated hydrographs along with the observed discharges for some locations. The results show that the performances of the NewAge simulation are a little better than the performances reported by Mengistu and Sorteberg (2012), with slightly lower PBIAS value (PBIAS = 8.2, r = 0.95). Generally, the model predicts both the high flows and low flows well, with slight underestimation of peak flows (Fig. 5a). This is likely due to the underestimation of SM2R-CCI precipitation data for high rainfall intensities (Abera et al., 2016). An additional source of error can also be caused by model inconsistency due to averaging out input data over large areas or from some inadequacy in stagedischarge curves used to obtain discharges from water levels. The slight underestimation of runoff could result from the overestimation of evapotranspiration. However, in this case, GLEAM (or MODIS) would cause larger discrepancies. Regarding the internal-site discharge simulation, we highlight some representative results. The hydrograph comparison between the NewAge simulated discharge and the observed one of the Gelgel Beles River, enclosed at the bridge near to Mandura with an area of 675 km 2 , is shown in Fig. 5b. The performance of the uncalibrated NewAge at Gelegel Beles has a correlation coefficient of 0.70, PBIAS is 11.40 % and the KGE value of 0.68 (Table 4).
For most subbasins, because of the good model performances (i.e., KGE is higher than 0.5 and PBIAS is within 20 %), the estimated discharges are deemed adequate for estimating water resource at locations where gauges are unavail- able. The model is also able to reproduce discharge across the range of scales. For instance, the model performances at the Ethiopia-Sudan border (175 315 km 2 ), Dedisa near Arjo (9981 km 2 ), main Beles (3431 km 2 ), and Temcha near Dembecha (406 km 2 ) are also acceptable. An exception is Lake Tana, where the discharge is regulated (Fig. 5 and Table 4). Model performance varies with basins and a consistent behavior with respect to basin size, climate, vegetation density and topographic complexity is not found. Indeed, there are many factors that affect the model performance, including uncertainties in input observations. Sample simulations at all the channel links of the study basin at daily time steps are provided in the Supplement .

Total water storage change
NewAge-simulated ds / dt for 16 years for each subbasin is calculated as a residual of the flux terms. The simulated ds / dt is represented and compared with the GRACEbased TWSC in Fig. 6. The storage change shows high seasonality over the basin, with positive change in summer and negative change in winter. The change varies from −100 to +120 mm month −1 . The model ds / dt, aggregated at monthly timescales and for the whole basin, is in accordance with the GRACE TWSC both in temporal pattern and amplitude. The good correlation coefficient of 0.84 and the general good performances of the ds / dt component is certainly caused also by the ability of NewAge to reproduce the other water fluxes well. Due to the possible high leakage error introduced in GRACE TWSC at high spatial resolutions (Swenson and Wahr, 2006), statistical comparison at subbasin level is not performed. However, the spatial dis-tribution of NewAge and GRACE ds / dt estimates can be found in the Supplement.

Water-budget closure
The water-budget components (J , ET, Q, and ds / dt) of 402 subbasins of the UBN are simulated for the period 1994-2009 at daily time steps. Figure 7 shows the long-term, monthly-mean, water-budget closure derived from 1994 to 2009. The 4 months (January, April, July, and October) are selected to show the four seasons (winter, spring, summer, and autumn). For all components, the mean seasonal variability is very high. Generally, the seasonal patterns of Q and ds / dt follow the J , showing the highest values in summer (i.e., July) and the lowest in winter (i.e., January). However, simulated ET shows distinct seasonal patterns with respect to the other components, the highest being during autumn (October), followed by winter (January). During the summer it is low, most likely due to high cloud cover. The variability between the subbasins is also appreciable. Generally, all water-budget components tend to increase from the east to the southwestern part of the basin, except for the summer season (July). During summer, however, the eastern part of the basin receives its highest rainfall, stores more water, and generates high runoff as well. In general the dominant budget component varies with months. For instance, in January ET is dominant while in June and July ds / dt is more dominant. After the summer season, Q and ET are the dominant fluxes. A regression analysis based on the results for all subbasins and all years shows that, on short timescales (e.g., daily or monthly), the variability in ET is not due to variability in J (R 2 = 0.01). Conversely, on the yearly timescale, 78 % of ET variance is explained by variability in J . The spatial variability of the long-term mean annual waterbudget closure is shown in Fig. 8. The spatial variability for J and Q is higher than ds / dt and ET. The higher Q and ET in the southern and southwestern part of the basin are due to higher J . Similarly, Q is lower in the eastern and northeastern part of the basin. In terms of the percentage share of the output term (Q, ET, ds / dt) of total J (Fig. 8c), ET dominates the water budget, followed by Q. It is noteworthy that the eastern subbasins with low ET still have percentage share of ET due to the low amount of J received. The long-term basin-average water-budget components show 1360 ± 230 mm of J , followed by 740 ± 87 mm of ET, 454 ± 160 mm of Q and −4 ± 63 mm of ds / dt. While the spatial variability of the water budget is high, the annual variability is rather limited. Higher annual variability is observed for J , followed by Q. 2001 and 2006 are wet years, characterized by high J and Q. Conversely, 2002 and 2009 are dry years with 1167 mm and 1215 mm per year of precipitation. Details on the two dry years (2002,2009) of the region can be read in Viste et al. (2013). Figure 10 provides long-term monthly-mean estimates of water-budget fluxes and storage. The average basin-scale budget is highly variable. The highest variability is mainly in J and ds / dt. During summer months, J , Q, and ds / dt show high magnitude. ET is not high in June, July, and August, but it is in October and December. The S(t) accumulated in the summer season feeds high ET in autumn, and causes very high drops in ds / dt (Fig. 10). The seasonal trend between J and ET is slightly out of phase, i.e., the highest energy  to evaporate water occurs during low precipitation months (March, April, and May). Due to this slight out-of-phase trend, ET is minimal and Q and ds / dt are enhanced during wet months (Fig. 10), thus revealing that ET is water-limited more than energy-limited. The same Figure also shows the complex interplay between discharges, (variation of) storages, and evapotranspiration. A first look at Figs. 4 and 5 could lead to the conclusion that overestimation of ET brings in underestimation of Q. However, Fig. 10 shows that the role of ds / dt is not negligible at all.

Conclusions
The goal of this study is to estimate the whole water budget and its spatial and temporal variability of the upper Blue Nile basin using the JGrass-NewAge hydrological system and remote sensing data. The study covered 16 years from 1994 to 2009 at a finer spatial and temporal resolution than Figure 7. Spatial distribution of long-term mean monthly water budget (January, April, July, and October) in the UBN basin. For the sake of visibility, the legend is plotted separately and on logarithmic scale, except for the storage component.
in previous studies. In order to achieve this result, we used various remote sensing products, rainfall from SM2R-CCI, cloud cover from SAF EUMETSAT CFC, evapotranspiration from GLEAM and MODIS (used for comparison), and storage change from GRACE (also used for comparison). We also used all the ground data currently available, i.e., 16 discharge time series and 35 ground-based meteorological stations. The results can be summarized as follows: -The basin-scale annual precipitation over the basin is 1360 ± 230 mm yr −1 and highly variable spatially. The southern and southwestern parts of the basin receive the highest precipitation, which tends to decrease towards the eastern parts of the basin (Fig. 3).
-Generally, the interannual variability of ET is high, and tends to be higher in autumn and lower in summer. The average basin-scale ET is about 740 ± 87 mm yr −1 and is the larger flux in water budget in the basin.
-The comparison of simulated ET with the satellite product GLEAM shows that GLEAM has lower temporal variability than our estimates. The correlation between GLEAM ET and NewAge ET increases from daily time steps to monthly time steps, and spatially it is higher in the east and central parts of the basin. Comparison with MODIS products was also performed (reported in the Supplement). MODIS actually shows an even larger departure from JGrass-NewAge results. Both satellite products, however, seem to introduce a systematic bias  Figure 8. The spatial distributions of long-term mean annual waterbudget closure: precipitation in millimeters (Fig. 3), the output terms (Q, ET, ds / dt) in millimeters (a), and the percentage share of the output term (Q, ET, ds / dt) of the total precipitation (b). which would not allow the closure of the water budget according both simulated and measured discharges.
-The NewAge ADIGE rainfall-runoff component is able to reproduce discharge very well at the outlet (KGE = 0.92). The long-term annual runoff of the UBN basin is about 454 ± 160 mm yr −1 . The verification results at the internal sites where measurements are available reveal that the model can be used for forecasting at ungauged locations with some success. and Table 4) and often greatly improve previous results.
-The NewAge storage estimations and their space-time variability are effectively verified by the basin-scale GRACE TWSC data, which show high correlation and similar amplitude.
Despite the good results obtained, it is important to note that this study is limited by the lack of in situ ET observation and low-resolution GRACE data for confirmation of storage. To this end, the results of this study would benefit from basinspecific assessments of ET and ds / dt RS products based on ground measurements, as done in Abera et al. (2016) for precipitation. We claim that the procedure we followed can be easily replicated in any other poorly gauged basin, with benefits for the hydrological knowledge of any region on Earth.
Data availability. The forcing data used for NewAge simulation, SM2R-CCI, is obtained from http://hydrology.irpi.cnr.it/people/l. brocca; the rain gauge precipitation and hydrometer discharge data were obtained from the National Meteorological Agency and the Ministry of Water and Energy of Ethiopia, respectively, and they can be requested for research. The remote sensing data used for comparison, GLEAMS ET, MODIS ET, and GRACE TWSC, are freely available and can be downloaded at http://www.gleam. eu, http://www.ntsg.umt.edu/project/mod16 and ftp://podaac-ftp. jpl.nasa.gov/allData/tellus/L3/landmass/RL05 respectively. Modeling components used for the simulations are available and documented through the Geoframe blog http://geoframe.blogspot.com. Additional data (i.e., GIS database, topographic information, input data, and additional results) and other notes regarding the paper can be found at Zenodo: https://doi.org/10.5281/zenodo.264004 .
Appendix A: Hymod model in NewAge-JGrass system The NewAge system executes one Hymod model at each HRU and routes water downslope. Detailed description of the Hymod model is provided in many studies (Moore, 1985;Van Delft et al., 2009;Boyle et al., 2001;Formetta et al., 2011). In Hymod, each HRU is supposed to be a composition of storages of capability C (L) according to distribution (Moore, 1985): where F (C) represents the cumulative probability of a certain water storage capacity (C), C max is the largest water storage capacity within each hillslope, and B exp is the degree of variability in the storage capacity. As shown in the schematic diagram (Fig. A1), the precipitation exceeding C max is sent directly to the volume available for surface runoff. If we call the precipitation volume in a time interval t, J (t) := P (t) t, then this "direct" runoff can be estimated according to the following: where C(t) defines the fraction of storages already filled at time t. The latter equation is true for any precipitation and storage level, even when the maximum storage C max is not exceeded. When precipitation does not exceed C max , runoff volume can be produced by filling some of the smaller storages. The extent to which this happens can be derived by the knowledge of the storage distribution, Eq. (A1), the initial storage C(t), and the precipitation J (t). This residual runoff is, in fact, given by the following: An analytic expression for the integral in Eq. (A3) is available, which makes the computation easier. Water in storage is made available to evapotranspiration. Water going into the runoff volume, i.e., R(t) and R H (t), is further subdivided into a surface runoff volume and subsurface storm runoff. Surface runoff, in turn, is composed by the whole of R H (t) and part of R(t), and R(t) is split according to a partition coefficient α such that the part αR(t) goes into surface runoff volume and (1 − α) into the subsurface storm runoff volume. In Hymod, α is a calibration coefficient. Finally, surface runoff volumes are routed through three linear reservoirs, and subsurface storm runoff volume is routed through a single linear reservoir. A summary of equations for the surface runoff is therefore as follows: where S 1 (L 3 ) is the storage in the first of the linear reservoirs, and k (T) is the mean residence time in each of the reservoirs. Then, the following applies: for the other two reservoirs, where S i (L) with i = 2, 3 is the storage in the two remaining surface reservoirs. Subsurface storm runoff is then modeled by the following: where S sub (L 3 ) is the storage in the subsurface storm-flow system and k sub (T ) is its mean residence time. A waterbudget equation can be written for the groundwater system as follows: where S g (t) (L 3 ) is the groundwater storage, and Q g (t) the groundwater flow which becomes surface flow at the closure of the HRU. Summarizing, Hymod subdivides each HRU into three reservoirs: a groundwater reservoir (from where evapotranspiration and groundwater flow is allowed), a subsurface storm-water reservoir, and a surface runoff reservoirs set. Partition of precipitation into the three reservoirs is obtained by a calibration coefficient, α, and the use of a probability distribution function of storages' capacity, F (c).
whereas the positive value indicates the overestimation and negative values indicate model underestimation (Moriasi et al., 2007;Gupta et al., 1999).
The PBIAS value ranges from −20 to 20 % is considered good, and values between ±20 and ±40 % and those greater than ±40 % are considered satisfactory and unsatisfactory respectively (Stehr et al., 2008). Gupta et al. (2009) to provide a diagnostically interesting decomposition of the Nash-Sutcliffe efficiency (and hence MSE), which facilitates the analysis of the relative importance of its different components (correlation, bias, and variability) in the context of hydrological modeling. Kling et al. (2012) proposed a revised version of this index. It is given by the following:

Kling-Gupta efficiency (KGE) is developed by
where ED is the Euclidian distance from the ideal point, β is the ratio between the mean simulated and mean observed flows, r is the Pearson product-moment correlation coefficient, and v is the ratio between the observed (σ o ) and modeled (σ s ) standard deviations of the time series and takes account of the relative variability (Zambrano-Bigiarini, 2013). The KGE ranges from infinity to a perfect estimation of 1, but a performance above 0.75 and 0.5 is considered to be as good and intermediate, respectively (Thiemig et al., 2013).
3. Pearson correlation coefficient (r) -please refer to Moriasi et al. (2007). The correlation coefficient is best as much as it is close to 1.