Using satellite-based evapotranspiration estimates to improve the structure of a simple conceptual rainfall–runoff model

Daily, quasi-global (50 N–S and 180W–E), satellite-based estimates of actual evapotranspiration at 0.25 spatial resolution have recently become available, generated by the Global Land Evaporation Amsterdam Model (GLEAM). We investigate the use of these data to improve the performance of a simple lumped catchmentscale hydrologic model driven by satellite-based precipitation estimates to generate streamflow simulations for a poorly gauged basin in Africa. In one approach, we use GLEAM to constrain the evapotranspiration estimates generated by the model, thereby modifying daily water balance and improving model performance. In an alternative approach, we instead change the structure of the model to improve its ability to simulate actual evapotranspiration (as estimated by GLEAM). Finally, we test whether the GLEAM product is able to further improve the performance of the structurally modified model. Results indicate that while both approaches can provide improved simulations of streamflow, the second approach also improves the simulation of actual evapotranspiration significantly, which substantiates the importance of making “diagnostic structural improvements” to hydrologic models whenever possible.


Statement of the problem
As a primary mechanism in the surface-to-atmosphere portion of the water cycle, evapotranspiration (ET) plays a crucial role in the water and energy budgets of a hydrologic system. Although there are several different methods avail-able for estimating potential ET (see Penman, 1948;Thornthwaite, 1948;Monteith, 1965;Priestley and Taylor, 1972;Hargreaves and Samani, 1985;Shuttleworth, 1992;Allen et al., 1998), or pan evaporation (e.g., data-driven approaches by Bruton et al., 2000;Sudheer et al., 2002;Jain and Roy, 2017), the estimation of actual ET is not straightforward. In practice, actual ET can be derived from model simulations, remotely sensed observations of different variables, etc. The quality of a model-derived estimate of ET depends on various sources of uncertainty (inputs, parameters, process representation, structure, etc.) inherent to the model-based scheme used, and common problems include both over-and underestimation of evaporative fluxes (Trambauer et al., 2014). Recently, methods that use satellite-based remotely sensed climatic and environmental observations have provided an alternative approach to the estimation of ET (e.g., Bastiaanssen et al., 1998;Arboleda et al., 2005).
Several studies have advocated and/or implemented the idea of using physically consistent estimates for the parameters of hydrologic models (Pokhrel et al., 2008(Pokhrel et al., , 2012Savenije, 2010;Schaefli et al., 2011;Kumar et al., 2013;Troch et al., 2015, and references therein). However, in catchment-scale modeling, it is a common practice to use parameter estimates that are calibrated by adjusting the simulated streamflows to try to match the observed data. If due care is not taken during calibration, this approach can result in conceptually unrealistic estimates for the parameters. Such a result defeats an important purpose of using conceptual/physically based models (as opposed to empirical databased models), which is to help us better understand the dynamical behavior of the system. T. Roy et al.: Evapotranspiration estimates to improve the structure of a conceptual rainfall-runoff model In principle, the potential of such models can be better realized by incorporating more information about the physical system during model development. Such information can take various forms and be incorporated in different ways. Evapotranspiration (ET) can be used to constrain model parameters that are sensitive to the ET process (Winsemius et al., 2008;van Emmerik et al., 2015). Alternatively, ET can be used as a calibration target along with streamflow within a multi-objective setting (Zhang et al., 2009). There has also been a recent drive towards structurally flexible models that are able to both better characterize the uncertainty associated with model structure and use additional information to help reduce such uncertainty (Wagener et al., 2001;Marshall et al., 2006;Clark et al., 2008Clark et al., , 2015Savenije, 2010;Schaefli et al., 2011;Fenicia et al., 2008aFenicia et al., , b, 2011Bulygina and Gupta, 2009, 2010Martinez and Gupta, 2011;Nearing, 2013;Nearing and Gupta, 2015).
A variety of satellite-based remotely sensed estimates of daily precipitation have been available for some time (e.g., Hsu et al., 1997;Joyce et al., 2004;Huffman et al., 2007;Funk et al., 2014), making it possible to consider the model-based generation of streamflow simulations for ungauged locations. Recently, satellite-based remotely sensed estimates of daily ET have become available, based on a variety of different retrieval algorithms of varying complexity (e.g., Bastiaanssen et al., 1998;Arboleda et al., 2005;Miralles et al., 2011). Worldwide evaluations suggest that satellite-based ET estimates are strongly correlated with ground-based observations made at flux towers (Demaria and Serrat-Capdevila, 2016).
For this study, we use the Global Land Evaporation Amsterdam Model (GLEAM) as the source of the satellitebased ET (SET) data. In the GLEAM algorithm, ET is computed using only a small number of satellite-based inputs, which makes it particularly beneficial for application to sparsely gauged basins. Miralles et al. (2011) have shown that GLEAM estimates of evaporation are strongly correlated (0.80) with annual cumulative evaporation estimated via eddy covariance at 43 stations, and have a very low (−5 %) average bias. The correlations at individual stations are strong (0.83) for all vegetation and climate conditions, and improve to 0.9 for monthly time series (Miralles et al., 2011 reported satisfactory statistical performance (R 2 = 0.68; root mean square difference = 64 W m −2 ; Nash-Sutcliffe efficiency = 0.62) of GLEAM when compared against data from 45 globally distributed eddy-covariance stations. Michel et al. (2016) compared Priestley-Taylor Jet Propulsion Laboratory model (PT-JPL), Moderate Resolution Imaging Spectroradiometer evaporation product (PM-MOD), Surface Energy Balance System (SEBS), and GLEAM simulations against 22 FLUXNET tower-based flux observations and found GLEAM and PT-JPL to more closely match in situ observations for the selected towers and reference period (2005)(2006)(2007). Their extended analysis over 85 towers also had a similar overall out-come. Miralles et al. (2016) compared three process-based ET methods (PM-MOD, GLEAM and PT-JPL) for surface water balance from 837 globally distributed catchments, and reported that GLEAM and PT-JPL provide more realistic estimates of ET. They found these two products to provide superior overall performance for most ecosystem and climate regimes, whereas PM-MOD tends to underestimate the flux in the tropics and subtropics.
While previous studies have used SET estimates to constrain the parameters of hydrologic models (Winsemius et al., 2008;van Emmerik et al., 2015), the recent interest in diagnostic improvements to model structure Gupta and Nearing, 2014) suggests that it would be potentially more valuable to use the ET data to actually improve the model structure when possible. This study attempts to explore this possibility in the context of using satellite-based data to drive a streamflow simulation model for a poorly gauged basin in Africa.

Objectives and scope
In this study, we explore the use of the GLEAM daily SET product (Miralles et al., 2011;Martens et al., 2016) to improve the performance of a simple lumped catchment-scale hydrologic model driven by satellite-based precipitation estimates to generate streamflow simulations for a poorly gauged basin in Africa. We first use the GLEAM product to constrain the evapotranspiration estimates generated by the model, thereby improving the daily water balance. Next, we instead change the structure of the model to make it more physically consistent and improve its ability to simulate actual evapotranspiration (as estimated by GLEAM). Finally, we test whether the use of GLEAM SET can further improve the performance of the structurally modified model, and whether there is any decline in model performance if GLEAM SET data become unavailable.
2 Study area, data and methodology

Study area
This study is carried out for the Nyangores River basin, which is a sub-basin of the Mara River basin in Kenya and Tanzania (Fig. 1). The Nyangores River basin has an aerial coverage of 697 km 2 and is located at the northeastern side of the Mara River basin (location: 33 • 88 E, 35 • 90 E, 0 • 28 S, 1 • 97 S). The perennial Nyangores River originates from the Mau Escarpment (3000 m a.s.l.) fault scarp passing through the western side of the Great Rift Valley in Kenya. It then merges with the Amala River at the Napuiyapi swamp (2932 m a.s.l.) to form the Mara River, which flows all the way to Lake Victoria at Musoma Bay, Tanzania (1130 m a.s.l.). The Mara River basin (or Nyangores River basin) has two wet seasons consequent to the yearly oscillations of the inter-tropical convergence zone (ITCZ), the Figure 1. The Mara River basin and the Nyangores River sub-basin (Roy et al., 2017a). The discharge station is located at Bomet Bridge (red dot). Meteorological stations (green dots) are located in the surrounding regions. primary wet season occurring during March to May and the secondary one during October to December. The long-term mean rainfall in the Mau Escarpment is around 1500 mm. The rainfall in the basin is influenced by factors like topography, elevation, regional influence of Lake Victoria, and seasurface temperature (SST) of the Indian Ocean (Camberlin et al., 2009;Dessu and Melesse, 2012).

Estimates of actual evapotranspiration
The source of the SET data used in this study is the Global Land Evaporation Amsterdam Model (GLEAM) Version 3.0. GLEAM comprises a set of algorithms that use remotely sensed climatic and environmental observations to estimate various components of ET. Satellite-based observations of surface net radiation and near-surface air temperature are processed via the Priestley-Taylor equation (Priestley and Taylor, 1972) to calculate potential evapotranspiration (PET), which is then converted to actual evapotranspiration (AET) by incorporating an evaporative stress factor obtained from microwave observations of vegetation optical depth (as a proxy for vegetation water content) and root-zone soil moisture (simulations). Interception loss is calculated using the Gash analytical model (Gash, 1979).
Three different versions of the GLEAM datasets are currently available, depending on the satellite observations used. The version used in this study (GLEAM_v3.0b) is based on satellite observations, is quasi-global (50 • N-S, 180 • W-E), has a spatial resolution of 0.25 • , and has a daily temporal coverage of 13 years (2003 to 2015). Figure 2 shows the annual mean of GLEAM AET (GAET) over the entire Mara River basin. As can be seen, GAET increases towards the western side of the basin. The annual average GAET varies between 900 and 1200 mm yr −1 .
We computed corresponding estimates of PET via the Hargreaves equation (HPET; Hargreaves and Samani, 1985) using temperature data collected from the stations surrounding the Mara River basin (Fig. 1). For a small number (∼ 0.6 %) of days, the lumped GAET values were found to be larger than the lumped HPET values; for these few anomalous values, HPET was replaced by GAET. Figure 3 shows the time series of HPET and GAET for the Nyangores River basin.

Estimates of precipitation
The Real Time Multi-satellite Precipitation Analysis of the NASA Tropical Rainfall Measuring Mission (TMPA-RT) combines information from multiple satellites to produce a quasi-global (50 • N-S, 180 • W-E), near-real-time (1 March 2000 to near-present) precipitation product at 0.25 • × 0.25 • spatial and 3-hourly temporal resolution (this product is the real-time version of TMPA; Huffman et al., 2007). Until it was shut down on 8 April 2015 due to fuel deficiency and battery issues in the satellite, TMPA used to include the TRMM Microwave Imager (TMI) products. TRMM was the first satellite dedicated for precipitation studies. The after-real-time TMPA product also incorporates rain gauge information wherever feasible. In this study, we aggregated the 3-hourly TMPA-RT data to daily level, resampled from the coarse resolution (0.25 • × 0.25 • ) to a resolution of 0.05 • × 0.05 • , and implemented a bias correction using the Climate Hazards Group InfraRed Precipitation with Station product (CHIRPS; Funk et al., 2014Funk et al., , 2015 and rain gauge measurements (Roy et al., 2017a, b).

Estimates of streamflow
Streamflow data were computed using the calibrated stagedischarge relationship for the Bomet Bridge discharge station (station ID: 1LA03; location: 0 • 47 23.50 S 35 • 20 47.45 E) on the Nyangores River (drainage area approximately 882 T. Roy et al.: Evapotranspiration estimates to improve the structure of a conceptual rainfall-runoff model  697 km 2 ), which is one of the two main tributaries of the Mara River. Data are available for the period 1 January 1996 to 30 June 2010, during which only about ∼ 8 % of the records are missing.

Estimates of temperature
We computed PET using the Hargreaves equation, the annual mean of which closely matched the reported PET value for the study area (WREM, 2008). The temperature data used in the Hargreaves equation were extracted from the Global Surface Summary of the Day (GSOD) product produced by the National Climatic Data Center (NCDC) in Asheville, NC. The daily temperature data include multiple observations and are available in three forms: maximum, minimum, and average.

The HYMOD2 hydrologic model
The spatially lumped HYMOD Version 1 (HYMOD1) conceptual rainfall-runoff model with five/six parameters has previously been used in several studies (Boyle et al., 2000;Vrugt et al., 2003Vrugt et al., , 2009Moradkhani et al., 2005;Duan et al., 2007;Wang et al., 2009;Razavi and Gupta, 2016). The model is driven using daily precipitation and PET data to generate daily estimates of AET (HAET: HYMODgenerated AET) and streamflow. Nonlinear vertical flow processes are controlled by a two-parameter soil moisture ac-counting module based on the rainfall excess model proposed by Moore (1985). Horizontal flows are simulated in a linear routing module that includes a Nash cascade for quick-flow routing and a linear reservoir for slow-flow routing.
In this study, we improved the ET process parameterization within the model by relying on satellite-based AET estimates provided by GLEAM. In this regard, we were careful to ensure that the structural modifications (1) do not overcomplicate the model since that defeats the whole purpose of having simpler models such as HYMOD, (2) do not require a large number of additional model parameters, (3) are more physically consistent, (4) consistently produce improved ET simulations, and finally (5) do not deteriorate the streamflow simulations. We refer to the structurally modified version of the model as HYMOD Version 2 (HYMOD2), as shown in Fig. 4. As can be seen, a new ET resistance module is added to the soil moisture accounting module of the model. A detailed description of the ET resistance module is provided in Sect. 2.4. Details of the overall model structure and process equations are presented in Appendix A.

Study approach
We conducted the investigation in two stages. The first stage consists of five steps designed to improve model performance (as assessed against data), but without making structural modifications to the model. The strategy includes using GAET to constrain simulated evapotranspiration, recalibra-tion of model parameters, and constraint adjustments. In doing so, we specifically do not directly assimilate GAET into the model so that the model's representation of the overall water balance is not compromised. Accordingly, while we are extracting information from the GLEAM product, we do so via a process of "constraining" rather than "assimilation". In the second stage, we modify the structure of the model (by capturing the physics of the underlying processes more accurately) to directly improve its ability to simulate ET (using GAET as the target). The steps followed in Stage-I are repeated so that the results of the different strategies can be compared.
Conceptually, the main difference between Stage-I and Stage-II is that, in the former, the information provided by GAET is used only to constrain the evapotranspiration fluxes of the model, whereas in the latter the information contained in GAET is used to alter the model structure. While the former provides a temporary improvement to model performance, achieved as long as GLEAM data are available, the latter is expected to provide a lasting improvement to model performance that should persist even when GLEAM data are not available. In Stage II, we further check whether the GAET product contains residual information that, not having yet been used to improve the model structure, remains useful for improving model performance via the constraining operation.
The entire study approach is summarized in Fig. 5. As can be seen, only Step-1 is different for both the stages (Stage-I and Stage-II), while the remaining four steps (Steps-2-5) are similar. Thus, in each stage, there are five steps altogether. Stage-I Step-1 is for generating benchmark simulations using the calibrated model but without any ET constraint or structural modifications. On the other hand, Stage-II Step-1 has four different cases (A-D) corresponding to different structural modifications in the ET process parameterization. Both the benchmark model from Stage-I Step-1 and the best performing model from Stage-II Step-1 are used in the following steps.
Step-2 is based on imposing the ET constraint but without recalibration.
Step-3 is based on recalibrating the model while imposing the ET constraint.
Step-4 is conceptually similar to Step-3; however, additionally, some constraint adjustments (Eqs. 1 and 2) are applied and the adjustment parameters are calibrated together with model parameters (to match the simulated and observed streamflows). Finally, in Step-5, we remove the ET constraint to see whether the performance of the new model will decline when satellite ET data become unavailable (note this is no longer the benchmark model since we recalibrated the parameters in . (1) In Eqs. (1) and (2), y represents the streamflows after the adjustment of the GAET constraint, and a and b are the param- eters of the adjustment formulations (a controls the variance and b controls the degree of nonlinearity).

ET constraining
The ET constraint was imposed by modifying the original ET equation of the model (Eq. A5) from HAET = min{PET, C} to the new form HAET = min{PET, GAET, C}. Note that this is not a structural modification to the form of the process equation; rather, GAET sets the upper limit of HAET in this case, which is more realistic than using PET directly as the upper limit.

Structural modifications
The ET process parameterization within the original model (HYMOD1) is modified in Stage-II Step-1 to improve its ability to reproduce GAET more accurately without deteriorating the streamflow simulations. Four ET equations of progressive complexity and physical basis are tested. In each case, the model parameters are recalibrated to match the simulated streamflows to the observed data. The final result is a structurally modified model called HYMOD2. More specifically, the ET equation of HYMOD1 is multiplied by a function K( q ) such that 0 ≤ K( q ) ≤ 1. This function acts as a resistance to the ET flux of the model. Four different forms for K( q ) that represent incremental increases in complexity and physical basis (Table 1) are tested. Writing the main ET equation in the general form where Y t is the AET generated by HYMOD (HAET t ), X t is the soil moisture storage (C SMA t ), and EDR t is the evaporation demand ratio computed as min{1, PET t X t }. The most general form for K t is given by where K min and K max are lower and upper limits for K, and ψ t is the ratio of actual storage to maximum storage capacity (ψ t = X t /X max ).

Calibration methodology and benchmark model calibration
Calibration of the model (and adjustment) parameters was performed using the SCE-UA algorithm (Duan et al., 1992). In all cases, the calibration runs were carried out using 10 complexes and 25 loops. Model simulated streamflows were matched against the observed streamflows in the λtransformed space to minimize the effects of skewness and reduce heteroscedasticity. The λ-transformation was applied after modifying the original equation as proposed by Box and Cox (1964) (see Appendix B), and the value of the λ parameter was calculated from the observed streamflow records. The performance criterion used was the mean squared error (MSE) of transformed flows. The model was run continuously for the 7.5-year period January 2003 to June 2010, with the first 4 years (2003 to 2006) used for calibration and the remaining 3.5 years (2007 to mid-2010) used to provide an additional assessment of model performance. Results are shown for the "calibration (4 years)", "evaluation (3.5 years)" and "total (7.5 years)" simulation periods.

Metrics used for performance evaluation
Four metrics are used in this study to assess the model performance (Table 2). These metrics measure performance in regards to overall mean squared error, bias, variability, and correlation (see Gupta et al., 2009), are computed in the transformed space where applicable (e.g., for streamflows), and are normalized to be comparable.

Results from Stage-I (ET constraints)
The performance of the HYMOD1 benchmark model, driven using TMPA-RT satellite-based precipitation and with parameters calibrated to match simulated streamflow to observed data, is reported in Table 5. The NMSE varies between 0.56 (calibration period) and 0.84 (evaluation period), where NMSE = 0.56 means that on average only about 44 % (1.0 − 0.56 = 0.44) of the variability in the flows has been explained. This is not surprising given the use of a simple lumped conceptual model driven by satellite-based estimates of precipitation for a poorly gauged basin. The flow biases are small (NBµ < 15 %), indicating that long-term water balance is approximately preserved. The calibrated values of the model parameters are reported in Appendix C. Table 3 presents a comparison of the model-generated HAET with the GAET data (for the total 7.5-year simulation period). HAET tends to be larger on average, varies over a wider range, is considerably more skewed, and is less kurtotic. Some of the reasons for this can be understood from the time-series plot and scatter plot shown in Fig. 6. The behavior of HAET tends to be more erratic and, although both HAET and GAET show seasonal patterns, the former regularly drops to zero or near zero (explained by the very simple, threshold-like, ET process representation in the model, which does not contain a resistance term). The result is that HAET and GAET are not well correlated (Fig. 6b) and have different shapes for their empirical probability distributions (Fig. 7). Even if we were to ignore the time steps when HAET drops closer to zero, HAET is strongly positively bi- Normalized bias in standard deviation (NBσ ) NBσ = SD(S)−SD(O)   ased (too large), which results from trying to satisfy the potential evapotranspiration (PET). Table 4 reports a water balance estimate WBAET of the mean annual AET for the basin, obtained by subtracting mean annual streamflow (at the discharge station) from mean annual precipitation (estimated from TMPA-RT). WBAET is similar in magnitude to GAET, and we have GAET < WBAET < HAET < HPET, indicating that the AET computed by the model tends to be a little high.
The benchmark model was constrained using the GAET estimates in Step-2, but the model parameters were not recalibrated. As can be seen in Table 5, the model performance has become significantly worse due to streamflow becoming positively biased. Given that GAET < HAET, this makes sense because imposing GAET as a constraint alters the wa- ter balance of the model. To try and fix this water balance problem, we recalibrated the parameters of the model to improve the match to observed streamflows while continuing to use GAET to constrain HAET in the model (Step-3). Although the large positive bias was reduced (Table 5) and the NMSE statistic was improved as compared to Step-2, most of the error statistics deteriorated for all three periods (calibration, evaluation, and total simulation) as compared to Step-1. Importantly, this calibration step resulted in an unrealistically large value of 17.36 m for the size of the soil moisture storage (previously a more realistic 0.76 m). This value is clearly conceptually and physically inconsistent (the realistic range is about 0 to 2 m), and while it improved calibration period performance, the lack of consistency is reflected in the sharp deterioration in performance during evaluation. Unrealistic parameter values, such as this, are indications of either severe errors in the data or structural errors in the model. Since GAET agrees well with WBAET on average, it is likely that Table 5. Streamflow error statistics for calibration, evaluation, and total simulation (in parentheses) for all five different steps in Stage-I analysis.

Metrics
Step-1 Step-2 Step-3 Step-4 Step-5 the major cause here is model structural inadequacy . We next checked (Step-4) to see whether this problem could instead be resolved by implementing empirical adjustments to GAET. We tested two empirical constraint adjustment schemes (Eqs. 1 and 2) applied to the GAET data, and calibrated the additional parameters (from these equations) along with HYMOD1 parameters. Results for both schemes were similar, but Eq. (2) provided slightly better performance for the evaluation and total simulation periods, and so we selected Eq. (2). Compared to the benchmark (Step-1), NMSE and NBµ calibration period statistics reduced from 0.56 to 0.43 and 12 to 6 %, respectively (Table 5) while ρ increased from (0.76/0.66/0.72; Cal/Eval/Tot) to (0.83/0.74/0.78). Perhaps more important, the calibrated value of parameter H is now 0.65 m, which is within the conceptually acceptable range. Finally, the model obtained in Step-4 was run without the use of GAET to see how well the model would perform if GAET data were to become unavailable. The results (Table 5) indicate that model performance does not deteriorate significantly when GAET data become unavailable and, in some cases, the performance is better than the benchmark.

Results from Stage-II (structural modifications)
Results from Stage-I confirm that GAET constraining can improve the overall performance of HYMOD. However, for operational implementation, the method requires real-time estimates of SET, which could sometimes pose a challenge for practical applications. To overcome the need for real-time data availability, a simple approach could be to establish a functional relationship between HAET and GAET from the historical records and use that relationship to adjust HAET. In our case, however, HAET and GAET did not show a sufficiently strong relationship (Fig. 6). Therefore, we instead investigated whether we could use the historical GAET data to improve the structure of the model itself.
Our previous results (Fig. 6) showed that HAET generated by HYMOD1 did not match well with GAET. This is likely because the entire soil moisture storage was exposed to the ET process. Consequently, it is common for all of the soil moisture to evaporate away during a single time step, leaving no water available for evaporation at the next time step (provided no precipitation is added), so that HAET drops to zero. This tendency can be reduced by modifying the ET process representation so that HAET more closely follows GAET.
Step-1 in the Stage-II analysis has four different cases as shown in Table 1. The first case (Case-A) is identical to the benchmark step (Step-1) in the Stage-I analysis, where the calibrated HYMOD1 is run without GAET estimates. In Case-B, K t = K 0 is applied as a constant multiplier to the ET equation (see Table 1), which acts as a constant surface resistance to ET. Calibration (of all of the model parameters) resulted in improved error statistics ( Table 6). The estimate obtained for the surface resistance was K 0 = 0.73. However, we again obtained a conceptually unrealistic value (H = 9.5 m) for the soil moisture storage parameter. In Case-C, the more complex form K t = K 0 · f (ψ t ) was used (see Table 1). This resulted in a model performance ( Table 6) comparable to that of the previous one, but with a more realistic calibrated value of the soil moisture storage (H = 0.90 m). Interestingly, the calibrated value for K 0 was 1, implying that K 0 becomes irrelevant once f (ψ t ) is introduced to the ET equation.
Finally, the most complex form K t = K min + [K max − K min ] · f (ψ t ) was introduced in Case-D. K max was a calibration parameter and K min was defined as K min = γ · K max via a second calibration parameter γ (ranging from 0 to 1) (see Table 1). Results indicate that although the calibration error statistics (Table 6) are similar to those of Case-C, the evaluation and total simulation statistics are better. The calibrated value of parameter "BE" (derived by transforming the parameter "be"; see Eq. A7) was 0.86, indicating only a mildly nonlinear relationship between ψ t and K. The minimum and maximum limits of K were close to zero and one, 0.14 (0.14) 0.07 (0.16) 0.14 (0.14) 0.14 (0.13) NBσ respectively, confirming the findings of Case-C, where K 0 became irrelevant once f (ψ t ) was introduced to the ET equation.

Final model selection from Stage-II
Step-1 In this section, we address two main questions: (a) does the structural modification of the model (to the representation of the ET process) improve ET estimation? If so, then (b) what level of complexity is adequate? Table 7 presents the streamflow and AET performance statistics for the total simulation period for the four cases. Since Case-A provides very poor error statistics for AET (e.g., NMSE = 8.93 and NBσ = 1.89), we disregard this case. Although Case-B shows the best NBσ (−0.06) statistics for streamflow, and the best NMSE (1.28) and NBµ (0.12) statistics for AET, the value obtained for the soil moisture storage capacity (H ) was unrealistic; we therefore also disregard this case. Comparison of Case-C and Case-D shows that while their streamflow and AET simulations are similar (Fig. 8), Case-D provides slightly better NMSE (0.70) and NBµ (0.13) statistics for streamflow and slightly better ρ (0.49) statistics for AET (Table 7 and Fig. 9). We therefore selected the most complex form K t = K min + [K max − K min ] · f (ψ t ) for the ET func-  tion (Case-D). The corresponding model is hereafter referred to as "HYMOD2". Note that, for practical applications, any simpler structural modification (Case-B or Case-C) can be adapted, if that proves convenient. Comparing the streamflow error statistics of Stage-I Step-4 (Table 5) and Stage-II Step-1 Case-D (Table 6), we see that they are quite similar, indicating that the ET constraining (first approach) and diagnostic structural improvement (second approach) strategies produce dynamical behaviors that are similar (as measured by the four performance metrics used). Table 8. Streamflow error statistics for calibration, evaluation, and total simulation (in parentheses) for all five different steps in Stage-II analysis.

Is further improvement possible?
The modified model (HYMOD2) selected in the previous section (Stage-II Step-1 Case-D) was next used with GAET in Step-2 to Step-5 (see Fig. 5) to address two questions: (1) could more information from GAET be incorporated (via constraining) into the model, or is the improved model structure already good enough? (2) Is the constraint adjustment on GAET (Step-4) still relevant once the model structure has been improved? When GAET was used to constrain the ET process (Step-2) in HYMOD2 without model recalibration (parameters used were from Case-D), there was significant overestimation bias evident in the simulations of streamflow (Table 8). Clearly, recalibration of the modified model was necessary to see whether the model performance could be improved any further. The recalibration of HYMOD2 improved the error statistics (Table 8); compare these results with Stage-I Step-3 results in Table 5 derived the same way for HYMOD1. While a small improvement was obtained for the soil moisture storage capacity parameter H (reduced from 17.4 to 12.8 m), this value remained conceptually inconsistent (too large). Overall, the error statistics deteriorated compared to the best results from Case-D. The HYMOD2 parameters were then calibrated along with the parameters of the GAET adjustment equation (using Eq. 2). Although the results improved (Table 8), and the value of the H parameter became conceptually realistic (0.88 m), the results were not significantly different from Case-D in Step-1. Finally, when the GAET data were made unavailable, the model performance remained stable, as also seen in Stage-I Step-5 results.
Therefore, in regards to the two questions that motivated this section, the results indicate that (1) once information from GAET was incorporated into the model as a modification to the structure, there was no further need for the use of GAET to constrain the simulation of ET (use of GAET even caused some of the results to deteriorate), and (2) im-plementation of a constraint adjustment to  in the structurally modified model (Stage-II) did not improve the overall results. Figure 10 compares the streamflow time series obtained from Stage-I Step-4 (constraining ET) and Stage-II Step-1 Case-D (selected model after structural modification) against the benchmark (Stage-I Step-1) in both actual and λ-transformed spaces. Simulations from the HYMOD2 structurally modified model follow the observations most closely, followed by the simulations from Stage-I Step-4 (ET constraining) and benchmark. Clearly, while the streamflow simulations are improved by both ET constraining and model structural modification, the latter performs the best.

Overall comparison and analysis of uncertainty
Using the best model from Stage-II (HYMOD2), we next investigate the change in simulation uncertainties for streamflow and AET due to the model structural improvement. The calibration period residual distributions (assumed stationary) were superimposed on the daily estimates of the corresponding variables for the total simulation period. Figure 11 shows the histograms of calibration period residuals for the benchmark and final (Stage-II Step-1 Case-D) steps. In both cases (AET and streamflow), the residuals become more normally distributed, with the improvement being more prominent for AET. This result is expected, since HYMOD1 in Stage-I Step-1 showed poor performance in regards to AET. Overall, the structural modification is clearly beneficial. Figure 12 shows the streamflow and AET time series along with their corresponding 90 % confidence intervals for the benchmark and the final steps. Both the streamflow and AET simulations improve as a result of the model structural modification. Although the streamflow uncertainty bounds have not narrowed significantly, the flow series is clearly less biased and tracks the recessions better. Meanwhile, the AET simulations have improved significantly: (a) the bias has been reduced, (b) the uncertainty bounds are narrower, and (c) the T. Roy et al.: Evapotranspiration estimates to improve the structure of a conceptual rainfall-runoff model 889 Figure 10. Time-series plots of streamflow for the best simulations in Stage-I and Stage-II, the benchmark simulation, and the observations. erratic behavior originally seen in the AET simulations (frequent drops to zero) has disappeared. Further, although the improvement in streamflow performance is evident from the statistics in Tables 5 and 6, the improved behavior is even more apparent in Fig. 12, where the model can be seen to track the recessions quite well.

Discussion
In this study, we have explored two different approaches for using the recently available GLEAM satellite-based AET dataset to improve the realism and performance of the HY-MOD conceptual catchment-scale hydrologic model. In the first approach, GAET is used as a constraint to the ET process equation in the model, while in the second approach, the model structure has been modified so that the ET process parameterizations become more physically consistent and realistic. We avoided making the model overly complicated in terms of its structural representations and/or having a large number of parameters, since both of these would defeat its main purpose of being a simple model. Our goal was to increase the realism within the model and improve its performance in simple manners. Furthermore, we also made sure that the improvements in some particular process simulation (e.g., AET) do not deteriorate the model's performance in simulating some other process (e.g., streamflow). Our results show that both the approaches (process constraining and structural modification) can improve the simulations of streamflow, while the latter also significantly improves the AET simulations. Clearly, the satellite-based ET datasets (GLEAM in this case) can significantly benefit the process of hydrologic modeling in poorly gauged basins.
The use of ET data as a constraint can improve streamflow forecasts, provided some additional processing steps are implemented. If the GAET data are used directly as a constraint to the ET equation, the model tends to show bias in streamflow simulations. This behavior can be attributed to the fact that once GAET is incorporated, the water balance within the model is altered, the effects of which are reflected in terms of bias in the simulated streamflows. The type of this bias, of course, is subject to change depending on the dataset. While recalibration of the model with the ET constraint improves the performance, it can result in conceptually unrealistic estimates of certain parameters (H in this case). However, the model produces conceptually realistic values of the H pa- rameter if some adjustments are made in the GAET constraints, instead of using them directly. Note that constraint adjustment is not similar to bias correction; for the latter we need the "ground truth" of actual ET. Therefore, the adjustment process is not necessarily indicative of the presence of any actual bias in GAET estimates. The adjustment factors are model parameters that correspond to the structural deficiencies within the model. They may or may not be necessary as the structure changes. As we have seen in Stage-II Step-4, the constraint adjustment became irrelevant once the structure of the model itself was improved. We also found that a simple adjustment (using a multiplicative factor) can perform equally well as a more complex alternative (power function).
Improving the model structure provides several other benefits. For example, a model that simulates ET more accurately can be a suitable candidate for real-time forecasting applications. This type of a model can also prove useful for projecting future water availability. Although ET plays a significant role in the hydrological cycle, traditionally, for conceptual models, the main focus has been drawn towards improving their streamflow simulation performance, while making the ET process overly simplified (e.g., a simple water budget). In this study, we show that by incorporating simple but physically consistent structural changes, the ET simulation performance can be improved significantly. In poorly gauged basins, the satellite-based estimates of AET provide useful information to carry out this improvement.
In this study, we tested several conceptually reasonable model structural modifications of varying levels of complexity and physical basis (Case-A to Case-D), and selected the one that provided the best simulations of both AET and streamflow. We found that relatively simple changes to the model's ET equation significantly improved the ET simulations as assessed by GAET. While our goal was to improve the AET and streamflow simulations, we were also careful to ensure that the model parameters remain conceptually realistic. We saw that using a simple multiplicative factor (parameter K) as a resistance to AET produced excellent streamflow and AET forecasts (Case-B), but resulted in an unrealistic estimate of basin storage capacity (parameter H ). In contrast, inclusion of a soil moisture dependent function, f (ψ t ), resulted in a more realistic estimate of basin storage capacity without compromising the streamflow and AET simulations. Once the model structure was appropriately modified to provide good simulations, the model simulations were robust/stable, and there was no need to impose the ET constraint (with/without constraint adjustment). The modified model structure provided significantly improved AET forecasts with much narrower uncertainty intervals (see Fig. 12), along with reduced bias in streamflow and improved tracking of the streamflow recessions.
Overall, by incorporating an additional source of external information in a sensible manner (here by structural modification), the need for calibration can be reduced (note that the model was not calibrated against GAET); see the extensive discussion by Gharari et al. (2014) and Bahremand (2016) on this topic. Nevertheless, given the simplistic nature of the hydrologic models and the large uncertainties that exist therein, some degree of calibration will generally remain important and relevant. We do not mean to imply, therefore, that calibration is not essential, because we will rarely (correctly) know everything we need to know about the system we are modeling. Instead, we should be aware of the strengths and weaknesses involved in the use of calibration and apply it carefully in such a way that useful information is gained about the underlying nature of the actual physical system. In this study, we demonstrate the need for both approaches. On the one hand, improving the model structure resulted in improved AET simulations without any need for calibration (against AET). On the other hand, the best streamflow performance was achieved when the structurally modified model was tuned via a calibration procedure.
Note that this study is based on testing the model on a single basin using a single satellite-based AET product. While not demonstrating universal applicability, the results are clearly indicative and the methodology illustrates how such data can be used to investigate potential improvements to the structures of simple catchment-scale models used for hydrologic studies in data-scarce regions. A rigorous analysis of the methodology over multiple basins is a future research need. For more detailed process-based models, the ET process parameters can be calibrated against some reliable satellite-based AET estimates (e.g., GLEAM), or the process representation itself can be improved by adapting some similar strategies that these AET products are based on.

Conclusions
In conclusion, SET data can be used to improve model performance in different ways. However, strategies that result in model structural modifications can generally be expected to provide longer lasting benefits than ones that simply update or constrain the state trajectories of the model. This is because structural modifications can both improve the initial estimates of the state at each time step and sustain these improvements into future time steps (Bulygina and Gupta, 2009, 2010Nearing and Gupta, 2015). In contrast, even though data assimilation to directly adjust state estimates can improve model performance, inadequacies in model structure will tend to cause the state estimates to drift away from their more appropriate values over time, because of which the performance will deteriorate markedly when the constraining data are not available. Of course, we have only tested a simple "constraining" strategy for assimilating ET information, and more sophisticated approaches such as the ensemble Kalman filter (EnKF) could instead be implemented. However, the efficiency of the EnKF for soil moisture retrieval has been shown to be as low as 30 % (Nearing et al., 2013a, b), and so it is not clear that more sophisticated forms of DA are justified, especially given the large uncertainties associated with both the data and the model structure for this poorly gauged catchment. We leave such investigation for future work.

Code and data availability
Data and codes (HYMOD2 in Matlab) used in this study are available on request from the corresponding author, Tirthankar Roy (royt@email.arizona.edu).

892
T. Roy et al.: Evapotranspiration estimates to improve the structure of a conceptual rainfall-runoff model

Appendix A: Original HYMOD equations
The benchmark version of the HYMOD spatially lumped conceptual rainfall-runoff model has six parameters. The model is driven by mean daily precipitation and PET data to generate daily estimates of AET and streamflow. It has two main components, a two-parameter soil moisture accounting (SMA) module based on the Moore (1985) rainfall excess concept, and a linear routing (ROUT) module with parallel quick-flow and slow-flow pathways. In the SMA module, the state variable (soil moisture storage, C) and the indicator variable (storage height, H ) are nonlinearly related via the following equation (Moore, 1985): where the maximum storage capacity (C max ) and the maximum indicator height (H max ) are related as First, the initial storage (C beg ) is calculated from the initial indicator height (H beg ) using Eq. (A1). Next, H max is subtracted from the sum of precipitation (P ) and H beg to calculate overland flow (OV) as Infiltration (I ) is then calculated by subtracting OV from P , and an intermediate indicator height (H int ) is computed by adding I to H beg , and used to calculate the intermediate storage (C int ). By subtracting C int from the sum of I and C beg we obtain the interflow (IF). Finally, the total runoff is obtained by adding together OV and IF. Finally, the HYMOD AET (called HAET) is taken to be the smaller of available water C int and PET (which is provided as input to the model): and the storage at the end of the time step is computed by subtracting AET from C int : The power coefficients in HYMOD ("BE" in Table 1 and "b" in Eqs. A1 and A2) can have values ranging from 0 to infinity. For calibration it is useful to be able to impose finite values on the feasible ranges of the parameters; therefore we applied the following transformation (Eq. A7) which converts the [0, inf) range of parameter BE to the [0, 2) range of transformed parameter "be" so that the search can be conducted on the finite range of parameter "be" (similarly for parameter "b" in Eqs. The λ-transformation on streamflows used in this study is given by the equation where Q t and T Q t represent streamflows in the actual space and the transformed space, µ Q obs is the mean of the observations in the actual space, and λ is the transformation parameter that reduces the skewness. This expression differs slightly from the form T Q t = (Q t ) λ −1 λ recommended by Box and Cox (1964), in that the flows are normalized by the mean µ Q obs before transformation, and the transformed flows all remain positive. This form works as long as the transformation parameter λ = 0, which is true in our case; if λ = 0, then one should use T Q t = ln(Q t ) as discussed by Box and Cox (1964).
Appendix C: Calibrated HYMOD (actual and modified) parameters Table C1. This table provides calibrated parameters of the actual and modified HYMOD models.