Incorporating remote sensing-based ET estimates into the Community Land Model version 4 . 5

Land surface models bear substantial biases in simulating surface water and energy budgets despite the continuous development and improvement of model parameterizations. To reduce model biases, Parr et al. (2015) proposed a method incorporating satellite-based evapotranspiration (ET) products into land surface models. Here we apply this bias correction method to the Community Land Model version 4.5 (CLM4.5) and test its performance over the conterminous US (CONUS). We first calibrate a relationship between the observational ET from the Global Land Evaporation Amsterdam Model (GLEAM) product and the model ET from CLM4.5, and assume that this relationship holds beyond the calibration period. During the validation or application period, a simulation using the default CLM4.5 (“CLM”) is conducted first, and its output is combined with the calibrated observational-vs.-model ET relationship to derive a corrected ET; an experiment (“CLMET”) is then conducted in which the model-generated ET is overwritten with the corrected ET. Using the observations of ET, runoff, and soil moisture content as benchmarks, we demonstrate that CLMET greatly improves the hydrological simulations over most of the CONUS, and the improvement is stronger in the eastern CONUS than the western CONUS and is strongest over the Southeast CONUS. For any specific region, the degree of the improvement depends on whether the relationship between observational and model ET remains time-invariant (a fundamental hypothesis of the Parr et al. (2015) method) and whether water is the limiting factor in places where ET is underestimated. While the bias correction method improves hydrological estimates without improving the physical parameterization of land surface models, results from this study do provide guidance for physically based model development effort.


Introduction
Land surface models are widely used tools in simulating and predicting the Earth's water and energy budgets over a wide range of spatiotemporal scales (Rodell et al., 2004;Haddeland et al., 2011;Getirana, 2014;Xia et al., 2012a, b, Xia et al., 2016a, b).For example, the Global Land Data Assimilation System (GLDAS) was designed to simulate the terrestrial water and energy budgets over the globe using multiple land surface models (Rodell et al., 2004), and its regional counterpart, the North America Land Data Assimilation System (NLDAS), utilizes four land surface models and focuses on the conterminous United States at a much higher resolution (Rodell et al., 2004;Xia et al., 2012a, b).Products from these two operational systems have been widely used in estimating terrestrial water storage changes (Syed et al., 2008), investigating land-atmosphere coupling strength (Spenne-mann and Saulo, 2015), analyzing soil moisture variability (Cheng et al., 2015), studying the impact of soil moisture on dust outbreaks (Kim and Choi 2015), and improving the data quality of in situ soil moisture observations (Dorigo et al., 2013;Xia et al., 2015).These model-based estimates of land surface fluxes and state variables are considered an important surrogate for observations, as observational data for some components of the global water and energy cycles are scarce in many regions of the world, and lack spatial and temporal continuity where they do exist.However, land surface models are subject to large uncertainties.Haddeland et al. (2011) compared 11 models in simulating evapotranspiration (ET), and found that the global ET on the land surface ranges from 415 to 586 mm yr −1 and that the runoff ranges from 290 to 457 mm yr −1 .Xia et al. (2012aXia et al. ( , b, 2016a, b) , b) documented a large disparity among the four models in NLDAS phase 2 (NLDAS-2) at both the continental and basin scales, and showed that the Mosaic and Sacramento Soil Moisture Accounting (SAC-SMA) models tend to overestimate ET whereas the Noah and Variable Infiltration Capacity (VIC) models tend to underestimate ET.
Great efforts have been made to improve model performance over the years, by enhancing both the model parameterization of land surface processes and the model input data.For instance, during the past 10 years, the Community Land Model (CLM) has been upgraded from version 2 to version 4.5 (Bonan et al., 2002;Oleson et al., 2008Oleson et al., , 2013)), accompanied by increasingly accurate and high-resolution surface datasets (Lawrence et al., 2011).Comparison with observations of runoff, evapotranspiration, and total water storage demonstrated continuous improvement of the model performance (Lawrence et al., 2011).The Noah model is another example of a continuous upgrade from its original version since the 1980s (Mahrt et al., 1984).Recent model developments were on vegetation canopy energy balance, the layered snowpack, frozen soil and infiltration, soil moisturegroundwater interaction and related runoff production, and vegetation phenology (Niu et al., 2011).Despite the improved understanding and parameterization of physical processes and better input data, substantial model biases remain (e.g., Parr et al., 2016;Wang et al., 2016).
Another approach to improving model simulations or predictions is through data assimilation, by merging observational data and land surface models to obtain optimal estimates for the next time step.Fusing soil moisture observations into land surface models is a typical practice in land data assimilation, and it has been reported that data assimilation of soil moisture helped in reducing model biases (Reichle and Koster, 2005;Kumar et al., 2008;Yin et al., 2015).However, data assimilation is a computationally intensive task, especially when implementing a multi-model ensemble approach.Moreover, the data assimilation approach is not applicable to future prediction.Parr et al. (2015) proposed an alternative approach to reducing model biases, and applied it to the Variable Infiltration Capacity (VIC) model over the Connecticut River Basin for both historical simulations and future projections.The Parr et al. (2015) approach assumes that the relationship between the model evapotranspiration (ET) and observational ET remain unchanged from one period to another, and hence the relationship estimated from the calibration period can be used to correct ET biases and their effects on other variables for any period, historically or in the future.When applied to VIC over the Connecticut River Basin, Parr et al. (2015) found that the ET bias correction approach significantly reduces systematic biases in the estimates of both historical ET and historical river flow, and qualitatively influences the projected future changes in drought and flood risks.
To establish the robustness of the Parr et al. (2015) method, it needs to be evaluated over different regions and different climate regimes based on different models.In this study, we implement the Parr et al. (2015) approach in CLM4.5 and evaluate its performance over the whole conterminous United States (CONUS).The land surface model, study area, and bias correction method are introduced in Sect. 2. The data for model calibration and validation, including datasets of ET, runoff, and soil moisture, are described in Sect.3. Section 4 presents the calibration and validation results.Finally, the main findings are summarized and discussed in Sect. 5.

Model and methodology
2.1 Model and forcing data CLM4.5 (Oleson et al., 2013) in its offline mode with the prescribed vegetation phenology is used in this study.The land surface datasets used in CLM4.5 were derived from different sources.The soil texture data were taken from Bonan et al. (2012), which were generated using the International Geopshere-Biosphere Programme soil data (Global Soil Data Task, 2000).Both the percentage of plant functional types (PFTs) and the leaf area index within each grid cell were derived from Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data (Lawrence et al., 2011).Slope and elevation were obtained from the US Geological Survey HYDRO1K 1 km dataset (Verdin and Greenlee, 1996).Parr et al. (2016) found that CLM4.5 can realistically capture the overall spatial pattern of ET in the CONUS when the model is forced by the NLDAS-2 meteorological variables.The spatial correlation coefficients between the simulated annual ET and the FLUXNET-MTE (model tree ensemble) ET are as high as 0.93.Wang et al. (2016), using multiple atmospheric forcing datasets, also reported that CLM4.5 can reasonably reproduce the large-scale patterns of runoff and ET.In this study CLM4.5 is forced by the NLDAS-2 meteorological forcing (Xia et al., 2012a).The NLDAS-2 forcing is available during 1979-present at an hourly resolution on a 0.125 • grid system, but is aggregated to a 0.25 • resolution in this study as the driving forcing for CLM4.5.The CONUS is chosen as the study domain over the globe for the high quality of atmospheric forcing data in this region.

Methodology
The division of the CONUS into Northwest, Southwest, Northeast, and Southeast, which is based on the 40 • N latitude line and the 98 • W longitude line, was defined by Lohmann et al. (2004).This division was later adopted by Xia et al. (2012a) and Tian et al. (2014) when land surface models were evaluated over the CONUS.We follow this division in this study, as shown in Fig. 1a.
Although land surface models are capable of capturing the large-scale pattern of ET, significant biases were found at finer spatiotemporal scales (Parr et al., 2015(Parr et al., , 2016;;Wang et al., 2016), which propagate to influence other components of the hydrological cycle including runoff and soil moisture (Parr et al., 2015).Following Parr et al. (2015), we derived the climatology of modeled ET for each model grid cell and for each month based on a simulation during the calibration period and climatology of observational ET from satellitebased ET data at the same spatiotemporal resolution during the same period, and estimate the scaling factor between observational ET and the model ET.This scaling factor, which has its unique spatial variability and seasonal cycle, is assumed to be time-invariant at the inter-annual and longer timescales.To correct the ET biases in model simulations during any period, two types of simulations are conducted sequentially.In the first type of simulation, named the CLM, we run the default CLM4.5 and save the output for three components of ET, i.e., interception loss, plant transpiration, and soil evaporation, at the PFT level for every time step.The corrected interception loss, plant transpiration, and soil evaporation are then derived by multiplying the simulated values by the ET scaling factor, and will be used as the input for the second type of simulation, named CLMET.In CLMET, we re-run CLM4.5 for the same period as in the first type, but overwrite the three ET components simulated by the model with the corrected values.Since ET simulations affect the partitioning of precipitation between ET and runoff, the bias correction in ET is expected to have a direct positive impact on runoff generation and therefore soil moisture.
In this study, we use 1986-1995 as the calibration period and 2000-2014 as the validation period.The simulations during the calibration period are obtained from a 16-year (1980-1995) CLM run with the first 6-year run disregarded as the spinup.Both CLM and CLMET runs during the validation period start with the initial condition of 1 January 1996 obtained from the calibration period.The time step for both CLM and CLMET runs is 1 h.Since the overwriting process in CLMET may break the water balance, the model checks whether the amount of water stored in the vegetation canopy is sufficient to sustain the interception loss and whether the surface soil water storage is sufficient to sustain soil evaporation through the model time step.If not, the interception loss (soil evaporation) rate is set to be equal to the water available in the vegetation canopy (soil) divided by the model time step.This adjustment minimizes the imbalance caused by overwriting ET components in CLMET.
In this study, the statistics bias, relative bias, and root mean square error (RMSE) are used to validate models in reproducing the spatial pattern against the reference dataset.They are defined as where N is the total number of grid cells, and S i (R i ) are the temporal average of the model simulated (reference) value for grid cell i, which is calculated as where S i,j (R i,j ) is the model simulated (reference) value at time j and grid cell i, and M is the total number of time points.The statistic RMSE is also used to validate models in reproducing time series where M becomes the total number of grid cells and N the total number of time points.
GLEAM (the Global Land Evaporation Amsterdam Model) version 3.0a (Miralles et al., 2011;Martens et al., 2016) is used to calibrate the ET scaling factors and to validate the CLM and CLMET.As such we assume full trust in the GLEAM evaporation data with the bias correction method.GLEAM 3.0a was derived based on reanalysis net radiation and air temperature, a combination of gauge-based, reanalysis and satellite-based precipitation and satellite-based vegetation optical depth, spanning the 35-year period 1980-2014 (http://www.gleam.eu/).Potential evaporation in GLEAM 3.0a was calculated using a Priestley and Taylor equation based on surface net radiation and near-surface air temperature, and was converted to actual evaporation using the multiplicative evaporative stress factor.The dataset has been used in studying soil moisture-temperature coupling (Miralles et al., 2012), the impact of land surface on precipitation (Guillod et al., 2015), and the climate control on land surface evaporation (Miralles et al., 2014).Recent evaluations conducted at both flux tower site and global scales show that GLEAM-based ET is superior to MODIS-based and Surface Energy Balance System (SEBS) based ET products (Michel et al., 2016;Miralles et al., 2016).The spatial resolution of the GLEAM dataset is 0.25 • , which is consistent with the resolution of CLM4.5 used in this study.The temporal resolution of the GLEAM dataset is daily, and the monthly aggregated ET is used to derive the scaling factors.

MODIS and FLUXNET-MTE ET
Two other gridded ET products are used for independent evaluations: MODIS ET and FLUXNET-MTE (model tree ensemble) ET.Mu et al. (2007Mu et al. ( , 2011) ) produced a MODISbased global ET dataset using a revised Penman-Monteith (PM) equation.The dataset is arguably the most widely used remote sensing-based global ET product (Miralles et al., 2016).Monthly versions of the MODIS-based product at the 0.5 • spatial resolution are used to validate the model with the bias correction method.The FLUXNET-MTE global ET dataset was derived from 253 FLUXNET eddy covariance towers distributed over the globe using the model tree ensemble (MTE) approach (Jung et al., 2009(Jung et al., , 2010)).The record gaps of half-hourly eddy covariance fluxes were filled first, and the complete tower-based dataset was then used to train the MTE to produce the monthly global ET dataset at the 0.5 • spatial resolution.The data have been used to study the ET trend (Jung et al., 2010) and to improve canopy processes in a land surface model (Bonan et al., 2011).As FLUXNET sites over the CONUS are fairly dense, the quality of the FLUXNET-MTE dataset in our study domain is expected to be good.The MODIS dataset is available for 2000-2014, and the FLUXNET-MTE dataset is available for 1982-2011.We chose the overlap period of these two products, 2000-2011, for model validations using MODIS and the FLUXNET-MTE dataset.

Flux tower ET
ET observations (in energy unit) at 16 sites from the Ameri-Flux network are used to validate the model on the grid cell scale (Fig. 1b).Those sites span four sub-regions (i.e., NW, SW, NE, and SW) of the CONUS with five different vegetation types (i.e., grass, crop, evergreen needleleaf forest, mixed forest, and deciduous broadleaf forest).More details about these flux tower sites can be found in Xia et al. (2015b).
For most sites, the year of 2005 is selected for validation because data for this year have the least amount of missing records; three sites are exceptions due to data availability: 2002 for the site of Sylvania Wilderness, and 2004 for the sites of Donaldson and Walnut River.Both daily and monthly ET observations at these 16 sites are compared with model simulations.

Observation-based runoff coefficient
The runoff coefficient (the ratio of runoff to precipitation) of the Global Streamflow Characteristics Dataset (GSCD) version 1.9 (Beck et al., 2013(Beck et al., , 2015) ) is used to evaluate the model performance in simulating runoff.The GSCD dataset was produced based on streamflow observations from approximately 7500 catchments over the globe.A data-driven approach was adopted to derive the gridded streamflow characteristics at the 0.125 • resolution on a global scale.This dataset is relatively reliable for the grid cells within which a large number of catchment data are used.The uncertainty is low in North America, Europe, and southeastern Australia, where a large number of observations are available.

In situ soil moisture observations
The North American Soil Moisture Database (NASMD, Quiring et al., 2016) is used to evaluate the model perfor-mance in simulating soil moisture in both the surface (0-10 cm) and root-zone (0-100 cm) layers.The NASMD was initiated in 2011 to provide support for developing climate forecasting tools, calibrating land surface models, and validating satellite-derived soil moisture algorithms.A homogenized procedure has been implemented, as the measurement stations are across a variety of in situ networks.In addition, a quality control (QC) algorithm was applied to the measurement records (Xia et al., 2015;Liao et al., submitted to the Journal of Hydrometeorology, 2017).The in situ observations in Alabama (AL), Illinois (IL), Mississippi (MS), Nebraska (NE), and Oklahoma (OK) from 2006 to 2010 are selected from the NASMD (Fig. 1a).A large number of stations is evenly distributed over these states and observation records during this period are relatively complete after QC.The numbers of stations in AL, IL, MS, NE, and OK are 10, 19, 14, 45, 105, and 39, respectively.Since the soil layer where measurement was taken varies with stations, we linearly interpolate the volumetric soil water content to the 5 and 50 cm depths for all stations to compare them with the modeled soil moisture for the 0-10 and 0-100 cm layers.

Calibration of ET scaling factor
Figure 2 shows the climatological scaling factors for each month over the CONUS based on the 1986-1995 period.The GLEAM-derived dew and the CLM simulated dew are not consistent in some areas of the Northwest CONUS.If that happens, the scaling factors became negative, because ET is negative for one and positive for the other.We did not scale ET when the scaling factor is negative, and those areas are masked out in Fig. 2.This treatment (scaling in some months and no scaling in other months) may introduce a seasonal bias correction effect in these areas.The model simulations generally agree better with GLEAM estimations during the warm seasons, whereas the difference between simulations and GLEAM estimations remains large during the cold sea-sons.The scaling factors greatly vary with region.For instance, the area-averaged scaling factors for November are 0.34, 0.58, 0.28, and 0.52 for Northwest, Southwest, Northeast, and Southeast, respectively.The overestimation is overwhelming during October, November, December, and January, whereas underestimation occurs in many areas during March, April, and May.The overestimation is especially severe over the Northeast CONUS where simulated ET is almost 5 times the GLEAM estimate in December.

Evaluation
We evaluate the effectiveness of the ET bias correction approach in CLM4.5 by comparing results from the CLM and CLMET with the reference dataset.The evaluation metrics examined include bias, relative bias, and root mean square error (RMSE) as described in Sect.2.2.Since the spatial resolution of some gridded reference data is not consistent with the model resolution, we upscale the finer resolution data to match the coarser resolution data using simple arithmetic averages.For example, when the MODIS and FLUXNET-MTE

ET
Figure 3 shows the multi-year averages (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) of ET derived from GLEAM, simulated by the CLM and CL-MET, and the relative bias of simulations against GLEAM.Over most of the CONUS, the CLM overestimates ET and CLMET reduces ET as well as ET biases relative to GLEAM data.The averaged relative bias in the CLM over the CONUS is 10.8 %, with a relative bias exceeding 10 % in a substantial portion of the CONUS; and in CLMET, the CONUS-averaged relative bias is reduced to −0.1 %, and it is within 10 % over most of the CONUS.This improvement is more significant over the eastern CONUS than the western CONUS.Table 1 shows the statistics on the model performance with these two schemes during different seasons and in four sub-regions.The CLM overestimates the CONUS-averaged ET in all other seasons except for March-April-May (MAM), and the largest overestimation occurs in the Northeast CONUS during December-January-February (DJF), with a relative bias as large as 146.4 %.The underestimation in MAM is largest over the Southwest CONUS, with a relative bias of −17.9 %.CLMET substantially improves the model performance, as indicated by the various metrics.All the statistics in CLMET are superior to those in the CLM, with a few exceptions in bias or relative bias.The improvement from the CLM to CLMET is more substantial for September-October-November (SON) and DJF than MAM and June-July-August (JJA).The relative bias of 51 % (77.7 %) in the CLM is reduced to 7.8 % (18.9 %) in CLMET over the CONUS during SON (DJF).For the regional average, the improvement is greatest over the Southeast CONUS.All the positive biases in all seasons over the Southeast CONUS are substantially reduced.
To understand the differences between the CLM and CL-MET, we select four months representing each of the four seasons, January, April, July, and November, to examine the relationship between the relative bias of model simulations and the scaling factor changes from the calibration period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995) to the validation period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) in Fig. 4. The improvement from the CLM to CLMET is evident, especially in January and November (Fig. 4a and b).Although the bias is dramatically reduced in CLMET, it remains large in the Northeast CONUS in January (Fig. 4b1).In addition, the bias in CLMET appears larger in the western CONUS than the eastern CONUS (Fig. 4b).The spatial patterns of the relative biases in CLMET and the scaling factor differences between the two periods demonstrate a great degree of sim-ilarity (Fig. 4b and c), and the scatter plots between the two quantities (Fig. 4d) reflect a strong correlation.Not surprisingly, the degree to which CLMET can improve model performance in simulating ET greatly depends on how stable the scaling factor is from the calibration period to the validation period, i.e., how well the assumption of a time-invariant scaling relationship holds.Over most of the CONUS, changes in the scaling factor are within 10 % (Fig. 4d).This temporal stability of the relationship between observed ET and simulations guarantees improvements from the CLM to CLMET.
CLM and CLMET performances are also evaluated using two independent observation datasets of ET, MODISbased and FLUXNET-MTE-based ET (Fig. 5, Tables 2 and  3).For the multi-year averaged ET, the relative bias in CL-MET is smaller than that in the CLM, and the improvement is greater in the eastern CONUS than the western CONUS, as compared with either MODIS-or FLUXNET-MTE-based ET.Note that there is still a substantial overestimation in the western CONUS in CLMET compared with the MODIS ET.With the reference of the MODIS or FLUXNET-MTE ET, CLMET corrects biases for all three other seasons except for MAM (Tables 2 and 3).Bias, relative bias, and RMSE in CL- The analysis of time series of ET from MODIS, FLUXNET-MTE, and the two types of simulations also demonstrates improvement from the CLM to CLMET.Climatological seasonal cycles of ET over the CONUS and four sub-regions for the period 2000-2011 are shown in Fig. 6.CLMET outperforms the CLM over the CONUS, with a smaller RMSE (0.31 vs. 0.40 against MODIS, 0.19 vs. 0.25 against FLUXNET-MTE).The improvement mainly results from reduction of the overestimation existing in the CLM for SON and DJF.However, the model performance greatly varies with region.As indicated by the ET RMSE values, CL-MET and the CLM perform similarly over some areas of the western CONUS, whereas CLMET improves the ET simulation over the eastern CONUS no matter which reference data are used.Figure 7 compares the temporal evolution of the simulated ET in the CLM and CLMET against MODIS and FLUXNET-MTE ET over the CONUS and four sub-regions.It is evident that the bias correction method in CLMET is very effective in reducing overestimation (positive bias) but does not work as well in correcting the underestimation (negative bias) in water-limited regimes.The difference has to do with the specific ET regime, i.e., whether ET is limited by water or energy.When an overestimated ET is overwritten with a lower value, the water on land is sufficient to support the reduced ET; in contrast, when an underestimated ET is overwritten with a higher value, the land surface model checks whether water storage in the soil layer and vegetation canopy can sustain the elevated ET and further adjust if necessary to keep with the mass conservation equation.The extent to which ET can be increased is limited by the availability of water stored in the soil layer and vegetation canopy.Therefore, in water-limited ET regimes, if ET is underestimated in the CLM, the actual ET in CLMET after the water availability check can be substantially lower than the corrected ET fed into the model, which diminishes the effect of the bias correction algorithm under such circumstances.In addition, the ET validation is also conducted at the site scale (Figs. 8,9,and 10).Except for Port Peck and Wind River Crane stations in the Northwest CONUS, for all other stations the monthly mean ET from CLMET agrees better with the observed ET than that from the CLM (Fig. 8).The same statement holds for daily mean ET (Figs. 9 and 10).Generally, the CLM overestimates ET as compared with station observations, and CLMET alleviates this overestimation, which is consistent with comparisons between the modelled ET and satellite-based ET products.

Runoff
Using the runoff coefficient (the ratio of runoff to total precipitation) derived from GSCD as the benchmark, we evaluate the model performance in the CLM and CLMET in simulating runoff (Fig. 11).The CONUS-averaged runoff coefficients in the CLM and CLMET are 0.18 and 0.21, which are comparable to the GSCD-based runoff coefficient (0.22).However, the CLM underestimates runoff in most areas of the CONUS due to an overestimation of ET.CLMET alleviates the underestimation by reducing ET, thereby increasing the runoff, especially over the eastern CONUS.The relative bias of CLMET against GSCD is 1.1 %, which is much smaller than the value in the CLM (−9.2 %).Table 4 shows the regional difference in runoff simulations in the CLM and CLMET.The improvement is greater over the eastern CONUS than the western CONUS, which is consistent with the improvement of ET simulations.The most striking improvement occurs in the Southeast CONUS, with the relative bias (RMSE) reduced from −24.7 % (0.091) to −8.2 % (0.06).Because only the multi-year mean annual runoff coefficient is available for GSCD, we cannot examine the seasonal dependency of the model performance improvement.
The increase in runoff from the CLM to CLMET is mainly due to the increase in subsurface runoff (not shown).The Table 4. Statistics of simulated annual runoff coefficients (ratio of runoff to total precipitation) against GSCD observations over the CONUS, Northwest (NW), Southwest (SW), Northeast (NW), and Southeast (SW) during the period 2000-2014.

Bias
Relative same values of the ET scaling factor within each grid cell are applied to three components of ET (interception loss, plant transpiration, and soil evaporation) in this study.Because interception loss accounts for a small portion of total ET, the absolute change in interception loss (decrease from the CLM to CLMET over most areas) is much smaller compared with plant transpiration and soil evaporation (not shown).As a result, the increase in throughfall does not change much from the CLM to CLMET, which leads to smaller increases in surface runoff.By contrast, plant transpiration and soil evapo- ration are more significantly reduced by CLMET, inducing wetter soil and therefore more subsurface runoff.

Soil moisture
As analyzed in Sect.4.2.2, reductions in all three components of ET interception loss, plant transpiration, and soil evaporation from the CLM to CLMET slow down moisture depletion from the soil.As a result, the water content in different soil layers increases with reduced ET. Figure 12 shows soil water at the surface and root-zone layers simulated by the CLM and CLMET, and their differences in August.From the CLM to CLMET, the changes over the CONUS show an overwhelmingly increasing signal for both surface and root-zone soil moisture.The moisture increase in the top 0-100 cm soil layer from the CLM to CLMET in the central CONUS is very evident, which may have significant implications in drought monitoring and assessment.For example, the Central Great Plains experienced a severe drought in the summer of 2012, and soil moisture derived from land surface models was used to evaluate the intensity of the drought event (Hoerling et al., 2014;Livneh and Hoerling, 2016).Unfortunately, land surface models tend to systematically overestimate drought (Milly and Dunne, 2016;Ukkol et al., 2016).
The more accurate estimates of ET and soil moisture resulting from the bias correction method in this study may prove useful for improving drought monitoring and assessment.Due to the strong spatial heterogeneity of soil moisture and the lack of large-scale distributed data, the comparisons between observed soil moisture and modeled soil moisture from the CLM and CLMET are done based on the spatial averages across stations within each state and at the monthly scale during 2006-2010 for the top 0-10 cm and top 0-100 cm soil, respectively.The soil water increase from the CLM to CLMET is more evident during SON and DJF, which is consistent with changes in ET that also features more decreases during SON and DJF.The soil in the CLM shows dry biases over most of the examined states, with the exception of soil moisture in the top 10 cm layer in Alabama and Illinois, and CLMET generally alleviates these dry biases.The RMSE values against the NASMD observations in CLMET are smaller than or at least the same as the RMSE values in the CLM.An exception exists for the top 0-10 cm layer in Alabama and Illinois, where a wet bias is found in the CLM.The soil water content difference between the CLM and CLMET is larger for the 0-100 cm layer than the 0-10 cm layer, because plant transpiration, with which a large fraction of ET and therefore a large fraction of ET bias correction are associated, primarily depletes moisture from the rooting zone, which is deeper than 10 cm.As such, the improvement is more evident for the top 0-100 cm layer.For example, in Mississippi, the RMSE is reduced from 0.048 m 3 m −3 in the CLM to 0.042 in CLMET in the top 0-10 cm layer, and from 0.07 to 0.06 m 3 m −3 in the top 0-100 cm layer.The improvements in Alabama, Mississippi, Nebraska, and Oklahoma are summarized in Table 5.

Summary and discussions
In this study, we implemented the online bias correction approach proposed by Parr et al. (2015) to CLM4.5, and evaluated the effectiveness of the approach in reducing model Hydrol.Earth Syst.Sci., 21, 3557-3577, 2017 www.hydrol-earth-syst-sci.net/21/3557/2017/ Qualitatively, whether the Parr et al. (2015) ET bias correction approach improves the quantification of the hydrological cycle depends on whether ET is limited by water or energy and whether ET is underestimated or overestimated.The approach works well when/where ET is not limited by water availability; in water-limited regimes, the approach is effective in correcting the positive ET biases, but does not work well if ET is underestimated.Quantitatively, the degree of the model improvement derived from this bias correction algorithm is highly related to whether the fundamental as- Although the scaling factors between observations and simulations do not change much from the calibration period to the validation period over most regions in most seasons, dramatic changes do exist in some areas.Differences in the scaling factors between the calibration and validation/application periods greatly influence the effectiveness of the bias correction method, with large differences causing the approach to be less effective, leaving substantial biases in CLMET.The Northeast CONUS during winter is an example of having a large bias in CLMET due to greater changes in the ET scaling factor from the calibration period to the verification period.
Another factor affecting the degree of the model improvement is whether the ET scaling is applied at all.As shown in Fig. 2, we do not scale ET in some areas of the Northwest CONUS during the winter months due to the inconsistency in the ET sign (positive or negative) between GLEAM and the CLM.In these areas and season(s), ET in CLMET is not corrected at all.All of these three factors (i.e., whether the scaling factor differs significantly between calibration and validation periods, whether ET is underestimated in waterlimited regimes, and whether ET scaling is applied at all) influence the effectiveness of the bias correction approach, but one or two of them may dominate for a given region/season.For example, regardless of which product is used as the reference for comparison (Figs.3g, 5a4, and b4), the approach reduces ET biases over the eastern CONUS where the ET scaling is applied in most places/seasons and the scaling factor shows little difference between the calibration and validation periods.In contrast, in the northern part of the Midwest, some positive biases still remain in CLMET, as the ET scaling is not applied in winter months and the scaling factor differs quite substantially between these two periods.Over some areas of the western CONUS, the bias correction approach is less effective due to the underestimation of ET under a water-limited condition and large differences between calibration and validation periods in the scaling factor.
For a given grid cell and given month, the scaling factors for all three ET components, i.e., interception loss, plan transpiration, and soil evaporation, are the same in this study, set to be the ratio of the remote sensing ET to the modeled ET.Since the GLEAM dataset contains values of three components besides the total ET, we conducted additional experiments in which the scaling factor for each ET component was estimated separately, using the ratio of each ET component from the GLEAM product to the corresponding ET component from the CLM during the same calibration period.However, results based on the component-specific scaling do not show further improvement, which is likely due to the inaccurate partitioning of ET into interception loss, plan transpiration, and soil evaporation in the GLEAM product.Miralles et al. (2016) compared the ET partitioning for three widely used remote sensing-based ET products, and found that the contribution of each component to ET is dramatically different among these three products.For instance, they found that the percentage of global ET accounted for by soil evaporation ranges from 14 to 52 %, and the ranges are even larger at the regional and local scales.Because the in situ measurements of separate components of ET are very scarce, it is particularly challenging to validate the accuracy of the remote sensing-based estimates of the three ET components.These challenges led Miralles et al. (2016) to recommend against the use of any single product in partitioning ET.
The bias correction method evaluated in this study can effectively improve the estimates of surface fluxes and state variables in the absence of improved physical parameterizations in land surface models.It is applicable to not only historical simulations, but also future predictions (Parr et al., 2015).It provides an alternative approach to, but would in no way replace, model improvement through better pa-  rameterization of physical processes.Development of better physical parameterizations has to be based on improved understanding of physical processes, more effective mathematical formulations, and higher-quality surface type datasets, which require a long-term commitment from the land surface modeling community.Model parameter calibration (e.g., tuning surface resistance) is another way to reduce model bias (Ren et al., 2016).However, the parameter space may contain nonphysical parameter subsets (Ray et al., 2015), which is especially an issue when model parameter tuning is used to offset unrelated model deficits.The method used in this study attempts to avoid such issues by improving the model performance without dealing with calibration of model physical parameters.However, results from this study can provide useful guidance for physically based land surface model development.As can be seen from Fig. 3g, the bias correction algorithm improves ET estimation over most of the Author contributions.DW and GW designed the study.DW conducted model simulations and data analysis with input from GW, DP, and CF.DW and GW wrote the paper with input from YX. WL and YX contributed to data processing.
Competing interests.The authors declare that they have no conflict of interest.
Special issue statement.This article is part of the special issue "Observations and modeling of land surface water and energy exchanges across scales: special issue in Honor of Eric F. Wood".It does not belong to a conference.

Figure 1 .
Figure 1.(a) Mean annual (1980-2015) precipitation in millimeters over the conterminous USA (CONUS).NW, SW, NE, and SE represent the Northwest, Southwest, Northeast, and Southeast CONUS, respectively.The black circles represent sites of in situ soil moisture observations in Alabama, Illinois, Mississippi, Nebraska, and Oklahoma.(b) Locations of the 16 AmeriFlux stations with vegetation types.

Figure 2 .
Figure 2. Scaling factor as the ratio of the CLM simulated ET to the GLEAM ET for each month during 1986-1995.The numbers in titles are CONUS-averaged values, and the numbers within figures are area-averaged values for each of the four sub-regions (NW, SW, NE, and SE).The areas with negative scaling factors are masked out.

Figure 3 .
Figure 3. Mean annual ET from (a) GLEAM, (b) the CLM, and (c) CLMET, the relative difference between (d) CLMET and the CLM, (e) the CLM and GLEAM, (f) CLMET and GLEAM, and (g) the difference between the absolute value of (e) and the absolute value of (f) during the period 2000-2014.Numbers in titles are CONUS-averaged values.

Figure 8 .
Figure 8. Monthly mean latent heat fluxes from the CLM and CLMET and observations at 16 flux tower sites.RMSE CLM and RMSE CLMET represent the root mean square error against observations for the CLM and CLMET, respectively.Note that the CLM and CLMET simulations are driven with meteorological forcings at the grid cell level (as opposed to site-specific forcing).

Figure 9 .
Figure 9. Daily mean latent heat fluxes from the CLM and CLMET grids and station observations at ARM SGP Burn, Audubon Grassland, Bondville, Donaldson, Flagstaff Forest, Fort Dix, Fort Peck, and Little Prospect.RMSE CLM and RMSE CLMET represent the root mean square error against observations for the CLM and CLMET, respectively.

Figure 10 .
Figure 10.Daily mean latent heat fluxes from the CLM and CLMET grids and station observations at Mead Rainfed, Metolius Pine, Missouri Ozark, Morgan Forest, Sylvania Wilderness, Tonzi Ranch, Walnut River, and Wind River Crane.RMSE CLM and RMSE CLMET represent the root mean square error against observations for the CLM and CLMET, respectively.

Figure 11 .
Figure 11.Mean annual runoff coefficient (the ratio runoff to total precipitation) from (a) the Global Streamflow Characteristics Dataset (GSCD), (b) the CLM, and (c) CLMET, and the relative differences between (d) the CLM and GSCD, (e) CLMET and GSCD, and (f) CLMET and the CLM during the period 2000-2014.Runoff coefficients of less than 0.02 are blanked out.Numbers in titles are CONUSaveraged values.

Table 2 .
Similar to Table 1, but based on comparison with MODIS-derived ET during the period 2000-2011.

Table 3 .
Similar to Table 1, but based on comparison with FLUXNET-MTE ET during the period 2000-2011.
Wang et al.:Incorporating remote sensing-based ET estimates into the Community Land Model version 4.5 3575 cating a strong potential for performance improvement that can be derived from improving the physical parameterization of ET processes in the model.Over regions where the bias correction approach does not improve the ET estimate (which are mostly places where ET is water-limited, while the model underestimates ET), parameterizations for other processes that influence soil moisture (e.g., runoff generation, groundwater interactions) are the most likely cause of model biases and should be the focus of physically based model development effort.Data availability.The GLEAM ET data were provided by the GLEAM team at the website http://www.GLEAM.eu(GLEAM,2014).The MODIS ET data by NTSG, University of Montana, are at the website http://www.ntsg.umt.edu/project/mod16(NTSG,2014).The FLUXNET-MTE ET data were provided by the Max Planck Institute for Biogeochemistry at the website https://www.bgc-jena.mpg.de/geodb/projects/Data.php (Max Planck Institute for Biogeochemistry, 2011).The GSCD runoff data were provided by the Amsterdam Critical Zone Hydrology Group at the website http://hydrology-amsterdam.nl/valorisation/GSCD.html(Amsterdam Critical Zone Hydrology Group, 2010).The original NASMD soil moisture data are available at the website http:// soilmoisture.tamu.edu/(NASMD, 2012).The quality controlled NASMD soil moisture data can be obtained from the authors upon request.Latent heat flux measurements at the tower sites are available: Flux -http://ameriflux.lbl.gov/(AmeriFlux network, 2016).