Pairing FLUXNET sites to validate model representations of land-use/land-cover change

Land surface energy and water fluxes play an important role in land–atmosphere interactions, especially for the climatic feedback effects driven by land-use/landcover change (LULCC). These have long been documented in model-based studies, but the performance of land surface models in representing LULCC-induced responses has not been investigated well. In this study, measurements from proximate paired (open versus forest) flux tower sites are used to represent observed deforestation-induced changes in surface fluxes, which are compared with simulations from the Community Land Model (CLM) and the Noah MultiParameterization (Noah-MP) land model. Point-scale simulations suggest the CLM can represent the observed diurnal and seasonal changes in net radiation (Rnet) and ground heat flux (G), but difficulties remain in the energy partitioning between latent (LE) and sensible (H ) heat flux. The CLM does not capture the observed decreased daytime LE, and overestimates the increased H during summer. These deficiencies are mainly associated with models’ greater biases over forest land-cover types and the parameterization of soil evaporation. Global gridded simulations with the CLM show uncertainties in the estimation of LE and H at the grid level for regional and global simulations. Noah-MP exhibits a similar ability to simulate the surface flux changes, but with larger biases in H , G, and Rnet change during late winter and early spring, which are related to a deficiency in estimating albedo. Differences in meteorological conditions between paired sites is not a factor in these results. Attention needs to be devoted to improving the representation of surface heat flux processes in land models to increase confidence in LULCC simulations.

Abstract. Land surface energy and water fluxes play an important role in land-atmosphere interactions, especially for the climatic feedback effects driven by land-use/landcover change (LULCC). These have long been documented in model-based studies, but the performance of land surface models in representing LULCC-induced responses has not been investigated well. In this study, measurements from proximate paired (open versus forest) flux tower sites are used to represent observed deforestation-induced changes in surface fluxes, which are compared with simulations from the Community Land Model (CLM) and the Noah Multi-Parameterization (Noah-MP) land model. Point-scale simulations suggest the CLM can represent the observed diurnal and seasonal changes in net radiation (R net ) and ground heat flux (G), but difficulties remain in the energy partitioning between latent (LE) and sensible (H ) heat flux. The CLM does not capture the observed decreased daytime LE, and overestimates the increased H during summer. These deficiencies are mainly associated with models' greater biases over forest land-cover types and the parameterization of soil evaporation. Global gridded simulations with the CLM show uncertainties in the estimation of LE and H at the grid level for regional and global simulations. Noah-MP exhibits a similar ability to simulate the surface flux changes, but with larger biases in H , G, and R net change during late winter and early spring, which are related to a deficiency in estimating albedo. Differences in meteorological conditions between paired sites is not a factor in these results. Attention needs to be devoted to improving the representation of surface heat flux processes in land models to increase confidence in LULCC simulations.

Introduction
Earth system models (ESMs) have long been used to investigate the climatic impacts of land-use/land-cover change (LULCC) (cf. Pielke et al., 2011;Mahmood et al., 2014). Results from sensitivity studies largely depend on the land surface model (LSM) that is coupled to the atmospheric model within ESMs. In the context of the Land-Use and Climate, Identification of Robust Impacts (LUCID) project, Pitman et al. (2009) found disagreement among the LSMs in simulating the LULCC-induced changes in summer latent heat flux over the Northern Hemisphere. de  and Boiser et al. (2012) argued that the inter-model spread of LULCC sensitivity (especially regarding the partitioning of available energy between latent and sensible heat fluxes within the different land-cover types) highlights an urgent need for a rigorous evaluation of LSMs. From Phase 5 of the Coupled Model Intercomparison Project (CMIP5), Brovkin et al. (2013) also found different climatic responses to LULCC among the participating models, and the diverse responses are associated with different parameterizations of land surface processes among ESMs. To deal with the uncertainties in LULCC sensitivity among models, the Land Use Model Intercomparison Project (LUMIP) has been planned, with a goal to develop metrics and diagnostic protocols that quantify LSM performance and related sensitivities with respect to LULCC .
However, a paucity of useful observations has hindered the assessment of the simulated impacts of LULCC and limited the understanding of the discrepancies among models. In situ and satellite observations make it possible to quantify the impacts of LULCC on land surface variables. Satellite-derived datasets have been used to explore the albedo, evapotranspi- Table 1. Information about the variables used from FLUXNET2015. The marginal distribution sampling (MDS) filling method is based on Reichstein et al. (2005), and the ERA-Interim filling method can be found in Vuichard and Papale (2015).

Name
Gap - ration (ET), and land surface temperature changes due to historical LULCC (Boisier et al., , 2014 and the climatic effects of forest (Li et al., 2015). Meanwhile, the development of FLUXNET (Baldocchi et al., 2001) enables the study of land surface responses to different land-cover types based on paired field observations from neighboring flux towers over forest and open land (Juang et al., 2007;Lee et al., 2011;Luyssaert et al., 2014;Teuling et al., 2010;Williams et al., 2012). In terms of LSM evaluation, the paired site observations have been mainly used to simulated impacts of LULCC on land surface temperature (Chen and Dirmeyer 2016;Lejeune et al., 2016;Vanden Broucke et al., 2015). However, a more fundamental question, "whether a model can represent the observed LULCCinduced changes in surface energy fluxes well", has not been thoroughly investigated, even though we know that the turbulent fluxes are tightly associated with both energy and water exchange between the land surface and atmosphere.
In this study, we evaluate the performance of the Community Land Model (CLM) version 4.5 and the Noah Multi-Parameterization (Noah-MP) LSM in simulating the impacts of LULCC on surface energy fluxes based on observations from FLUXNET sites. The CLM and Noah-MP represent perhaps the two most readily available and widely used state-of-the-art community land models developed in the US. The CLM is chosen because, as the land component of the Community Earth System Model (CESM), it prioritizes the simulation of biogeophysical and biogeochemical processes for climate applications (Oleson et al., 2013). Much effort has gone into improving the representation of the land-atmosphere interactions among different biomes (Bonan et al., 2011), and the model itself has been used for many LULCC sensitivity studies (e.g., Dirmeyer, 2016, 2017;Schultz et al., 2016;Lejeune et al., 2017;Lawrence et al., 2012). Noah-MP has found use mainly in shorter timescale, limited area applications, such as weather and hy-drologic forecasting, and as a LSM run at very high resolution coupled to mesoscale models (e.g., WRF-Hydro, Gochis et al., 2015). It is intended to become the LSM used in global weather and seasonal forecasting applications at the National Centers for Environmental Prediction (NCEP). Its performance over varying land-cover types has direct consequences for its use in forecast models.
The rest of this paper is structured as follows. Section 2 describes the datasets used in the study and experimental design. Section 3 presents a comparison between observations and model simulations in surface latent and sensible heat flux, ground heat flux, and net radiation. Section 4 shows the uncertainties within the FLUXNET pairs and model simulations. Sections 5 and 6 include discussion and conclusions, respectively.

Observational data
We use half-hourly observations from 24 selected pairs of flux sites from the FLUXNET2015 Tier 1 dataset (http: //fluxnet.fluxdata.org/data/fluxnet2015-dataset) and 4 pairs from the AmeriFlux dataset (Baldocchi et al., 2001). These observations include meteorological forcings for the LSM, and surface flux measurements for model validation, which include latent heat flux (LE), sensible heat flux (H ), ground heat flux (G), and net radiation (R net ). All of these variables have been gap-filled (Reichstein et al., 2005;Vuichard and Papale, 2015). Table 1 shows the variable names and gapfilling algorithms used in FLUXNET2015. Because there is no directly measured humidity variable reported, which is needed as a meteorological forcing for the LSMs, relative humidity is calculated based on the reported vapor pressure in which T a is air temperature ( • C), e s is saturation vapor pressure (hPa), VPD is vapor pressure deficit (hPa), and RH is relative humidity (%). Additionally, for the turbulent flux measurements over 18 pairs, FLUXNET2015 provides "corrected" fluxes based on an energy balance closure correction factor, which is calculated for each half-hour as (R net −G)/(H +LE). More details about the data processing can be found on the FLUXNET2015 website (http://fluxnet. fluxdata.org/data/fluxnet2015-dataset/data-processing/). To simulate local land-cover change for each pair, one flux tower is located in forest (deciduous, evergreen, or mixed; broadleaf or needleleaf) and the other is in a nearby open land-cover type (grassland, cropland, or open shrub).  Table S1. The median linear distance between the paired sites is 21.6 km, and the median elevation difference is 20.0 m. Because of their proximities, the paired sites share similar atmospheric background conditions; however, they are not identical (Chen and Dirmeyer, 2016). Below we show that the differences in meteorology are usually small and not likely a dominant factor in simulated surface flux differences in most of the pairs. We consider the differences (open minus forest) in observed surface fluxes to be representative of the effects of LULCC (deforestation in this case).

Model simulations
We have run the offline version of CLM4.5 and Noah-MP at the point scale for individual sites. The forcing data, described below, include downwelling longwave radiation (W m −2 ), downwelling shortwave radiation (W m −2 ), air temperature (K), precipitation (mm s −1 ), relative humidity (%), surface pressure (Pa), and wind speed (m s −1 ) at halfhourly time steps. The plant functional type (PFT) in the CLM for each site is identified based on its reported landcover type (Table S1) with prescribed climatological satellite phenology (Lawrence and Chase, 2010). Because of the focus on biogeophysical impacts of LULCC in this study, the biogeochemistry carbon-nitrogen module has been disabled in our simulations. The initial conditions for each site are generated by cycling through available atmospheric forcings for about 40 years until soil moisture and temperature reach quasi-equilibrium.
The differences in simulated surface fluxes between the paired sites are compared against the observations, so that the performance of the CLM in representing LULCC-induced surface flux changes can be evaluated. In the single-point simulations, two types of forcing data are used for each site: (1) measurements at this site; (2) measurements at the neighboring paired site. Consequently, three types of differences in simulated surface fluxes can be calculated: (1) the difference derived from individual forcings; (2) the difference from identical "forest forcings" (both of the paired sites use the same forcings measured at the forest site); (3) the difference from identical "open forcings" (both of the paired sites use the same forcings measured at the open sites). Such an experimental design can eliminate well the influence from the uncertainties of forcing data and the difference in the atmospheric background of the paired sites.
The ultimate goal of evaluating the CLM's performance at single-point scale is to assess its ability to be used in global LULCC sensitivity simulations in both offline and coupled modes. The paired sites are close enough that they are typically located within a single grid cell of the CESM. Moreover, the sub-grid heterogeneity of the CLM allows the biogeophysical processes to be calculated at the individual PFT level (15 PFTs available), and makes it possible to output surface fluxes for individual land-cover types. The paired sites can be presented as paired PFTs within a single grid of the CESM. They then share the same atmospheric forcings, and their differences can be considered as the impacts of LULCC. It should be noted that the PFT-level calculation is independent of the percentage of individual PFTs in the grid cell. Therefore, the coverage of the PFTs in the shared grid cell does not influence the flux difference between the paired PFTs in the global simulations.
We run the CLM offline, globally driven by the CRUN-CEP forcings from 1991 to 2010 (Viovy, 2011) and present land-cover conditions  at a horizontal resolution of 0.9 • × 1.25 • . The paired PFTs are identified based on the locations and land-cover types of the FLUXNET paired sites, to ensure the single-point and global simulations are comparable. Schultz et al. (2016) found the shared-soil-column configuration for vegetated land units in the CLM caused issues with PFT-level ground heat fluxes. They propose an individual-soil-column scheme (PFTCOL) to better represent the PFT-level energy fluxes, so we also extract and examine the output for the paired PFTs from the PFTCOL model configuration. Details about the PFTCOL simulations can be found in Schultz et al. (2016). Additionally, a coupled simulation with the Community Atmosphere Model (CAM) has also been conducted. It shows very similar results to the offline simulations, because the paired PFTs in a single model grid box always share the same atmospheric forcings no matter if the CLM is run offline or coupled with the CAM. Therefore, results from the coupled simulation are not included in this study.
Furthermore, we compare the performance of the CLM with Noah-MP (Niu et al., 2011), which serves as a participant model in Land Data Assimilation Systems (LDAS, Cai et al., 2014). Single-point Noah-MP simulations are conducted in the same way as CLM simulations to ensure their comparability. The monthly leaf area index (LAI) of each site is identical to the prescribed satellite-based LAI in the corresponding CLM simulation. Table S2 shows selected options for various physical processes in Noah-MP. Information about all model simulations is summarized in Table 2.

Surface energy fluxes and their changes
First, we analyze the diurnal and seasonal cycles of surface energy fluxes and the LULCC-induced changes. The diurnal cycle analysis is primarily focused on summer (DJF for the two austral sites and JJA for the other sites). The seasonal cycle for the austral sites is shifted by 6 months to keep summer in the middle of the time series when comparing or compositing with the Northern Hemisphere sites. The results shown below are composites averaged over all open (or forest) sites or open-forest pairs. Not all sites have energy-balance corrected fluxes available; exclusion of those sites shows very similar results for uncorrected fluxes to the average over all sites (or pairs, not shown). There are also some pairs with relatively large changes in surface fluxes. Exclusion of those pairs shows very consistent patterns with the results including all sites, even though there is a slight influence on the magnitude of the changes (Fig. S1). Therefore, all sites are included in our analyses for each variable.  Figure 2c shows the difference in the diurnal cycle of LE due to LULCC (deforestation). It should be noted that there is a substantial spread among the pairs in model simulations and especially observations, indicating the diverse geographical backgrounds and specific vegetation changes of these paired sites. The observations suggest an overall lower summer daytime LE over the open land compared to forest. In spite of the considerable spread among the energy-balance corrected LE observations (Fig. 2a, b), the differences between the forest and open lands show consistent signals. However, both CLM and Noah-MP single-point simulations fail to represent the observed decreased daytime LE as a result of deforestation. The simulated LE over the open land is usually slightly greater than the forest from 10:00 to 16:00 at local time. Such a discrepancy may be attributed to the large underestimation of daytime forest LE in the models. Meanwhile, simulations by different forcings of the paired sites show robust signals, implying that the bias of the simulated LE sensitivity should not be attributed to the uncertain-  Table 1. Information about model simulations in the CLM and Noah is given in Table 2. ties of the forcing data. For the CLM global simulations, the PFTCOL case exhibits a similar diurnal pattern to the singlepoint simulations, while decreased daytime LE is found consistently only in the PFT simulations. As CLM-PFT is less physically realistic than CLM-PFTCOL from a soil hydrologic perspective, its superior performance needs further investigation.

Latent heat flux (LE)
To explore the mechanism of the LE changes within the CLM, we examine the changes in the three components of evapotranspiration, namely canopy evaporation, canopy transpiration, and ground evaporation (Fig. 3). Unfortunately, these separate components are not measured and cannot be directly validated. The CLM, PFT, and PFTCOL simulations show an agreement in decreased canopy evaporation after deforestation, with the greatest decrease during the early morning. There is also an agreement in an overall decreased canopy transpiration, but CLM simulations do not exhibit an obvious change during the morning, when greatly decreased canopy transpiration can be found in the PFT and PFTCOL simulations. The main discrepancy among model versions is found in ground evaporation, which increases after deforestation in the CLM and PFTCOL simulations. The increased ground evaporation has exceeded the decreased canopy evaporation and transpiration, resulting in slightly increased LE (Fig. 2c). Interestingly, the PFT simulations, which have known issues with PFT-level ground heat flux (Schultz et al., 2016), show decreased daytime ground evaporation. Along with decreased canopy evaporation, transpiration, and ground evaporation, the total LE decreases sharply after deforestation in the PFT simulations, which agrees better with the observations than other simulations (Fig. 2c). However, the decreased ground evaporation may be associated with a problematic soil-column scheme at sub-grid scale, which undermines the credibility of the agreement between the observations and PFT simulations. Figure 4 shows the changes in monthly LE after deforestation across the annual cycle. There is clear and consistent seasonality in the LE changes from the observations. The four types of observations show decreased LE (up to −24.0 W m −2 ) during local summer. There is little change in LE in the uncorrected observations during the winter season. However, there is significantly increased LE (up to +17.9 W m −2 ) in the energy-balance corrected observations in late winter and early spring. Neither the CLM nor Noah-MP captures the observed seasonality of LE change. As found in the change in the diurnal cycle of the LE, the PFT-COL simulations exhibit a similar pattern to the single-point simulations, while the PFT simulations show decreased LE throughout the year, with the maximum from May to August, and the best correlation (R = 0.81, P < 0.01) with observations.    Fig. 5c. Robust signals are found among the four types of observations, so results from the energy-balance corrected observations are not included hereafter, but are shown in Fig. S2. Both observations and models exhibit a clear diurnal pattern of change in H after deforestation -a small nighttime increase and a large daytime decrease. Observations show a large spread among the 28 pairs, which is much greater than that from the CLM simulations, indicating uncertainties and variability among the observed fluxes and the robustness of simulated H sensitivity to LULCC in the LSM. Compared with the observations, the CLM shows a greater H decrease, which is twice as much as in the observations. The overestimated H decrease may be related to the large positive bias in H over the forest sites (Fig. 5b). Additionally, the PFT simulations show the largest H decrease, which may be associated with the ground heat issues in the shared-soil-column scheme.

Sensible heat flux (H )
Seasonally, decreased H is found throughout the year after deforestation in both observations and models (except for the same-forest-forcing CLM simulations in winter, Fig. 6). The greatest decrease is observed during spring, when both of the single-point CLM and PFTCOL simulations show good agreement. However, CLM and Noah-MP simulations also show a large decrease during summer, which has not been observed in the FLUXNET dataset. Again, the PFT simulations show the greatest H decrease among the simulations and the largest bias compared with the observations during the warm season.   Additionally, evaporative fraction (EF), which is defined as the ratio of LE to the available energy (LE + H ), is a useful diagnostic of the surface energy balance (Gentine et al., 2011). Meanwhile, most of the correction methods to solve the imbalance issue of the surface energy budget assume that the Bowen ratios for small-and large-scale eddies are similar or even equal (Wilson et al., 2002;Foken, 2008;Zhou and Wang, 2016). Under such an assumption, EF can be independent of the energy closure issue, because EF is related to the Bowen ratio (B) as (3) Figure 7 shows the change in the diurnal (summer only) and seasonal cycle of EF due to LULCC from forest to open land. During summer, there are small changes in observed daytime EF (Fig. 7a) because of the decreases in both LE and H . However, both CLM and Noah-MP show increased daytime EF due to the decreased H and slightly increased LE after deforestation. Seasonally, the models show year-around increased EF, however, which is not observed in FLUXNET from June to September, further demonstrating the models' deficiencies in representing energy partitioning during summer.

Diurnal and seasonal cycle of ground heat flux (G)
and net radiation (R net ) Figure 8a shows the change in the diurnal cycle of G after deforestation. Both the observations and models exhibit increased G during the day and decreased G during the night. However, models overestimate the magnitude of the G change, and discrepancies also exist in the timing of maximum change. The greatest increase in G is observed during the early afternoon, while the greatest increase in simulated G occurs at noon in the CLM (single-point and PFTCOL) and during the morning in Noah-MP. Because G is strongly correlated with R net (Santanello and Friedl, 2003), we examine the timing of maximum observed G and R net during summer. There are some sites showing about a 1 h lag be- tween maximum R net and G (not shown). Therefore, the lag between simulated and observed peaks in G change can be partially attributed to the uncertainties in G measurements that are commonly estimated with heat flux plates installed at some depth (e.g., 5-10 cm) below the surface (Wang and Bou-Zeid, 2012), while the LSM simulated G is calculated at the surface. Meanwhile, the G changes (in both the diurnal and seasonal cycles) in the PFT simulations are further from the observations than the other simulations. Such disagreement further confirms the issues with the sub-grid soilcolumn scheme in the CLM, which is discussed in the following section. The changes in observed G also have a clear seasonal pattern -an increase during the warm season and a decrease during the cold season (Fig. 8b). This seasonality is captured well by the CLM simulations (especially the simulations with identical forcings for the paired sites) in both magnitude and timing, but is not evident in Noah-MP simulations. After exploring the three flux components of the surface energy balance, it is worthwhile examining the change in R net after deforestation. During summer, the observations show that R net slightly increases during the night, and decreases considerably (up to −65.7 W m −2 ) during the day, which can be attributed to the increased albedo after deforestation (Fig. 9a). Decreased daytime R net is also found in the CLM simulations, but with a slightly smaller magnitude. Seasonally, there is a good agreement between the observations and CLM simulations, showing a large R net decrease during spring and summer but a relatively small decrease during autumn and winter (Fig. 9b). The Noah-MP simulations are comparable to the CLM, but with a notable deficiency in simulating the R net change during late winter and early spring.

Uncertainties among the FLUXNET pairs
The results discussed above are based on composites averaged over all forest and open sites. It is worthwhile examining the uncertainties in surface flux changes among different paired sites. Figure 10a shows the changes in summer daytime (08:00-16:00) LE from the observations and model simulations across the 28 pairs. This time period is chosen because it is the time of the greatest differences in surface energy fluxes (Figs. 2c, 5c, 7a, 8a). The observations show decreased LE associated with deforestation over 23 pairs, among which the pairs of evergreen needleleaf forest and open shrub (nos. 16-25) exhibit consistent decreases and the pairs of deciduous broadleaf forest and crops (nos. 1-4) show the overall greatest decrease. However, both the CLM and Noah-MP show relatively weak increases over most of the pairs, which further demonstrate their deficiency in simulating LE change. Additionally, for both the CLM or Noah, the choice of forcing does not exert much influence on the simulated change in summer daytime LE.
The changes in R net over individual pairs are shown in Fig. 10b. There are 27 pairs (all except number 21) showing decreased R net after deforestation, with the greatest decreases over the pairs of evergreen needleleaf forest and grassland. Both the CLM and Noah-MP capture the observed decreases in R net well over most of the pairs.
It should be noted that pair 15 shows large LE and R net changes in Fig. 10. This pair consists of a site over valley grassland and the other site over mountain evergreen needleleaf forest with 60.29 km separation and 1186 m elevation difference. There are significantly different air temperature and downwelling longwave radiation measurements between the sites (Fig. S3). Such large differences in LE and R net here are likely associated with the distinct although proximate geographical sites. Even though the exclusion of this site does not make a significant change to the composite anal-  Table S1. The pairs are grouped based on the type of LULCC (shown as the icons in the middle). The bottom row is the average over all pairs. The Student's t test is performed on the daily (daytime average) time series for each pair. Dots indicate statistically significant changes at the 95 % confidence level. No significant test is carried out for the CLM-PFTCOL simulation (the last column), because we only have long-term averaged hourly output for each month. ysis in Sect. 3 (not shown), it may raise another question of whether the simulated sensitivity of surface energy fluxes is associated with the inconsistencies of atmospheric forcings of LSMs at the single pair level.

Uncertainties within the forcings for LSMs
Based on the composite analysis in Sect. 3, we have found that the simulated changes in surface energy fluxes with identical forcings (either from forest or open sites) are consistent with the simulations with individual forcings, demonstrating that the overall sensitivities of surface energy fluxes are robust among the choices of different forcings. In this subsection, we explore the uncertainties of the simulated surface flux changes due to the different forcings for individual pairs, especially with the focus on the roles of separation and elevation difference in the simulated sensitivity of surface energy fluxes.
Since we have simulations with identical forcings for the paired sites, the difference in surface flux changes between "forest forcings" and "open forcings" can be considered as the simulated sensitivity of surface energy fluxes to variation in the atmospheric forcings. Figure 11 shows the relationship with separation and elevation difference for individual pairs. Overall, the flux changes are not associated with the separation and elevation difference between the paired sites, further confirming the robustness of simulated signals from pairedsite simulations. Nevertheless, some "outliers" are identified.
In the CLM simulations, only pair 15 shows large differences in LE and H change. However, pairs 3, 7, and 12 also exhibit large differences in Noah-MP simulations. The uncertainties in pairs 12 and 15 may be attributed to their large elevation differences. For pair 7 in Australia, Noah-MP shows greater sensitivity of H and R net to atmospheric forcings over the evergreen broadleaf forest than grassland (not shown), leading to large differences in the surface flux changes. However, this is the only pair with evergreen broadleaf forest, and its behavior in Noah-MP needs further investigation. Even though the pair 3 sites are close with small elevation differences, we found considerably different downwelling shortwave and longwave radiation between the two sites (not shown), which may explain the uncertainties in the Noah-MP simulations.

Discussion
This study has examined simulated changes in the surface energy budget in response to local land-cover change based on paired proximate FLUXNET sites with differing land cover. Our results suggest that the CLM represents the observed changes in R net and G well, but there remain issues in simulating the energy partitioning between LE and H , which also further confirms the large uncertainties in simulated ET responses to LULCC revealed in several recent studies (e.g., Pitman et al., 2009;Boisier et al., 2012Boisier et al., , 2014de Noblet-Ducoudré et al., 2012;Vanden Broucke et al., 2015). Based on the observations, deforestation generally leads to a decrease in summer daytime R net , accompanied by decreased LE and H . On the one hand, the CLM captures the observed signal of H change, but overestimates the decrease due to its large overestimation of H over the forest. On the other hand, the model underestimates the LE over the forest, leading to an opposite signal (a slight increase) of LE change compared to the observations. Simulations in Noah-MP show similar biases. Therefore, uncertainties in current LULCC sensitivity studies may persist specifically in the representation of turbulent fluxes over forest land-cover types.
Scrutinizing the three components of ET suggests that the simulated increase in summer daytime LE is mainly attributable to a large increase in ground evaporation, which counteracts the decreased canopy evaporation and transpiration. This may raise another issue about the soil resistance parameterization in CLM4.5. Previous studies indicate that the model generates excessive ground evaporation when the canopy is sparse or absent (Swenson and Lawrence, 2014;Tang et al., 2015). If there is overestimated ground evaporation over the open land, such a bias can also contribute to the disagreement in the LULCC-induced ET changes. Swenson and Lawrence (2014) have implemented a dry surface layer for the soil resistance parameterization to solve this issue for the upcoming CLM5. An extension of the evaluation with CLM5 would be useful to examine whether the issue within the soil resistance parameterization is responsible for the uncertainties in ET changes.
Besides the uncertainties in estimating turbulent fluxes over different land-cover types, the simulations show that differences in the meteorological forcings between nearby paired sites seem to have little impact on the simulation of surface flux changes due to LULCC. Many LSMs besides the CLM employ a sub-grid tiling parameterization where multiple land surface types exist within a single grid box, each maintaining a separate set of surface balances and returning a weighted average set of fluxes to the atmosphere based on areal coverage of each surface type. In this arrangement, each land surface type within a grid box receives the same meteorological forcing from the overlying atmospheric model. It appears from our forcing-sensitivity studies that this arrangement does not significantly impact the simulation of surface flux changes associated with LULCC on the grid scale.
That said, the sub-grid comparison between different landcover types may yet be problematic due to the shared soilcolumn issue for vegetated land units in the CLM (Schultz et al., 2016). Both the single-point observations and simulations show significant differences in surface soil moisture between most of the paired sites, even though no clear drying or wetting pattern is found (Fig. S4). The differences between the paired sites suggests that the shared soil column for vegetated land in the CLM may not represent soil moisture and temperature well at the sub-grid scale, which may influence the simulations of land surface energy and water fluxes. We find an unreasonably large change in PFT-level G between forest and open land especially for the seasonal cycle in PFT simulations, while both observations and singlepoint and PFTCOL simulations show a seasonal change with a very small range (within ±3 W m −2 ). As G is calculated as the residual of the surface energy budget in the CLM (Oleson et al., 2013), this sub-grid G issue may cast even more uncertainties on the calculation of LE and H at the PFT level, as well as their aggregated values at the grid level for regional or global simulations. Therefore, caution should be taken when examining the LULCC sensitivity which involves sub-grid PFT changes.
Compared with the CLM, Noah-MP exhibits a similar ability to simulate surface flux changes, except for a deficiency in simulating H and R net changes during late winter and early spring. We have examined the daytime albedo change after deforestation, calculated from available shortwave radiation terms, from observations and model simulations during local late winter/early spring (February-April, FMA) and summer (Fig. 12). Both the CLM and Noah-MP agree with the observations during summer. However, Noah-MP does not capture the observed albedo increase over nearly half of the pairs during late winter/early spring. Greater disagreement is also found during the local winter season (DJF, not shown), suggesting a deficiency in snowmelt timing or snow albedo sensitivity to LULCC, despite improvement in the snow surface albedo simulations by implementation of the Canadian Land Surface Scheme (CLASS; Verseghy, 1991) in Noah-MP (Niu et al., 2011).
Finally, it should be recognized that the observational data are not perfect. In particular, there may be systematic biases or even trends in specific instruments that contribute to the perceived differences between paired sites (e.g., site 3). Ideally, redundant instrumentation at sites, or in this case the rotation of an extra set of instruments among nearby paired sites, could be used to identify, quantify, and account for significant systematic biases in measurements for suspicious variables. Furthermore, footprints of the flux towers may bias the comparison of surface fluxes between the open and forest sites (Baker et al., 2003;Griebel et al., 2016). In other words, the observed differences between sites can only be partially attributed to LULCC because their environmental conditions may also be different. As most of the current stud-ies used paired sites to represent LULCC, we have assumed that the paired sites share similar background atmospheric conditions, and any observed differences in surface climate conditions can be attributed to LULCC (e.g., Lejeune et al., 2017;Luyssaert et al., 2014;Teuling et al., 2010;Vanden Broucke et al., 2015). Meanwhile, model simulations with the different forcings can effectively examine the effects of the local environment of individual sites, because their footprints can also be taken by the meteorological measurements. Our results show robust signals of LULCC-induced changes in surface fluxes, implying that impacts of footprints at individual sites are probably trivial.

Conclusions
This study has evaluated the performance of two state-of-theart LSMs in simulating the LULCC-induced changes in surface energy fluxes. Observations from 28 FLUXNET pairs (open versus forest) are used to represent the observed flux changes following deforestation, which are compared with the LSM simulations forced with meteorological data from the observation sites. Diurnal and seasonal cycles of the flux changes have been investigated.
The single-point simulations in the CLM and Noah-MP show the greatest bias in simulating LE change. Significantly decreased daytime LE is observed during local summer, but is not captured by the models. The observed LE changes also exhibit an evident seasonality, which is not represented in the model. The energy partitioning between LE and H might be a common issue within the LSMs. Other studies have noted problems in the simulation of surface fluxes by LSMs, including poor performance relative to non-physical statistical models (Best et al., 2015;Haughton et al., 2016).
The sub-grid comparison from the global simulations in the CLM yields unrealistic changes in G and H when the soil column is shared among vegetated land units, even though there is a better agreement in LE change with the observations. The individual-soil-column scheme improves the representation of the PFT-level energy flux changes, but uncertainties still remain as with the point-scale simulations. Therefore, these uncertainties must be considered when interpreting global experiments of LULCC sensitivity studies with current LSMs.
Consistent aggregate performance across many paired sites suggests the problems in these LSMs may not lie primarily with parameter selection at individual sites, but with more fundamental issues of the representation of physical processes in LSMs. The simulation of LULCC may or may not have become more consistent among models since LU-CID , but consistency with observed biophysical responses appears to be lacking. LU-MIP  will be a step toward better LSM simulation of LULCC responses, and ultimately better simulations of the response of climate to LULCC. Data availability. Measurements at paired sites can be accessed at the Fluxdata website (http://fluxnet.fluxdata.org/data/ fluxnet2015-dataset). Output from the CLM and Noah-MP simulations in this study are available upon request (lchen15@gmu.edu).