Remote-sensing algorithm for surface evapotranspiration

8 Evapotranspiration (ET) plays an important role in surface-atmosphere interactions and can be 9 monitored using remote sensing data. However, surface heterogeneity, including the inhomogeneity 10 of landscapes and surface variables, significantly affects the accuracy of ET estimated from satellite 11 data. The objective of this study is to assess and reduce the uncertainties resulting from surface 12 heterogeneity in remotely sensed ET using Chinese HJ-1B satellite data, which is of 30 m spatial 13 resolution in VIS/NIR bands and 300 m spatial resolution in the TIR band. A temperature sharpening 14 and flux aggregation scheme (TSFA) was developed to obtain accurate heat fluxes from the HJ-1B 15 satellite data. The IPUS (input parameter upscaling) and TRFA (temperature resampling and flux 16 aggregation) methods were used to compare with the TSFA in this study. The three methods repre17 sent three typical schemes used to handle mixed pixels from the simplest to the most complex. IPUS 18 handles all surface variables at coarse resolution of 300 m in this study, TSFA handles them at 30 m 19 resolution, and TRFA handles them at 30m and 300m resolution, which depends on the actual spatial 20 resolution. Analyzing and comparing the three methods can help us to get a better understanding of 21 spatial scale errors in remote sensing of surface heat fluxes. In situ data collected during HiWATER22 MUSOEXE (Multi-Scale Observation Experiment on Evapotranspiration over heterogeneous land 23 surfaces of The Heihe Watershed Allied Telemetry Experimental Research) were used to validate 24 and analyze the methods. ET estimated by TSFA exhibited the best agreement with in situ observa25 tions, and the footprint validation results showed that the R, MBE, and RMSE values of the sensible 26 heat flux (H) were 0.61, 0.90 W∙m and 50.99 W∙m, respectively, and those for the latent heat flux 27 (LE) were 0.82, -20.54 W∙m and 71.24 W∙m, respectively. IPUS yielded the largest errors in ET 28 estimation. The RMSE of LE between the TSFA and IPUS methods was 51.30 W∙m, and the 29 RMSE of LE between the TSFA and TRFA methods was 16.48 W∙m. Furthermore, additional 30 analysis showed that the TSFA method can capture the sub-pixel variations of land surface temper31 ature and the influences of various landscapes within mixed pixels. 32


Introduction
Five types of methods have been developed to estimate evapotranspiration (ET) or latent heat flux (LE) via remote sensing.
1. Surface energy balance models calculate LE as a residual term.According to the partitioning of the sources and sinks of the soil-plant-atmosphere continuum (SPAC), surface energy balance models can be classified as one-source (Bastiaanssen et al., 1998;Su, 2002;Allen et al., 2007;Long and Singh, 2012a) or two-source models (Shuttleworth and Wallace, 1985;Norman et al., 1995;Xin and Liu, 2010;Zhu et al., 2013).
3. Land surface temperature-vegetation index (LST-VI) space methods assign the dry and wet edges of the LST-VI feature space as minimum and maximum ET, respectively.These methods interpolate the media, and use the Penman-Monteith or Priestley-Taylor equation to calculate the LE (Jiang andIslam, 1999, 2001;Sun et al., 2011;Long and Singh, 2012b;Yang and Shang, 2013;Fan et al., 2015;Zhang et al., 2005).
4. Priestley-Taylor models expand the range of the Priestley-Taylor coefficient in the Priestley-Taylor equation (Jiang and Islam, 2003;Jin et al., 2011) or combine the physiological force factors with the energy component of ET (Fisher et al., 2008;Yao et al., 2013).
If the operational algorithm can be described as a linear combination of inputs, or if the surface variables and landscapes are homogeneous at the pixel scale, scale error does not exist (Hu and Islam, 1997).However, it is difficult to develop linear operational models due to the complexity of mass and heat transfer processes between the atmosphere and land surface.ET estimation models have been generally developed for simple and homogeneous surface conditions.However, heterogeneity is a natural attribute of the surface of the Earth.Therefore, larger spatial-scale errors occur when these remotely sensed models are applied to calculate the regional ET using satellite data.
In previous studies, researchers have coupled high-and low-resolution satellite data and statistically quantified the inhomogeneity of mixed pixels to correct the scale error in ET estimations using (1) temperature downscaling, which converts images from a lower (coarser) to higher (finer) spatial resolution using statistical-based models with regression or stochastic relationships among parameters (Kustas et al., 2003;Norman et al., 2003;Cammalleri et al., 2013;Ha et al., 2013); (2) the correction-factor method, which uses subpixel landscapes information to determine the correction factor of scale bias (Maayar and Chen, 2006); and (3) the areaweighting method, which calculates roughness length and sensible heat flux based on subpixel landscapes (Xin et al., 2012).These correction methods mainly focus on two problems: inhomogeneity of landscapes and inhomogeneity of surface variables.
Studies have shown that different landscapes (Blyth and Harding, 1995;Moran et al., 1997;Bonan et al., 2002;Mc-Cabe and Wood, 2006) and the subpixel variations of surface variables, such as stomatal conductance (Bin and Roni, 1994), or leaf area index (Bonan et al., 1993;Maayar and Chen, 2006), can cause errors in turbulent heat flux estimations.Surface variables' inhomogeneity is rather difficult to evaluate, as the subpixel variation of surface variables can be large, even in the pure pixels.For example, generally, temperatures over land surfaces vary strongly in space and time, and it is common for the LST to vary by more than 10 K over just a few centimeters of distance or by more than 1 K in less than a minute over certain cover types (Z.L. Li et al., 2013).However, in case of mixed pixels, surface variables such as land surface temperature are commonly considered as a single value to represent the entire pixel area in ET estimation models, which results in large errors.
The focus of this study is on the effects of surface heterogeneity when estimating ET.Based on the satellite products that are currently available, three methods were used to analyze the uncertainties produced by surface heterogeneity: (1) input parameter upscaling (IPUS) does not consider the surface heterogeneities at all.It was designed to simulate the satellites that have identical spatial resolutions in both the visible near-infrared (VNIR) and thermal infrared (TIR) bands; (2) temperature resampling and flux aggregation (TRFA) does not consider the heterogeneity of LST; and (3) temperature sharpening and flux aggregation (TSFA) considers all the surface heterogeneities.These methods were designed for use with the majority of satellite data or products that have inconsistent spatial resolutions between the VNIR and TIR bands, such as the Landsat and HJ-1B satellites.
The surface variables in this paper were mainly derived from HJ-1B satellite data.The Chinese HJ-1A/B satellites were launched on 6 September 2008, and were designed for disaster and environmental monitoring, as well as other applications.The HJ-1B satellites are equipped with two charge-coupled device (CCD) cameras and one infrared scanner (IRS) with spatial resolutions of 30 and 300 m, respectively.Compared to high-temporal-resolution satellite data, such as the MODIS satellite data, or high-spatialresolution satellite data, such as the Landsat 7 or 8 satellites data, HJ-1B data have the advantage of a high spatiotemporal resolution.Since the satellites were launched, the HJ-1/CCD time series data have been widely used in China to accurately classify land cover (Zhong et al., 2014a) and monitor various environmental disasters (Wang et al., 2010).Land-based variables, such as leaf area index (LAI), land surface temperature (LST), and downward longwave radiation (L d ), have been retrieved by the HJ-1B satellites using algorithms developed by Chen et al. (2010), H. Li et al. (2010Li et al. ( , 2011)), and Yu et al. (2013), respectively.These variables lay the foundation for ET research.
Although the HJ-1B satellites provide CCD data with a high spatial resolution of 30 m, the spatial resolution of the TIR band is only 300 m.Thus, surface heterogeneity effects must be considered when estimating the heat flux.

Methodology
2.1 Temperature-sharpening method based on statistical relationships Surface thermal dynamics affect ET.The spatial resolution of TIR images is usually not as high as the spatial resolution of visible near-infrared (VNIR) bands because the energy of VNIR photons is higher than the energy of thermal photons.Thus, the inhomogeneity of TIR images would be larger than the inhomogeneity of VNIR images.Since the land surface temperature is calculated from the TIR band, the uncertainty of the variables becomes unpredictable when the inhomogeneity of TIR images is enhanced.Therefore, land surface temperature data should be derived with a high spatial resolution.
The land surface temperature can be reconstructed at the spatial resolution of the VNIR images by using a statistical temperature-sharpening strategy proposed by Kustas et al. (2003).This method assumes that the negative correlation between the normalized difference vegetation index (NDVI) and LST is invariant.The NDVI reflects vegetation growth and coverage, and the LST reflects surface thermal dynamics.The LST decreases with increasing vegetation cover.The scatter plot between the LST and NDVI values forms a feature space that is applicable at different scales when a sufficient number of pixels exist.
HJ-1B satellite images can provide vegetation and thermal information at spatial resolutions of 30 and 300 m, respectively.The 300 m resolution thermal data cannot sufficiently distinguish the surface temperatures of small targets within pixels.However, this issue can be addressed by temperature sharpening based on the functional relationship between NDVI and LST.A flowchart of temperature sharpening is shown in Fig. 1, and LST at the NDVI pixel resolution can be derived based on the following steps (Kustas et al., 2003): 1.The NDVI 30 is aggregated to 300 m NDVI (NDVI 300 ).
2. A subset of pixels is selected from the scene where the NDVI is as homogeneous as possible at a pixel resolution of 300 m based on the coefficient of variation (CV).
The CVs are calculated using the original 30 m NDVI data (NDVI 30 ) as follows: where SD and mean are the standard deviation and the average values of 10 × 10 pixels of NDVI 30 , respectively.The CVs are sorted from smallest to largest.Lower CVs correspond to more homogeneous land surface values, and a threshold should be determined to guarantee that a sufficient number of pixels is available for least squares fitting between NDVI 300 and T 300 .Therefore, the fractions of 25 % of the lowest CVs are selected from each class.
3. A least squares expression is established between NDVI 300 and T 300 using the selected pixels.
4. For each 30 m pixel within a 300 m pixel, T30 can be calculated according to Eq. ( 2) as follows: where T300 = T 300 − T300 is the deviation between the regressed temperature and the temperature that was observed by the satellite at 300 m.

Area-weighting method based on landscape information
Coarse pixels are inhomogeneous because various types of land use may be included.Using a dominant type to represent such a large landscape is irrational.The spatial resolution of LST is significantly increased by temperature sharpening in Sect.2.1.Consequently, all inputs of ET algorithms can be obtained at high spatial resolutions.Then, inhomogeneity issues can be greatly diminished by dividing the landscape into finer pixels.
Combined with a high-resolution classification map, subpixel-scale parameters can be used in the ET algorithm, which is more rational than using a dominant class type because different landscapes may require different ET algorithms.The surface energy fluxes can be averaged linearly due to the conservation of energy (Kustas et al., 2003), and a simple average that calculates the arithmetic mean over subpixels is the best choice for flux upscaling (Ershadi et al., 2013).Thus, the aggregated flux at a low resolution F (x,y) is the arithmetic mean of all the n × n subpixel fluxes that constitute the contributing flux F (x i , y j ) at coordinate (x i , y j ): (4) Because the average of the subpixels fluxes is equal to the area-weighted sum of each land-type result, the final coarse result can be derived from the area-weighted sum of each land type result within the landscape.The main steps in the area-weighting process are shown below (Xin et al., 2012): 1.The VNIR and TIR input data sets are geometrically corrected and registered.
2. The area ratios of different land cover types within each pixel of a low-spatial-resolution classification image are counted.
3. According to the fine-classification data, different parameterization schemes can be used in the ET algorithm to calculate the subpixel flux, such as the net radiation (R n ), soil heat flux (G), and sensible heat flux (H ).
4. To calculate the regional flux, the flux of the large pixel is calculated by the area-weighting method as follows: where w i is the fractional area contributing flux F i of class type i and F is the aggregated flux at the coarse resolution.The LE is computed as a residual of the surface energy balance in the TSFA (see Sect. 2.3) process, in which a high-spatial-resolution image is used to reduce the number of mixed pixels.

Pixel ET algorithm
The surface energy balance describes the energy between the land surface and atmosphere.The energy budget is commonly expressed as follows: where R n is the net radiation, G is the soil heat flux, H is the sensible heat flux, and LE is the latent heat flux absorbed by water vapor when it evaporates from the soil surface and transpires from plants through stomata.The widely used one-source energy balance model considers a homogeneous SPAC medium and ignores the inhomogeneity and structure.In this case, LE can be expressed as follows: where γ is the psychometric constant; e s and e a are the aerodynamic saturation vapor pressure and atmospheric water vapor pressure, respectively; and r a and r s are the water vapor transfer aerodynamic resistance and surface resistance, respectively.Surface resistance includes soil resistance and canopy resistance.The surface resistance is influenced by the physiological characteristics of the vegetation and the water supply of roots.Thus, it is difficult to obtain surface resistance via remote sensing, and surface resistance is highly uncertain, particularly over heterogeneous surfaces.To avoid error introduced by the uncertainty of the surface resistance, LE is computed as a residual of the surface energy balance equation.
R n is the difference between incoming and outgoing radiation and is calculated as follows: where S d is the downward shortwave radiation, α is the surface broadband albedo, ε s is the emissivity of the land surface, L d is the downward atmospheric longwave radiation, σ = 5.67 × 10 −8 W m −2 K −4 is the Stefan-Boltzmann constant, and T rad is the surface radiation temperature.
G is commonly estimated using the empirical relationship with R n .Because the canopy exerts a significant influence on G, the fractional canopy coverage FVC is used to determine the ratio of G to R n as follows: where s is 0.315 for bare soil and c is 0.05 for a full vegetation canopy (Su, 2002).H is the transfer of turbulent heat between the surface and atmosphere, which is driven by a temperature difference and controlled by resistances that depend on local atmospheric conditions and land cover properties (Kalma et al., 2008).According to gradient diffusion theory, the equation for H is as follows: where ρ is the density of the air; c p is the specific heat of the air at a constant pressure; T aero is the aerodynamic surface temperature obtained by extrapolating the logarithmic air temperature profile to the roughness length for heat transport; T a is the air temperature at a reference height; and r a is the aerodynamic resistance, which influences the heat transfer between the source of the turbulent heat flux and the reference height.Aerodynamic resistance was calculated based on the Monin-Obukhov similarity theory (MOST) using a stability correction function (Paulson, 1970;Ambast et al., 2002).The zero-plane displacement height, d, and roughness length, z 0 m , were parameterized by the schemes proposed by Choudhury and Monteith (1988).
In this approach, H must be accurately estimated.However, calculating H using Eq. ( 10) is difficult.Because remote sensing cannot obtain T aero , the value of T aero is generally replaced with the radiative surface temperature T rad , which is not always equal to T aero .The difference between these terms for homogeneous and full-coverage vegetation is approximately 1-2 • (Choudhury et al., 1986), and it can reach 10 • in sparsely vegetative areas (Kustas, 1990).The method that corrects for this discrepancy adds excess resistance r ex to r a .We used the brief method proposed by Chen (1988) to calculate r ex : r ex = 4/u * .
Figure 2 shows the flowchart for merging ET retrieval and temperature sharpening based on HJ-1B satellites.
The spatial-scale effect is generally revealed by a discrepancy between different upscaling methods.In one method, parameters are upscaled to a large scale before calculating the heat flux.In the other method, heat flux is calculated at a small scale, and the results are then upscaled.In this study, the resolution of the final output result is 300 m.To evaluate the heterogeneity-reducing effect of TSFA, two other upscaling methods, called IPUS and TRFA, were implemented (see Fig. 3).In the case of IPUS, the inputs of the energy balance model are first retrieved at 30 m resolution (see information of HJ-1B satellite data in Sect.3.2.1)and then aggregated to 300 m resolution.Subsequently, these 300 m inputs are used in the one-source energy balance model to obtain the four energy balance components at 300 m resolution.In TRFA, the LST at 300 m is first resampled to 30 m using the nearest neighbor method and the 30 m resolution inputs are used for estimating ET.The outputs of the four energy-balance components of the TRFA are obtained using the area-weighting method shown in Sect.2.2.
3 Study area and data set

Study area
Our study was conducted in the middle stream of the Heihe River basin (HRB), which is located near the city of Zhangye in the arid region of Gansu province in northwestern China (100.11-100.16• E, 39.10-39.15• N).The middle reach of the HRB is a typical desert-oasis agriculture ecosystem dominated by maize and wheat.Areas of the Gobi Desert and the alpine vegetation in the Qilian Mountains are located near the study area (see Fig. 4).The artificial oasis is highly heterogeneous, which impacts the thermal dynamics and hydraulic features.Consequently, the water use efficiency and ET are variable.The Heihe River basin has long served as a test bed for integrated watershed studies, as well as land surface and hydrological experiments.Comprehensive experiments, such as the Watershed Allied Telemetry Experimental Research (WATER) project (Li et al., 2009), and an international experiment -the Heihe Basin Field Experiment (HEIFE) as part of the World Climate Research Programme (WCRP), have been conducted in this basin.One major objective of HiWATER was to capture the strong land surface heterogeneities and associated uncertainties within a watershed (X.Li et al., 2013).

Data set
In this study, data are mainly derived from the HJ-1B satellite.We combined these data with ancillary data and in situ Multi-Scale Observation Experiment on Evapotranspiration over heterogeneous land surfaces of The Heihe Watershed Allied Telemetry Experimental Research (HiWATER-MUSOEXE) data to estimate and validate the HJ-1B land surface variables and heat fluxes.

Remote sensing data HJ-1B satellite data
The specifications of HJ-1B are shown in Table 1.The satellite has quasi-sun-synchronous orbits at an altitude of 650 km, a swath width of 700 km and a revisit period of 4 days.Combined, the revisit period of the satellites is 48 h.Because HJ-1A/B CCDs lack an onboard calibration system, cross-calibration methods were proposed to calibrate the CCD instruments (Zhang et al., 2013;Zhong et al., 2014b).The image quality of the HJ-1A/B CCD is stable, the performances of each band are balanced (Zhang et al., 2013) and the radiometric performance of the HJ-1A/B CCD sensors is similar to the performances of the Landsat-5 TM, Observer-1 (EO-1) Advanced Land Imager, and Terra ASTER.The image quality of the HJ-1A/B CCD is very similar to the image quality of Landsat-5 TM (Jiang et al., 2013).In addition, the accuracy of the TIR band's onboard calibration meets the land surface temperature retrieval requirements but not the sea surface temperature retrieval requirements (J.Li et al., 2011).The Center for Resources Satellite Data and Application (CRESDA) in China releases calibration coefficients annually on its website (http://www.cresda.com).These data are freely available from the CRESDA website (http://218.247.138.121/DSSPlatform/index.html).
We used the HJ-1B satellite data from the HRB region in 2012.Because many variable-retrieving algorithms require clear-sky conditions when calculating ET, we combined data quality information with visual interpretation to select satellite images without clouds.Considering the period of ground observations discussed in Sect.3.2.2,we obtained data for 11 days: 19 and 30 June; 8 and 27 July; 2, 15, 22, and 29 August; 2, 13, and 14 September.The HJ-1B satellite data of the HRB were preprocessed, including geometric correction, radiometric calibration, and atmospheric correction.The following surface variables are needed in Eqs. ( 1) to (10): downward shortwave radiation, downward longwave radiation, emissivity, albedo, fractional vegetation coverage (FVC), cloud mask data, meteorological data, LAI, and LST. Figure 5 illustrates a flowchart of the retrieval of these variables.
1. Surface albedo was obtained from the top of the atmosphere (TOA) reflectance by the HJ-1A/B satellite using a lookup table based on an angular bin regression relationship according to the algorithm proposed by Liang et al. (2005) and Q. Liu et al. (2011).The surface albedo and bidirectional reflectance distribution function (BRDF) of the HJ-1A/B satellite in the regression procedure were monitored using POLDER-3/PARASOL BRDF data sets, and BRDF was used to obtain the TOA reflectance in the 6S (second simulation of a satellite signal in the solar spectrum) radiation transfer mode.
Z. Q. Peng et al.: Remote sensing algorithm for surface evapotranspiration 2. The NDVI is the regression kernel of the temperaturesharpening procedure and is used to calculate the FVC.Atmospherically corrected surface reflectance values were used to calculate the NDVI as follows: where ρ nir and ρ red are the reflectances in the nearinfrared and red band, respectively, and NDVI v and NDVI s are the fully vegetated and bare soil NDVI values, respectively.As an important input for the parameterization of surface roughness length and aerodynamic resistance, the LAI was determined using the following equation (Nilson, 1971): where θ is the zenith angle, P (θ ) is the angular distribution of the canopy gap fraction, G(θ ) is the projection coefficient (0.5) and is the total foliage clumping index, which can be obtained from the GLC global clumping index database according to the land use type (He et al., 2012).
3. Land surface emissivity (LSE) is needed to calculate the R n and is extremely important for retrieving LST.In this paper, LSE was calculated using the FVC as follows (Valor and Caselles, 1996): where ε is the LSE, < dε > is an effective value of the cavity effect of emissivity, the mean dε of all vegetation species in this study is < dε > = 0.015, ε v and ε g are the vegetation and ground emissivity, respectively.
4. Land surface temperature is a single-channel parametric model for retrieving LST based on HJ-1B/IRS TIR data developed by H. Li et al. (2010) was employed to obtain the LST.This model was developed from a parametric model based on MODTRAN4 using NCEP atmospheric profile data.
5. Downward shortwave radiation was calculated in this study by applying the algorithm proposed by L. Li et al. (2010).MOD05, TOMS, aerosol and solar angle data were used to estimate the direct light flux and diffuse light flux using a lookup table that was generated via the 6S radiation transfer mode (Vermote et al., 2006).This method considered the influences of complex terrain, and a topographic correction was performed by using products of the ASTER digital elevation model (DEM).
6. Downward longwave radiation (L d ) was calculated by the algorithm proposed by Yu et al. (2013).The TOA brightness temperature of the HJ-1B thermal channel was used to substitute the atmospheric effective temperature.Effective atmospheric emissivity was parameterized as an empirical function of the water vapor content.These values were substituted for atmospheric temperature and atmospheric emissivity to estimate the value of L d .Because this L d retrieval method was only valid for clear-sky conditions, cloud masking information was used to determine clear skies.When cloud contamination existed in the image, the brightness temperature was relatively low, causing the L d to be lower than that in the cloudless images.

Ancillary data
Ancillary data were used because the bands of the satellite could not invert all of the variables needed to retrieve ET.
1. MODIS provides atmospheric water vapor data (MOD05), including a 1 km near-infrared product and a 5 km thermal-infrared product, every day.The 1 km near-infrared water vapor product was used to retrieve L d in this study.
2. For surface elevation data, we used the 30 m resolution global digital elevation model (GDEM) based on ASTER, which covers 83 • N-83 • S, to derive S d .
3. For atmosphere ozone data, a total ozone mapping spectrometer (TOMS), which was carried on an Earth Probe (EP) satellite, was used to derive S d .The TOMS-EP provided daily global atmospheric ozone data at a resolution of 1 • × 1.25 • (L.Li et al., 2010).

HiWATER experiment data set
The in situ HRB observation data were provided by Hi-WATER.From June to September 2012, HiWATER designed nested observation matrices over 30 km × 30 km and 5.5 km × 5.5 km within the middle stream oasis in Zhangye to focus on the heterogeneity of the scale effect in HiWATER-MUSOEXE.
In the larger observation matrix, four eddy covariance (EC) systems and one superstation were installed in the oasis-desert ecosystem.Each station was supplemented with an automatic meteorological station (AMS) to record meteorological and soil variables and monitor the spatial-temporal variations of ET and its associated factors (X.Li et al., 2013).
Hydrol.Earth Syst. Sci., 20, 4409-4438, 2016 www.hydrol-earth-syst-sci.net/20/4409/2016/The station information is shown in Table 2, and the distribution of the stations is shown in Fig. 4. Within the artificial oasis, an observation matrix composed of 17 EC towers and ordinary AMSs exists where the superstation was located.The land surface was heterogeneous and dominated by maize, maize intercropped with spring wheat, vegetables, orchards, and residential areas (X.Li et al., 2013).Because the EC16 and HHZ stations lacked R n and G observation data, they were excluded from this study.The ground observation data included the H and LE.Reliable methods were used to ensure the quality of the turbulent heat flux data.Before the main campaign, an intercomparison of all instruments was conducted in the Gobi Desert (Xu et al., 2013).After basic processing, including spike removal and corrections for density fluctuations (WPL correction), a four-step procedure was performed to control the quality of the EC data.In this procedure, data were rejected when (1) the sensor had been malfunctioning, (2) precipitation occurred within 1 h before or after collection, (3) the ratio of missing data was greater than 3 % in the 30 min raw record and, (4) the friction velocity was below 0.1 m s −1 at night (for more details, see S. M. Liu et al., 2011;Xu et al., 2013;Liu et al., 2016).EC outputs are available every 30 min.G was measured by using three soil heat plates at a depth of 6 cm at each site, and the surface G was calculated using the method proposed by Yang and Wang (2008) based on the soil temperature and moisture above the plates.Surface meteorological variables, such as wind speed, wind direction, relative humidity, and air pressure, were used to interpolate images using the inverse distance weighting.Re- Computing the LE as a residual variable may be a better method for energy balance closure under conditions with large LEs (small or negative Bowen ratios due to strong advection) (Kustas et al., 2012).Thus, the residual closure method was applied because the oasis effect was distinctly observed in the desert-oasis system on clear days during the summer (S.M. Liu et al., 2011).

Evaluation of surface variables
To control model inputs and analyze error sources, the coarse-resolution land surface temperature, downward shortwave radiation, downward longwave radiation, R n , and G were evaluated using in situ data.
The ground-based land surface temperature, T s , was calculated using the Stefan-Boltzmann law from the AMS measurements of the longwave radiation fluxes (Li et al., 2014) as follows: in which L ↑ and L ↓ are in situ surface upwelling and atmospheric downwelling longwave radiation, respectively, and ε s is the surface broadband emissivity, which is regarded as the pixel value of the HJ-1B at the AMS.The coefficient of determination R 2 , mean bias error (MBE) and root mean square error (RMSE) of the LST are 0.71, −0.14 K, and 3.37 K, respectively.As shown in Table 3, the accuracy of EC4 is low.The main causes of the large errors are as follows: (1) buildings and soil/vegetation are distinct materials, the LSE algorithm may not be suitable for buildings and   The R 2 , MBE, and RMSE values of the HRB L d were 0.73, 0.28 W m −2 , and 21.24 W m −2 , respectively.As seen in Table 5, the accuracies at EC3, SD, and SSW were low.The low accuracies at EC3 and SD potentially resulted from (1) high humidity, which resulted in low at-nadir brightness temperatures and low retrieved L d , or (2) instrument error, which occurred because the EC3 ground observations were always greater than those of the other stations during the same period.Although SSW was located in a desert, the ground-air temperature difference was large.The L d retrieval may have a large error because the models use surface temperature when estimating L d to approximate or substitute the near-surface temperature (Yu et al., 2013).The corrected error of our L d retrieving algorithm resulted from the ground-air temperature difference in non-vegetated areas.The inaccuracy of the SSW LST may influence the L d results.
The R 2 , MBE, and RMSE values of the HRB R n were 0.70, −9.64 W m −2 , and 42.77 W m −2 , respectively.The station-validated results of R n are shown in Table 6, which indicates that the accuracies of EC4, EC7, EC17, and SSW were relatively low.According to the sensitivity analysis of Eq. ( 8), L d and S d are highly sensitive variables when calculating R n , while the albedo, LSE, and LST are not as sensitive.Although LST was not a sensitive variable, the MBE and RMSE values of LST at EC4 reached −9.87 and 10.04 K because the land cover of EC4 was maize at 300 m resolution.However, the observation tower was located in a builtup area, which potentially caused errors when estimating R n .The accuracies of S d and L d at EC7 were low on several days, and MBE = −43.40W m −2 and RMSE = 50.50W m −2 after removing these data.EC17 was located in an orchard, and the signal that was received by the sensors at EC17 was affected Hydrol.Earth Syst.Sci., 20, 4409-4438, 2016 www.hydrol-earth-syst-sci.net/20/4409/2016/    7.At EC5, the soil temperature and moisture were the same at different depths after 19 July, which resulted in a surface G that was equal to the G at a depth of 6 cm.G below the surface was usually less than the G at the soil surface; thus, the validation results of G at EC5 indicate that G was overestimated.At SSW, the brief cloudy period decreased the observed soil surface temperature, which decreased the calculated surface G.However, the remotely sensed G did not reflect this situation.In this case, the G was overestimated because the R n was overestimated.After removing the data on 27 July, the R 2 , MBE, and RMSE values of the G at SSW were 0.17, 19.34 W m −2 , and 33.30W m −2 , respectively.

Validation of heat fluxes by TSFA
Figure 6 provides the turbulent heat flux results calculated by TSFA on 13 September 2012.The spatial distribution of the turbulent heat flux is obvious.The H values of buildings and uncultivated land, including land patches in the Gobi region, barren areas, and desert areas, were high, in addition to the LEs of water and agricultural areas in the oasis.The southern areas of the images show uncultivated barren land bordering the Qilian Mountains that resulted from snowmelt and the downward movement of water.In these areas, the groundwater levels are high and the soil moisture content is approximately 30 % based on in situ measurements at a depth of 2 cm.Therefore, the LE values of barren areas in the south are higher than the LE values of desert areas in the southeast, although both areas were classified as uncultivated land.
Studies have shown that validation methods that consider the source area are more appropriate for evaluating ET models than traditional validation methods based on a single pixel (Jia et al., 2012;Song et al., 2012).In this study, a userfriendly tool presented by Neftel et al. (2008), which is based on the Eulerian analytic flux footprint model proposed by Kormann and Meixner (2001) was used to calculate the footprints of the function parameters.The continuous footprint function was dispersed based on the relative weights of the pixels in which the source area was located.
The footprint validation results of the TSFA turbulent heat fluxes are shown in Fig. 7 and Table 8.The R 2 , MBE, and RMSE of H were 0.61, 0.90 W m −2 , and 50.99 W m −2 , respectively, and those of LE were 0.82, −20.54 W m −2 , and 71.24 W m −2 , respectively.Because LE was calculated as a residual term, it was impacted by R n , surface G, and H .The errors of all inputs may contribute to the LE, which complicates the error sources of the LE.These errors are discussed in detail in Sect.4.3.2 and 4.4.
As shown in Fig. 7, the majority of the H values are small because June, July, August, and September constitute the growing season when ET greatly cools the air.The temperature difference between the land surface and air is small, leading to a low H .The points with large H values are influenced by uncultivated land.In our study area, the Gobi region, barren area and desert area comprise the uncultivated land.The points in the scatter plot with large H values represent desert, where the H values reach approximately 300 W m −2 .Some points in the H scatter plot are less than zero due to inversion from the oasis effect or irrigation.For example, the HiWATER soil moisture data show that irri-gation occurred on 22 August 2012.Irrigation is the main source of water within the oasis and cools the land surface to temperatures below the air temperature.In addition, irrigation leads to errors in LST retrieval because it increases the atmospheric water vapor content, as discussed in Sect.4.1.The model error is further analyzed in Sect.4.4.

Comparison between TSFA, TRFA, and IPUS
To verify whether the TSFA method can simulate the heterogeneity of the land surface, the TRFA and IPUS methods were also implemented for comparison purposes.These three methods were evaluated using (1) validation of TRFA and IPUS based on in situ measurements and (2) quantitative analysis based on the spatial distribution and scatter plots of the four energy balance components.

Validation of the TRFA and IPUS heat fluxes
Table 9 provides the in situ validation results of H and LE calculated using the IPUS and TRFA methods.Compared to validation results of TSFA in Table 8, the TSFA produced a better retrieval accuracy than the TRFA, and the TRFA was better than the IPUS on all days and the MBE and RMSE values of TSFA decreased and the R 2 of TSFA increased on most days.Table 9 shows that the improvements in accuracy between TRFA and IPUS were relatively larger than those between TSFA and TRFA.Compared to the IPUS results, the TRFA results were similar to the TSFA results because subpixel landscapes and subpixel variations of most variables were considered.Thus, TRFA effectively decreased the scale error that resulted from heterogeneity because the 30 m VNIR data were fully used.However, the performance of the TRFA method is unstable.For example, on 3 and 29 August, the TRFA results were slightly worse than the IPUS results.This situation occurred because the different subpixel landscape temperatures were considered as equal to the values estimated at the 300 m resolution.Thus, when values of LST at 300 m scale have large retrieval errors, the turbulent heat flux retrieval error may be amplified by the subpixel landscapes.
Variations in landscape characteristics systematically trigger variations in surface variables.Landscape inhomogeneity can be classified using two conditions: nonlinear vegetation density variations between subpixels (e.g., different types of vegetation mixed with each other or with bare soil) and coarse pixels containing different landscapes (e.g., vegetation or bare soil mixed with buildings or water).To evaluate the effects of TSFA, stations with a typical severe heterogeneous surface at EC4, a weak heterogeneous surface at EC11, a typical pixel (called "TP" hereafter) at the boundary of the oasis and bare soil (sample 62, line 102 in the image of study area), and a uniform surface at EC15, were selected to analyze the temperature-sharpening results.
EC4 is used as an example because its land cover and subpixel variations of temperature were complicated.Table 10 compares the turbulent heat fluxes calculated using the IPUS, TRFA, and TSFA methods.Significant differences were observed between the TSFA and IPUS results and between the TRFA and IPUS results due to the heterogeneity of the surface.The LE calculated using the TSFA method was more consistent with in situ measurements than the LE calculated using the IPUS method because the MBE and RMSE decreased considerably, the R 2 increased, and the accuracy was improved by approximately 40 W m −2 .However, the LE calculated by the TRFA was more accurate than the LE calculated by the TSFA, as discussed below.
The H calculated by using the TSFA method was more accurate than the H calculated by using the TRFA and IPUS methods.The RMSE of the results from the TRFA method was relatively close to the RMSE of the results from the TSFA method because the TRFA method also considers the effects of the heterogeneity of landscapes.In addition, the H values obtained from the TRFA method were always greater than those obtained from the TSFA method.Because the TSFA turbulent heat flux results are the same as the TRFA turbulent heat flux results for buildings and water bodies in our pixel ET algorithm, the differences between the TSFA and TRFA results depend on the vegetation and bare soil.Additionally, the 300 m resolution LST is larger than the LST of the subpixels, such as pixels containing vegetation or bare soil.This relationship occurs for two reasons: (1) the coarse Table 9.In situ validation results of the turbulent heat fluxes of IPUS and TRFA.pixels contain buildings and result in a larger 300 m resolution LST, and (2) the LSTs were underestimated at EC4 (as shown in Table 3), which would underestimate the value of T300 in Eq. ( 3) and, consequently, the sharpening temperature at 30 m and H .Because the LE was calculated as a residual item in the energy balance equation, the errors of the other three energy balance components accumulate in the LE term.At EC4, the R n was overestimated by approximately 80 W m −2 , as discussed in Sect.4.1, but the scale effect of R n was not obvious (see Sect. 4.3.2),and the G was overestimated by approximately 20 W m −2 .These results decreased the accuracy of the available energy and overestimated the error by 60 W m −2 .Because the TRFA overestimates H , the underestimation of H in the TSFA would result in larger overestimation of LE than that estimated by the TRFA.Con-sequently, the LE calculated by using the TSFA method is less accurate than the LE calculated by the TRFA method.

IPUS-H (W m
Figure 8 shows that the classes and temperatures of 10 × 10 subpixels at 30 m correspond to the pixels with a resolution of 300 m at the EC tower.In the IPUS upscaling scheme, the 300 m pixels included buildings, maize, and vegetable crops at the 30 m resolution and were identified as maize.The canopy height gap between maize and vegetables was large during our study period, resulting in the overestimation of the canopy height.For additional details, see the error analysis in Sect.4.4.However, because buildings corresponded with H = 0.6R n in this study, ignoring the contributions of buildings would result in the underestimation of H . Figure 8a shows the temperature-sharpening results in the EC4 pixel on 29 August.The temperature retrieved at 300 m scale was 303.49K. Compared with the in situ measurement The land surface of EC15 was uniform and consisted of pure pixels covering maize fields.Consequently, the temperature distribution at 30 m resolution was very homogeneous, and the surface temperature variations were comprised within a range of 1.6 K. Table 11 shows the in situ validation results at EC15, for which the overall accuracy is not high due to the low LST retrieval accuracy on 8 July, which is discussed in Sect.4.4.1.For homogeneous surfaces, the gaps between IPUS, TRFA, and TSFA were not large (within 10 W m −2 ), and the accuracy did not improve (MBE and RMSE did not exhibit obvious variations).Statistically sharpening the temperature may increase the uncertainty of the model results for homogeneous surfaces; however, this influence could be omitted.
The weak heterogeneous land surface at EC11 contained barley, maize, and vegetables in a 300 m pixel resolution with a fractional area of 58 : 41 : 1 and was classified as barley at the 300 m resolution.The distributions of the classes and temperatures are shown in Fig. 8c.The pixel belongs to the first conditions of heterogeneity (nonlinear vegetation density variation between subpixels).Table 12 shows the in situ validation results of EC11.The improvements in the accuracies of H and LE by temperature resampling or sharpening were not as obvious as the improvements at EC4, which contained totally different landscapes (the other inhomogeneous condition).
Theoretically, the LE from the TSFA and TRFA at EC11 should be smaller than the IPUS LE values in the energy balance system.The height of maize (ranges from 0.3 to 2 m) was generally higher than the height of barley (ranges from 0.9 to 1.1 m) in the study area from June to August.Taller vegetation resulted in larger surface roughness and smaller aerodynamic resistance, which led to larger H values and smaller LE values, and vice versa (e.g., vegetables with a canopy height of 0.2 m).When using the TSFA and TRFA methods, patch landscapes consisting of different crops, such as maize and vegetables, were considered.Thus, the LE was smaller than the IPUS LE.On 19 June, the canopy height of maize was 0.74 m, which was lower than the canopy height of barley (1 m) and indicated that the H values that resulted from the TRFA and TSFA methods were less than H that resulted from the IPUS method.Because our validation method considered the influence of source area, the in situ turbulent heat flux validation results included the effects of neighboring pixels (i.e., on 3 August, the turbulent heat flux values of the pixel corresponding to the location of EC11 was only assigned a weight of 37 % in the source area).
The differences between the TSFA and TRFA methods were small and resulted from the LST differences between the 30 m resolution temperature-sharpening results and the LST retrieved at the 300 m resolution, but these differences were not evident at EC11.For example, on 29 August, the temperature range was 1.4 K, as shown in Fig. 8c.This tem-perature was even less than the temperature range at EC15 because the observation system at EC15 was a superstation with a 40 m tall tower that may cause large shadow effects and result in a relatively large temperature range.Hence, the temperature-sharpening effect is not obvious after aggregating the flux at the 300 m resolution under dense vegetation canopies.However, temperature sharpening can still decrease the heterogeneity that resulted from thermal dynamics.
The excess errors at EC11 was caused by the relatively low LST accuracy, with R 2 , MBE, and RMSE values of 0.42, 1.59 K, and 2.98 K, respectively.On August 29, the temperature retrieved at 300 m scale was 301.6 K, and the observed ground temperature was 300.20 K.The LST at the 300 m resolution was slightly overestimated.When the in situ temperature was substituted into the IPUS algorithm, the value of H decreased to 16.06 W m −2 and LE became 467.43 W m −2 .When the in situ temperature was substituted into the TRFA scheme, the value of H was 22.43 W m −2 and the LE was 461.58 W m −2 , which were similar to the ground observations.
Another typical pixel located at the boundary of the bare soil and the oasis with no flux measurements was used to evaluate the correction effects of landscapes and temperature sharpening.The land surface of the TP contained maize, vegetables, and bare soil at a fraction of 35 : 31 : 34.Table 13 shows that when neither the heterogeneity of the landscape nor the LST are considered, the relative error of LE reached 180 W m −2 .In addition, if only the LST heterogeneity is not considered, the LE relative error reached 48 W m −2 .This result also reveals that the influences of landscape inhomogeneity are greater than the influences of inhomogeneity on the LST in mixed pixels.

Comparison of the TRFA and IPUS methods
Using data of 13 September as an example, the spatial distributions of the four components of the energy balance calculated by the IPUS and TRFA methods are shown in Figs. 9 and 10, respectively.TSFA minus IPUS and TSFA minus TRFA, which display the spatial distributions of the scale effect, are shown in Fig. 11.Scatterplots of TSFA vs. IPUS and TRFA are shown in Fig. 12.
A comparison of Fig. 6 with Fig. 9 shows that the spatial distributions of the fluxes greatly change, except for R n .The TSFA results are synoptically smoother than the IPUS results because the land cover types and temperature distributions in mixed pixels are not considered in IPUS.For example, the boundary between the oasis and uncultivated land becomes a belt of intermediate G, H , and LE because these mixed pixels include uncultivated land and vegetation.However, mixed pixels are classified as the dominant land use type in the parameterization process of IPUS.This result overlooks the contributions of heat flux from complex land use types and overestimates or underestimates the turbulent heat flux by approximately 50 W m −2 .Since the TSFA can integrate the effects of these land areas and reveals the relative actual surface conditions, the heat flux results of TSFA vary less dramatically than those of IPUS, as shown in the figures.The results are similar in the oasis area.Based on Figs. 6 and 10, the TRFA and TSFA methods are similar.Because the TRFA method considers the subpixel landscapes that could be a significant source of error in the ET models, the difference between the TSFA and TRFA methods mainly resulted from the differences between the sharpened LST and retrieved, resampled LST of subpixels at the 30 m resolution.In addition, the bias between the TSFA and TRFA is not as obvious as the bias between the TSFA and IPUS methods, as shown in Fig. 11c-f.Furthermore, Fig. 11f shows that the LEs calculated by using the TSFA method in most oasis areas were slightly greater than the LEs calculated by using the TRFA method, which yielded values of approximately 20 W m −2 .
The quadrangular shape with a relatively unstable bias shown in Fig. 11a is caused by the L d that was calculated from the MOD05 water vapor product which exists as a quadrangular shape even after preprocessing the instrument mal-  function gap.In Fig. 11, the differences of the four energy components of the pure pixels between these three methods are within 5 W m −2 , and the mixed pixels have different ranges.
Figure 12 shows the scatter plots between the results of the TSFA method and the other two methods for all four energy balance components.Figure 12a and e show that R n does not vary much between the three methods and the scatter is centralized around the 1 : 1 line.However, regarding the spatial-scale effect, the differences in G, H , and LE calculated by using the IPUS and TSFA methods are obvious.The scatter plots disperse at the mixed pixels, and the differences between the TRFA and TSFA results are relatively smaller.When using the TSFA method, the temperature-sharpening results can be divided into results that are higher and lower than the LST retrieved at 300 m.Compared to the LST retrieved at 300 m when using the TRFA method, a higher LST is counterbalanced by a lower LST when calculating H using the TSFA.Thus, the effect of temperature heterogeneity is neutralized in this case.This observation is potentially resulted from the temperature-sharpening algorithms because they tend to overestimate the subpixel LST for cooler land-scapes and underestimate the subpixel LST for warmer areas in the image (Kustas et al., 2003).
However, LE is calculated as a residual; thus, the difference of LE is resulted from G and H .When the 300 m mixed pixels contained various types of landscapes, they were categorized as one type of landscape in the IPUS method and a single temperature value was used to evaluate the thermal dynamic effects when using the TRFA method.Pixels with highly different G, H , and LE values are mainly distributed near the mixed pixels, as shown in Fig. 11.An explanation for these deviations is provided below.
The parameterization of G and H are based on the land cover type.For example, for buildings, G = 0.4R n (Kato and Yamaguchi, 2005) (which is usually greater than the G of vegetation and bare soil deduced from Eq. 9) and H = 0.6R n , and for water, G = 0.226R n and LE = R n − G. From the land cover map shown in Fig. 4, four major classes exist in the study area: buildings with a high H , uncultivated land with a relatively high H , cropland with a relatively low H , and water with H = 0.
1.If a pixel contains cropland and buildings and is categorized as cropland, the building area within the pixel is ignored in the IPUS method.In this case, G and H are underestimated and LE is overestimated.In addition, after considering the landscapes using the TRFA method, the LE is underestimated and H is overestimated because the pixels contain buildings that are still reflected indistinctly by LST at 300 m as the detailed temperature heterogeneity cannot be represented by the TRFA method.These points are shown in green in Fig. 12.However, if the pixel is categorized as buildup, the building area within a pixel is exaggerated, which causes G and H to be overestimated and LE to be underestimated when using the IPUS method.This situation is similar to that illustrated by the green points associated with the TRFA results and is shown by the red points in Fig. 12. estimated, G and H are underestimated in the IPUS method, and vice versa.LE is also overestimated in the pixels containing water and other types of land cover (generally bare soil in our study area).These pixels are categorized as water and are shown as blue points in Fig. 12.Some of the blue LE points calculated by using the TSFA method are slightly smaller than those calculated using the TRFA method for pixels containing vegetation.At noon, the temperature of vegetation in those pixels is lower than that of water bodies.
3. In mixed pixels that contain various crops, such as maize and vegetables, LE is underestimated if the area of maize within the pixel is overestimated because the canopy height of the maize is taller than that of vegetables.This relationship results in the overestimation of H when using the IPUS and TRFA methods.In addition, G depends on the FVC values of the crops when using the IPUS method.Moreover, G depends on R n when using the TRFA method and is nearly identical to the values of G obtained using the TSFA method.
At the study area, we compared the TRFA and IPUS methods to quantify the ability of the TSFA method to simulate the heterogeneities of the land surface on 13 September (see Table 14).In pure pixels, the LE biases among the IPUS, TRFA, and TSFA methods were small.In mixed pixels, the LE bias between the TSFA and IPUS methods varied from 35.36 to 65.66 W m −2 , and the bias between the TSFA and TRFA methods varied from 4.41 to 22.53 W m −2 .More class types in mixed pixels correspond to larger biases.Table 15 shows the bias of the mixed pixels that contain buildings and bare soil between the three methods.In mixed pixels with buildings, the IPUS and TRFA methods generally underestimated LE and had large bias values compared to those of the TSFA method.In mixed pixels without buildings and bare soil, the bias between TRFA (or IPUS) and TSFA was relatively small, which indicates that the landscape and temperature inhomogeneity were accounted for by the TSFA method.The aforementioned analyses demonstrate that the TSFA method can consider the heterogeneous effects of mixed pixels.
Considering the landscapes and inhomogeneous distribution of LST, the TSFA method ensures that none of the end members (30 m pixels) are ignored or exaggerated.Thus, the distribution of LE calculated using the TSFA method is smoother and more rational than the distributions of LE calculated using the other methods.At the regional scale, the TSFA method describes the heterogeneity of the land surface more precisely.The degree of achievable estimation accuracy is discussed hereafter.

Error analysis
Since LE is calculated as a residual term in the energy balance equations, the sensitivity of H was analyzed first.Land surface variables (including LST, LAI, canopy height, and FVC) and meteorological variables including wind speed, air temperature, air pressure, and relative humidity are the major factors for H sensitive analysis.Figure 13 presents a case of sensitivity analysis results for H .In this case, LST is 303.9K, and it ranges from 298.4-309.4K with a step size of 0.5 K, LAI is set to 1.4 and it ranges from 0.14-2.66with a step size of 0.14.Canopy height is 1 m and it ranges from 0.1-1.9m with a step size of 0.1 m.Additionally, FVC = 0.5, wind speed u = 2.48 m s −1 , air temperature T a = 297.9K, air pressure = 97.2kPa, and RH = 40.29 %.In addition, the land cover is maize, and the reference H is 230.2W m −2 .
The air pressure is stable over a short period and has little effect on the ET results.Although excess resistance was calculated from the friction velocity, the meteorological data were provided by ground observations; thus, the meteorological data are relatively accurate.As shown in Fig. 13, LAI, canopy height, and LST are sensitive variables.
The parameterization of the momentum roughness length indicates that H is sensitive to LAI, with decreasing sensitivity when the LAI is greater than 1.When the LAI is less than 1, the momentum roughness length increases as the LAI increases, and turbulent exchange is enhanced.However, when the LAI is greater than 1, the plant canopy is regarded as a continuum that is not a sensitive variable.Because our study area is dominated by agriculture and the study period extended from July to September, the crops in the HRB middle stream grew quickly, thus, the LAI was usu-ally greater than 1.Thus, LST and canopy height are the main sources of error.

Errors in LST
As shown in Fig. 13, 1 K LST bias would result in 21 % error of H while H is 230.2W m −2 .However, the sensitivity of the LST is unstable and depends on the strength of the turbulence.The strength of the turbulence determines the mass and energy transport and the resistance of heat transfer, which influences the sensitivity to the LST.A weaker turbulence corresponds to a weaker LST sensitivity, and vice versa.
A sensitivity analysis of LE induced by LST was also performed.In order to exclude the influence of other factors, stations were chosen with homogeneous landscapes within coarse pixels.These results are shown in Table 16.The LE results obtained from the observed LST are consistent with the in situ observations and have less bias.LE was overestimated when the LST was underestimated, and vice versa.Because the magnitude of LE was greater than that of H , the relative error of LE was less than the relative error of H .However, 1 K of LST bias resulted in an average LE error of 30 W m −2 , which is consistent with the sensitivity analysis of H shown in Fig. 13.Specifically, 1 K of LST bias would result in an LE biases of 8.7 W m −2 (in desert, SSW) to 84.4 W m −2 (in oasis, EC8), which indicates that the sensitivity of LST is unstable.

Errors in canopy height
In this paper, canopy height was obtained from a phenophase and classification map.Thus, the accuracy of the canopy height was mainly dependent on the classification accuracy and plant growth state.Even within the same region, the canopy height of a crop can differ due to differences in seeding times and soil attributes, such as soil moisture and fertilization.
The land use type was orchard at EC17.However, in our land classification map, the land use at EC17 was other crops, which includes vegetables and orchards.Thus, it was difficult to set the canopy height.In our study area, most of the other crops were vegetables (canopy height of 0.2 m), and the height of the orchard was approximately 4 m; thus, a value of 0.2 m would overestimate the LE.The LE estimations with incorrect canopy heights and correct orchard canopy height at EC17 are shown in Table 17.The days of large LST bias were removed, and the bias between the model and ground observations decreased.The excess errors were caused by errors in the LST and land use, such as buildings and maize in the mixed pixels.
Except for the error source discussed previously, the following sources of error were unavoidable: 1.Although the remotely sensed turbulent heat flux is instantaneous, the EC data are averaged over time.Thus, the timescales do not match in the validation.2. The calibration coefficients of the HJ-1B satellite's CCD and IRS drifts because of instruments aging.
3. Geometric correction causes half-pixel bias equal to or less than the deviation of the artificially subjective interpretation.
A one-source model and simplified parameterization schemes were used in this paper to determine surface roughness lengths and heat transfer coefficients.The one-source model combines soil evaporation and plant transpiration and assumes that SPAC is a one-source continuum.This assumption is reasonable when the surface is densely covered by vegetation but relies on the accuracy of the difference between the LST and air temperature, as previously mentioned.
When a one-source model is applied to an area covered by sparse vegetation, such as semiarid or arid areas, this assumption is irrational.

Discussions
The TSFA describes the surface heterogeneity much better than the IPUS and TRFA.The IPUS aggregates the land surface variables from 30 to 300 m, which results in the loss of land surface details and leads to the scale effects.Although the TRFA uses 30 m information from VNIR bands and partially decreases the heterogeneity, it treats the pivotal variable LST as homogeneous at 300 m resolution, which results in considerable error.In summary, the advantages of the TSFA method are described as follows:  1.The temperature-sharpening algorithm in TSFA is capable of decreasing the influences of the heterogeneity of the LST, which is consistent with results from previous studies (Kustas et al., 2003;Bayala and Rivas, 2014;Mukherjee et al., 2014).As analyzed in Sect.4.3, the non-consideration of the heterogeneity of LST in mixed pixels is ill-founded and causes errors when estimating ET.
2. In the one-source energy balance model, different parameterization schemes were employed for different landscapes.In the IPUS, a single land cover is assigned to a mixed pixel, which results in a large error.However, the TSFA method is used to calculate the surface flux at 30 m and is aggregated to 300 m using the areaweighting method, which considers all of the subpixel landscapes and improves the accuracy.
Some problems exist in the temperature-sharpening algorithms.The temperature-downscaling method used in this paper caused "boxy" anomalies in parts of the sharpened temperature fields in large pixels because of the constant residual term, T300 , in Eq. ( 3) within large pixels.This situation also occurred in the temperature-sharpening algorithm proposed by Agam et al. (2007).In addition, our temperaturesharpening algorithm tends to overestimate the subpixel LST for cooler landscapes and underestimate the subpixel LST for warmer areas (Kustas et al., 2003).This inaccurate estimation causes errors that are difficult to evaluate when estimating the turbulent heat flux.For example, the small turbulent heat flux bias between TSFA and TRFA was caused by a counterbalancing effect as analyzed in Sect.4.3.1.Additional temperature-sharpening algorithms under heterogeneous surfaces should be evaluated using real data sets when applied in ET models (Ha et al., 2011).The retrieval methods of land surface variables were validated in other areas.For example, the albedo algorithm was previously applied to retrieve Global Land Surface Satellite (GLASS) products (Liang et al., 2014), the LST retrieval algorithm was validated in the Haihe River basin in northern China (H.Li et al., 2011), and the soil heat flux correction algorithm was validated in the GAME-Tibet campaign (Yang and Wang, 2008).Since the surface of the Heihe River basin is very heterogeneous, additional comparisons of our algorithm in other areas would be helpful.
In addition, to correct the discrepancy between remotely sensed radiative surface temperature and aerodynamic temperature at the source of heat transport, a brief and wellperformed parameterization scheme (under uniformly flat plant surface) of excess resistance was used to calculate the aerodynamic resistance of heat transfer (Jiao et al., 2014).
Since our study is based on mixed pixels, multiple parameterization methods should be compared to select the optimum method.
Because of the sensitive variables of the one-source energy balance model used in this paper, the accuracies of the LST and canopy height greatly influenced the turbulent heat flux.HJ-1B IRS has a single thermal channel, and the single-channel LST-retrieving algorithm may be unstable under wet atmospheric conditions (water vapor contents higher than 3 g cm −2 ) (H. Li et al., 2010), which may create a bottleneck for ET estimations by HJ-1B.The canopy height is a priori knowledge based on phenophase classifications and would influence the accuracy of the surface roughness calculation.Multi-source remote sensing data could be used to improve the accuracy of calibrations and land surface variable estimation.Active microwave and lidar data (Colin and Faivre, 2010) could be used to obtain the canopy height, which would decrease the dependence on the accuracy of the classification.
The energy balance closure has a significant influence on the evaluation of the model calculated heat flux results.In our study area, the EC energy balance closure ratio was greater than 0.75 (S.M. Liu et al., 2011).Studies have shown that uncaptured low-frequency eddies (Von Randow et al., 2008), extension of averaging time (Charuchittipan et al., 2014), and lack of an accurate accounting of heat storage terms (Meyers and Hollinger, 2004) are potential reasons for the energy imbalance and so forth.The conserving Bowen ratio and residual closure technique are often used to force energy balance.We chose the residual closure method at last because the conserving Bowen ratio method yields irrational sensible heat flux due to small or negative Bowen ratios (large LEs due to the oasis effect) in the oasis-desert system.Energy balance closure was problematic at times for turbulent flux system and tended to be associated with significant discrepancies in LE (Prueger et al., 2005).
Since a footprint model was used in the validation, the footprints' discrepancies between in situ measurements and remote sensing pixel may cause biases.For example, model validation results were calculated based on the relative weights of the footprint model and multiplied by the heat flux results of the coarse pixels in the source area from the upwind direction.However, the heat fluxes of coarse pixels included the contributions of non-overlapped subpixels within the coarse pixel.These pixels are influenced by the heterogeneity of underlying surface, and it would cause uncertainties in the validation.

Conclusion
The effects of surface heterogeneity in ET estimation have been studied here by employing the IPUS, TRFA, and TSFA methods over heterogeneous surface.Compared to the IPUS and TRFA methods, the TSFA method exhibits more consistent agreement with in situ measurements (energy balance forced by the residual closure method) based on the footprint validation results.The IPUS approach does not consider surface heterogeneity at all, which causes significant error in the heat fluxes (i.e., 186 W m −2 ).The TRFA considers heterogeneity of landscapes besides LST heterogeneity, with a heat flux error (i.e., 49 W m −2 ) that is less than that of IPUS.However, this error is non-negligible.As a sensitive variable of the ET model, canopy height is mainly determined by classification, and the application of classification at a 30 m resolution can improve the accuracy of the canopy height.Additionally, the sharpened surface temperature at a resolution of 30 m decreases the thermodynamic uncertainty caused by the land surface.The TSFA method can capture the heterogeneities of the land surface and integrate the effects of landscapes in mixed pixels that are neglected at coarse spatial resolutions.
HJ-1B satellite data have advantages because of their high spatiotemporal resolution and free access.Because the satellites are still in operation, the long-term data are promising for applications of monitoring energy budgets.

Figure 2 .
Figure 2. Flowchart of ET retrieval using the temperature-sharpening and flux aggregation method.

Figure 3 .
Figure 3. Flowchart of the three upscaling methods for retrieving evapotranspiration.

Figure 4 .
Figure 4. Study area and distribution of EC towers in HiWATER-MUSOEXE.

Figure 5 .
Figure 5. Flowchart of land surface variable retrieval.The abbreviations are defined as follows: SZA: solar zenith angle; SAA: solar azimuth angle; VZA: view zenith angle; AOD: aerosol optical depth; ABT: at-nadir brightness temperature; S d : downward shortwave radiation; USR: upward shortwave radiation; ULR: upward longwave radiation; and L d : downward longwave radiation.

Figure 6 .
Figure 6.Maps of the four energy components, (a) R n , (b) G, (c) H , and (d) LE, calculated by TSFA on 13 September 2012.

Figure 7 .
Figure 7. Scatter plot of the TSFA turbulent heat flux results.

Figure 9 .Figure 10 .
Figure 9. Maps of the four energy components, (a) R n , (b) G, (c) H , and (d) LE, calculated using the IPUS method on 13 September 2012.

Figure 11 .
Figure 11.Maps of the bias of the energy balance components calculated using the TSFA method minus the IPUS method: (a) R n , (b) G, (c) H , (d) LE; and TSFA minus TRFA: (e) H and (f) LE.

2.
Figure 12.Scatter plots between the TSFA and IPUS results: (a) R n , (b) G, (c) H , and (d) LE; and TSFA and TRFA results: (e) R n , (f) G, (g) H , and (h) LE.MBD and RMSD are the mean bias deviation and root mean square deviation between the TSFA and IPUS results, respectively.

Table 1 .
Specifications of the HJ-1B main payloads.

Table 3 .
The station validation results of land surface temperature.thesedata from the websites of the Cold and Arid Regions Science Data Center at Lanzhou (http://card.westgis.ac.cn/) or the Heihe Plan Data Management Center (http://www.heihedata.org/).Energy imbalances are common in ground flux observations.The conserving Bowen ratio (H / LE) and residual closure technique are often used to force the energy balance.

Table 4 .
The station validation results of downward shortwave radiation.

Table 5 .
The station validation results of downward longwave radiation.
The R 2 , MBE, and RMSE values of S d were 0.81, 13.80 W m −2 , and 25.35 W m −2 , respectively.The station validation results are shown in Table4.The accuracy of SSW is low.Because cloudy conditions occurred briefly on 27 July, few ground observations were obtained, and S d was significantly overestimated.After removing these data, the R 2 , MBE, and RMSE values of S d at SSW were 0.87, 10.90 W m −2 , and 21.13 W m −2 , respectively.
(2) the EC4 foundation is non-uniform and is not suitable for validation.After removing the EC4 data, the R 2 , MBE, and RMSE values of the LSTs were 0.83, 0.69 K, and 2.51 K, respectively.The LST errors of SSW and SD were large due to large errors on particular days.For example, although it was briefly cloudy above station SSW on 27 July, this area was not identified as cloudy in the cloud detection process.

Table 6 .
The station net radiation validation results.

Table 7 .
The station validation results of the soil heat flux.
The information of substrate plants may be ignored, leading to albedo retrieval errors.An albedo bias of 0.03 can lead to an R n error of approximately 20 W m −2 when the solar incoming radiation is large.As previously discussed, it was briefly cloudy on 27 July, and after removing those data, the R 2 , MBE, and RMSE values of the R n obtained at station SSW were 0.72, 8.20 W m −2 , and 37.60 W m −2 , respectively.The R 2 , MBE, and RMSE values of the G in the HRB were 0.57, 8.51 W m −2 , and 29.73 W m −2 , respectively.The station-validated G results are shown in Table

Table 8 .
In situ validation results of heat flux using the TSFA.

Table 10 .
Comparison of the turbulent heat flux results at EC4.

Table 11 .
Comparison of the turbulent heat fluxes results at EC15.

Table 12 .
Comparison of the turbulent heat flux results at EC11.

Table 13 .
Comparison of the turbulent heat flux results at TP.

Table 14 .
Comparison of the latent heat flux in pixels containing different numbers of class types.Notes: the number of class types in mixed pixels represents the number of classification types that were contained in the pixels.For example, 1 represents the pure pixels and 2 represents mixed pixels containing two land use types.MBD and RMSD are the mean bias deviation and root mean square deviation, respectively, between the TSFA results and the TRFA and IPUS results.

Table 15 .
Comparison of the latent heat fluxes of typical mixed pixels.
Figure 13.Sensitivity analysis of the surface variables for sensible heat flux.

Table 16 .
The results of the LST error analyses at the stations with homogeneous landscapes.: LST bias is calculated as the retrieved LST minus the observed LST; EC-LE is the in situ latent heat flux; LE relative error is the relative error between the retrieved and observed LST and is expressed as ((LE from retrieved LST) − (LE from observed LST))/(LE from observed LST) × 100 %, and H relative error is calculated in the same way. Notes

Table 17 .
The results of the canopy height error analyses at EC17.
Notes: LE-I relative error is the relative error between the LE from incorrect canopy height and observed LE and is expressed as ((LE from incorrect canopy height) − (EC-LE))/(EC-LE) × 100 %; LE-C relative error is the relative error between the LE from correct canopy height and observed LE and is expressed as ((LE from correct canopy height) − (EC-LE))/(EC-LE) × 100 %.