Reducing soil moisture measurement scale mismatch to improve surface energy flux estimation

At the so-called hyper-resolution scale (i.e. grid cells of 1 km) Land Surface Model (LSM) parameters are sometimes calibrated with Eddy-Covariance (EC) data and Point Scale (PS) soil moisture data. However, measurement scales of EC and PS data differ substantially. In our study, we investigated the impact of reducing the scale mismatch between surface 10 energy flux data and soil moisture data by replacing PS soil moisture data with observations derived from Cosmic-Ray Neutron Sensors (CRNS) made at larger spatial scales. Five soil-evapotranspiration parameters of the Joint UK Land Environment Simulator (JULES) were calibrated against PS and CRNS soil moisture data separately. We calibrated the model for twelve sites in the USA representing a range of climatic, soil, and vegetation conditions. The improvement in surface energy partitioning for the two calibration solutions was assessed by comparing to EC data and to a version of JULES runs with 15 default parameter values. We found that simulated surface energy partitioning did not differ substantially between the PS and CRNS calibrations, despite their differences in actual soil moisture observations. We concluded that potential differences due to distinct spatial scales represented by the PS and CRNS soil moisture sensor techniques were substantially undermined by the weak coupling between soil moisture and evapotranspiration within JULES.

LSMs have become more complex over the past decades. The first LSMs included only limited representations of soil hydrological and biophysical processes. The `second-generation' LSMs, however included the main physical processes within vegetation and in soil (Sellers, 1997;Seneviratne et al., 2010), and the later developed `third-generation' LSMs, additionally 5 included plant physiological processes like photosynthesis and carbon assimilation (Seneviratne et al., 2010). This increasing complexity has brought with it an increasing number of parameters, with values not easily defined with in-situ measurements because of scale mismatch, e.g. stomatal resistance measured at leaf level is not the same as canopy stomatal resistance needed for LSMs (Blyth et al., 1993). Soil hydraulic parameter values (e.g. soil hydraulic conductivity) are often obtained from laboratory experiments on soil cores for cubic centimetres to cubic decimetres. The soil properties and processes at this scale 10 can however differ from those at the LSM grid cell sizes, which are often as large as hundreds to tens of thousands of square kilometres (Pitman 2003). Due to the different governing processes upscaling the soil hydraulic properties from soil core scale to field scale is non-trivial (Vinnikov 1996;Crow et al., 2012).
Following discussion on how to move forward with hydrological models and LSMs in an effort to develop models that cover all land surface of the globe (Wood et al., 2011;Beven and Cloke, 2012), models are increasingly being applied at the finer 15 `hyper-resolution scale' with grid cells of about 1 km 2 . Typically, parameters are calibrated and validated at this hyperresolution against in-situ measurements from sources such as Eddy-Covariance (EC) flux towers and point-scale (PS) soil moisture sensors (e.g. Time Domain Transmissivity; TDT and Time Domain Reflectometry; TDR) (Stockli et al., 2008;Richter et al., 2004;Blyth et al., 2010;Blyth et al., 2011;Rosolem et al., 2013). Such calibration/validation data has become more widely available at hyperresolution scale (Baldocchi, 2001;Smith, 2012). However, the horizontal footprints of different 20 measurement techniques' vary from each other: EC surface energy flux data represent a downwind footprint of 100 m 2 to 1 km 2 , while in-situ soil moisture probes link to much smaller surface areas by representing a support volume (soil volume represented by sensor; Blöschl and Sivapalan, 1995) of ~4 dm 3 only (Running et al., 1999;Kurc and Small, 2007;Vivoni et al., 2008). Soil moisture can be spatially non-uniform within the EC footprint due to heterogeneity in soil properties, vegetation, and topography. How variable the soil moisture content is under certain heterogeneous conditions depends on the wetness 25 conditions; past research indicates the variability to be highest during soil wetting and drying periods. Ideally an EC footprint average soil moisture content which best (i.e. most effectively) connects to the surface exchange processes should be used within an LSM grid cell. If soil moisture is measured at only a single or few locations with limited support volume, like with PS sensors, the measured soil moisture content might be different from the effective soil moisture state that controls the surface exchange processes. This poses a potential scale mismatch issue, as depicted in Figure 1. The question which then rises is, 30 does reduced observation scale mismatch improve LSM energy partitioning estimation?
In recent years, new soil moisture measurement techniques have been developed that have, compared to in-situ soil moisture sensing techniques (TDT and TDR), a reduced scale mismatch with EC surface energy flux measurements. Improvement in wireless technology and remote data collection technology have made the development of PS soil moisture sensor networks more feasible (Cardell-Oliver, 2005;Ritsema et al., 2009;Trubilowicz, 2009;Bogena et al., 2010;Robinson et al., 2008). It is usually assumed that, due to spatial variability in soil physical characteristics, a network of PS sensor profiles is more 5 informative than a single or a couple of PS sensor profiles for studying land-surface processes and constraining model parameters at scales of >~100 m 2 (Robinson et al., 2008).
Newer soil moisture sensor techniques, for instance one which makes use of the Global Positioning System (GPS; Larson et al 2008Larson et al , 2010, and the Cosmic-Ray Neutron Sensor (CRNS; Zreda et al., 2008) have the advantage that only a single, above ground sensor is needed, which is easier to install than a PS network. 10 The CRNS (Zreda et al., 2008) is an above-ground passive sensor which utilises natural cosmic-ray neutron radiation to estimate soil moisture content. Networks of CRNS have been established in various countries (e.g. COSMOS , COSMOS-UK (Evans et al., 2016), CosmOZ (Hawdon et al., 2004)). The CRNS passively measures fast neutron intensity, which can be non-linearly related to the soil moisture content in the top 10-70 cm, from an area with a radius of about 100 to 300 m surrounding the above-ground sensor (Figure 2; Desilets and Zreda, 2013;Kohli et al., 2015). The CRNS 15 has been shown to provide similar soil moisture information as co-located in-situ sensor networks .
Unlike wireless in-situ sensor networks, which consist of a set of point measurements, both the GPS and CRNS technology provide an integrated soil moisture measurement over the entire support volume (Larson et al., 2008;Zreda et al., 2008). We chose to answer our research question using the CRNS technology because the COSMOS network provides publicly available data for multiple years at a range of sites co-located with Ameriflux/FLUXNET EC towers (ORNL-DAAC, 2015). We used 20 twelve of these sites which provided sufficient LSM forcing data, PS soil moisture data, CRNS data, and EC LE and sensible heat flux (H) data. We phrased the following hypothesis for our research question: Due to reduced scale mismatch, CRNS yields LSM surface flux estimates closer to eddy-covariance observations than in-situ soil moisture sensors.
To investigate our research question we made the LSM simulated soil moisture content match the observed PS or CRNS data 25 as closely as possible. We did this by calibrating parameters of LSM Joint UK Land Environment Simulation Model (JULES; Best et al., 2011) that importantly affect the soil moisture state in the model against the PS and CRNS data separately. We subsequently validated the results against EC observed data over the same periods. To assess the change in soil moisture and surface energy after calibration we compared the calibrated runs against a default run with parameter values computed from a widely used soil properties database. Before our modelling exercise we first compared the PS and CRNS data. The outcomes 30 of this data analysis were mainly used to see if the results from the calibration and validation yielded larger differences in surface energy flux estimation at sites where the two soil moisture observation products showed higher deviation from each other.

Calibration and Validation data: PS, CRNS, and EC data
PS soil moisture and CRNS neutron count data from twelve Ameriflux/COSMOS sites were used ( Figure 2). These twelve sites covered eight of twenty Ecoclimatic domains of the US National Ecological Observatory Network (NEON; www.neonscience.org) (Figure 2). These twelve sites hence represent a variety of climates and land cover types, but also 5 different soil types (Table 1; for full site names see Appendix 1, Table A1.1).
Hourly PS data for nine sites were obtained from the publicly available Ameriflux Level 2 data source (ORNL). Data for the three California Climate Gradients sites (DC, CS, and SO) were obtained at http://www.ess.uci.edu/~california/ (data version 3.4; Goulden et al., 2015). The number of PS profiles, the installation depths, and sensor types differed between the twelve study sites (Table A1.1 in Appendix 1). We used data from the upper soil layers (up to 30 cm) only, because the CRNS data 10 represents the contribution from the upper layers only. Quality control was applied to filter out spurious and unrealistic data points due to sensor errors. The PS data was then interpolated to the JULES soil layer on which the model was calibrated.
Hourly CRNS neutron count data were obtained from the COSMOS network website (www.cosmos.hwr.arizona.edu).
Corrections were applied as by Zreda et al. (2012). Water vapour corrections (Rosolem et al., 2013) were applied with respect to dry air (Bogena et al., 2013). Similar quality control approach used for the PS analysis was also applied to CRNS neutron 15 count data series to remove unrealistic points. Snow cover periods were also removed for the analysis. A 5-hour moving average window was applied to the observed CRNS neutron counts (following Shuttleworth et al., 2013;Rosolem et al., 2014).
We compared PS soil moisture with CRNS integrated soil moisture computed from vertically homogeneous soil moisture values obtained from the observed neutron counts using the COsmic-ray SoiI Moisture Interaction Code (COSMIC; Shuttleworth et al., 2013). 20 We used the exact same data for the soil moisture data comparison as for the calibration (providing that both PS and CRNS were available in the same period). As validation data, latent heat (LE) and sensible heat (H) flux hourly data from Ameriflux Level 2 data source was used for nine sites while for the three California Climate Gradients sites data version 3.4 (Goulden et al., 2015). Quality control was applied to the LE and H flux data to remove outliers and unrealistic data points.

JULES forcing data and initial conditions 25
JULES requires precipitation, air temperature, atmospheric pressure, wind speed, specific humidity, downward shortwave radiation, and downward longwave radiation as meteorological forcing data. Quality controlled hourly data was obtained from Ameriflux Level 2 and the three California Climate Gradients sites. At some of the sites however, certain specific forcing data was not available from Ameriflux and hence data from different sources was used (Table A1.2 of Appendix 1).
Model input data was gap-filled following Rosolem et al. (2010) because JULES requires continuous time series except for 30 precipitation where gaps were set to zero. For all sites gaps smaller than 3 hours were filled by linear interpolation, while larger gaps of up to 31 days were gap-filled using the average diurnal pattern of the preceding and following 15 days. In addition, Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. some remaining gaps in Downward Shortwave Radiation and Downward Longwave Radiation at Wind River (WR) were filled using linear least squares relationships with NLDAS-2 data (http://disc.sci.gsfc.nasa.gov/uui/datasets?keywords=NLDAS). At Tonzi Ranch (TR) NLDAS data were used to gap fill air temperature. Gap filling at Santa Rita Creosote was done with data from the nearby Sahuarita site followed by the gap-filling procedure described above.

Soil moisture data comparison methodology 5
To compare PS and CRNS soil moisture data we computed the Mean Squared Deviation (MSD) and its decomposition (Gupta et al., 2009) into: (1) The squared difference between the means (structural bias) (2) The squared difference between the standard deviations (indicates different seasonality) (3) A term relating to the coefficient of correlation (r; indicates different dynamics): 2·σps·σcrns·(1-r) 10 We however scaled the relative contributions of the three MSD components to the Root Mean Squared Deviation (RMSD) instead of MSD, to keep units of m 3 m 3 . We then ranked the sites from lowest to highest RMSD. According to our hypothesis, we would expect to see a larger difference in simulated surface energy fluxes after calibration when the two soil moisture time series differ most. The CRNS soil moisture values computed with COSMIC represent the integrated signal computed from vertically homogeneous profiles, whereas in reality heterogeneous profiles are common across different sites and conditions. 15

Joint UK Land Environment Simulator (JULES)
We used the Joint UK Land Environment Simulator (JULES; Best et al., 2011;Clark et al., 2011) in this study. JULES can be coupled as lower boundary condition to the UK Met Office Unified Model (Cullen, 1993). Within JULES choices can be made (e.g. canopy radiation model type) and certain modules (e.g. vegetation dynamics) can be switched on or off to operate at 20 different levels of complexity. In addition, we chose the UK Variable resolution configuration (UKV) because it is the standard setting when JULES is run coupled with the UK Met Office Unified Model. The UKV land grid cell size is 1 km by 1 km.
However, our study focused on JULES standalone simulations at the 12 grid points located at the sites investigated. The UKV setting employs the multi-layer canopy radiation module with surface heat capacity and snow beneath the canopy, the single canopy layer `big leaf' approach for leaf-level photosynthesis (which computes radiation absorption with Beer's law). Soil 25 heat conductivity was calculated using the approach of Dharssi et al. (2009). We used the default JULES-UKV soil layering with shape parameter n (-), α (m -1 ) representing the inverse of the water entry pressure, θres (or smres; m 3 m -3 ) is the empirical residual soil moisture content (without physical meaning), and θsat (or smsat; m 3 m -3 ) is the saturated soil moisture content. In JULES parameter n is defined as b = 1/(n-1) and sathh=α -1 (m).
The Mualem equation computes the unsaturated hydraulic conductivity K: 5 where Ksat (or satcon; mm·s -1 ) is the saturated hydraulic water conductivity.
The values of the Mualem-Van Genuchten parameters need to be defined by the user for each grid cell/point based on soil characteristics.
In JULES soil moisture directly interacts with transpiration (through root water uptake) and bare soil evaporation as described 10 hereafter (see also Figure A1.1 of Appendix 1). JULES first computes the potential photosynthesis, which is a function of three limiting factors: Rubisco limitation, radiation limitation, and photosynthetic product transport limitation. The potential photosynthesis is multiplied with a soil moisture reduction factor to obtain the actual photosynthesis. To obtain this soil moisture reduction factor the model first computes a limiting factor for each layer: where θi is the unfrozen soil moisture content in layer i, θcrit is the critical point soil moisture content below which soil moisture is limiting the root water uptake (matrix water potential -330 cm in JULES), and θwilt is the wilting point soil moisture content below which no root water uptake occurs (-15000 cm in JULES). These reduction factors are multiplied with the root density in the layer. These weighted reduction factors are then summed to obtain the root zone soil moisture reduction factor. From the actual photosynthesis the plant stomatal conductance is computed. Separately the bare soil surface conductance, which is 20 a function of the soil moisture content in the upper soil layer and the critical soil moisture, is computed. The surface conductance is then computed as a function of the stomatal conductance and the bare soil surface conductance.
The potential evapotranspiration is also calculated separately. This variable is multiplied with the saturated land fraction to compute the free water evaporation (e.g. lake and canopy evaporation). The rest of the potential evapotranspiration is multiplied with the surface conductance to obtain the bare soil evaporation + plant transpiration. Together these fluxes are the 25 actual evapotranspiration (water) or latent heat flux (energy). The amount of water drawn from the top soil layer through bare soil evaporation depends on the bare soil surface conductance. The distribution of the root water uptake between the layers depends on the weighted soil moisture limitation factor for each layer. The water extraction from the soil in its turn directly affects the soil moisture content in the different layers at the start of the next time step. These soil moisture contents then affect the soil moisture redistribution, surface runoff, and deep drainage. 30 Hydrol. Earth Syst. Sci. Discuss., doi: 10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. JULES-UKV also requires a number of initial conditions: the amount of unfrozen water and snow stored on the surface (on canopy and on soil surface; set to zero in this study), snow properties (set to JULES-UKV defaults), the surface temperature (set to the air temperature of the hour before the first simulation time step), the soil temperature of each layer (set to the soil temperature from Ameriflux data the hour before initial time step) the soil water content in each layer (set to the soil water content from the PS observed moisture content of the hour before the first simulation time step and applied homogeneously 5 throughout the profile). Soil moisture was spun up by running a maximum of five cycles and stopped when soil moisture convergence was lower than or equal to 10% compared to the previous cycle.

Calibration approach
At each site we calibrated JULES against observed PS observed soil moisture and against CRNS observed neutron counts respectively. We chose to calibrate simulated neutron counts against CRNS observed neutron counts because it allowed us to 10 take into consideration vertical heterogeneity in modelled soil moisture by computing neutron counts from modelled soil moisture profiles using COSMIC. We computed the Root Mean Squared Error (RMSE) between simulated and observed hourly time series. To better match the observed soil moisture/neutron count time series, we calibrated five JULES parameters that influence the model soil moisture state ( Figure 3). These included three Mualem-Van Genuchten parameters: b, sathh, and satcon and soil moisture limitation factor β parameters smcrit and smwilt. We chose these parameters because they are, in 15 theory, directly linked to the movement of moisture in the soil and to the effects of soil moisture on transpiration in JULES.
To assess the effects of calibration on soil moisture and surface energy flux simulation we compared the calibrated solutions against a default run at each site. The parameter values for the default case were derived from soil properties (percentages clay, loam, and organic matter) reported by the Harmonised World Soil Database (HWSD; FAO, 2009) for each of the twelve sites.
These properties were used in the Wösten Pedotransfer function (Wösten PTF;Wösten et al., 1997) to obtain values for b, 20 sathh, and satcon. Parameter values for smwilt and smcrit were subsequently obtained with the Van Genuchten formula.
The parameter calibration ranges were the same for all sites (Table 2). They were constructed by computing the minimum and maximum parameter values for the entire soil texture triangle (based on Wösten PTF), while also taking into consideration three organic matter contents (yielding three triangles), but excluding clay percentages above 70% to avoid extreme values for parameter b especially. The range for smcrit was set to 10-90 % of the saturated soil moisture content. The saturated soil 25 moisture content was computed as a function of the dry soil bulk density (bd): smsat =1bd / 2.65 (Brady and Weil, 1996).
Soil bulk density values obtained from the COSMOS network were used. This thus yielded different smcrit ranges in terms of soil moisture content (m 3 m -3 ) for different sites. The residual soil moisture content (defined implicitly in JULES) was set to zero because the Wösten PFT does not consider it.
JULES' remaining two ancillary parameters; the soil heat conductivity and soil heat capacity, were for each site computed as which 86 % of plant roots are present) and the canopy height, at sites where more specific information was available from Ameriflux/COSMOS site information or from site specific literature.
We calibrated JULES using the BORG Multi-Objective Algorithm (BORG-MOEA or BORG; Hadka and Reed, 2013). This calibration tool was designed for multi-objective problems but also works for single-objective calibration. BORG employs multiple optimisation algorithms simultaneously to obtain convergence while also keeping the searched parameter space wide. 5 The algorithm measures progress with the epsilon-progress technique, which uses the Objective Function (OF) space divided in boxes with sides of size epsilon. Epsilon is a user defined value for each OF (we used epsilon values of 0.001 m 3 m -3 for PS and 1 cph for CRNS). Only if a new solution resides inside a box with a better OF value, BORG considers it is progress. If no progress was obtained after 200 runs, the algorithm had stagnated. In this case the BORG algorithm triggers a restart, which consists of (among other techniques) changing population size to maintain a diverse population and to escape local optima. 10 We used a maximum number of 3,000 runs and an initial population size of 100 runs.

Validation approach
As main validation metric we chose RMSE between the observed and simulated Evaporative Fraction EF = LE / (LE + H).
The EF tells how the net surface radiation is partitioned between all LE and H fluxes. Additionally, we computed the RMSE between observed and simulated LE. Because data quality issues with EC data are often observed during at night time (Goulden 15 et al., 1996;Aubinet et al., 2010), we computed these metrics over day time hours only. We defined day time as downward shortwave radiation > 20 Wm -2 . To avoid extreme RMSE-EF values, we used hours with both observed and simulated H and LE values ≥ 1 Wm -2 only. Otherwise a few hours with small LE or H values would have dominated the RMSE values, while this would probably have been due to forcing or EC data inconsistencies and would not relate to soil moisture temporal variability. 20

Time series analyses
In Figure 4 comparison between PS and CRNS soil moisture time series shows that the seasonal trends of the two soil moisture observation products were similar. The two soil moisture products however also differed from each other, in distinct ways at 25 different sites. At UM PS was peakier (peak height up to 0.10 m 3 ·m -3 compared to <0.05 m 3 ·m -3 ). At DC PS dry down was more gradual compared to CRNS and reached soil moisture contents at least 0.01 m 3 m 3 above PS SM. At SO PS dried down quicker and was then systematically drier than CRNS. At KE PS was peakier (peak height up to 0.20 m 3 ·m -3 compared to <0.10 m 3 ·m -3 for CRNS) and systematically wetter. ME was characterised by a slower dry down during summer for PS compared to CRNS. At SR PS was systematically higher (~0.25 m 3 ·m -3 ) than CRNS. At CS PS SM dried down quicker and 30 Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. then stayed at a systematically wetter soil moisture content (0.1 m 3 m -3 higher) than CRNS soil moisture. At MM the seasonality was the main difference, with PS being up to 0.1 m 3 m -3 drier in summer and 0.05 m 3 m -3 wetter during winter. At TR PS was peakier and during wet periods (winter) systematically higher (up to 0.2m m 3 m 3 ) than CRNS. At AR PS was mostly wetter.
Also at WR PS was systematically wetter (up to 0.10 m 3 ·m -3 ). During spring PS dried down slower than CRNS. At MO PS was wetter as well (up to 0.20 m 3 ·m -3 ) but also showed a higher seasonality (up to 1.5 times larger amplitude) than CRNS. 5 PS SM observations were systematically higher than CRNS SM observations at eight of twelve sites. In addition, the CRNS data showed high frequency variation, especially clear at SO, MM, TR, WR, and MO. This was an effect of inherent randomness in neutron radiation reaching the CRNS sensor element. This effect was more pronounced for lower neutron intensity, when there was relatively more hydrogen prevalence in the surroundings of the CRNS.

Similarity metrics 10
The differences seen in Figure 4 are also summarised in Figure 5, which shows a gradual site to site increase in RMSD between observed PS and CRNS and soil moisture data series (RMSD-SMobs). However, between WR and MO there is a sudden ~60% increase in RMSD-SMobs. The systematic bias mentioned in the soil moisture time series analysis for SR reflects here in the large (~90%) contribution from the difference in mean soil moisture. The difference in seasonality mentioned in the time series analyses for MM reflects in the large contribution (~50%) from seasonality (i.e. standard deviation). The higher soil moisture 15 content for PS at TR during winter shows up in the relatively large contribution (~60%) from seasonality. The slower dry down at DC is reflected by the high contribution (~70%) from dynamics (coefficient of correlation). Overall, bias contributed to 50% or more of the total error at seven out of twelve sites.
Scatter plots of RMSD-SMobs against mean and standard deviation of observed PS and CRNS soil moisture content (not shown) revealed that the difference between PS and CRNS soil moisture content is larger at wetter sites. A similar plot, with RMSD-20 SMobs against mean and standard deviation of precipitation (also not shown) showed a weaker correlation. No correlation was found between MSE-SMobs decomposition and wetness conditions. Dominant vegetation type seemed not to have an effect on the similarity between the two soil moisture data products: forested sites included those with relatively small RMSD-SMobs (UM, SO, ME) and those with relatively high RMSD-SMobs (MM, WR, MO). The same holds for grass sites (KE, TR, AR).
Only bare/shrub covered sites (DC, SR, CS) were all below the sites' average RMSD-SMobs of 0.049 m 3 m -3 . These four sites 25 were however also relatively dry. No correlation between dominant land cover and decomposition was found. The grassy sites had the most even distributions between the three decomposition terms. Soil type and soil bulk density were also investigated for correlations with RMSD-SMobs, but no trends were discovered.
Larger differences between PS and CRNS soil moisture could be expected at sites with more heterogeneous soil or vegetation.
Static satellite photos of the sites from the COSMOS website did not indicate systematically more heterogeneous conditions 30 at the sites where PS and CRNS soil moisture differed more. Site info (e.g. topography, presence of rocks) from COSMOS and Ameriflux did also not clearly show more horizontally heterogeneous soil properties for those sites. Concluding, the differences between the two soil moisture products could not be clearly related to physical site characteristics besides the mean soil wetness.

Discussion of data analyses results
The fact that the soil moisture time series of PS and CRNS differed from each other in various ways could be related to a number of issues. First, PS sensor types and numbers of sensors differed between the sites ( Figure A1 of Appendix A). 5 Secondly, the installation methods may also have been different between the sites. Thirdly, the exact installation locations of the PS sensors may in certain cases have been for instance next to a macropore, or near roots, while at other sites they were coincidently installed in a homogeneous soil patch.
The presence of hydrogen pools other than soil moisture (e.g. biomass, intercepted water, and water in litter layer) also affect the observed CRNS neutron count. Because different hydrogen pools are more present at certain sites than at others, the 10 uncertainty on neutron count observations varies between sites. The results did however not show effects of land cover and soil properties on the similarity between the two soil moisture products. Another factor is that, the quality of the calibration of COSMIC could possibly be different for different sites. Finally, at multiple sites, the PS and CRNS soil moisture time series were quite similar. This could be expected at rather homogeneous sites. Moreover, Kohli et al. (2015) suggested the CRNS footprint to be around 150-200 m instead of 300 m as reported by Desilets and Zreda (2013). In that case the differences 15 between the two soil moisture observation techniques could be smaller than initially thought.

Limitations of the data comparison methodology
We derived vertically constant CRNS soil moisture values from observed neutron counts with COSMIC. This method contains inherent uncertainty because in reality soil moisture is often not vertically homogeneous. Comparing these derived CRNS soil moisture values with PS soil moisture data integrated to certain layers (0-10cm and 10-35cm) is therefore a deviation from 20 field conditions. PS installation depths differed between sites and CRNS measurement depth varied between sites and over time. The comparison might therefore have been `more valid' at some sites than at others.
The different lengths (between sites) of the time series used can have affected the metrics and sites' ranking. However, we used at least one year of data at all sites.
For these reasons these data analyses should be seen as a first simple comparison of the two data types only. The outcomes of 25 the calibration (against PS and CRNS) and validation provide insight in the effects of the differences between the two soil moisture products on JULES' surface energy flux partitioning and latent heat flux simulation.

Calibration
Calibration reduced the objective function (RMSE-SM) values in all cases ( Figure 6). However, the degree to which this happened differed between sites and the two calibration strategies (PS/CRNS), with decreases of 21% (AR-PS) to 93% . While the errors of the default runs existed mostly of systematic bias, after calibration the difference in dynamics was Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. the largest source of uncertainty and in 16 out of 24 cases this contribution actually increased in absolute terms. This happened because peaks and valleys became less extreme after calibration. Higher peaks yield higher correlation (with scatter plot slope > 1) because the relatively higher peaks within a time series differ more clearly from smaller (noise) peaks (e.g. SR-PS, WR in Figure 7). When the peaks are then reduced by calibration, the high peaks do not stand out as much anymore and the correlation decreases. 5 The RMSE values reduced relatively more for the PS calibration than for the CRNS calibration. The calibration method could possibly explain this. CRNS calibration was against observed neutron counts, while PS calibration was against observed soil moisture contents. Because neutron counts have an inverse relationship with soil moisture content, the PS calibration was possibly governed by avoiding larger errors occurring during a few brief soil moisture peaks. While focussing on getting the fitting for those peaks right, the PS calibration neglected the smaller errors during dry periods. This would then result in 10 relatively smaller decrease in RMSE values than for the CRNS calibrations because those were fitted with heavier weights to the drier periods. Figure 6 also shows that the relative improvement was not systematically lower or higher for sites with higher similarity between the two observed soil moisture time series (actually the largest improvement was for the CRNS calibration at UM).
The quality of the default runs was hence maybe simply determined by the quality of the chosen default parameter values. 15 Figure 7 shows soil moisture time series for four selected sites: UM was chosen because PS and CRNS soil moisture were most similar there, SR was a site with moderate difference, and at WR and MO PS and CRNS soil moisture were most different.
The soil moisture time series approached the observations better in all cases, especially at UM and WR, where the default runs overestimated soil moisture contents. The simulated soil moisture dynamics at WR approached those of PS and CRNS observed soil moisture closely, while these differed from each other substantially. 20

Validation
While calibration errors decreased, the Evaporative Fraction estimation improved for eleven out of twenty-four calibrations ( Figure 8a) and we observed similar results for latent heat flux ( Figure A2.1 of Appendix 2). This basically means that an improvement in simulated soil moisture did not necessarily lead to better estimation of surface energy fluxes.
In fact, the magnitude of those improvements (EF: 3% to 30%; LE: 1% to 37%) were comparably to the deterioration from 25 other calibration cases (EF: 2% to 28%; LE: 1% to 29%). The results also showed no systematically greater improvement across all sites for either of the two calibration strategies. Compared to the default case, CRNS calibration yielded lower RMSE-EF at four sites while PS calibration yielded better EF at eight sites. In Figure 8b 10% change in EF estimation was chosen to distinguish substantial change from non-substantial change, derived from the approximate error in EC sensible heat flux data (Finkelstein and Sims, 2001). It shows that in just three cases for PS calibration and three cases for CRNS calibration 30 the improvement was actually substantial. At WR and CS, PS calibration improved EF most, while at these same sites CRNS calibration did not yield improvement. Finally, larger differences between the two observed soil moisture time series did yield better performance during these periods compared to periods with SMroot zone>smcrit, after both PS and CRNS calibration.
We explored trends between relative improvement in surface energy flux estimation and soil wetness, precipitation, vegetation type, vegetation height, and soil characteristics. No clear trends were discovered. The only feature that could be distinguished was that for the two sites with a bare soil tile (DC and SR) PS and CRNS calibration improved EF and LE. Rooting depth did also not explain relative improvement in surface energy flux estimation. 10 In Figure

Were calibrated parameter values physically feasible?
If parameter values obtained after calibration were not physically feasible (e.g. representing a sandy soil while there was a clay 20 soil) then, if model structure is assumed to represent biophysical processes sufficiently well, that could yield undesirable results. We investigated whether the parameter values obtained were `realistic' by seeing if they were substantially different from the soil conditions reported by HWSD (used to obtain default values), COSMOS, and Ameriflux. We analysed the calibrated parameter values from the single-objective calibrations for each site with the parallel coordinate plots in Figure A2 Although the parameter calibration range of sathh was rather wide compared to the range of soil types from HWSD at the 30 twelve sites, the values were on, or very near to the boundaries at all sites for the default runs and in thirteen cases after the calibrations. The saturated hydraulic conductivity was on the edges for 22 runs. It should be noted however that, especially for Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. the saturated hydraulic conductivity, the parameter calibration range was non-linear, with a factor 1000 difference between the upper and lower boundaries.
COSMOS/Ameriflux soil information (Table 1) for UM suggests sandier conditions, which would yield higher saturation hydraulic conductivity, which was indeed obtained after PS calibration. At SO calibration yielded higher values for the saturation hydraulic conductivity and a lower value for b, which is in line with the thick organic layer occurring there. At the 5 KE site, the soil was reported to be coarser (and also containing stones) than actually reported by the HWSD. This would mean higher saturation hydraulic conductivity, but a higher values was not obtained after calibration. At SR and MM the soil was reported to be of finer texture than the data from HWSD. Lower satcon values than the defaults were however not obtained.
At TR a clay hardpan was found at 30-40 cm, which would impede drainage and calibration could hence be expected to yield lower satcon. This did however not happen after our calibrations. AR and WR were reported to be sandy, in contrast with the 10 loam soils reported by HWSD. Higher satcon values were found after CRNS calibration at AR and after both calibrations at WR.
We could think of a range of reasons why apparently non-physically realistic parameter values were found, for instance:  Horizontal and vertical soil heterogeneity (e.g. macro-pores, soil layering in organic matter and particle sizes, stones) was not (properly) accounted for in the model 15  The vertical discretisation of the soil (layers of 10, 25, 65, and 200 cm) is not suitable for solving Richards' Equation (Beven and Germann, 2013)  Richards' Equation applicability at larger horizontal scales is questionable for multiple reasons (Beven and Germann, 2013).  JULES' bottom boundary condition (free drainage) may not have been suitable in all cases. We used Figure 3 from 20 Fan et al. (2013) to get an idea of the water table depth. This did not indicate the presence of shallow groundwater tables (< 5m), which would have made free drainage unsuitable as bottom boundary condition. However, the resolution of the used map was rather coarse for our purpose. Moreover, other conditions, such as perched groundwater tables and shallow hard rock could be present and impede free drainage.  The wilting point and critical point soil moisture contents were computed as a function of soil properties only; 25 pressure heads of -16000 cm and -300 cm respectively were used regardless of vegetation type. More complex assumptions are however made in more complex models. For instance, a common soil water stress function, similar to the one used in JULES, is the Feddes function (Feddes et al., 1978). This function can take into account two different critical soil moisture content values; one for high potential transpiration and one for low potential transpiration. Moreover, the critical soil moisture pressure (-300 cm in JULES) can take different values for 30 different vegetation types. Taylor and Ashcroft (1972) for instance reported values between -150 cm and -15000 cm for crops and grass. The results presented in Section 4.3 indicated these two parameters might govern the soil wetness importantly in JULES.  Physically realistic parameters often provide unrealistic results (Gupta et al., 1998).

Two-objective calibration against soil moisture and latent heat flux 35
The single-objective calibration against PS and CRNS soil moisture did in thirteen of twenty-four cases not yield better surface energy fluxes and in only six cases was improvement substantial. This was not completely surprising because single-objective calibration is often insufficient to constrain parameters to simulate different states and fluxes well (Gupta et al., 1998). In addition, Vereecken et al. (2015) argued that calibrating soil hydraulic parameters against soil moisture only does not guarantee Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. better surface energy fluxes. To find out whether it was actually feasible to expect better surface energy flux simulation when soil moisture was improved we performed calibrations where we optimised the model for two objectives simultaneously. We employed the BORG algorithm to simultaneously optimise daytime latent leat flux RMSE and soil moisture RMSE (using PS and CRNS separately). Both optimisation experiments are defined as PS/LE and CRNS/LE.
The results from the PS/LE and CRNS/LE calibrations are shown in Figure 10 and Figure 11 respectively. From the results 5 we learned that for five of the 12 PS calibrations and four of the 12 CRNS calibrations a substantially better LE estimation than obtained with the single-objective calibration could have been obtained while maintaining a similar soil moisture error (black points, representing all two-objective calibration runs, were present right below the single objective calibration solutions).
The shapes of the Pareto fronts did not suggest automatic improvement in LE estimation with improved soil moisture 10 estimation except in a few cases (e.g. DC in higher RMSE-SM range and SO). To have obtained such automatic improvement, the lower edges of the point clouds should have shown positive correlations.
We plotted the compromise solutions (the runs which did relatively well for both RMSE-SM/N and RMSE-LE) on top (green triangles) to see if a two-objective calibration would have led to improved soil moisture and latent heat flux. This was the case for all PS/LE compromise solutions except at TR and for all CRNS/LE compromise solutions except at UM. This suggests 15 that two-objective calibrations could have yielded improved soil moisture and latent heat flux. This is also clear from Figure   13, where the relative improvements for the compromise solutions with respect to the default solutions are shown.
The large differences in EF/LE simulation between the PS and CRNS single-objective calibrations at WR and MO seem, based on Figure 10 and Figure 11, mostly a coincidence of which best solution was chosen. At WR a CRNS-calibration solution with the same neutron count/soil moisture RMSE but better RMSE-LE could have been obtained, yielding LE performance similar 20 to the PS-calibration. At MO this was true for the PS calibration. At other sites where we saw differences between PS and CRNS (e.g. MM), the Pareto fronts of Figure 10 and Figure 11 suggest these differences could as well have been quite limited. Our results (Figure 10 and Figure 11; especially the rather horizontal lower edges of the point clouds in those figures), suggest that the coupling between soil moisture and latent heat flux was generally quite weak in JULES, even at sites where such coupling is expected to be relatively strong (e.g. DC, SR, KE, AR). Moreover, differences between surface energy flux simulation of PS and CRNS calibrations were minimal.

To which parameters were soil moisture and latent heat flux most sensitive? 30
We investigated if a change in some parameters had a relatively small effect on soil moisture, while at the same time having a large effect on latent heat flux. In such case calibration could yield better soil moisture, but the parameter value might be inappropriate for latent heat flux. The inverse (influential on SM but not on LE) could also occur. We explored this by Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License.
performing a sensitivity analysis with Morris' method (Morris, 1991) as implemented in the SAFE Toolbox (Pianosi et al., 2015) on the exact same parameter value ranges as used during the calibration. We computed the sensitivity indices (mean and standard deviations of the elementary effects) on the RMSE values (i.e. our Objective Functions; OFs) of simulated vs.
observed PS soil moisture, simulated vs. observed CRNS neutron counts, and simulated vs. observed latent heat flux (day time only). 5 The results (shown for four sites in Figure A2.2 of Appendix 2) were consistent across most sites: all three OFs were most sensitive to changes in parameter b and least sensitive to the wilting point multiplier. Finch and Haria (2006) also found JULES parameter b to be most influential on soil moisture and latent heat, at a UK chalk site. The critical point soil moisture content was influential with respect to soil moisture / neutron counts but had at all sites less effect on latent heat flux. The lack of effect from the wilting point soil moisture content can probably be attributed to the use of the multiplier, despite being a common 10 approach (Prihodko et al., 2008;; a certain multiplier value could be good in combination with a certain value for the critical point but not for a different critical point value.

Could JULES model structure explain the limited improvement in surface energy flux estimation?
We found multiple causes for the lack of improvement in surface energy fluxes after calibration against soil moisture data and the quite weak coupling between JULES soil moisture and latent heat flux seen from the single and two-objective calibrations 15 (PS-SM/LE and CRNS-N/LE). One cause was that even when root zone soil moisture changed considerably (not shown), soil moisture stress changed little (self-adjusting behaviour, Figure 13), and hence there was no considerable change in surface energy partitioning. We found this to be the main reason for KE, MO-CRNS, and TR-CRNS.
AR-CRNS, ME-CRNS, SR-PS, and SO-PS however did yield change in soil moisture stress (Figure 13), but other factors limited improvement in EF/LE. AR-CRNS, SR-PS, and SO-PS did yield different EF and LE timeseries, but because 20 performance deteriorated during certain periods but improved during other periods, overall performance did not change. For instance, the CRNS calibration at AR yielded better EF and LE during wet periods with high LE, while during subsequent periods of drying the estimation was worse. EF and LE estimation after PS calibration at SR was better during dry periods while it was worse during wetter periods (monsoon). PS calibration at SO improved EF and LE during dry periods after summer while the simulation was worse in spring. Soil moisture stress at ME was relatively limited (beta mostly above 0.6) and during 25 periods of soil moisture stress, the latent heat flux was dominated by different stress factors.
At UM, CS, MM, WR, and MO, EF and LE were substantially different between single-objective PS and CRNS calibrations. Figure 13 shows that in these cases the soil moisture stress was very different. At UM, CS, and MO, root zone soil moisture was similar (not shown) between the PS and CRNS calibration solutions, causing different stress due to different wilting point and critical point soil moisture, while at MM and MO, the root zone soil moisture actually changed considerably. 30 These results indicate the relatively large effect of the wilting point and critical point soil moisture parameters on the calibration results. The self-adjusting behaviour can yield similar soil moisture stress for different root zone soil moisture, while in other cases the wilting point and critical point values can be such that simulated soil moisture is close to the observations but root Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 8 November 2016 c Author(s) 2016. CC-BY 3.0 License. zone soil moisture stress is different. With respect to the two-objective calibrations the self-adjusting behaviour contributed to the possibility of having highly different soil moisture but similar latent heat flux performance.

Discussion
Our results suggest a relatively weak coupling between soil moisture and evapotranspiration in JULES. In combination with the self-adjusting behaviour of wilting point and critical point soil moisture values this suggests that how soil moisture 5 observations are used to calibrate JULES should be carefully considered beforehand. In the Land Surface Modeling, community it is known that the absolute value of soil moisture content has limited information content for the model (Dirmeyer et al., 2000;Koster et al., 2009). However, we found the wilting and critical point soil moisture parameters not only to move up or down after calibration, but also to move closer when the standard deviation in simulated root zone soil moisture content decreased. Our findings might also have implications for data assimilation if certain critical and wilting point soil moisture 10 parameter values are chosen that do not match the magnitude of the assimilated soil moisture.
Finally, the potential differences in terms of contribution to model performance observed between the PS and CRNS sensors are undermined by the weak coupling in the current model structure of JULES. Namely, even at sites where the two observed soil moisture time series differed most, surface energy partitioning simulation did not differ substantially between PS and CRNS calibration. As mentioned in Section 2.3 the calibration approach differed for the two soil moisture products; we 15 calibrated against PS soil moisture and CRNS neutron counts. We chose to calibrate against neutron counts because it is the way in which CRNS data could be used in the context of LSM when the COSMIC operator is used. If we had calibrated against representative CRNS soil moisture values, we might have found worse fits, more similar to those of the PS calibrations.
Previous research has indicated that soil moisture alone is insufficient to estimate soil hydraulic parameters (Vereecken et al., 2008). Vereecken et al. (2015) commented that even improved land surface models in combination with better soil moisture 20 observations do not necessarily yield correct latent and sensible heat flux estimation. Our findings support this.

Conclusions
We investigated whether reducing the spatial scale mismatch between the surface energy flux data and soil moisture data could be reduced through the use of Cosmic-Ray Neutron Sensors (CRNS). Five soil-evapotranspiration parameters of LSM JULES were calibrated against Point Scale (PS) and CRNS soil moisture data separately, for twelve sites with different climate, land 25 cover, and soil properties. Next, at each site, the improvement in surface energy partitioning and latent heat fluxes for the two calibration solutions was assessed by comparing the fit with EC data and a version of LSM JULES runs with default parameter values based on a widely used soil database. Before the calibrations we compared the observed soil moisture data from the two sensor types. These analyses showed the differences between PS and CRNS soil moisture happen in different ways at the investigated sites. While at certain sites there were mainly systematic biases, at other sites the seasonality was more different , 30 Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-558, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. while at other sites different time series dynamics were the main cause of differences between the two soil moisture observations. We found the difference between the two soil moisture products to be larger at wetter sites.
The calibration of JULES parameters did not necessarily yield an improvement in model surface energy partitioning and latent heat flux simulation. Single-objective (soil moisture) and multi-objective calibrations (soil moisture and latent heat flux) did not yield substantial differences between PS sensor and CRNS calibrations in simulating latent heat flux at eleven of the twelve 5 sites. Moreover, at sites where the observed soil moisture time series from the two observation techniques diverged more, the differences between the resulting surface energy fluxes were not larger. The assumed benefits of a larger footprint of the Cosmic-Ray Neutron Sensor were not clear to JULES due to its weak coupling behaviour observed in this study. We found a few causes of the lack of improvement: (1) the wilting point and critical point soil moisture moved up or down in a similar way as the root zone soil moisture did, yielding similar transpiration for different soil moisture conditions; (2) calibration 10 against soil moisture improved surface energy fluxes during certain periods, but deteriorated surface energy fluxes during other periods, yielding similar overall performance; (3) soil moisture was not the main surface energy partitioning controlling factor.
Because our findings indicate that the coupling between model soil moisture and evapotranspiration in JULES is somewhat weak, we recommend to improve the representation of the related land surface processes at sub-kilometre scale; the spatial scale which is consistent with Cosmic-Ray Neutron Sensor and Eddy Covariance flux data. 15     Table 2 for detailed description of parameters.

Site
Lw,in Rnet