Evaluation of a cosmic-ray neutron sensor network for improved land surface model prediction

Abstract. In situ soil moisture sensors provide highly accurate but very local soil moisture measurements, while remotely sensed soil moisture is strongly affected by vegetation and surface roughness. In contrast, cosmic-ray neutron sensors (CRNSs) allow highly accurate soil moisture estimation on the field scale which could be valuable to improve land surface model predictions. In this study, the potential of a network of CRNSs installed in the 2354 km2 Rur catchment (Germany) for estimating soil hydraulic parameters and improving soil moisture states was tested. Data measured by the CRNSs were assimilated with the local ensemble transform Kalman filter in the Community Land Model version 4.5. Data of four, eight and nine CRNSs were assimilated for the years 2011 and 2012 (with and without soil hydraulic parameter estimation), followed by a verification year 2013 without data assimilation. This was done using (i) a regional high-resolution soil map, (ii) the FAO soil map and (iii) an erroneous, biased soil map as input information for the simulations. For the regional soil map, soil moisture characterization was only improved in the assimilation period but not in the verification period. For the FAO soil map and the biased soil map, soil moisture predictions improved strongly to a root mean square error of 0.03 cm3 cm−3 for the assimilation period and 0.05 cm3 cm−3 for the evaluation period. Improvements were limited by the measurement error of CRNSs (0.03 cm3 cm−3). The positive results obtained with data assimilation of nine CRNSs were confirmed by the jackknife experiments with four and eight CRNSs used for assimilation. The results demonstrate that assimilated data of a CRNS network can improve the characterization of soil moisture content on the catchment scale by updating spatially distributed soil hydraulic parameters of a land surface model.


Introduction
Soil water content (SWC) is a key variable of land surface hydrology and has a strong control on the partitioning of net radiation between latent and sensible heat flux (Brutsaert, 2005). Knowledge of SWC is relevant for the assessment of plant water stress and agricultural production, as well as runoff generation as a response to precipitation events Robinson et al., 2008). In atmospheric circulation models, SWC is important as a lower boundary condition, while it is calculated as a state variable in land surface models. Coupling of atmospheric circulation models and land surface models allows for the quantification of the role of soil moisture on atmospheric processes such as soil moisture-precipitation feedbacks (Koster et al., 2004;Eltahir, 1998) and summer climate variability and drought (Seneviratne et al., 2006;Oglesby and Erickson, 1989). It is therefore important to improve the modelling and prediction of SWC. Data assimilation of soil moisture provides a way to improve imperfect land surface model predictions. Here, soil moisture measurements are used to update model predictions by optimally considering the uncertainty of model initial conditions, model parameters and model forcings. However, there is a lack of high-quality soil moisture data . Soil moisture measured by spaceborne remote sensing technologies provides information over large areas but is strongly affected by vegetation and surface roughness (e.g. Temimi et al., 2014). Therefore, in this paper an alternative source for soil moisture information is explored which can measure soil moisture more accurately under dense vegetation (Bogena et al., 2013). Cosmic-ray neutron sensors (CRNSs) measure fast neutron intensity on an intermediate scale of ∼ 15 ha (Kohli et al., 2015;Zreda et al., 2008), which is the desired application scale of land surface models (Ajami et al., 2014;Chen et al., 2007;Shrestha et al., 2014). Fast neutrons originate from collisions of secondary cosmic particles from outer space with terrestrial atoms. Fast neutrons in turn are moderated most effectively by hydrogen because the mass of a neutron is similar to that of a nucleus of the hydrogen atom. Therefore, the corresponding fast neutron intensity measured by CRNSs strongly depends on the amount of hydrogen within the CRNS footprint, allowing for a continuous non-invasive soil moisture estimate on the field scale. The spatial extent of this measurement is desirable as it matches with the desired grid cell size of a high-resolution land surface model (Crow et al., 2012), and small scale heterogeneities are averaged over a larger area Kohli et al., 2015). Vertical measurement depth ranges from a maximum of ∼ 70 cm under completely dry conditions and decreases to roughly ∼ 12 cm under wet conditions (e.g. 40 vol. % soil moisture) (Kohli et al., 2015;Franz et al., 2012). Worldwide several CRNS networks exist, such as the North American COSMOS network , the German CRNS network  installed in the context of the TERENO infrastructure measure (Zacharias et al., 2011), the Australian COSMoZ network (Hawdon et al., 2014) and the British COSMOS-UK (Evans et al., 2016).
In this work, fast neutron intensity data measured by CRNSs are assimilated in a land surface model, to evaluate the impact those data can have on improving soil moisture characterization and land surface model predictions. The ensemble Kalman filtering (EnKF) is one of the most commonly applied data assimilation methods (Evensen, 1994;Burgers et al., 1998). The EnKF is much less CPU-intensive compared to alternative methods such as the particle filter (e.g. Montzka et al., 2011), because for high-dimensional problems the EnKF requires a much smaller ensemble size to achieve reasonably good predictions. The ensemble Kalman filter (Reichle et al., 2002a;Dunne and Entekhabi, 2005;Crow, 2003;De Lannoy and Reichle, 2016), variants such as the extended Kalman filter (Draper et al., 2009;Reichle et al., 2002b) and the local ensemble transform Kalman filter (Han et al., 2013(Han et al., , 2015 were applied for updating soil moisture states in land surface models. Reichle et al. (2002a) performed a synthetic experiment using L-band microwave observations of the Southern Great Plains Hydrology Experiment (Jackson et al., 1999) to analyse the effect of ensemble size and forecast errors. Dunne and Entekhabi (2005) showed that an ensemble Kalman smoother approach, where data from multiple time steps were assimilated to update current and past states, can yield a reduced prediction error compared to a pure filtering approach. More recently, state updates with the EnKF were tested for the Soil Moisture Ocean Salinity (SMOS, Kerr et al., 2012) mission. De Lannoy and Reichle (2016) assimilated SMOS temperature brightness and soil moisture retrievals into a land surface model with large improvements in surface soil moisture. However, localized error patterns were not captured well enough, and locally optimized EnKF error parameters would improve prediction results further.
More recent work addressed joint updating of model states and parameters in hydrologic and land surface models with data assimilation methods. Joint state-parameter estimation with EnKF is possible by an augmented state vector approach (Chen and Zhang, 2006), a dual approach (Moradkhani et al., 2005) or an approach with an additional external optimization loop (Vrugt et al., 2005). In the augmented state vector approach, parameters are included in the state vector and are updated via cross-covariances between states and parameters. The cross-covariances are estimated from the ensemble. In the dual approach, first parameters are updated by data assimilation, and the assimilation step is repeated with the updated parameters to also update the states by data assimilation. In the approach with an external optimization loop the parameters are not updated by EnKF, but in an external optimization loop. Pauwels et al. (2009) were one of the first to optimize soil hydraulic parameters of a land surface model by data assimilation, assimilating synthetic aperture radar data. Lee (2014) used synthetic aperture radar soil moisture data to estimate soil hydraulic properties at the Tibetan plateau using the EnKF and a soil-vegetationatmosphere transfer model. Bateni and Entekhabi (2012) assimilated land surface temperature with an ensemble Kalman smoother and achieved a better estimate of the partitioning of energy between sensible and latent heat fluxes. Han et al. (2014) updated soil hydraulic parameters of the Community Land Model version 4.5 (CLM) by assimilation of synthetic brightness temperature data with the local ensemble transform Kalman filter (LETKF) (Hunt et al., 2007) and showed the potential of this approach for improving land surface states and fluxes like evapotranspiration. Shi et al. (2014) used the ensemble Kalman filter for a synthetic multivariate data assimilation problem with a land surface model and then applied it to real data (Shi et al., 2015). Both cases illustrate that parameters from different compartments can be updated successfully by multivariate data assimilation. Kurtz et al. (2016) developed a particular CPU-efficient data assimilation framework for the coupled land surface-subsurface model TerrSysMP (Shrestha et al., 2014). They successfully updated 2 × 10 7 states and parameters in a synthetic experiment. Whereas these studies were made with land surface models, in soil hydrological applications recently data assimilation was also used to estimate soil hydraulic parameters. Early work was by Margulis (2011, 2013) in the context of real-time control of wastewater reuse in irrigation and also showed the potential of EnKF in soil hydrol-ogy. Montzka et al. (2011Montzka et al. ( , 2013 explored the role of the particle filter for handling non-Gaussianity in soil hydrology data assimilation. They showed that the nonlinear character of the soil moisture retention characteristic is critical for joint state-parameter estimation in data assimilation systems and showed that the particle filter is an interesting alternative for soil hydraulic parameter estimation for 1-D problems. Erdal et al. (2014) investigated the role of bias in the conceptual soil model and explored bias-aware EnKF as a way to deal with it. They argued that the exact location of soil layers is often not known and that this can severely deteriorate the performance of EnKF. Song et al. (2014) worked on a modified iterative EnKF-based filter to handle the non-linearity and non-Gaussianity of data assimilation for the vadose zone. They proposed a modified procedure which avoids the high CPU need of a fully iterative method, but which still gives stable results. Erdal et al. (2015) also focussed on handling of strong non-Gaussianity of the state variable in EnKF under very dry conditions. They showed that classical EnKF fails under such conditions and proposed two alternative strategies, both involving transformation of state variables, which also performed favourably under very dry conditions with strongly skewed pressure distributions. All these studies on joint state-parameter estimation showed in general that estimation of soil hydraulic or land surface parameters improves model predictions (strongly), but can be unstable for strongly non-Gaussian distributions and nonlinear problems. For a further literature review on data assimilation in the context of hydrological and land surface models, we refer to Reichle (2008) and Montzka et al. (2012). Shuttleworth et al. (2013) developed the Cosmic Ray Soil Moisture Interaction Code (COSMIC), which is a forward operator to be applied for assimilating neutron intensity observations from CRNS. The COSMIC code was evaluated for several sites Rosolem et al., 2014). The COSMIC operator was successfully implemented in the Data Assimilation Research Testbed (Rosolem et al., 2014) to allow for state updating by the ensemble adjustment Kalman filter (Anderson, 2001). The surface soil moisture information was propagated into greater soil depth than only the measurement depth using COSMIC in combination with data assimilation (Rosolem et al., 2014). The COSMIC operator was implemented in a python interface that couples the land surface model CLM and the LETKF for joint state-parameter updating (Han et al., 2015). Neutron counts measured by CRNSs have been used in data assimilation studies to update model states (Han et al., 2015;Rosolem et al., 2014). Soil hydraulic parameters were also updated by assimilation of neutron counts in one synthetic study (Han et al., 2016), showing its feasibility. CRNSs were also used for inverse estimation of soil hydraulic parameters of the HYDRUS-1D model (Villarreyes et al., 2014).
This work further explores the value of measured neutron intensity by CRNSs to improve modelling of terrestrial sys-tems on the catchment scale (Simmer et al., 2015) using a land surface model. The main novelties are as follows: i. Data from a network of nine CRNSs were assimilated in the CLM with an evaluation of the information gain by this assimilation at the catchment scale. Until now, evaluations with CRNSs were made for a single location, but not for a complete network of CRNSs. A very important question is whether CRNSs can also improve the soil moisture characterization on the catchment scale. The high variability of soil moisture at a short distance could potentially limit the CRNS measurement value and make updating of soil moisture contents further away from the sensor meaningless. Conversely, soil moisture, soil maps and atmospheric forcings show spatial correlations over larger distances (Kirkpatrick et al., 2014;Korres et al., 2015), which suggests that CRNS measurements potentially carry important information to update soil moisture contents for larger regions (e.g. Han et al., 2012). If it is found that CRNS networks with a density such as that in this study (nine stations per 2354 km 2 ) can improve soil moisture content characterization on the catchment scale, this is of high relevance and importance for agricultural applications, flood prediction and protection, and regional weather prediction (Whan et al., 2015;Koster et al., 2004;Seneviratne et al., 2010). The main research question addressed in this paper is therefore whether a CRNS network of the density as in this study can improve large-scale soil moisture characterization.
ii. Soil hydraulic parameters are updated in this study together with the soil moisture states in a real-world case study. The study in this paper also allows some evaluation of the updated large-scale soil hydraulic parameters.
2 Materials and methods

Site description and measurements
The model domain, the Rur catchment (2354 km 2 ), is situated in western Germany and illustrated in Fig. 1. The altitude varies between 15 m a.s.l. in the flat northern part and 690 m a.s.l. in the hilly southern part. Precipitation, evapotranspiration and land use follow the topography. The dominant land use types are agriculture (mainly in the north), grassland, and coniferous and deciduous forest. Annual precipitation ranges between less than 600 mm in the north and 1200 mm in the hilly south (Montzka et al., 2008). Annual potential evapotranspiration varies between 500 mm in the south and 700 mm in the north (Bogena et al., 2005). The Rur catchment CRNS network comprises nine CRNSs (CRS1000, HydroInnova LLC, 2009) which were installed  Figure 1. Map of the Rur catchment and locations of the nine cosmic-ray neutron sensors. The hilly south of the catchment is prone to more rainfall, lower average temperatures and less potential evapotranspiration than the north of the catchment. in 2011 and 2012 . Climate and soil texture of the CRNS sites can be found in Table 1.
The CRNSs were calibrated in the field using gravimetric soil samples. At each site, 18 soil samples were taken along 3 circles with distances of 25, 75 and 175 m from the CRNS, and 6 samples were evenly distributed along each circle. Each sample was extracted with a 50.8 × 300 mm round HUMAX soil corer (Martin Burch AG, Switzerland). The samples were split into 6 sub-samples with 5 cm length each and oven dried at 105 • C for 48 h to measure dry soil bulk density and soil moisture. Lattice water, hydrogen from organic and an-organic sources, was determined for each site using a heat conductivity detector (Ray, 1954). Soil bulk density, soil moisture, lattice water and 12 h averaged measured neutron intensity were used to determine calibration parameters specific to each CRNS and the COSMIC operator. This represents a compromise between the measurement noise (which follows a Poisson distribution) and the assumed variation of environmental variables over the averaging time window (Iwema et al., 2015).

Community Land Model and parameterization
The CLM was the land surface model of choice for simulating water and energy exchange between the land surface and the atmosphere (Oleson et al., 2013). Some of the key processes which are modelled by CLM are radiative transfer in the canopy space, interception of precipitation by the vegetation and evaporation from intercepted water, water uptake by vegetation and transpiration, soil evaporation, and photosynthesis, as well as water and energy flow in the subsurface. SWC in CLM is influenced by precipitation, infiltration into the soil, water uptake by vegetation, surface evaporation, and surface and subsurface runoff. To limit the scope and complexity of this study, CLM was run using satellite phenology, e.g. prescribed leaf area index data and the biogeochemical module turned off. The biogeochemical module allows CLM to model the vegetation development dynamically, but it requires a large spin-up of 1000 years, and little additional gain is expected for this study from these additionally modelled processes.
Vertical water flow in soils is modelled by the 1-D Richards equation. Soil hydraulic parameters are determined from sand and clay content using pedotransfer functions for the mineral soil fraction (Clapp and Hornberger, 1978;Cosby et al., 1984) and organic matter content for the organic soil fraction (Lawrence and Slater, 2008).
The joint state-parameter estimation used in this study updates soil texture and organic matter in CLM. Hence, param-eter estimates directly determine soil hydraulic properties in CLM. The following equations describe how soil texture and organic matter define the soil hydraulic properties in CLM such as porosity, hydraulic conductivity, the empirical exponent B and soil matric potential. Hydraulic conductivity (k(z i ), mm s −1 ) at the depth z between two layers (i and i + 1) is a function of soil moisture (θ, m 3 m −3 in layers i and i + 1), saturated hydraulic conductivity (k sat (z i ), mm s −1 ), saturated soil moisture (θ sat , m 3 m −3 ) and the empirical exponent B (Oleson et al., 2013): where φ ice is the ice impedance factor. The ice impedance factor was implemented to simplify an increased tortuosity of water flow in a partly frozen pore space. It is calculated with φ ice = 10 − F ice using the resistance factor = 6 and the frozen fraction of soil porosity F ice = θ ice /θ sat,i . Soil hydraulic properties are calculated separately for the mineral (min) and organic matter (om) soil components. Total porosity θ sat,i is calculated using the fraction of organic matter (f om,i ) with the following: where the organic matter porosity is θ sat,om = 0.9 and sand content in percentage (%) determines the mineral soil porosity θ sat,min as follows: Analogous, the exponent B is calculated with where B om = 2.7 is the organic exponent and the mineral exponent B min,i is determined by clay content in percentage (%) with the following: Saturated hydraulic conductivity is calculated for a connected and an unconnected fraction of the grid cell with the following: where f perc,i is the fraction of a grid cell where water flows with saturated hydraulic conductivity of the organic matter (k sat,om (z i ) in mm s −1 ) through the organic material only, the so-called connected flow pathway. The saturated hydraulic conductivity of the unconnected part (k sat,uncon (z i ), mm s −1 ) depends on organic and saturated mineral soil hydraulic conductivity: where saturated hydraulic conductivity for mineral soil is calculated from the grid cell sand content as follows: The fraction f perc is calculated with the following: Soil matric potential (mm) is defined as a function of saturated soil matric potential (mm) with the following: where saturated organic matter matric potential is ψ sat,om = −10.3 mm and saturated mineral soil matric potential is calculated from sand content as follows:

Cosmic-ray forward model
SWC retrievals were calculated from neutron intensity observations with COSMIC (Shuttleworth et al., 2013) following calibration results and the procedure of Baatz et al. (2014). COSMIC parameterizes neutron transport within the soil subsurface and was calibrated against the more complex Monte Carlo Neutron Particle model MCNPx (Pelowitz, 2005). COSMIC needs considerably less CPU time than the MCNPx model. The code was tested at multiple sites for soil moisture determination Rosolem et al., 2014) and analysed in detail by Rosolem et al. (2014). COSMIC assumes that a number of high-energy neutrons enter the soil. In the soil, the number of high-energy neutrons is reduced by interactions within the soil, leading to generation of fast neutrons in each soil layer. Before resurfacing, the number of fast neutrons is reduced again by their interaction with nuclei of elements within soil (Shuttleworth et al., 2013). The number of neutrons, N CRP , that reaches the CRNS can be summarized in a single integral as follows: where N COSMIC is an empirical coefficient that is CRNS-specific and needs to be estimated by calibration, R. Baatz et al.: Evaluation of a cosmic-ray neutron sensor network for land surface model prediction A(z) is the integrated average attenuation of fast neutrons, α = 0.404 − 0.101 × ρ S is the site-specific empirical coefficient for the creation of fast neutrons by soil, ρ S is the dry soil bulk density (g cm −3 ), ρ w is the total soil water density (g cm −3 ), and m s and m w are the masses of soil and water, respectively, per area (g cm −2 ) L 1 = 162.0 g cm −2 and L 2 = 129.1 g cm −2 are scattering lengths for fast neutrons in solids and water, respectively, that were estimated using the MCNPx code (Shuttleworth et al., 2013). The integrated average attenuation of fast neutrons A(z) can be found numerically by solving where γ is the angle along a vertical line below the CRNS detector to the element that contributes to the attenuation of fast neutrons, and L 3 = −31.65 + 99.29 × ρ S and L 4 = 3.16 g cm −2 are the scattering lengths for fast neutrons in soil and water, respectively, determined using the MCNPx code (Shuttleworth et al., 2013). The COSMIC operator is discretized into 300 layers of 1 cm thickness up to a depth of 3 m. For each CLM grid cell in the model domain, simulated SWC in all CLM layers is used to generate a weighted SWC retrieval using the COSMIC code. Simulated SWC is handed from the CLM simulation history files to the COSMIC operator. Given the vertical SWC distribution of the individual CLM grid cell, COSMIC internally calculates the contribution of each layer to the simulated neutron intensity signal at the soil surface in COSMIC. In this study, the contribution of each CLM soil layer was used to calculate the weighted CLM SWC retrieval corresponding to the vertical distribution of simulated SWC in each grid cell. Measured neutron intensity of CRNS was used to inversely determine a CRNS SWC retrieval, as by Baatz et al. (2014) assuming a homogeneous vertical SWC distribution. Then, the weighted CLM SWC retrieval is used in the data assimilation scheme to relate the CRNS SWC retrieval to the model state. Alternatively, neutron flux data could be assimilated directly within the catchment. This would require calibration data throughout the catchment, which is only feasible using spatially distributed data sets (e.g. Avery et al., 2016). However, high stands of biomass are a major factor for calibration in the Rur catchment , and estimates of biomass come along with high uncertainties. To circumvent the introduction of these additional uncertainties, SWC retrievals are assimilated in this study. Changes in onsite biomass were assumed to be negligible.

Data assimilation
To further expand the work of Han et al. (2016), this study uses the LETKF (Hunt et al., 2007) to assimilate SWC retrievals by CRNSs into the land surface model CLM. Updates were calculated either for SWC states or jointly for SWC states and soil parameters, depending on the experiment setup. For state updates only, the LETKF was used as proposed by Hunt et al. (2007). Calculations were made for an ensemble of model simulations which differed depending on variations in model forcings and input parameters. The states of the different ensemble members are indicated by x f i where i = 1, . . . , N and N is the number of ensemble members; "f" marks the model prediction or forecast before the update. The individual state vectors x f i contain the CLM-simulated SWC of the 10 soil layers and the vertically weighted SWC retrieval obtained with the COSMIC operator. For each grid cell, a matrix X f can be constructed which contains the deviations of the simulated states with respect to the ensemble mean x f : In the case of joint state-parameter updates, a state augmentation approach was followed (Hendricks Franssen and Kinzelbach, 2008;Han et al., 2014). In this case, the augmented model state matrix X f is constructed from the simulated SWC of the 10 soil layers, weighted SWC, and the grid cell's sand, clay and organic matter content.
In order to relate the measured neutron intensity with the simulated SWC of CLM, the observation operator H (COS-MIC) is applied on the measured neutron intensity in order to obtain the expected weighted SWC retrieval at each of the observation locations for each of the stochastic realizations: The ensemble realizations of the modelled SWC retrievals at the measurement locations y f 1 to y f N with respect to the ensemble mean y f are stored in the matrix Y f : The observation error correlation was reduced in space by the factor f red using the spherical model: where d is the distance to the observation and d max = 40 km is the maximum observation correlation length, about half the size of the catchment. Only SWC retrievals within the maximum observation correlation length were used for assimilation. This leads to a "localized" size of Y f and the observation error covariance matrix R. The intermediate covariance matrix P a (also called analysis error covariance matrix) is calculated according to the following: In addition, the mean weight vector w a is obtained as follows: where y 0 contains the CRNS SWC retrievals at the measurement locations. In the ensemble space, a perturbation matrix W a is calculated from the symmetric square root of P a : The final analysis X a is obtained from the following: A more detailed description of the LETKF can be found in Hunt et al., (2007), and details on the implementation of the LETKF in combination with CLM are given by Han et al. (2015).
3 Model and experiment setup

Model setup
In this study, discretization and parameterization of the hydrological catchment was done on the basis of highresolution data. The model of the Rur catchment was spatially discretized by rectangular grid cells of 0.008 • size (∼ 750 m). The model time step was set to hourly. Land cover was assumed to consist of vegetated land units only, and a single plant functional type (PFT) for each grid cell was defined. The plant functional types were derived from a remotely sensed land use map using RapidEye and ASTER data with 15 m resolution (Waldhoff, 2012). Contents of sand, clay and organic matter were derived from the highresolution regional soil map BK50 (Geologischer Dienst Nordrhein-Westfalen, 2009). Alternative simulations were also performed with the FAO soil map of the global Harmonized World Soil Database (FAO, 2012) and with a biased soil texture with a fixed sand content of 80 % and clay content of 10 % (S80 soil map). Average sand and clay content are 22.5 and 21.4 % for the BK50 soil map and 39 and 22 % for the FAO soil map, respectively. The FAO soil map and the biased soil map represent large error with respect to the soil properties of the BK50 soil map. The FAO soil map and S80 soil map simulations allow the evaluation of the joint stateparameter estimation approach because, given the expected bias, we can evaluate to what extent the soil properties are modified by the data assimilation. This is important because in many regions across the Earth a high-resolution soil map is not available. Land surface models are applied for those regions, for example in the context of global simulations, and hence might be strongly affected by the error in soil properties. Maximum saturated fraction, a surface parameter which is used for runoff generation, was calculated from a 10 m digital elevation model (scilands GmbH, 2010). Leaf area index data were derived from monthly-averaged Moderate Resolution Imaging Spectrometer data (MODIS). CLM was forced with hourly atmospheric data from the COSMO_DE reanalysis data set for the years 2010 to 2013 from the German Weather Service (Deutscher Wetterdienst, DWD). The data were downscaled from a resolution of 2.8 km × 2.8 km to the CLM resolution using linear interpolation based on Delaunay triangulation. Forcing data include precipitation, incident solar and longwave radiation, air temperature, air pressure, wind speed, and relative humidity at the lowest atmospheric level.

Model ensemble
Uncertainty was introduced into the regional CLM model by perturbed soil parameters and external model forcings. Contents of sand, clay and organic matter were perturbed with spatially correlated noise from a uniform sampling distribution with mean zero and standard deviations 10 and 30 % (Han et al., 2015). Soil texture perturbation considers that in CLM a single set of pedotransfer functions is assumed to be valid throughout the globe while pedotransfer functions are usually specific to regions (e.g. Patil and Singh, 2016). In other words, the perturbation of soil texture also covers the uncertainty in the pedotransfer function itself. By perturbing texture, soil parameters are also perturbed through the pedotransfer functions used in CLM as specified in Sect. 2.2. Precipitation (σ = 0.5 or 1.0; lognormal distribution) and shortwave radiation (σ = 0.3; lognormal distribution) were perturbed with multiplicative noise with mean equal to 1. Longwave radiation (σ = 20 W m −2 ) and air temperature (σ = 1 K) were perturbed with additive noise. The forcing perturbations were imposed with correlations in space (5 km) using a fast Fourier transform. Correlation in time was introduced with an AR(1) model with autoregressive parameter 0.33. These correlations and standard deviations were chosen based on previous data assimilation experiments (Reichle et al., 2010;Kumar et al., 2012;De Lannoy et al., 2012;Han et al., 2015). In this work, only results for precipitation perturbation with σ = 0.5 will be shown, as results for σ = 1.0 were similar. An ensemble size of 95 realizations was used in the simulations. Based on previous work , the SWC retrieval uncertainty for CRNS was estimated to be 0.03 cm 3 cm −3 while fluctuations in the measurement standard deviation, related to the nonlinear relation between observed neutron intensity and SWC, were assumed to be negligible. Table 2. Overview of simulation scenarios: open loop (OL- * ) with variation in the soil maps BK50, FAO and S80, data assimilation run with state update (Stt) or joint state and parameter update (PAR) with variation in the soil map perturbation (−10 and −30), and jackknife evaluation runs (jk8-S80-1 to 9, jk8-BK50-1 to 9 and jk4-S80-A to C).

Experiment set-up
All simulation experiments in this study used initial conditions from a single 5-year spin-up run in which a single forcing data set of the year 2010 was repeatedly used as atmospheric input. The soil moisture regime became stable after the 5-year spin-up period, and additional spin-up simulations would not affect soil moisture in the consecutive years. After this 5-year spin-up, soil parameters and forcing data of the consecutive years were perturbed. From 1 January 2011 onwards, CLM was propagated forward with an ensemble of 95 realizations. On 20 March 2011, the first SWC retrieval was assimilated, and assimilation of SWC retrievals continued until 31 December 2012. In the data assimilation period, soil properties were estimated at every time step when observations were made available. For the year 2013, the model was propagated forward without data assimilation but with an ensemble of 95 realizations. The year 2013 was used exclusively as the evaluation period for data assimilation experiments.
In total, 31 simulation experiments were carried out using different setups ( Table 2). The present setups are intended to cover three different initial soil maps, three different sizes of a CRNS network and two different parameter perturbations. Three open loop simulations were run without data assimilation and soil parameter perturbation of 30 % for the BK50 soil map (OL-BK50), the FAO soil map (OL-FAO) and the S80 soil map (OL-S80). These simulations are referred to as reference runs for the respective soil map. Simulation results of data assimilation runs were compared to the reference runs for quantification of data assimilation benefits. Simulations were done with joint state-parameter estimation (PAR- * ), two for the BK50 soil map (PAR-BK50- * ), one for the FAO soil map (PAR-FAO-30), and two for the S80 soil map (PAR-S80- * ). Soil texture was perturbed by 10 or 30 % as indicated by the experiment name (Table 2). Two simulations were done with state updates only for the BK50 soil map (Stt-BK50) and the S80 soil map (Stt-BK50). These 10 simulations form the basic set of experiments.
Besides the data assimilation experiments, a larger number of jackknifing simulations were also conducted to evaluate the impact of the CRNS data assimilation on SWC at unobserved locations in the model domain. In nine jackknife experiments, data from eight CRNS locations were assimilated (jk8- * simulations) and data of the one remaining CRNSs were not assimilated but kept for evaluation. In addition, three simulations were conducted where data of four CRNSs were assimilated (jk4- * simulations), and data of the five remaining CRNSs were used for evaluation. These three simulations represent a CRNS network with much less than the existing nine CRNSs. At the evaluation locations, simulated SWC (which is affected by the assimilation of the other eight probes) was compared to CRNS SWC retrievals. For jackknife simulations, the perturbation of soil texture was set to 30 %. States and parameters at these sites were jointly updated, and simulations were made using either the BK50 or the S80 soil maps as initial parameterization. Therefore, a total of 21 jackknife simulations were performed.
Simulation results were evaluated with the root mean square error (E RMS ): where n is the total number of time steps, θ t,CLM is SWC simulated by CLM at time step t and θ t,CRNS is the CRNS SWC retrieval at time step t. In case SWC was assimilated at the corresponding time step, θ t,CLM is SWC prior to assimilation. In case the E RMS is estimated at a single point in time over all CRNSs available, the number of time steps n can be replaced by the number of CRNSs available. The second evaluation measurement in this study is the bias which is, in contrast to the E RMS , a measure for systematic deviation:   (Table 3). Joint state-parameter updating resulted in similar E RMS values for all three initial soil maps: the BK50, FAO and S80 soil maps (each 0.03 cm 3 cm 3 ). State updates (Stt-S80) improved E RMS and bias for the S80 soil map (E RMS = 0.06 cm 3 cm −3 for assimilation period) but much less compared to the joint state-parameter updates (PAR-S80-30). The E RMS and bias for simulations with 10 and 30 % perturbation of soil texture only showed very small differences (smaller than 0.01 cm 3 cm −3 ). The temporal course of simulated soil moisture in 2011 at the two sites, Merzenhausen and Gevenich, is shown in In the data assimilation run shown (PAR-S80-30), modelled SWC was immediately affected at both sites, Merzenhausen and Gevenich, as soon as data at Merzenhausen were assimilated. By July, simulated SWC with the biased soil map and data assimilation (PAR-S80-30) was already close to the CRNS SWC retrieval at the Gevenich site (Fig. 2). This demonstrates the beneficial impact of data availability for assimilation at one site and the information brought into space by the data assimilation scheme. Figure 2 also shows that the BK50 open loop run was close to the observed SWC at both sites, even without data assimilation. Figure 3 shows the temporal course of SWC from January 2011 to December 2013 at Heinsberg and Wildenrath. Assimilation and evaluation results are shown for the open loop (OL-S80 and OL-FAO) simulations, only state updates (Stt-S80), joint state-parameter updates (PAR-S80-30) and CRNS SWC retrievals. At Heinsberg, results show that simulated SWC with assimilation was closer to the CRNS when both states and parameters were updated (PAR-S80- Table 3. Root mean square error (E RMS , cm 3 cm −3 ) and mean absolute bias (cm 3 cm −3 ) for open loop simulations (OL- * ), data assimilation with state updates (Stt- * ) and joint state-parameter updates (PAR- * ) for both the assimilation period (2011 and 2012) and the evaluation period (2013). Error and bias was averaged over all sites with observations. Site-specific errors and biases are provided in Appendix A1 to A4. The best cases are marked in bold.

30) than if only states were updated (Stt-S80
). This is the case for both the assimilation and the evaluation periods. At the beginning of the evaluation period (first few days of 2013), the Stt-S80 simulation shows an increase in bias between modelled SWC and CRNS. The bias of Stt-S80 remained throughout the evaluation period. In contrast, if parameters were previously updated (PAR-S80-30), modelled SWC was close to the CRNS during the evaluation period.
Open loop SWC modelled with the FAO soil map is lower than the CRNS SWC retrievals at Heinsberg and higher than CRNS SWC retrievals at Wildenrath. At Wildenrath, results of the OL-S80 run suggest that the initial sand content of the biased soil map is closer to the optimal sand content than the sand content of the FAO soil map. Consequently, the OL-FAO bias was −0.05 and 0.05 cm 3 cm −3 for Heinsberg and Wildenrath, respectively (Tables A1 and A4 in Appendix). At both sites, absolute bias was reduced with joint stateparameter updates to equal or less than 0.01 cm 3 cm −3 (S80 and FAO soil map). The reduced bias is also well reflected in the temporal course of modelled SWC with joint stateparameter updates (PAR-S80-30). It is interesting to notice that the error values for the verification period are very similar if soil hydraulic parameters were estimated in the assimilation period, independent of the initial soil map (Table 3). E RMS values for the 2013 simulations with state updates only  show that in the evaluation period the improvements by state updates (without parameter updates) were small (reduction by 0.02 and 0.00 cm 3 cm −3 for S80 and BK50, respectively) compared to the improvements obtained by joint state-parameter updates (reduction by 0.07 cm 3 cm −3 for S80). This illustrates the benefits of joint state-parameter updates compared to state updates only, and that soil moisture states are strongly determined by soil hydraulic parameters. The case of only state updates also illustrates that the improved characterization of soil moisture states in the assimilation period results in improved initial states for the verification period (Table 3) but in the verification period these improvements lose their influence quickly over time (Fig. 3). Figure 4 shows the temporal evolution of the hourly E RMS calculated for all nine CRNSs. E RMS was highest for the S80 open loop run and lowest for the PAR-S80-30 simulation. The FAO soil map resulted in errors mostly between 0.05 and 0.1 cm 3 cm −3 , which are lower than the S80 soil map but not as good as simulation results with joint stateparameter updates (PAR-S80-30) or with the BK50 soil map (OL-BK50). State updates did not improve modelled SWC as much as joint state-parameter updates. For most of the time, the E RMS of the Stt-S80 run is larger than the E RMS of the OL-BK50 run. During the evaluation period, the open loop run with the FAO soil map (OL-FAO) also performs better than the Stt-S80 run. In contrast, joint state-parameter updates to the S80 soil map improved the E RMS most of the time compared to open loop simulations (OL-BK50, OL-FAO and OL-S80). As shown in Fig. 4, the PAR-S80-30 simulation performed best out of the four simulations during the assimilation period 2011-2012. During the evaluation period 2013, OL-BK50 and PAR-S80-30 performed equally well, except in summer 2013 when the PAR-S80-30 simula- Table 4. Root mean square error (E RMS , cm 3 cm −3 ) and mean absolute bias (cm 3 cm −3 ) for open loop (OL- * ), jackknife simulations with eight CRNSs (simulations jk8-S80-1 to 9 were averaged) and with four CRNSs (simulations jk4-S80-A to C). Results were averaged over the omitted sites only. Data at omitted sites were not assimilated, while at the other sites data were assimilated. At sites where data were assimilated, E RMS and bias were equal to the values found in simulation PAR-S80-30. Site-specific errors and biases are provided in the Appendix A1 to A4. The best cases are marked in bold.

Jackknife simulations
The jackknife simulations investigated the impact of CRNS data on improving simulated SWC at locations beyond the CRNS stations. Spatial improvements are possible by spatial correlation structures of atmospheric forcings, soil hydraulic parameters and soil moisture which are taken into account by the local ensemble transform Kalman filter. The error and bias shown in Table 4 refer to jackknife simulations with the BK50 and the S80 soil map. On average, over the three runs where only data of four CRNSs were assimilated (jk4-S80- * ), the E RMS was 0.07 m 3 m −3 , which is much lower than the E RMS for the open loop run (0.12 m 3 m −3 ) and only a bit higher than the case where eight CRNSs were assimilated (E RMS = 0.06 m 3 m −3 for jk8-S80- * ). The improved simulation results were also due to the bias reduction from 0.11 to 0.05 m 3 m −3 in the case of four and 0.04 m 3 m −3 in the case of eight assimilated CRNSs. However, for the BK50 soil map where E RMS (0.04 m 3 m −3 ) and bias (0.02 m 3 m −3 ) of the open loop run were already good, the jackknife simulations led to slightly higher E RMS (0.05 m 3 m −3 ) and bias (0.04 m 3 m −3 ). More detailed site statistics (Tables A1 to A4 of the Appendix) demonstrate that all jackknife simulations with the S80 soil map resulted in an improved E RMS at the omitted locations compared to the open loop simulation, except for Wildenrath. At sites with large open loop E RMS , the assimilation could reduce the E RMS by 50 % or more. The jackknife simulations illustrate that a network of CRNSs can improve modelled SWC if the soil map information is not sufficient. This suggests that assimilation of CRNS data is particularly useful for regions with little information on subsurface parameters. A trade-off can be expected between the initial uncertainty on soil moisture and parameters, and the density of a CRNS network. In the case of a large uncertainty, like in regions with limited information about soils or a strongly biased soil map (e.g. FAO or S80 soil map) and a low density of meteorological stations, a sparse network of probes can already be helpful for improving soil moisture characterization. The results of the real-world jackknife experiments demonstrated that four CRNSs are already beneficial, but it is desirable to have more CRNSs for improved parameter estimates. The results also suggest that the additional information gain for an extra CRNS is reduced for a denser network, because the soil moisture characterization did not improve so much more when eight instead of four CRNSs were used for assimilation. However, in regions with a high density of meteorological stations and a high-resolution soil map, it can be expected that a denser CRNS network than that in this study is needed to further lower the error of soil moisture characterization. Further potentially synthetic experiments in other regions with networks of CRNSs are needed to obtain more quantitative information about this.

Temporal evolution of parameter estimates and parameter uncertainty
The temporal evolution of sand content estimates during the assimilation period for the nine sites with CRNSs is shown in Fig. 5 for PAR-S80-30, PAR-S80-10, PAR-BK50-30, PAR-BK50-10, jk8-S80- * and jk8-BK50- * . Time series start on 20 March 2011, the date of the first assimilated CRNS SWC retrieval at Wüstebach. At Wüstebach and sites within the influence sphere of Wüstebach (Aachen, Kall and Rollesbroich), sand content estimates were updated from 20 March 2011 onwards. Because of the localization, all other sites show a first update in sand content in May 2012 when Rollesbroich and Merzenhausen start operating, and their data were assimilated. During the data assimilation period with joint state-parameter updates, all sites show variability in sand content estimates over time with differences in magnitude. Values and spread in sand content estimates amongst the experiments is smaller at the sites Merzen-hausen, Gevenich, RurAue, Heinsberg and Wildenrath, compared to the sites Wüstebach, Aachen and Rollesbroich where spread is considerably larger. At the sites Merzenhausen, Kall, Gevenich, RurAue and Heinsberg, sand content estimates of the jackknife simulations were close to the sand content of the other data assimilation experiments with joint state-parameter estimation. A comparison of parameter estimates at the end of the assimilation period indicates that initial soil parameterization has a limited effect on the resulting parameter estimates. Parameter estimates of jk8-BK50- * and jk8-S80- * are close together at the end of the assimilation period.
Estimates of the soil hydraulic parameter B and saturated hydraulic conductivity are shown in Figs. 6 and 7 for PAR-S80-30, PAR-S80-10, PAR-BK50-30, PAR-BK50-10, jk8-S80- * and jk8-BK50- * . Updates of soil hydraulic parameters start in March and May 2011 with the assimilation of CRNS SWC retrievals, depending on the location. The B parameter estimates increase for all simulations. Throughout the whole assimilation period the empirical B parameter varies considerably within short time intervals. The total range of the B parameter estimates is between 2.7 and 14 at all sites. At the sites Merzenhausen, Kall, Aachen, Gevenich and Rollesbroich, B generally ranges between 6 and 10. At Wüstebach, Heinsberg and RurAue, estimates of B range most of the time between 8 and 12, and at Wildenrath, B is below 8. Initial saturated hydraulic conductivity k sat is rather high (k sat > 0.015 mm s −1 ) in the case of high sand content, i.e. for the S80 soil map, and rather low (k sat < 0.005 mm s −1 ) in the case of low sand content, i.e. for the BK50 soil map. In the case of the S80 soil map, at all sites except Wildenrath, high initial k sat estimates decrease quickly through joint stateparameter updates to values below 0.01 mm s −1 . The initial spread in k sat estimates amongst the simulation scenarios decreases at most sites. At Wüstebach, Merzenhausen, Aachen, Gevenich, RurAue and Heinsberg, the spread is rather small, particularly at the end of the assimilation period, while at Wildenrath k sat ranges from 0.005 to 0.015 for individual experiments at the end of the assimilation period.
Temporally unstable parameter estimates imply that there may be multiple or seasonal optimal parameter values. This is also supported by the findings of the temporal behaviour of site-average E RMS (Fig. 4), e.g. during the evaluation period when, in the dry summer of 2013, the E RMS peaks for the PAR-S80-30 simulation. In this context, it is important to mention that many possible error sources were not subject to calibration in this study but could be crucial for an even better modelled soil moisture and more reliable soil parameter estimation. In this study we only considered uncertainty of soil parameters, but vegetation parameters are also uncertain. Also, a number of other CLM-specific hydrologic parameters (e.g. decay factor for subsurface runoff and maximum subsurface drainage) strongly influence state variables in CLM and hence show potential for optimization . Considering this uncertainty from multiple param- eters could give a better parameter-uncertainty characterization . Precipitation is also an important forcing for hydrologic modelling. For this study, precipitation data from the COSMO_DE re-analysis were used. A product which optimally combines precipitation estimates from radar and gauge measurements is expected to give better precipitation estimates than the reanalysis. This could improve the soil moisture characterization and also potentially lead to better parameter estimates. Further improvements and constraining of parameter uncertainty is also possible using multivariate data assimilation with observations such as latent heat flux (e.g. Shi et al., 2014). Also, other error sources related to the model structure play a significant role. These options should be subject to future investigations.

Latent heat flux
Latent heat flux, or evapotranspiration (ET), is another important diagnostic variable of land surface models (e.g. Best et al., 2015) and is of importance for atmospheric models.
Results of the data assimilation experiments showed that soil texture updates altered soil moisture states significantly. In Fig. 8 it is shown that joint state-parameter estimation also altered ET during the evaluation period. Figure 8 shows The differences can be linked to the drier soil conditions for OL-S80 compared to OL-BK50 simulation results. The differences in ET between the runs with and without parameter updates were larger for the S80 soil map than for the BK50 soil map. For PAR-S80-10, ET increased by up to 40 mm yr −1 in the northern part of the catchment through data assimilation while the change in ET from OL-BK50 to PAR-BK50-10 is rather small. This is linked to the comparatively larger updates made to soil hydraulic parameters. Additionally, the impact of soil parameter estimates on ET is different in the north of the catchment compared to the south. While ET in the north of the catchment was impacted by the estimated soil properties during the evaluation period in 2013 for PAR-S80-10, ET in the south was not impacted as much by estimated soil properties. This is related to the fact that in the north ET is moisture-limited in summer, whereas in the south this is not moisture-limited but energy-limited. Therefore, ET in the north is sensitive to variations in soil hydraulic parameter values, whereas in the south this is not the case. In the south, ET is sensitive to model forcings such as incoming shortwave radiation. Nearing et al. (2016) came to the conclusion that soil parameter uncertainty dominates soil moisture uncertainty and forcing uncertainty dominates ET uncertainty. Our findings in the southern part of the catchment support their conclusion, but in the northern part of the catchment soil parameter uncertainty strongly affects ET. Hence, particularly in the northern part of the catchment, further observations such as ET measurements are desirable for further improving the land surface model. These additional observations could be used for future land surface model benchmarking (Best et al., 2015) or for more constrained parameter estimates (Shi et al., 2015).

Conclusions and outlook
This real-world case study on assimilating CRNS SWC retrievals into a land surface model shows the potential of CRNS networks to improve subsurface parameterization in regional land surface models, especially if prior information on soil properties is limited. CRNS SWC retrievals were assimilated into the land surface model CLM version 4.5 using the LETKF. SWC and subsurface parameters were updated with the LETKF at unmonitored locations in the catchment considering model and observation uncertainties. Joint stateparameter estimates improved soil moisture estimates during the assimilation and during the evaluation period. The error and bias for the soil moisture characterization was strongly reduced for simulations initialized with a biased soil map and similarly well if initialized with the FAO soil map. Simulations initialized with a biased or global soil map approached similar error statistics with joint state-parameter updates to the ones obtained when the regional soil map was used as input to the simulations. Error values in simulations with the regional soil map were not improved during the evaluation period, because open loop simulation results were already close to the observations. The beneficial results of joint stateparameter updates were confirmed by additional jackknife experiments with eight and four CRNSs for assimilation. In many areas of the world, only global soil maps (e.g. the FAO soil map) are available but there are no detailed highresolution regional soil maps. This study has shown that in these areas a more advanced subsurface characterization is possible using CRNS measurements and the data assimilation framework presented in this study.
For now, neutron intensity observations by CRNSs were not assimilated directly. In future studies it would be desirable to use the COSMIC operator for assimilating neutron intensity observations directly. However, in this case the impact of biomass on the CRNS measurement signal would have to be taken into account. Therefore, it is desirable to further develop the COSMIC operator to include the impact of biomass on neutron intensities. Using the biogeochemical module of CLM would then allow for modelling of local vegetation states as input for the measurement operator. Remotely sensed vegetation states are another option to characterize vegetation states as input for the measurement op-erator. Both methods require additional field measurements for the verification of vegetation state estimates. The further extension of the data assimilation framework would also enable the estimation of additional land surface parameters. In addition, the impact of other subsurface parameters, such as subsurface drainage parameters and the surface drainage decay factor, on SWC states and radiative surface fluxes has already been shown . Estimation of these parameters is desirable because of the inherent uncertainty of these globally tuned parameters. However, estimation of soil texture and organic matter content was demonstrated to already be beneficial for improved SWC modelling. Hence, this study represents a way forward towards the integration of CRNS information in the calibration or real-time updating of land surface models.
Data availability. Most data presented in this study are freely available via the TERENO data portal TEODOOR (http://teodoor.icg. kfa-juelich.de/). Atmospheric data were licensed by the German Weather Service (Deutscher Wetterdienst, DWD), and the BK50 soil map was licensed by the Geologischer Dienst Nordrhein-Westfalen.
Appendix A