State and parameter estimation of two land surface models using the ensemble Kalman filter and the particle filter

Abstract. Land surface models (LSMs) use a large cohort of parameters and state variables to simulate the water and energy balance at the soil–atmosphere interface. Many of these model parameters cannot be measured directly in the field, and require calibration against measured fluxes of carbon dioxide, sensible and/or latent heat, and/or observations of the thermal and/or moisture state of the soil. Here, we evaluate the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the Variable Infiltration Capacity Model (VIC-3L) and the Community Land Model (CLM) using a 5-month calibration (assimilation) period (March–July 2012) of areal-averaged SPADE soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. We used the EnKF with state augmentation or dual estimation, respectively, and the residual resampling PF with a simple, statistically deficient, or more sophisticated, MCMC-based parameter resampling method. The performance of the calibrated LSM models was investigated using SPADE water content measurements of a 5-month evaluation period (August–December 2012). As expected, all DA methods enhance the ability of the VIC and CLM models to describe spatiotemporal patterns of moisture storage within the vadose zone of the Rollesbroich site, particularly if the maximum baseflow velocity (VIC) or fractions of sand, clay, and organic matter of each layer (CLM) are estimated jointly with the model states of each soil layer. The differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies among the four data assimilation methods. The EnKF with state augmentation or dual estimation yields the best performance of VIC-3L and CLM during the calibration and evaluation period, yet results are in close agreement with the PF using MCMC resampling. Overall, CLM demonstrated the best performance for the Rollesbroich site. The large systematic underestimation of water storage at 50 cm depth by VIC-3L during the first few months of the evaluation period questions, in part, the validity of its fixed water table depth at the bottom of the modeled soil domain.

Abstract. Land surface models (LSMs) use a large cohort of parameters and state variables to simulate the water and energy balance at the soil-atmosphere interface. Many of these model parameters cannot be measured directly in the field, and require calibration against measured fluxes of carbon dioxide, sensible and/or latent heat, and/or observations of the thermal and/or moisture state of the soil. Here, we evaluate the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the Variable Infiltration Capacity Model (VIC-3L) and the Community Land Model (CLM) using a 5-month calibration (assimilation) period (March-July 2012) of arealaveraged SPADE soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. We used the EnKF with state augmentation or dual estimation, respectively, and the residual resampling PF with a simple, statistically deficient, or more sophisticated, MCMC-based parameter resampling method. The performance of the "calibrated" LSM models was investigated using SPADE water content measurements of a 5-month evaluation period (August-December 2012). As expected, all DA methods enhance the ability of the VIC and CLM models to describe spatiotemporal patterns of moisture storage within the vadose zone of the Rollesbroich site, particularly if the maximum baseflow velocity (VIC) or fractions of sand, clay, and organic matter of each layer (CLM) are estimated jointly with the model states of each soil layer. The differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies among the four data assimilation methods. The EnKF with state augmentation or dual estimation yields the best performance of VIC-3L and CLM during the calibration and evaluation period, yet results are in close agreement with the PF using MCMC resampling. Overall, CLM demonstrated the best performance for the Rollesbroich site. The large systematic underestimation of water storage at 50 cm depth by VIC-3L during the first few months of the evaluation period questions, in part, the validity of its fixed water table depth at the bottom of the modeled soil domain.

Introduction and scope
Land surface models (LSMs) are used widely to simulate and predict the exchanges of momentum, energy, and mass between the terrestrial biosphere and overlying atmosphere at local, regional, and global scales. These models also play a key role in assessing impacts of environmental changes (climate, land use, and land cover) on energy, water, and biogeochemical fluxes (e.g., CO 2 , CH 4 , N 2 O) at the soil-atmosphere interface, simplify analysis of causeeffect relationships among the myriad of processes that govern land-atmosphere interactions and feedbacks, and emulate spatiotemporal variations in climate through greenhouse gas exchanges, carbon-nitrogen feedbacks, soil moistureprecipitation, and soil moisture-temperature coupling. LSMs use relatively simple mathematical equations to conceptual-Published by Copernicus Publications on behalf of the European Geosciences Union.
ize and aggregate the complex, spatially distributed, and interrelated (bio)physical, chemical, and ecological processes that govern the exchange of mass, energy, and momentum between the land surface and the atmosphere. This approach simplifies considerably the topology of the land surface system, and reduces to much lower dimensions its state and parameter space. The consequence of this process aggregation and simplification is however that the LSM parameters often do not represent directly measurable entities, and instead must be estimated via calibration by fitting the model against measured data records of soil moisture, soil temperature, and/or CO 2 , water vapor, and/or energy fluxes across a range of biomes and timescales. These measurements are of crucial importance to quantify properly LSM parameter and predictive uncertainty, and to identify poorly represented or missing processes (Williams et al., 2009;Bonan, 2008).
Many of the parameters of a LSM are model dependent and therefore difficult to transfer between different land surface schemes. Nevertheless, all LSMs use soil hydraulic, vegetation, and thermal parameters to describe heat transport, water flow, and root water uptake (canopy transpiration) in the variably saturated soil domain, and share a reflection coefficient (also known as surface albedo) to calculate the reflected shortwave radiation. Two main approaches exist to determine the hydraulic and thermal properties of the considered soil domain. Some LSMs such as the Community Land Model (CLM) use basic soil data (soil texture and organic matter fraction) to estimate hydraulic and thermal parameters via pedotransfer functions (Oleson et al., 2013;Han et al., 2014;Vereecken et al., 2016). Other land surface schemes such as the Variable Infiltration Capacity Model (VIC) (Liang et al., 1994;Gao et al., 2010) expect users to specify values for the hydraulic and thermal parameters. Pedotransfer functions are particularly useful in largescale application of CLM as they simplify tremendously soil hydraulic characterization. Nevertheless, soil hydraulic parameter values derived from pedotransfer functions are subject to considerable uncertainty, and might therefore not accurately describe soil water movement and storage, particularly at larger spatial scales. What is more, (measurement) errors of the atmospheric forcing (e.g., wind speed, temperature, radiation, vapor pressure deficit, and precipitation) and errors in the auxiliary model input (e.g., topographic properties, vegetation characteristics) further enhance LSM prediction uncertainty.
In the past decades, many different search and optimization methods have been developed for automatic calibration of dynamic system models. Of these, Bayesian methods have found widespread application and use in Earth systems modeling due to their innate ability to treat, at least in principle, model input (forcing), output (forecast), parameter, and structural errors. The Bayesian approach relaxes the assumption of a single optimum parameter value in favor of a posterior parameter and forecast distribution which summarizes the coordinated impact of different uncertainties on the mod-eling results. Yet, general-purpose methods such as DREAM (Vrugt et al., , 2009Vrugt, 2016) require a relatively large number of LSM evaluations to estimate parameter and forecast uncertainty. This can pose significant computational challenges for CPU-intensive and parameter-rich LSMs, and complicates treatment of input data uncertainty via latent variables (e.g., Vrugt et al., 2008).
Data assimilation offers an attractive alternative as a general framework to account for LSM parameters, input, output, and other sources of uncertainty to take advantage of all available ground-based, airborne, or spaceborne observations to improve the compliance between numerical models and corresponding data. This approach enables joint estimation of model state variables and parameters and simplifies treatment of forcing data errors (Liu and Gupta, 2007). Many different studies published in the hydrologic literature have demonstrated the benefits of parameter estimation in the context of data assimilation for soil moisture characterization (e.g., Montzka et al., 2011;Lee et al., 2014), rainfall-runoff (e.g., Moradkhani et al., 2005b;Vrugt et al., 2005a) and land surface modeling (e.g., Pauwels et al., 2009).
Data assimilation methods merge uncertain observations with predictions (output) of imperfect models to optimally estimate the state of a dynamical system. The prototype of this method, the Kalman filter (KF), was developed in the 1960s by Rudy Kalman for optimal control of linear dynamical systems (Kalman, 1960). The KF is a maximum likelihood estimator of the dynamic state of the system if the model error and measurement error distributions are (multivariate) normal. For nonlinear dynamical models this Gaussian assumption is not generally valid, and the KF will not give a maximum likelihood state estimate. The ensemble Kalman filter, or EnKF, is a stochastic generalization of the KF to nonlinear system models, in which the evolution of the model error covariance matrix is derived from a finite set of state realizations (Evensen, 1994). The use of this Monte Carlo ensemble not only makes possible state estimation for complex system models, but also enables the explicit treatment of different sources of modeling error. Two decades on from its inception, the EnKF has received operational status in real-time weather, tsunami, and flood prediction systems (amongst others) due to its proven ability to enhance a model's forecast skill and characterize accurately prediction uncertainty.
State estimation via the EnKF advances significantly the capabilities of hydrologic and land surface models to predict spatiotemporal dynamics of water movement and storage in soils, lakes, and reservoirs, and fluxes of mass, energy, and momentum between the soil and the atmosphere. The predictive skill of these models is, however determined in large part by their parameterization. This has led hydrologists and hydrometeorologists to develop data assimilation approaches that permit the simultaneous inference of model state variables and parameter values. The power and usefulness of such joint state and parameter estimation methods have been Hydrol. Earth Syst. Sci., 21,2017 www.hydrol-earth-syst-sci.net/21/4927/2017/ investigated by many different authors in the water resources literature. Most of these publications use synthetic (or twin) experiments with assimilation of artificially generated data. Examples include studies with simulated measurements of the groundwater table depth or hydraulic head (Franssen and Kinzelbach, 2008;Bailey and Baù, 2012;Kurtz et al., 2014;Shi et al., 2014;Song et al., 2014;Tang et al., 2015), discharge/streamflow (Bailey and Baù, 2012;Moradkhani et al., 2012;Vrugt et al., 2013;Rasmussen et al., 2015), groundwater temperature (Kurtz et al., 2014), soil moisture (Wu and Margulis, 2011;Plaza et al., 2012;Erdal et al., 2014;Shi et al., 2014;Song et al., 2014;Pasetto et al., 2015), brightness temperature from passive remote sensing (Montzka et al., 2013;Han et al., 2014), and contaminant concentration (Gharamti et al., 2013). These studies use a variety of different methods for joint parameter and state estimation, among which the EnKF (Franssen and Kinzelbach, 2008;Wu et al., 2011;Gharamti et al., 2013;Erdal et al., 2014;Kurtz et al., 2014;Shi et al., 2014;Pasetto et al., 2015), the iterative EnKF (Song et al., 2014), the extended KF (Pauwels et al., 2009), the local ensemble transform KF (Han et al., 2014), the ensemble transform KF (Rasmussen et al., 2015), and the normal score EnKF (Tang et al., 2015). The overarching conclusion from the body of synthetic experiments is that the joint estimation of parameters and state variables via data assimilation enhances significantly the predictive capabilities of hydrologic and land surface models. This finding is corroborated by results for real-world assimilation studies documented in a rapidly growing list of publications and involving model structural inadequacies, measurement errors of the atmospheric forcing variables and calibration (assimilation) data, inadequate characterization of the lower boundary condition (aquifer), and uncertainty of other, auxiliary, model inputs. This includes assimilation of measurements of the electrical conductivity (Wu and Margulis, 2013), hydraulic head in wells (Kurtz et al., 2014;L. Shi et al., 2015), groundwater temperature (Kurtz et al., 2014), streamflow and discharge Y. Shi et al., 2015), active remote sensing (Pauwels et al., 2009), passive brightness temperature (Qin et al., 2009), soil moisture from lysimeters (Lue et al., 2011;Wu and Margulis, 2013;Erdal et al., 2014;L. Shi et al., 2015), land surface temperature (Bateni and Entekhabi, 2012), and sensible and latent heat fluxes (Y.  using methods such as the PF (Qin et al., 2009), PMCMC , EnKF (Bateni and Entekhabi, 2012;Wu and Margulis, 2013;Erdal et al., 2014;Kurtz et al., 2014;Y. Shi et al., 2015), and the extended KF (Pauwels et al., 2009;Lue et al., 2011). Despite this growing body of applications, relatively few studies (e.g., Lue et al., 2011;Y. Shi et al., 2015) have focused on an accurate characterization of soil moisture dynamics simulated by LSMs. This is particularly surprising, as root zone moisture storage modulates spatiotemporal variations in climate and weather, and governs the production and health sta-tus of crops and the organization of natural ecosystems and biodiversity (Vereecken et al., 2008).
In this paper, we evaluate the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of VIC-3L and CLM using a 5month calibration (assimilation) period of soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. This grassland site is part of the TERENO network of observatories and has been extensively monitored since 2011 to catalog long-term ecological, social, and economic impacts of global change at regional level. We used the EnKF with state augmentation (Chen and Zhang, 2006) or dual estimation (Moradkhani et al., 2005b), respectively, and the residual resampling PF (Douc et al., 2005) with a simple, statistically deficient (Moradkhani et al., 2005a), or more sophisticated, MCMC-based (Vrugt et al., 2013) parameter resampling method. The "calibrated" LSM models were tested using SPADE water content measurements from a 5-month evaluation period. To the best of our knowledge, this is only the second study after Chen et al. (2015) that compares sequential data assimilation methods for joint parameter and state estimation of a LSM. Related work by DeChant and  and Dumedah and Coulibaly (2013) consider application to the rainfall-runoff transformation of a watershed.
The three main objectives of our study may be summarized as follows: (1) to evaluate the usefulness and applicability of joint parameter and state estimation for soil moisture characterization with LSMs, (2) to compare the performance of four commonly used parameter and state estimation methods in their ability to predict soil moisture dynamics at different depths in the Rollesbroich experimental test site, and (3) to compare, contrast, and juxtapose the soil moisture simulations and predictions of CLM and VIC.
The remainder of this paper is organized as follows. Section 2 discusses briefly VIC-3L and CLM, which are used as our LSMs to characterize soil moisture dynamics of the Rollesbroich experimental site in Germany. In this section, we contrast the numerical approaches, boundary conditions, and spatial discretization (soil layers) that are used by VIC-3L and CLM to describe water flow and storage in the modeled soil domain, and are particularly concerned with selection of their calibration parameters. Section 3 then reviews the basic concepts and theory of the four different data assimilation algorithms used herein. This is followed in Sect. 4 with a detailed discussion of the Rollesbroich experimental site, and the numerical implementation and setup of each data assimilation method. Section 5 introduces the results of the different parameter and state estimation methods and two LSMs, and Sect. 6 discusses the main findings of our numerical experiments and assimilation studies. Finally, this paper concludes in Sect. 7 with a summary of our main findings.
LSMs simulate terrestrial biosphere fluxes of matter and energy via numerical solution of the water, energy, and carbon balance of the land surface. This includes hydrologic processes such as soil evaporation, infiltration, surface runoff, canopy interception and transpiration, aquifer discharge, groundwater recharge, and precipitation (Schaake et al., 1996), and energy fluxes such as latent and sensible heat from soil, snow, surface water, and vegetated surfaces (Bertoldi, 2004). Their respective equations contain parameters whose values depend on global or regional distributions of vegetation and soil properties (Milly and Shmakin, 2002).
The Rollesbroich site investigated herein covers an area of about 270 000 m 2 with grassland vegetation that is dominated by perennial ryegrass (Loliumperenne) and smooth meadow grass (Poapratensis). The limited size of our site and its rather uniform vegetation and topography justify treatment of our land surface domain as a single grid cell in a LSM with apparent parameters that characterize the mass and energy exchange between the soil and atmosphere. This assumption of homogeneity is computationally convenient and also simplifies somewhat our subsequent mathematical notation. We conveniently write the LSM as a (nonlinear) regression function, M(·), which returns a m × n matrix Y with the simulated (predicted) values of m different variables (e.g., soil moisture content, latent and sensible heat fluxes) at discrete times, t ∈ {1, . . ., n}, as follows: where α = {α 1 , . . ., α d } is the d-vector of model parameters, x 0 signifies the k × 1-vector with measured (inferred) values of the state variables of the land surface model for the domain at the start of simulation, B denotes the l × n control matrix with temporal measurements of l forcing variables (e.g., air temperature, precipitation, vapor pressure deficit, wind speed, and shortwave and longwave radiation), U represents a list with auxiliary constants, variables, or properties (e.g., plant functional type, land cover, soil texture, and other variables/constants) deemed necessary to simulate the water and energy balance of the land surface domain of interest, and Y = y 1 1:n , . . ., y m 1:n T , where T denotes the transpose. Without loss of generality, we restrict the model parameters to a closed space, A, equivalent to some d-dimensional hypercube, α ∈ A ∈ R d , called the feasible parameter space.
The assumption of homogeneity simplifies considerably the model definition in Eq. (1). Yet, this lumped topology might not characterize adequately real-world soil land surface systems that exhibit considerable spatial variations in soils, vegetation, and land properties. Such systems might necessitate distributed application of Eq. (1) via spatial discretization of the considered land surface domain into different grid cells. This discretization should honor spatial variations in vegetation and soil properties, and could account for small-scale (within-grid-cell) variability. Nevertheless, in our present application of LSM we the treat the Rollesbroich site as a single grid cell with grassland vegetation and homogeneous, but layered, soil (details to follow).
We now discuss briefly two different land surface schemes, VIC-3L and CLM, which are used to describe temporal variations in soil water storage at different depths in the Rollesbroich experimental site in Germany.

The Variable Infiltration Capacity (VIC) model
The VIC model is a macro-scale semi-distributed hydrological model which solves for the water and energy balance of each grid cell using explicit consideration of within-grid-cell vegetation variations. Accordingly, each grid cell is divided into land cover tiles (Liang et al., 1994(Liang et al., , 1996Cherkauer and Lettenmaier, 1999) and assumes constant values of the soil properties (e.g., soil texture, hydraulic conductivity, thermal conductivity). The total evapotranspiration, sensible heat flux, effective land surface temperature, and runoff are then obtained for each grid cell by summing over all the land cover tiles (vegetation types and bare soil) weighted by their respective fractional coverage (Gao et al., 2010). The VIC model can either be executed in a water balance mode or a water-and-energy balance mode. In this paper, we assume the latter and use a 70 cm deep soil composed of a 10 cm surface layer followed by middle and bottom layers of 20 and 40 cm, respectively. The relatively thin surface layer is used to capture rapid fluctuations in soil moisture due to rainfall and bare soil evaporation, and the deepest and thickest layer summarizes seasonal water content dynamics and baseflow. We use herein VIC-3L and force the model with atmospheric boundary conditions (e.g., precipitation, wind speed, air temperature, longwave and shortwave radiation, and relative humidity) for the Rollesbroich experimental site in Germany. In the absence of detailed information about the hydraulic properties of the considered soil domain, we treat each layer's saturated hydraulic conductivity, log 10 k s (log 10 (m s −1 )), and the exponent of the Brooks-Corey drainage equation, β (-), as calibration parameters. What is more, we also include the infiltration shape parameter, b (-), and the maximum baseflow velocity, D m (mm day −1 ), as calibration parameters. Thus, this involves estimation of d = 8 parameters in VIC-3L for the Rollesbroich site. Appendix A summarizes the soil module of VIC-3L, including a brief description of the main processes and model parameters.

The Community Land Model (CLM)
CLM is the land model for the Community Earth System Model (CESM) (Oleson et al., 2013), and is made up of multiple different building blocks, or modules, which resolve processes related to land biogeophysics, the hydrological cycle, biogeochemistry, and dynamic vegetation composition, structure, and phenology. The model recognizes explicitly surface heterogeneity by dividing each individual grid cell In our application of CLM to the Rollesbroich experimental site in Germany, we calculate soil temperature for 15 different soil layers, and simulate hydrological states and fluxes for the top 10 soil layers only. Appendix B presents a brief description of the soil module of CLM, and discusses the main parameters used. CLM is forced with atmospheric conditions (e.g., precipitation, vapor pressure deficit, wind speed, incoming shortwave and longwave radiation) using values for the model parameters and initial states, and land surface data and other physical constants and/or variables as auxiliary input. The soil hydraulic (e.g., saturated hydraulic conductivity) and thermal parameters of CLM are derived from built-in pedotransfer functions (see Eqs. B1-B4 of Appendix B) using as inputs the auxiliary list U with sand, clay, and organic matter fractions of each individual soil layer. We treat these auxiliary soil variables as unknown parameters in the present application of CLM. Thus, this involves d = 30 parameters in CLM for the Rollesbroich site.

Main differences of VIC-3L and CLM
Before we proceed, we first summarize the main differences of VIC-3L and CLM in their calculations of the water and energy balance of the land surface. In the first place, VIC-3L treats the vadose zone as a multi-layer bucket with variable infiltration capacity, whereas CLM uses a more physicsbased description of soil water movement, storage, and associated hydrological fluxes (e.g., root water uptake) by numerical solution of a modified form of Richards' equation (Zeng and Decker, 2009). A bucket model is computationally convenient, but sacrifices important detail regarding the vertical distribution of soil water storage. The latter is a prerequisite for characterizing accurately processes such as infiltration, redistribution, root-water uptake, drainage, and groundwater recharge. We refer the interested reader to Romano et al. (2011) for a detailed comparison of bucket type and Richards-based vadose zone flow models.
Second, VIC-3L treats the saturated and variably saturated soil domain as two separate, lumped, control volumes which are decoupled from the underlying groundwater reservoir. In other words, a fixed lower boundary condition is imposed. CLM, by contrast, simulates interactions between the modeled soil domain and an unconfined aquifer. The resulting water table variations of the aquifer affect soil water movement in the unsaturated zone via a variable recharge flux. In our application of CLM, this recharge flux emanates at the bottom of the tenth soil layer. The calculation of this recharge flux may be best explained via the use of a virtual soil layer, say layer 11, whose depth extends from the bottom of layer 10 to the groundwater table. If we assume hydrostatic conditions in layer 11, then we can calculate the recharge flux from layer 10 using Eq. (B9) in Appendix B. This recharge flux then changes the depth of the water table according to Eq. (B11). This equation also takes into consideration drainage from the water table due to topographic gradients. If the groundwater table is within the upper 10 soil layers, a drainage flux emanates from the uppermost saturated layer according to Eq. (B10).
Third, VIC-3L expects the user to specify values for the soil hydraulic (e.g., saturated hydraulic conductivity), thermal, and baseflow parameters of the first, second, and third layers of each grid cell, respectively, whereas CLM derives their counterparts (e.g., hydraulic conductivity at saturation, matric head at saturation, Clapp-Hornberger exponent B, and soil thermal conductivity) for each of the 15 soil layers using built-in pedotransfer functions.
Finally, VIC-3L allows the user to determine freely the number and thickness of the soil layers in the bucket model (the default is three layers), whereas CLM assumes a fixed thickness of each soil layer.

Selection of calibration parameters
LSMs contain a large number of parameters whose values can be adjusted by fitting model output to observed data. Yet, only a few of those parameters will affect noticeably model performance. Various authors have investigated the parameter sensitivity of VIC-3L via Monte Carlo simulation, generalized likelihood uncertainty estimation (GLUE), or model calibration methods (Demaria et al., 2007;Xie et al., 2007;Troy et al., 2008). These studies demonstrated a strong dependency of parameter sensitivity on climatic conditions. Table 1 lists VIC-3L and CLM parameters that have been selected for calibration via data assimilation, and reports their units, feasible ranges, perturbation, and spatial configuration. To honor prior information (e.g., soil textural data), we do not draw the model parameters from their feasible ranges, but rather sample their initial values around some best-guess VIC-3L and CLM parameterization using the normal and uniform distributions listed under the header "Perturbation". Table 1. Description of the soil parameters of VIC-3L and CLM that are subject to inference with the different data assimilation methods using the 5-month soil moisture calibration data period of the Rollesbroich site. We list the symbol, unit, feasible range, perturbation, and domain of application of each parameter of VIC-3L and CLM. The column with the header "Perturbation" lists the statistical distributions that are used to create the initial parameter ensemble for each data assimilation algorithm. The notation N(a, b) signifies the univariate normal distribution with mean a and standard deviation b, whereas U (a, b) denotes the univariate uniform distribution between a and b. These perturbation distributions are centered on the best-guess parameter values of VIC-3L and CLM (see Sect. 4.2) and define together the prior parameter distribution. This prior distribution honors textural measurements of each soil layer and its dispersion is in agreement with previously published studies.

Model
Parameter This makes up the prior parameter distribution and is further explained in Sect. 4.2. Appendices A (VIC-3L) and B (CLM) summarize the main variables, processes, and equations which are used by both models to describe the storage and vertical and/or horizontal movement of water in the variably saturated soil domain of the Rollesbroich site. These two appendices help to better understand the role of the different calibration parameters of Table 1, and will be most beneficial to readers who are rather unfamiliar with both models. Note that CLM estimates the hydraulic and thermal parameters of each soil layer from built-in pedotransfer functions (Oleson et al., 2013;Han et al., 2014) using as input the sand, clay, and organic matter fractions of each soil layer.

Data assimilation methods
Data assimilation methods merge uncertain observations with predictions (output) of imperfect models to optimally estimate the state and/or parameters of a dynamical system. This includes the use of four-dimensional variational data assimilation (4D-Var), EnKF, PF, and related assimilation schemes. These methods have been applied successfully to a large number of different fields for model-data fusion in the atmospheric, oceanic, biogeochemical, and hydrological sciences. We now briefly discuss the theory of four different data assimilation methods which are used herein with VIC-3L and CLM to characterize spatiotemporal soil moisture dynamics at our experimental site.

EnKF
The EnKF was proposed by Evensen (1994) as a generalization of the Kalman filter to nonlinear system models with many state variables. This method uses a Monte Carlo approach to generate an ensemble of different model trajectories from which the time evolution of the probability density of the model states and related error covariances are estimated. The EnKF uses a state-space implementation of the dynamic system model of Eq. (1) and implements the following steps (Burgers et al., 1998): where x i− t is the k × 1-vector of predicted values of the state variables of the ith ensemble member, i = {1, . . ., N }, b i t−1 signifies the corresponding vector (or matrix) of measured values of the forcing variables, w t denotes a k × 1 process noise vector that accounts for structural imperfections of the LSM, and t denotes time. In our specific implementation, the state vector is made up of areal-averaged soil moisture content values at three different measurement depths. What is more, the observed precipitation forcing was perturbed with a member-dependent vector of measurement errors. From the ensemble of N state vectors, we can calculate the k ×k background error covariance matrix, C, using where x t denotes the k×1 vector with ensemble mean values of the states at time t. The m × 1 vector of measured soil moisture data at time t can be written for each individual ensemble member as follows: where v i t signifies a m × 1 vector of measurement errors drawn randomly from a m-variate normal distribution, N m (0, R) with zero-mean and m × m observation error covariance matrix R. We assume the soil moisture measurement errors at each depth to have a fixed and common variance σ 2 , and to be uncorrelated in space and time. Thus we can write R = σ 2 I m , where I m signifies the m × m identity matrix with zeros everywhere except on the main diagonal which stores values of σ 2 .
We can now update the predicted state values of each ensemble member as follows: where x i t denotes the k × 1 vector with updated estimates of the state variables (also called analysis state), K is a k × m matrix called the Kalman gain, and the m × k matrix H signifies the measurement operator which maps the model output to the measurement space. It is linear for EnKF. In our present application, we observe directly the soil moisture content of respective measurement depth, and thus the matrix H is made up of values of zero and unity. The Kalman gain is computed as follows: where the symbol T denotes the transpose. The updated values of the states now enter Eq.
(2) and are used to predict the soil moisture values at the next observation time, t = t + 1, and so forth. In some cases it might be appropriate to estimate the model parameters along with the state variables. This requires a slight modification to the state-space formulation of Eq. (2) as the d-vector of parameter values, α, must now vary among the N ensemble members to facilitate parameter estimation from the measured data. Three different approaches have been published in the literature for joint estimation of model states and parameters in the EnKF. This includes, state augmentation, dual and outer estimation. The first two approaches assume the LSM parameters to be timevariant, and infer their values sequentially along with the model states. The third approach assumes the parameters to be time-invariant, and estimates their posterior distribution in a loop outside the EnKF by maximizing the marginal likelihood of the N state trajectories (Vrugt et al., 2005a(Vrugt et al., , 2013. We will consider herein only the first two approaches, that is, state augmentation and dual estimation, as these two methods are most CPU-efficient.

State augmentation
In state augmentation, the k × 1 vector of state variables, x t , the model error covariance matrix C, the measurement operator H, and the Kalman gain K consist of two separate blocks (Franssen and Kinzelbach, 2008): where the subscripts x and α refer to the model states and parameters, respectively. The state vector, x * , now consists of k + d elements, the model error covariance matrix C * is made up of four smaller matrices, C xx , C T αx , C αx , and C αα , and the measurement operator H * includes H x and additional values of zero. The Kalman gain matrix K is now given by This results in the following equation for the updated states and parameter values:

Dual estimation
In the dual estimation approach, the state variables and model parameters are stored in two separate vectors and updated using their own individual steps (Moradkhani et al., 2005b).
The parameter values of each ensemble member are first updated according to Then, the updated parameter values are used with Eq.
(2) to predict, for the second time, the state variables at time t, after which their values are updated via Eq. (5). This approach necessitates running the LSM twice for the time period between two successive measurements, thereby doubling the required CPU time of each ensemble member for this dual estimation method compared to the state augmentation approach. The EnKF suffers from filter inbreeding; that is, the ensemble spread degrades after several data assimilation steps. In extreme cases, the covariance matrix C of the state ensemble is so small that the measurements receive a negligible weight via Eq. (6) and do not affect much the state trajectories of the individual ensemble members. This reflects a situation similar to model calibration in which state variable errors are ignored and all uncertainty in the input-output representation of the model is attributed to the parameters. Filter inbreeding is aggravated by the use of a relatively low number of ensemble members (small N), which results in spurious correlations among state variables and/or parameters, and underestimation of the spread of the ensemble. Other reasons for an insufficient ensemble spread are model structural errors and the use of an underdispersed prior parameter distribution or overly small variance of the measurement errors of the forcing variables. Ensemble inflation methods are an effective way of ameliorating filter inbreeding (Anderson, 2007;Whitaker and Hamill, 2012). We apply the inflation algorithm of Whitaker and Hamill (2012) to the d parameter values of each ensemble member as follows: where α j,t signifies the analysis mean (after update) of the j th parameter at time t, the scalars V j and W j denote the prior (before update) and analysis standard deviation of the j th parameter (derived from ensemble), and j = {1, . . ., d}. This method promotes a parameter spread that is in agreement with the width of the prior parameter distribution, and is particularly important to avoid a strong underestimation of ensemble variance and associated filter inbreeding in applications with relatively small ensemble sizes. As the spread is kept artificially constant, it cannot be assessed properly how data assimilation affects reduction of prediction uncertainty. In addition, it is important that the initial ensemble spread is adequate. This is a drawback of the applied inflation.

Residual resampling particle filter (RRPF) and parameter estimation
The PF was first suggested in the research area of object recognition, robotics and target tracking (Gordon et al., 1993) and was introduced to hydrology by Moradkhani et al. (2005b). The PF differs from the EnKF in that it describes the evolving probability density function (PDF) of the LSM state variables by a set of N random samples, also called particles. Each particle carries a non-zero weight which determines its underlying probability, and these weights are updated as soon as a new datum (observation) becomes available. Before we proceed with a brief theoretical description of the PF we must first explicate our notation. We denote with symbol X 1:t the collection of simulated values of the LSM state variables between the first observation at t = 1 and the present datum, t, hence X 1:t = [x 1 , . . ., x t ] is a k × t matrix with the LSM states at each measurement time stored as a column vector. The corresponding observations are stored in the m × t matrix, Y 1:t = y 1 , . . ., y t . Finally, we use braces, {·}, to denote our Monte Carlo ensemble of N particle trajectories, X 1:N 1:t , and thus X 1:N t is a k × N matrix with sampled values of the LSM state variables at time t. The subsequent description of the PF follows closely the description of Vrugt et al. (2013). Interested readers are referred to this publication for further details.
If we assume the parameters to be known, then we can write the evolving posterior distribution, p α (X 1:t | Y 1:t ), for the state-space formulation of Eq. (2) as follows: where p α X 1:t−1 | Y 1:t−1 denotes the prior state distribution, M α (x t |x t−1 ) signifies the transition probability density of the state variables (= Eq. 2), L α ( y t |x t ) is the likelihood function, and p α ( y t | Y 1:t−1 ) represents a normalization constant which ensures that the posterior state distribution integrates to unity. Equation (14) follows directly from Bayes' law (see Appendix A of Vrugt et al., 2013), and does not use at once the data up to time t to estimate p α X 1:t | Y 1:t but rather estimates the evolving system state recursively over time using some mathematical model and new incoming measurements. If we integrate out the state trajectory X 1:t−1 from Eq. (14) then we can derive an expression for the marginal PDF of the state variables, p α x t | Y 1:t , at time t: which is also referred to as the update step of the optimal filter (conditional independence of measurements). The state prediction step is equivalent to the Chapman-Kolmogorov equation: where signifies the feasible state space. We conveniently assume herein, a Gaussian likelihood function: where R is the m × m measurement error covariance matrix, |·| signifies the determinant operator, and m denotes the length of the observation vector, y t , at time t.
The PF makes use of the following identity of Eq. (14) to approximate the evolving state PDF: This recursion implies that we can use reuse the particles (samples) at t − 1 that define the prior distribution, p α X 1:t−1 | Y 1:t−1 , to approximate the posterior state PDF, Hydrol. Earth Syst. Sci., 21, 4927-4958, 2017 www.hydrol-earth-syst-sci.net/21/4927/2017/ p α X 1:t | Y 1:t , at the next observation time. Yet, such recycling poses a problem; that is, we cannot sample directly from p α X 1:t | Y 1:t as we do not know its multivariate distribution. We therefore resort to an easy-to-sample-from importance density, q α · x t−1 , y t , and draw x 1:N t taking into consideration the current observation, y t , and previous state samples, x 1:N t−1 . We then calculate the unnormalized importance weight of the ith particle, W i t , as follows: where w t (X i 1:t ) signifies the incremental importance weight: and W i t = W i t / N i=1 W i t denotes the normalized importance weights, which vary between 0 and 1.
Before we can implement the PF in practice, we need to specify the importance density, q α · x 1:N t−1 , y t , for t = {2, . . ., n}. We follow Gordon et al. (1993) and set q α x t |x t−1 , y t = M α (x t |x t−1 ), which results in the following equation for the incremental particle weights: This approach gives satisfactory results if the transition density or model operator, M α (x t |x t−1 ), adequately describes the observed system dynamics, and/or the observations, Y 1:t , are not too informative. Otherwise, the repeated application of Eq. (19) causes particle impoverishment in which the sampled particle trajectories drift away from the actual posterior state distribution and receive a negligible importance weight. This ensemble degeneracy (e.g., Carpenter et al., 1999) deteriorates PF performance and results in a poor computational efficiency of the filter as much of the CPU time is devoted to carrying forward particle trajectories whose contribution to p α X 1:t | Y 1:t for t > 1 is virtually zero.
To combat particle degeneracy, we monitor the effective sample size (ESS) after assimilation of each new observation: If the ESS is smaller than some default threshold, say N/2, then the particle ensemble is said to be degenerating. Several methods have been developed in the statistical literature to rejuvenate the particle ensemble. Gordon et al. (1993) introduced sequential importance resampling (SIR), where N particles are drawn from the ensemble using selection probabilities equal to their normalized importance weights. This step replaces samples with low importance weights with exact copies of the most promising particles, and produces a resampled set of N particles with equal weights of 1/N. In our application of the PF we implement residual resampling (RR) developed by Liu and Chen (1998). This method has an important advantage over SIR in that it produces a resampled set of particles with more diverse weights (Weerts and Serafy, 2006). First, we compute a selection probability, p x i t , of each individual particle as follows: where the · operator rounds down to the nearest integer, Then, the M particles with largest normalized importance weights are retained, and the remaining N − M spots are filled by drawing from the M retained particles using their selection probabilities from Eq. (23). The resulting filter is referred to as RRPF.
In the present application of the RRPF, we not only estimate the LSM states but also jointly infer the values of the model parameters. We use state augmentation and add the model parameters to the vector of LSM state variables. Yet, this approach requires definition of an importance density for the parameters to avoid parameter impoverishment after several successive assimilation steps. This has been demonstrated numerically by Plaza et al. (2012) using a series of data assimilation experiments. In principle, we could corrupt the posterior parameter distribution using the ensemble inflation method of Whitaker and Hamill (2012) detailed in Eq. (13). This approach was used by Qin et al. (2009) to avoid degeneracy of the parameter values. Instead, we use the approach described by Plaza et al. (2012) and perturb the parameter values of the resampled particles using draws from a zero-mean d-variate Gaussian distribution with diagonal covariance matrix. This d × d matrix has zero entries everywhere (uncorrelated dimensions) except on the main diagonal which stores values of s 2 Var α 1:N 0,j , where s is a scaling factor, Var α 1:N 0,j signifies the prior variance of the j th parameter (at t = 0), and j = {1, . . ., d}. This is an adaptation of the method introduced by Moradkhani et al. (2005a) and uses the prior variance of the parameters rather than their variance at the previous measurement time, t − 1. Yet, in the absence of a formal guidelines on the choice of s, this perturbation approach suffers from a lack of adequate statistical underpinning (Vrugt et a., 2013;Yan et al., 2015). In our present application, we set s = 0.1, and evaluate the RRPF performance for the VIC-3L model using other values for this scaling factor as well.

Particle Markov chain Monte Carlo (PMCMC) simulation
The RR procedure produces a sample with more evenly distributed weights, but many of the particles are exact copies of one another. To enhance sample diversity, we therefore evaluate another resampling step using Markov chain Monte Carlo (MCMC) simulation. We follow herein the MCMC resampling method of Vrugt et al. (2013) and create candidate particles after RR using a discrete proposal distribution with state and parameter jumps equal to a multiple of the difference of two or more pairs of resampled particles. Each candidate particle is then re-evaluated between t − 1 and t by the LSM model, and the Metropolis acceptance probability is used to determine whether to replace the "old" particle or not. This combined PF and MCMC methodology is also referred to as PMCMC. Interested readers are referred to Vrugt et al. (2013) for a detailed description of this method.

Important Differences of EnKF and PF
Before we proceed with application of the EnKF-AUG, EnKF-DUAL, RRPF and PMCMC data assimilation methods, we reminisce about the key differences of the EnKF and PF. These differences are often overlooked and misunderstood but of crucial importance to help understand the two filters, and analyse and interpret our findings (see Vrugt et al., 2013). Most critically, the EnKF uses the measured values of the state variables (via measurement operator, if appropriate) to correct (update) the forecasted states of each ensemble member. The state PDF at each time is approximated by a weighted average of the distributions of the measured and forecast states. The PF on the other hand does not use a state analysis step, but rather assigns a likelihood to each particle. This likelihood is a dimensionless scalar which measures in a probabilistic sense the distance between the measured and forecasted state variables. The state PDF at each time is then constructed via the likelihoods (normalized importance weights) of the particles. Resampling is required to rejuvenate the ensemble, but this step is rather inefficient compared to the state analysis step of the EnKF as the measured states are only used indirectly in the PF via calculation of the likelihood. What is more, a single resampling step in RRPF or PMCMC does not guarantee a good approximation of the actual state PDF, as the particles' forecasted states may be systematically biased. Consequently, the PF may need a very large ensemble and/or many resampling steps to characterize properly the state PDF. By contrast, the state analysis step of the EnKF resurrects rapidly a biased ensemble by migrating the members' forecasted states in closer vicinity of their measured values. This crucial difference between the EnKF and PF is the result of their dichotomous design, as is also evident from our mathematical notation. The EnKF estimates separately at each time the state PDF via Eq. (5), whereas the PF is designed to estimate the posterior distribution of the entire state trajectory via the recursion of Eq. (18). This latter task is much more difficult in practice, and requires use of the laws of probability to ensure that each particles' state trajectory constitutes a plausible realization from the transition density, M x i t x i t−1 , juxtaposed by the distribution of the model errors. This latter requirement of plausibility renders impossible the use of an analysis step in the PF (such as EnKF), as the resulting state updates may violate the statistics of the transition density and model error distribution and jeopardize the realism of each particle's state trajectory. Therefore, the PF requires a proper resampling method that takes into explicit account the statistical properties of the state transition density and model error distribution to replace bad particles and ensure an exact characterization of the evolving state PDF.
4 Case study

The Rollesbroich experimental site
We apply the four data assimilation approaches to characterize soil moisture dynamics of the 27 ha Rollesbroich experimental test site (50 • 37 27 N, 6 • 18 17 E) in Germany. This site is located in the Eifel hills and ranges in elevation between 474 and 518 m with mean slope of 1.63 • . The watershed constitutes a sub-basin of the TERENO Rur experimental catchment (Bogena et al., 2010;Qu et al., 2014) and consists of grassland with a soil texture that is predominantly silty loam. The mean annual air temperature and precipitation are 7.7 • C and 1033 mm, respectively. An eddy covariance tower (50 • 37 19 N, 6 • 18 15 E, elevation 514.7 m) and a soil moisture and soil temperature sensor network (with measurements at 5, 20, and 50 cm depths) have been installed (amongst others) at the Rollesbroich site. Water content data are measured at 41 different locations (see Fig. 1) using SPADE soil moisture probes (sceme.de GmbH i.G., Horn-Bad Meinberg, Germany) (Hübner et al., 2009) installed at 5, 20, and 50 cm depths along a vertical profile. The SPADE probe is a ring oscillator and the frequency of the oscillator is a function of the dielectric permittivity of the surrounding medium, which depends strongly on local soil water content because of the high relative permittivity of water (≈ 80) as compared to mineral soil solids (≈ 2-9), and air (≈ 1). The SPADE probe was calibrated following the procedure outlined in (Qu et al., 2014). The soil moisture measurements are subject to several sources of error. This includes an inadequate contact of the sensors with the surrounding soil, and structural imperfections of the equations which relate the sensor response to the dielectric permittivity, and this permittivity to soil moisture.
The atmospheric LSM forcing data in this study were measured at the eddy covariance tower and include hourly measurements of air temperature, air pressure, relative humidity, wind speed, and incoming shortwave and longwave radia- tion. Precipitation was measured by a tipping bucket located in close proximity of the eddy covariance station. Soil texture was determined using 273 soil samples, taken from three different depths, ranging between 5 and 11, 11 and 35, and 35 to 65 cm. The sample locations coincided exactly with the location of the SoilNet sensors. The soil textural composition, organic carbon content, and bulk density were determined for each sample using standard laboratory experiments. These values were averaged to obtain mean values for the listed depths. Soil hydraulic parameters were then estimated for each of these three measurement depths from pedotransfer functions using as input data the basic soil measurements.
In this work, we conveniently assume the soil land surface domain of the Rollesbroich site to be homogeneous and char-acterized by areal-average values of soil moisture content at 5, 20, and 50 cm depths. In other words, we consider only vertical variations in soil water storage. Common LSM data assimilation experiments published in the literature usually involve application to much larger spatial scales, especially when remote sensing data are used. Hence, it is important to evaluate the LSM performance for a site where heterogeneities are neglected. Qu et al. (2014) investigated the geostatistical properties of the soils of the Rollesbroich test site. This work demonstrated a rather small spatial variability of the soil texture. This does not suggest however that we can ignore spatial variations in the measured soil moisture values. Indeed, the standard deviations of soil moisture vary between 0.04 and 0.07 cm 3 cm −3 depending on the actual soil layer. This spatial heterogeneity of the soil moisture data documents variability in the soil hydraulic properties and complicates the application and upscaling of LSMs.

Numerical experiments
A total of N = 100 ensemble members (particles) were used in all our data assimilation experiments. The period from 1 January 2011 to 29 February 2012 was used to spin up VIC-3L and CLM using measured hourly forcing data. The subsequent period between 1 March 2012 and 31 July 2012 served as our "calibration period" during which the daily soil moisture observations at the three measurement depths were used to update the LSM state variables and possibly also its parameter values. The following 5 months from 1 August 2012 to 31 December 2012 were used as an independent evaluation period. During this last period, we did not update the states and set the parameters to their "optimized" values derived from the calibration period. Soil moisture assimilation was initiated in March 2012 as the SPADE water content sensors were deemed unreliable (at least in February) in the preceding winter season due to soil freezing. We terminated our numerical experiments at the end of December 2012, as a large number of sensors seemed to be malfunctioning in subsequent readings, which could impact too much the mean soil moisture values.
Soil moisture contents measured at 5, 20, and 50 cm depths were assimilated jointly. The three (default) soil layers in VIC-3L (0-10, 10-30, and 30-70 cm) were synchronized to match the three measurement depths. Soil parameters were defined separately for all individual layers, measured or not. In CLM, we used 10 (default) soil layers with increasing thickness downwards (see Table 2). The 5, 20, and 50 cm measurement depths correspond to the third, fifth, and sixth layers in CLM. Spatial relationships (covariance matrices) between the soil parameters of the measured layers and their values of the unmeasured layers were used in the EnKF to update the parameterization of layers 1, 2, 4, 7, 8, 9, and 10. A slightly different approach was followed in RRPF and PM-CMC, in which the soil parameters of the unmeasured moisture layers in CLM were updated to their weighted-average values of the resampled particles using the vector of normalized importance weights. The measurement errors of the soil moisture observations are assumed to be zero-mean Gaussian with standard deviation, σ = 0.02 m 3 m −3 . This results in R = 4 × 10 −4 I m in Eqs. (4) and (17), respectively. We admit that 0.02 m 3 m −3 is clearly larger than the uncertainty of the mean soil moisture content averaged over the 41 values. A larger observation error alleviates potential problems with filter inbreeding. Also, we account crudely for errors in LSM model formulation via parameter uncertainty and the use of a stochastic description of the precipitation record of the Rollesbroich site (discussed next). In other words, the k × 1 process noise vector, w t , in Eq.
(2) consists of zeros. However, we agree that it can be expected that we have other model structural errors, for example in relation to the representation of photosynthesis.
The hyetograph of each ensemble member is derived by multiplying the measured hourly precipitation rates of the tipping bucket by multipliers drawn from a unit-mean normal distribution with a standard deviation of 0.10. This is equivalent to a heteroscedastic error of 10 % of the observed precipitation (Hodgkinson et al., 2004). Forcing variables which govern evapotranspiration (incoming shortwave and longwave radiation, air temperature, relative humidity, and wind speed) were not corrupted.
The initial values of VIC-3L and CLM parameters are sampled at random using a simple two-step procedure. This approach honours soil textural data and is consistent with related results published in the literature. First, we draw N times from each marginal distribution listed in Table 1 under the column "perturbation". These distributions originate from Han et al. (2014) for CLM, and Demaria et al. (2007) and Troy et al. (2008) in the case of VIC-3L. This results in a N × d matrix of perturbations for VIC-3L and CLM, respectively. We then create the initial N × d parameter ensemble of VIC-3L and CLM by adding each perturbation matrix to a deterministic vector of "best-guess" parameter values for each model. This initial parameter ensemble is the same for all the assimilation methods. For CLM, this best-guess vec-tor is simply equivalent to the areal-averaged sand, clay, and organic matter fraction of each of the 10 soil layers, respectively. In the case of VIC-3L, we guess that β = 15 (all layers), b = 0.2, and D m = 13 (mm d −1 ), and derive the value of log 10 k s (log 10 (m s −1 )) of all three soil layers from the measured mean areal sand fraction at each of those depths. The best-guess parameter values of VIC-3L and CLM and their respective marginal distributions are jointly also referred to hereafter as prior parameter distribution. We want to compare EnKF and PF starting from the same prior distribution in order to make a more meaningful comparison. EnKF assumes a Gaussian distribution, but the PF not. We believe that assuming an initial uniform distribution is a neutral assumption good for comparing EnKF and PF.
One may debate our best-guess parameter values of VIC-3L and CLM and their respective marginal distributions. Nevertheless, the prior parameter distribution used herein introduces more than sufficient dispersion into the best-guess parameter values to rapidly overcome a possibly deficient initial model parameterization. Note that the prior uncertainty of the two texture parameters (sand and clay fraction) in CLM is much larger than their spread derived from the texture measurements of each soil layer. This inflation of the prior distribution is done purposely to account indirectly for the epistemic uncertainty of the pedotransfer functions that are used to predict the soil hydraulic parameters. Indeed, the prior parameter uncertainty of the sand and clay fraction should be large enough to guarantee a sufficient soil moisture spread of the ensemble, which is of crucial importance for an adequate performance of the different data assimilation methods. Figure 2 shows the measured records of daily precipitation and daily air temperature for the 10-month measurement period used herein. The measurement period is rather wet, with several intensive precipitation events during the summer. For example, notice the event on 27 July 2012 in which 31 mm of precipitation fell in just 1 h. Our experience suggests that such extreme rainfall events corrupt the parameter estimates, in large part due to an inadequate description and/or characterization of surface runoff. What is more, the correlation between the hydraulic parameters of the different layers of our soil domain and the moisture state deteriorates rapidly close to saturation. Therefore, on days with rainfall in excess of 20 mm we resort to state estimation only, and proceed with this the next 2 consecutive days to give VIC-3 and CLM sufficient opportunity to remove, via deficient surface transport or state updating, the excess water. On the third day after each 20 mm + precipitation event, we resume joint LSM state and parameter estimation.
To evaluate joint state-parameter estimation algorithms for the two LSMs and the four different data assimilation algorithms, we carried out the following three numerical experiments for VIC-3L and CLM (see also 2. State updating with EnKF. The soil moisture state variables were updated during the 5-month calibration period using the SPADE moisture content measurements. In theory, soil moisture assimilation should improve our estimates of the initial states of the evaluation period. We posit that this enhanced state-value characterization should improve the accuracy of the LSM simulated (predicted) soil moisture values during the first few days/weeks of the evaluation period, after which the model performance deteriorates rapidly over time in the absence of recursive state adjustments.
3. Joint state-parameter estimation using RRPF, PMCMC, and EnKF with state augmentation and dual estimation. The soil moisture state variables and model parameters are estimated during the 5-month calibration period using the SPADE soil moisture measurements. The parameter values and state variables at the end of the calibration data period are used for the evaluation period.

Summary statistics
We used the Nash-Sutcliffe model efficiency (NSE) and the root mean square error (RMSE) to evaluate the quality-of-fit of VIC-3L and CLM predicted (simulated) soil moisture values during the calibration (assimilation) and evaluation pe- riods. These two metrics are computed separately for the 5, 20, and 50 cm measurement depths as follows: where y i,t and y i,t denote the measured and ensemble mean predicted soil moisture contents at time t, the subscript i constitutes an index for measurement depth, i = {1, . . ., 3}, and t = {1, . . ., n}. The 3 × 1 vector of ensemble mean predicted moisture contents, y t , is simply equivalent to the mean of VIC-3L or CLM forecasted state variables at these respective measurement depths. Larger values of the NSE and smaller values of the RMSE are preferred as they indicate a better LSM performance. In the absence of reliable information about the soil hydraulic properties of the different layers, the soil moisture observations were the only data available to evaluate the results of VIC-3L and CLM and each data assimilation method.

Results
In this section we present the results of our numerical experiments. We first discuss our findings for VIC-3L followed by the results of CLM. Sect. 6 proceeds with a discussion of the main findings.

VIC-3L
Figure 3 displays the observed (blue dots) and VIC-3L predicted soil moisture values (solid lines) at (a) 5, (b) 20, and (c) 50 cm depths using PMCMC (black), RRPF (red), EnKF-AUG (green), and EnKF-DUAL (cyan). As the Rollesbroich test site experiences a yearly average precipitation of more than about 1000 mm it is not surprise that the upper soil layer at 5 cm is rather wet with volumetric soil moisture contents that vary dynamically between 0.3 and 0.5 cm 3 cm −3 in response to atmospheric forcing. This is especially true during the summer months (weeks 12-22) and explained by a rapid succession of rainfall and drying events. The larger porosity values of the surface layer explain the relatively high soil moisture contents of the 5 cm measurement depth. The storage time series of the deeper soil layers at 20 and 50 cm depths exhibit a rather negligible temporal variation with soil moisture values that range between 0.3 and 0.4 cm 3 cm −3 and show a damped and lagged response to rainfall. Note that the soil water storage of the deepest layer increases steadily during the year. This implies a drainage flux from the top soil to the aquifer (and drainage channels). The different data assimilation methods demonstrate a rather similar performance with VIC-3L predicted moisture contents that track reasonably well the three different layers. Note however that RRPF does not reproduce well the measured data at 50 cm depth in the period from March (week 1) to June (week 17). This might be caused by filter inbreeding of the states, and will be discussed later (see also Fig. 9b). Nevertheless, RRPF recovers the observed soil moisture data in week 18. Although difficult to see, the EnKF produces the best results at 50 cm depth (state augmentation and dual estimation). Table 4 summarizes the NSE and RMSE values of PM-CMC, RRPF, EnKF-DUAL, and EnKF-AUG for the calibration (assimilation) period. We also list the performance of VIC-3L without data assimilation (OpenLoop) using the mean soil moisture time series of many different realizations of the prior parameter distribution, and include RMSE and NSE values of the EnKF for state estimation only (noParamUpdate) using VIC-3L parameterizations drawn randomly from its prior parameter distribution. The open loop deviates most from the measured values, with RMSE values of 0.036, 0.037, and 0.129 cm 3 cm −3 for the 5, 20, and 50 cm measurement depths. The different data assimilation methods improve significantly the quality-of-fit of VIC-3L compared to the open-loop run. EnKF-AUG and EnKF-DUAL exhibit an almost identical performance, with similar NSE and RMSE values. The particle filters RRPF and PM-CMC demonstrate comparable results for the 5 and 20 cm depths, but exhibit somewhat inferior performance compared to EnKF-AUG and EnKF-DUAL for the 50 cm layer. The table confirms our previous finding that the PF exhibits difficulties in tracking the soil moisture data of the deepest measurement layer. Indeed, the RMSE value of 0.088 of the PF for this layer is much larger than its counterparts of 0.021, 0.014, and 0.016 derived from PMCMC, EnKF-AUG, and EnKF-DUAL, respectively. Perhaps surprisingly, the best performance of VIC-3L is obtained for state estimation only (noParamUpdate) using model parameterizations drawn randomly from the prior parameter distribution. We posit that the nonlinear relationship between states and parameters may introduce inconsistencies in PMCMC, RRPF, EnKF-AUG, and EnKF-DUAL which jointly estimate VIC-3L states and parameters. Overall, the EnKF gives somewhat better results than the PF, particularly for the deepest mea-  surement layer, and PMCMC exhibits a better performance than RRPF. Figure 4 presents trace plots of VIC-3L parameters during the 5-month calibration period using the PMCMC (black), PF (red), EnKF-AUG (green), and EnKF-DUAL (cyan) data assimilation methods. We display the ensemble mean saturated hydraulic conductivity (log 10 k s in m s −1 ) at (a) 5 cm, (b) 20 cm, and (c) 50 cm depths, (d) b, β at (e) 5 cm, (f) 20 cm, and (g) 50 cm depths, and (h) the maximum baseflow velocity D m in mm day −1 . In general, the different data assimilation methods result in somewhat similar trajectories of the ensemble mean parameter values during the calibration period. In particular, the parameter trace plots of EnKF-AUG and EnKF-DUAL appear almost identical, with the exception of parameter b and β at 50 cm depth. Note that the parameters of the surface layer exhibit the most dynamics in response to atmospheric forcing. PMCMC exhibits significant temporal dynamics. This is not surprising, and a consequence of the MCMC resampling step that is used to rejuvenate the parameter samples (e.g., Vrugt et al., 2013). In the first place, the DREAM-type proposal distribution that is used to create candidate particles allows for relatively large moves in the parameter space. Second, only a small LSM trajectory between two successive soil moisture observations is used to determine the acceptance probability of each candidate particle. With such a short (re)-simulation period, insensitive parameters are allowed to transition to very different values, as they do not affect the model output between the two observations and thus the likelihood of a candidate particle. Altogether, this also contributes to a stronger dependency of PM-CMC on the initial parameter ensemble. This collection of parameter vectors is drawn randomly from the prior parameter distribution and differs per trial depending on the random seed. The use of a larger historical simulation period (going back further in time) would better constrain VIC-3L parameters but also increase significantly the computational burden of resampling. Nonetheless, the ensemble mean VIC-3L parameter values of the different data assimilation methods are remarkably similar at the end of the calibration period, af-ter assimilating the soil moisture observations of week 22. The exception to this is parameter b, whose trajectories differ most, with values at the end of the calibration period that range between values of 0.11 for RRPF and 0.25 for EnKF-DUAL. Finally, parameter D m converges systematically to values of 1-2 mm day −1 but at a different rate for the data assimilation methods. The EnKF-AUG, EnKF-DUAL, and PMCMC methods need just a few soil moisture observations to determine the value of D m , whereas RRPF converges at a much slower pace. This might explain the rather inferior performance of RRPF for the 50 cm measurement depth during a substantial part of the assimilation period.
To provide a better understanding of the ensemble spread of VIC-3L parameters, please consider Fig. 5, which presents trace plots of the sampled log 10 k s (left column) and β (right column) values at the 20 cm measurement depth for the N = 100 members. Results are presented in order of (a-b) PM-CMC (grey), (c-d) RRPF (red), (e-f) EnKF-AUG (green), and (g-h) EnKF-DUAL (cyan), and the ensemble mean is indicated with the solid black line. The ensemble members cover a relatively large part of the prior distribution of both parameters, with the exception of RRPF, which seems to underestimate the actual uncertainty of log 10 k s and β. This is an artefact of small s, which discourages large parameter adjustments. Nevertheless, note that the ensemble mean of the parameters is rather unaffected by assimilation of the soil moisture data, except for the small increase in log 10 k s and β in late April due to increased precipitation in the following months (see also Fig. 2). Figure 6 displays VIC-3L simulated soil moisture time series for the independent 5-month evaluation period at (a) 5, (b) 20, and (c) 50 cm depths using initial states and parameter values derived from PMCMC (black), PF (red), EnKF-AUG (green), and EnKF-DUAL (cyan). The observed soil moisture values are separately indicated with the solid blue dots. The water content simulations of VIC-3L are hardly distinguishable, except for the deepest soil layer at 50 cm depth. Apparently, it does not matter which data assimilation method is used to estimate VIC-3L parameter values and ini-  tial states of the evaluation period. VIC-3L tracks very well the soil moisture data at 20 cm depth, but does not do a particularly good job of describing water content dynamics at 5 and 50 cm depths. In particular, the model systematically underestimates the observed storage of the bottom soil layer between weeks 25 and 36. This might be a consequence of the use of a fixed lower boundary condition (no connection with the underlying aquifer) and/or the relatively simple baseflow parameterization. Although not further shown herein, a separate VIC-3L run using state estimation only (noParamUpdate) produces similar results after a few days to an openloop simulation. We summarize in Table 5 the NSE and RMSE values of PMCMC, RRPF, EnKF-DUAL, and EnKF-AUG during the 5-month evaluation period. We also list the performance of VIC-3L without data assimilation (OpenLoop) using the mean soil moisture time series of many different realizations of the prior parameter distribution, and include RMSE and NSE values of the EnKF for state estimation only (noParamUpdate) using VIC-3L parameterizations drawn randomly from its prior parameter distribution. In general, the RMSE values of the evaluation period are much higher than their counterparts of the assimilation period, and noParamUpdate produces RMSE values similar to that of an open-loop simulation. VIC-3L parameter estimation is pro-ductive, as it substantially reduces the RMSE values of 20 and 50 cm measurement depths compared to a model run with state estimation only (noParamUpdate) and parameters drawn randomly from their prior distribution. More specifically, PMCMC, RRPF, EnKF-AUG, and EnKF-DUAL show a RMSE improvement of about 54 and 42 % for the second and third measurement depths compared to OpenLoop and noParamUpdate. The NSE values of VIC-3L for the 50 cm depth are negative for all six methods, conclusively demonstrating an inferior performance of the model for this soil layer.
We now investigate in more detail the effect of MCMC resampling with the PF as Fig. 4 has demonstrated that PM-CMC produces rather dynamic trajectories of the sampled parameter values. Nevertheless, the parameters converge to stable values at the end of the assimilation period. This suggests that the choice of the length of the calibration period is crucially important in determining the performance of PM-CMC during the evaluation period. To investigate this in more detail we use 11 June, 30 June, 20 July, and 31 July 2012 as end dates of the PMCMC calibration period and verify VIC-3L performance for the same 5-month evaluation period. The different end dates are conveniently referred to as PMCMC_0611, PMCMC_0630, PMCMC_0720, and PM-CMC_0731 in Fig. 7. The simulated soil moisture trajectories of PMCMC_0630, PMCMC_0720, and PMCMC_0731 are in excellent agreement, but deviate from PMCMC_0611. Thus, a 4-month calibration period would have led to the same results of PMCMC.
The effect of initial uncertainties on the performance of EnKF with the ensemble inflation method is also tested with the VIC-3L model. Table 6 compares the RMSE values of EnKF-AUG and EnKF-DUAL for the calibration and evaluation period using heteroscedastic precipitation data errors equivalent to 10 % (default) and 20 % of their measured hourly rates plotted in Fig. 2. We list separate RMSE values for each soil moisture measurement depth. In short, the results are equivalent for both EnKF implementations.
Next, we evaluate the effect of the choice of the scaling factor s in RRPF on VIC-3L output. This scalar plays a crucial role in the resampling of the parameters in the PF. If s is taken too large, the resampling step will introduce parameter drift and corrupt the approximation of p X 1:t | Y 1:t and p x t | Y 1:t . By contrast, if s is too small, then the resampled parameters exhibit insufficient dispersion and underestimate the actual parameter uncertainty. In the absence of theoretical convergence proofs and clear guidelines on the selection of s, the RRPF cannot estimate exactly the posterior state and parameter PDF (Vrugt et al., 2013;Yan et al., 2015). Previous applications of RRPF have suggested a value of s = 0.01 (DeChant and Plaza et al., 2012), but thus far we have used s = 0.1 to avoid sample impoverishment. Table 7 lists RMSE values of VIC-3L for the 5, 20, and 50 cm measurement depths for the calibra- Table 5. Evaluation period: values of the NSE and RMSE summary statistics of the quality of fit of VIC-3L for the Rollesbroich soil moisture observations at 5, 20, and 50 cm depths using the calibrated parameter values and initial states derived from the PMCMC, RRPF, EnKF-AUG and EnKF-DUAL data assimilation methods. For completeness, we also list the performance of the EnKF using state estimation only (noParamUpdate) using VIC-3L parameter values drawn randomly from the prior parameter distribution, and the performance of an open-loop run of VIC-3L (OpenLoop) using the mean simulation of many different VIC-3L parameterizations drawn randomly from the prior parameter distribution.  Table 6. RMSE values of VIC-3L for the Rollesbroich soil moisture measurements at 5, 20, and 50 cm depths using the EnKF with state AUGmentation or DUAL estimation during the calibration period. We also summarize the subsequent performance of the VIC-3L model using the calibrated parameter values and initial states derived from AUG and DUAL. The subscripts 10 and 20 % signify the standard deviations of the measurement errors that are used to corrupt the hourly precipitation data.
Period Soil depth EnKF-AUG_10 % EnKF-AUG_20 % EnKF-DUAL_10 % EnKF-DUAL_20 %  Table 7. RMSE values of VIC-3L for the Rollesbroich soil moisture observations at 5, 20, and 50 cm depths using data assimilation with RRPF during the calibration period. We also summarize the subsequent performance of the VIC-3L model using the calibrated parameter values and initial states derived from RRPF. The subscripts 0.01, 0.1, and 0.5 signify the value of the scaling factor s of the multivariate normal distribution that is used to perturb the parameter values (importance density). tion and evaluation period using RRPF with s = 0.01, s = 0.1, and s = 0.5, respectively. These three runs are coined RRPF_0.01, RRPF_0.1 and RRPF_0.5, respectively. These results demonstrate that a value of s = 0.5 significantly enhances the performance of RRPF during the calibration period. The RMSE values are reduced from 0.025, 0.012, and 0.113 to 0.015, 0.007, and 0.037 for the 5, 20, and 50 cm measurement depths. RRPF_0.5 also shows substantial improvements over RRPF_0.01 during the evaluation period. This improvement is most apparent for the 20 and 50 cm soil depths with RMSE values that have decreased from 0.025 and 0.119 to 0.020 and 0.071, respectively. These results are on par with our default setting of s = 0.1 in RRPF. These findings provide evidence for our claim that the scaling factor s plays a crucial role in RRPF. What is more, it provides support for our conclusion in Fig. 5 that RRPF underestimates the actual uncertainty of log 10 k s and β. Larger values of s will increase the parameter spread, which in turn will enhance the uncertainty among the particles' forecasted states. This makes it easier for RRPF to track the observed soil moisture data during the calibration period. Figure 8 displays trace plots of the sampled N = 100 trajectories of the saturated hydraulic conductivity (log 10 k s in m s −1 ) at 50 cm depth (left column) and parameter β (right column) of VIC-3L during the 5-month assimilation period using (a-b) RRPF_0.01, (c-d) RRPF_0.1, and (ef) RRPF_0.5. As expected, larger values of s increase the spread of the sampled values of VIC-3L parameters, as evidenced by an increasingly larger particle coverage of the prior parameter distribution. This larger spread of the particles' parameter values also enhances the ability of RRPF to  Fig. 8e), but also many of the resampled parameter values might be deemed nonbehavioral, enhancing considerably the chances of particle degeneration. To demonstrate this more explicitly, Fig. 9 shows trace plots of VIC-3L predicted soil moisture contents of the N = 100 particles at 50 cm depth using (a) RRPF_0.01, (b) RRPF_0.1, and (c) RRPF_0.5. The RRPF is excessively optimistic for s = 0.01, with a negligible uncertainty in the predicted soil moisture values between weeks 2 and 14. Note that in weeks 2-4 the ensemble has collapsed to a deterministic simulation (appears as a single line). A similar result is observed for RRPF_0.1 but with enhanced uncertainty in soil moisture values for the second part of the calibration period. In PF_0.1 particle degeneration from March to June explains its bad performance from March to June in Fig. 3. The use of s = 0.5 enhances considerably the spread of VIC-3L soil moisture predictions. Yet, the ensemble spread has become quite large from week 15 onwards. For these reasons, we are satisfied with our value of s = 0.1 in RRPF, although this decision is subjective and would require much testing via trial-and-error. This has stimulated Vrugt et al. (2013) to introduce a parameter resampling method which is properly rooted in statistical theory and uses laws of probability to rejuvenate the ensemble. Figure 10 shows the observed (blue dots) and ensemble mean predicted soil moisture values by CLM (solid lines) at (a) 5, (b) 20, and (c) 50 cm depths during the assimilation period using PMCMC (black), PF (red), EnKF-AUG (green), and EnKF-DUAL (cyan). The most important results are as follows. First, the ensemble mean soil moisture time series of CLM exhibit a larger spread than VIC-3L depicted previously in Fig. 3. Second, the EnKF-AUG and EnKF-DUAL exhibit a superior performance with ensemble mean CLM simulations that track closely the observed soil moisture observations at each depth. Third, the moisture time series (and data) demonstrate most dynamics at the 5 cm depth in response to the variable atmospheric boundary conditions. Fourth, the worst performance is observed for RRPF, as evidenced by systematic deviations of this filter's soil moisture predictions with the observed data between weeks 3-6 and 18-21 for the 5 cm depth, weeks 1-14 and weeks 18-21 for the 20 cm depth, and weeks 1-15 and 19-22 for the 50 cm measurement depth. Fourth, the initial soil moisture values of CLM at 50 cm depth appear positively biased with a distance of approximately 0.05 cm 3 cm −3 to the areal-mean value of the soil water contents measured by the SPADE sensors on 1 March 2012 (first day of week 1). A smaller bias of 0.03 cm 3 cm −3 is observed at the 20 cm depth. The ENKF-AUG and EnKF-DUAL methods need a few days to recover from this erroneous initialization. For PMCMC_0611, PMCMC_0630, and PMCMC_0720, the soil moisture state on 1 August 2012, the first day of the 5-month evaluation period, was derived from VIC-3L simulation using the analysis state and parameter values of the last day of the assimilation period. Table 8 lists the NSE and RMSE values of PMCMC, RRPF, EnKF-DUAL, and EnKF-AUG for the CLM calibration (assimilation) period. We also list the performance of CLM without data assimilation (OpenLoop) using the mean soil moisture time series of many different realizations of the prior parameter distribution, and list in column with header "noParamUpdate" the RMSE and NSE values of the EnKF using state estimation only with CLM parameterizations drawn randomly from the prior parameter distribution. These results demonstrate that soil moisture assimilation enhances considerably the ability of CLM to predict the observed data. Compared to open-loop CLM simulation, the RMSE is reduced from 0.051, 0.031, and 0.069 to values of about 0.020, 0.012, and 0.016 (average) for the different data assimilation methods, respectively. Yet, the RMSE and NSE values of a CLM run with state estimation only (noParamUpdate) appear as good as those derived from joint parameter and state estimation using PMCMC, RRPF, EnkF-AUG, and EnKF-DUAL. Overall, the best performance is observed for EnKF-AUG and EnKF-DUAL, followed by PMCMC and RRPF.    We proceed in Fig. 11 with trace plots of the N = 100 sampled trajectories of the saturated hydraulic conductivity (log 10 k s in m s −1 ) at 50 cm depth (left column) and soil hydraulic parameter B at 50 cm depth (right column) during the 5-month assimilation period using (a-b) PMCMC, (cd) RRPF, (e-f) EnKF-AUG, and (g-h) EnKF-DUAL. The evolution of the ensemble mean log 10 k s and B values is separately indicated with the solid black line. The largest spread of the ensemble members is observed for EnKF-AUG and EnKF-DUAL and explained by the inflation method of Eq. (13) which inherits and sustains the prior parameter uncertainty. The RRPF sampled trajectories of log 10 k s and B exhibit a rather small uncertainty, with PDFs of these two pa-rameters that appear well defined at all measurement times. This might explain the inferior performance of RRPF as detailed previously in Table 8. Overall, the two CLM parameters do not exhibit large temporal changes and converge to stable values in the last few weeks of the calibration period. Figure 12 displays the observed (blue dots) and ensemble mean predicted soil moisture values by CLM (solid lines) at (a) 5, (b) 20, and (c) 50 cm depths during the evaluation period using PMCMC (black), PF (red), EnKF-AUG (green), and EnKF-DUAL (cyan). The soil moisture time series of the different data assimilation methods appear rather similar with largest differences observed at the 50 cm depth. In general, the PMCMC, RRPF, EnKF-AUG, and EnKF-DUAL methods do not do a particularly good job of tracking the soil moisture observations of the top soil layer. Indeed, the CLM soil moisture predictions derived from the different data assimilations are systematically biased, either underestimating (weeks 35-41 and 43-44) or overestimating (weeks 24-31 and 42) the observed soil moisture data during large parts of the evaluation data set. CLM tracks much better the soil moisture data of the 20 and 50 cm depths.

CLM
Finally, Table 9 presents the NSE and RMSE values of PMCMC, RRPF, EnKF-AUG and EnKF-DUAL during the 5-month evaluation period. We also list the performance of VIC-3L without data assimilation (OpenLoop) using the mean soil moisture time series derived from many different realizations of the prior parameter distribution, and display NSE and RMSE values of the EnFK using state estimation only (noParamUpdate) with CLM parameterizations drawn randomly from the prior parameter distribution. The results of this Table are in agreement with our findings for VIC-3L. Indeed, the RMSE values of the evaluation period are much higher than their counterparts of the assimilation period. This is particularly evident for the 5 cm measurement depth where RMSE values have increased from 0.017-0.027 to 0.054-0.058. The deeper measurement depths do not appear to be as much affected, consistent with our findings from Fig. 12. The results also highlight the importance of joint CLM parameter and state estimation as state estimation alone (column Figure 11. Sampled trajectories of the N = 100 ensemble members of the saturated hydraulic conductivity (log 10 k s in m s −1 ) at 50 cm depth (a, c, e, g) and soil hydraulic parameter B at 50 cm depth (b, d, f, h) of CLM during the 5-month assimilation period of weeks 1 to 22 using (a-b) PMCMC (grey) (c-d) RRPF (red), (e-f) EnKF-AUG (green), and (g-h) EnKF-DUAL (cyan). The solid black line signifies the evolution of the ensemble mean values of log 10 k s and B. Please note that log 10 k s (in log 10 (m s −1 )) and parameter B are derived from the sand, clay, and organic matter fractions of each soil layer, which are estimated during the assimilation period. noParamUpdate) results in significantly larger RMSE values during the evaluation period. This is most evident for the 50 cm measurement depth, where the RMSE value of 0.050 of noParamUpdate is much larger than its value of 0.016-0.025 derived from PMCMC, RRPF, EnKF-AUG and EnKF-DUAL. Altogether, RRPF achieves the worst performance of all four parameter-state estimation methods during the evaluation period. PMCMC, EnKF-AUG and EnKF-DUAL provide rather similar RMSE and NSE values.

Discussion
In this study, we have evaluated the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the VIC-3L and CLM land surface models using a 5-month calibration (assimilation) data set of distributed SPADE soil moisture measurements at the 5, 20, and 50 cm depths in the Rollesbroich test site in the Eifel mountain range in western Germany. We used the EnKF with state augmentation or dual estimation, respectively, and the PF with a simple, statistically deficient, or more sophisticated, MCMC-based parameter resampling method. The "calibrated" LSM models were tested using water content data from a 5-month evaluation period. The uniqueness of the present work resides in the application of these four joint or dual parameter and state estimation methods to real-world data.
Our results demonstrated that joint inference of VIC-3L and CLM soil parameters improved considerably soil moisture characterization during the evaluation period compared to the mean water content predictions of an open-loop run derived via averaging of simulations of many different realizations drawn randomly from the prior parameter distribution. This is particularly true for CLM, the two deeper soil layers, and the EnKF-AUG and EnKF-DUAL methods (but followed closely by PMCMC). Despite this improvement in model performance over an open-loop simulation, VIC-3L and CLM do not adequately characterize soil moisture dynamics of the top layer (5 cm measurement depth) during the evaluation period (RMSE values of about 0.05 cm 3 cm −3 ). We posit that these two models do not characterize adequately processes such as water infiltration, soil evaporation, and/or root water uptake (transpiration), which govern rapid variations in soil moisture storage in the top soil. VIC-3L Hydrol. Earth Syst. Sci., 21, 4927-4958, 2017 www.hydrol-earth-syst-sci.net/21/4927/2017/  also appeared deficient at 50 cm depth during the evaluation period, with RMSE values of about 0.07 cm 3 cm −3 , which are much larger than their counterparts of approximately 0.02 cm 3 cm −3 derived from CLM. These results favor the use of CLM, which uses a more physics-based description of soil water movement, storage, and associated hydrological fluxes at the Rollesbroich site.
The improvement in quality-of-fit of the VIC-3L and CLM models compared to an open-loop run does not necessarily imply that the estimated parameter values of VIC-3L and CLM characterize better the hydraulic properties and maximum baseflow velocity of the soils of the Rollesbroich experimental test site. Assimilation studies with synthetically generated data help to ascertain whether the model parameters converge properly to their "true" values, yet this is difficult to confirm with real-world measurements. State estimation will, without doubt, help reduce the impact of epistemic errors and systematic biases of LSM input and forcing data on parameter inference during the assimilation period (e.g., Vrugt et al., 2005b). But the calibrated parameter values derived with state estimation do not necessarily guarantee a consistent and adequate model performance during an independent evaluation period without state estimation. Indeed, without assimilation the simulated states may diverge from their "measured" values and deteriorate model performance in an evaluation period. This begs the question of which parameter values we should use to predict future system behavior outside an assimilation period. Should we use parameter estimates derived with state estimation or should we use their values derived via batch calibration (optimization) without recursive adjustments to the state variables? This dilemma is illustrated further in Vrugt et al. (2005b) by modeling of a subsurface tracer test using data from Yucca Mountain, Nevada, USA. We conclude that the enhanced performance of VIC-3L and CLM during the evaluation period compared to our open-loop simulation is due to improved estimates of the initial states and the soil parameters.
In our implementation of the EnKF and PF, VIC-3L and CLM parameters were assumed to be time-variant and their values updated jointly with the model states at each assimilation time step. The 5-month calibration period we used herein involves several large precipitation events, and as a consequence, the soil profile is rather wet. The resulting parameter estimates might therefore not be representative of dry periods with much lower moisture values of the soil profile. What is more, the assumption of spatial homogeneity might not characterize adequately the distributed soil properties of the Rollesbroich site and induce temporal variability in VIC-3L and CLM parameters. Bias in model input and measurement errors of the forcing data also contribute to the temporal fluctuations of the estimated parameter values. These temporal parameter variations are meaningful in some cases as they can help diagnose structural model inadequacies and/or biases in model input and forcing data. Kurtz et al. (2012) successfully estimated a temporally variant parameter with the EnKF, but these authors concluded that the algorithm needs a considerable spin-up period to "warm-up" to new parameter values. Vrugt et al. (2013) found considerable temporal non-stationarity in the parameters estimated by PMCMC as a result of the small time period used to calculate the acceptance probability of candidate particles. This finding is in agreement with the results of PMCMC in our paper. Of course, we could have assumed time-invariant parameters via a method such as SODA, but this would have enhanced significantly computational requirements. Fortunately, parameters estimated via our implementation of the EnKF exhibit asymptotic properties during the assimilation period (e.g., see Y. . This is particularly true for highly sensitive parameters. An example of this was parameter D m of VIC-3L which quickly converged to values of around 1-2 mm after assimilating just a handful of soil moisture observations. It is difficult to assess whether the inferred VIC-3L and CLM parameter values will do a good job of predicting soil moisture dynamics at the different measurement depths during a much longer evaluation period with wet and dry conditions. As the estimated parameters represent apparent properties of the Rollesbroich site, one may expect their calibrated values not to change too much over time. We would need additional soil moisture data and/or other types of measurements to corroborate this. Nevertheless, the apparent parameter values derived herein improve characterization of soil moisture dynamics at the Rollesbroich site compared to a separate state estimation run with VIC-3L and CLM using parameters drawn randomly from the prior distribution, or open-loop simulation using the ensemble mean model output of a large cohort of parameter vectors drawn randomly from the prior parameter distribution (initial parameter ensemble).
The different data assimilation methods (EnKF-AUG, EnKF-DUAL, RRPF, and PMCMC) led to a rather similar performance of VIC-3L during the calibration and evaluation periods. The only exception to this was the anomalous RMSE value of RRPF at the 50 cm measurement depth during the calibration period. This was explained by the slow convergence of the maximum baseflow velocity in RRPF. Our results for VIC-3L further demonstrated that the results of EnKF-AUG and EnKF-DUAL were equivalent for 10 and 20 % rainfall errors. Moreover, the use of a larger value of the scaling s in RRPF reduced considerably the RMSE values of VIC-3L in the calibration data period, particularly at the 50 cm measurement depth, whereas model performance was hardly improved during the evaluation period.
For CLM, larger differences were observed in the performance of the different data assimilation methods. This larger disparity among the methods is explained by the considerably larger number of soil layers (10) used by CLM. This increased significantly the dimensionality of the parameter estimation problem. The overall best results at the 5, 20, and 50 cm measurement depths were observed for EnKF-AUG and EnKF-DUAL, with RMSE values that were somewhat smaller than their counterparts derived from PMCMC. This was true for both the calibration and evaluation periods. The RRPF exhibited the worst performance, in part determined by the use of a relatively small ensemble of N = 100 particles. The superiority of the EnKF-AUG and EnKF-DUAL methods for CLM is consistent with our expectations articulated previously in Sect. 3.1. The analysis step of the EnKF makes it much easier for EnKF-AUG and EnKF-DUAL to track the measured soil moisture dynamics, thereby promoting convergence in high-dimensional state-parameter spaces. PF-based methods, by contrast, deteriorate in robustness and efficiency with larger dimensionality of the state-parameter space as they lack a state-analysis step and approximate the transient state-parameter PDF via the particles' likelihoods. This likelihood is only a low-dimensional summary statistic of the distance between the forecasted and measured values of the states. Resampling with MCMC via the likelihood thus becomes increasingly more difficult in highdimensional state-parameter spaces. For CLM, the PMCMC method still achieves comparable results to EnKF-AUG and EnKF-DUAL as the dimensionality of the state-parameter PDF of this model is only somewhat larger than its counterpart of VIC-3L. Of course, the use of a larger ensemble size makes it easier to characterize the transient stateparameter PDF, but at the expense of a significantly increased CPU cost. For PMCMC, multiple different MCMC resampling steps can also enhance significantly the particle ensemble by allowing each particle trajectory to improve its likelihood. Yet, this deteriorates significantly the efficiency of implementation as each candidate particle requires a separate model evaluation of VIC-3L or CLM to determine its likelihood. Thus, for LSMs with relatively few state variables and model parameters, we expect the EnKF and PF methods to achieve a comparable performance. For larger-dimensional state-parameter spaces we would recommend EnKF-AUG and EnKF-DUAL, unless one can afford a very large number of particles.
Finally, our results demonstrated that the differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies between the four data assimilation methods. Overall, CLM performed better than VIC-3L, especially at 50 cm measurement depth. Of course, we cannot generalize this finding to other sites, but VIC-3L's rather poor characterization of soil moisture dynamics at 50 cm depth (systematic underestimation during the first

Conclusions
In this study, we have evaluated the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the Variable Infiltration Capacity Model (VIC-3L) and the Community Land Model (CLM) using a 5-month calibration (assimilation) period (March-July 2012) of areal-averaged SPADE soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. This watershed is part of TERENO observatories and has been extensively monitored since 2011 to catalog long-term ecological, social, and economic impacts of global change at regional level. We used the EnKF with state augmentation or dual estimation, respectively, and the PF with a simple, statistically deficient, or more sophisticated, MCMCbased parameter resampling method. The "calibrated" LSM models were tested using SPADE water content measurements from a 5-month evaluation period (August-December 2012). The performance of the four different state and parameter estimation methods appeared rather similar during the evaluation period, with a slightly better performance of the augmentation and dual estimation methods, but followed closely by PMCMC and then RRPF. The differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies between the four data assimilation methods. Overall, the best performance was observed for CLM. The large systematic underestimation of water storage at 50 cm depth by VIC-3L during the first few months of the evaluation period questions, in part, the validity of its fixed lower boundary condition at the bottom of the modeled soil domain. This approach ignores the movement of water into and out of the groundwater reservoir of the Rollesbroich site. CLM simulates interactions of the modeled soil domain with the Rollesbroich aquifer via the use of a variable water depth at the lower boundary.
Appendix A: Parameterization of the VIC-3L model The integrated water balance in VIC-3L can be written as follows: where S (L) is storage, t (T) denotes time, ∂S/∂t (LT −1 ) signifies the change in water storage, and P , T , E, Q d , and Q b (LT −1 ) represent fluxes of precipitation, canopy transpiration, soil evaporation, direct runoff, and baseflow, respectively. Bare soil evaporation, E, is calculated using the equation of Francini and Pacciani (1991). The canopy transpiration flux, T , is equivalent to the total uptake of water by plant roots in our soil profile and is estimated following Blondin (1991) and Ducoudre et al. (1993) using the bulk equation of Monteith (1963). In this "single-leaf" approach, the canopy resistance is assumed to be a function of the minimum canopy resistance and environmental variables (factors) such as photosynthetically active radiation, ambient temperature, vapor pressure deficit, and soil moisture content. We refer to Wigmosta et al. (1994) for a detailed discussion of these four limiting variables, including their mathematical description and the parameterization used therein. When it rains the leaves become covered with a thin film of water and the transpiration flux is suppressed temporarily until the intercepted water has evaporated at the potential rate derived from the Penman-Monteith equation (Shuttleworth, 2007).
To calculate foliage storage the maximum canopy water storage is set to a multiple of 0.2 of the leaf area index (Dickinson, 1984). Direct runoff, Q d , reduces the amount of rainfall that can infiltrate into the top soil during wet conditions, and is calculated using (Liang et al., 1996) Q d (A2) where the triples {θ 1 , φ 1 , z 1 } and {θ 2 , φ 2 , z 2 } signify the volumetric moisture content (L 3 L −3 ), porosity (-), and depth (L) of the top layer of the soil and the next or second layer immediately below it, respectively, I 0 (L) and I max (L) denote the actual and maximum moisture capacity of the soil, respectively, t (L) signifies the integration time step that is used to solve Eq. (A1) numerically, and b (-) is an unknown shape parameter that measures the spatial variability of the soil moisture capacity. Note that the integration time step, t, is often missing from Eq. (A2) in VIC manuals or literature publications. This is consistent if rainfall, P , is expressed in units of depth, say mm, but invalid in conjunction with Eq. (A1), which requires as input precipitation rates. If the integration time step is set equivalent to the time unit of the measured precipitation rates, then t = 1. This approach however can introduce large numerical errors, particularly if the soil is close to saturation. The dimensionless parameter b is usually determined via calibration by fitting VIC-3L against a historical record of soil moisture observations and/or flux data. The direct runoff in Eq. (A2) is not only a function of the water saturation of the first layer, but also depends on the moisture content of the second underlying soil layer. To be able to track adequately the large storage variations of the top soil observed in experimental data, the first layer of VIC-3L must be taken rather small. Consequently, this top layer will saturate quickly in response to rainfall as it exhibits a rather negligible water holding capacity. Hence, VIC-3L uses the available storage of the first and second layer to determine the excess precipitation, which is set equivalent to Q d . If the rainfall depth exceeds the available moisture capacity of the soil, (I max − I 0 ), then the excess precipitation is removed via surface runoff. Otherwise, if P t ≤ (I max − I 0 ), then a large fraction of the rainfall will infiltrate depending on the soil's available storage and the spatial variability of the moisture capacity within the grid cell. The values of I 0 and I max are estimated from (Zhao, 1992): where A s (-) is the areal fraction of the grid cell that is saturated (infiltration capacity equal to I max ): .

(A5)
The baseflow, Q b , originates from the bottom (third) soil layer and is calculated using the formulation of the Arno model (Franchini and Pacciani, 1991): where D m (LT −1 ) is the maximum baseflow velocity, and D S and W S are dimensionless fractions of D m and the porosity of the third layer, φ 3 , respectively. The baseflow flux is linearly dependent on the water content of the third layer if θ 3 ≤ W s φ 3 , and increases nonlinearly with water storage of the third layer if θ 3 ≥ W s φ 3 . Now that we have discussed the different fluxes from the soil domain simulated by VIC-3L, we can now write differential equations of the moisture dynamics in the individual Hydrol. Earth Syst. Sci., 21, 4927-4958, 2017 www.hydrol-earth-syst-sci.net/21/4927/2017/ soil layers (see also Liang et al., 1996).
∂θ 1 ∂t z 1 = P + Q 1,2 − Q d − R 1 − E, ∂θ 2 ∂t z 2 = Q 2,3 − Q 1,2 − R 2 , (A7) where Q i,i+1 (LT −1 ) is the vertical flux of water between two adjacent soil layers i and i + 1, R i (LT −1 ) signifies the root water uptake of the ith layer, and i = {1, 2, 3}. Downward fluxes are negative to be consistent with convention used in soil hydrology. The canopy transpiration flux is equal to the total water uptake by the plant roots, thus T = R 1 + R 2 + R 3 . All three soil layers contain roots and thus contribute to transpiration in our application of VIC-3L to the Rollesbroich site. The vertical flux of water between two adjacent soil layers is assumed to be equivalent to the hydraulic conductivity of the upper layer. VIC-3L computes the hydraulic conductivity of each soil layer using the formulation of Brooks and Corey (1964): where k s,i (LT −1 ) and θ r,i (L 3 L −3 ) signify the saturated hydraulic conductivity and the residual volumetric moisture content of the ith soil layer, respectively. The minus sign on the right-hand side of Eq. (A8) matches the direction of the flux. The dimensionless exponent β i should be larger than 3.0. The use of three soil layers by VIC-3L makes it difficult to describe accurately the vertical moisture distribution in the vadose zone. Indeed, VIC-3L cannot distinguish between saturated and partially saturated areas in a given soil layer. As a consequence, the baseflow flux, Q b , is made up of water from the unsaturated zone and the groundwater (Liang et al., 1996(Liang et al., , 2003. Liang et al. (2003) developed a new parameterization, which considers explicitly effects of surface and groundwater interactions on soil moisture, transpiration, soil evaporation, runoff, and recharge. This parameterization, coined VIC-ground, enhanced considerably water storage in the lower soil layer compared to VIC-3L.

Appendix B: Parameterization of CLM
This Appendix summarizes the main equations of CLM which are used to describe variably saturated water flow in the soil domain of our experimental catchment. The model uses a water balance formulation similar to Eq. (A1) of Appendix A to simulate moisture storage and movement in the soil of each grid cell of the application domain of interest. Yet, CLM includes a more exhaustive description of all the different processes that determine the water storage of the land surface. This includes canopy water, surface water, snow water, soil water, soil ice, and water stored in the unconfined aquifer. In addition to surface and subsurface runoff, CLM also considers runoff from glaciers, wetlands, and lakes.
Fluxes, F (ML −2 T −1 ), of ground evaporation, interception evaporation, and vegetation transpiration are calculated by CLM using the following general expression (Schwinger et al., 2010;Oleson et al., 2013): where ρ a (ML −3 ) is the density of air, r a (TL −1 ) signifies the aerodynamic resistance, q (MM −1 ) is the specific humidity of the soil pores (for soil evaporation) or canopy (for vegetation transpiration and interception evaporation) or the saturated specific humidity of snow or surface water, and q a (MM −1 ) denotes the specific humidity at atmospheric level if ground evaporation is calculated or the saturated specific humidity within the canopy if canopy evapotranspiration is calculated. The values of r a , q, and q a are based on Monin-Obukhov similarity theory (Schwinger et al., 2010;Oleson et al., 2013). We use 10 soil layers (see Table 2) in CLM to solve for the vertical storage and movement of water. Whenever the index i is used we mean "for all i ∈ {1, . . ., 10}". The saturated hydraulic conductivity, k s,i (LT −1 ), saturated volumetric moisture content, θ s,i (L 3 L −3 ), thermal conductivity, λ i (WL −1 K −1 ), soil matric head at saturation, ψ s,i (L), and Clapp-Hornberger exponent, B i (-), of each soil layer are derived from built-in pedotransfer functions. These functions use as inputs textural data (Clapp and Hornberger, 1978;Cosby et al., 1984) and/or the organic matter fraction (Lawrence and Slater, 2008) of each soil layer as follows: where f sd,i , f cl,i , and f om,i signify the fractions of sand, clay, and organic matter, respectively, f p,i (-) denotes the fraction of connected organic matter, and k s,om (mm s −1 ) is the saturated hydraulic conductivity of organic soils. If the organic matter fraction, f om,i , is smaller than 0.5, then f p,i = 0; otherwise, f p,i = 0.5f om,i (f om,i − 0.5) −0.139 .
Vertical flow in the unsaturated zone is governed by rainfall infiltration, surface and subsurface runoff, root water uptake (canopy transpiration), and groundwater interactions. A modified Richards equation is used to predict water storage and movement in the variably saturated soils of the Rollesbroich site: where θ i (L 3 L −3 ), ψ i (L), k i (LT −1 ), z i (L), and ψ e,i (L) denote the volumetric water content, matric head, hydraulic conductivity, depth, and equilibrium matric head of the ith soil layer, C i = ψ e,i + z i , and R i (T −1 ) is the loss of water via root water uptake (canopy transpiration). Note that Eq. (B6) omits conveniently the evaporation flux from the first (top) layer. The hydraulic conductivity, k i , of each layer depends on its moisture content, saturated hydraulic conductivity, and exponent B, and these values of the adjacent soil layer immediately below, with the exception of the bottom layer (Oleson et al., 2013;Han et al., 2014). The use of the constant C i in Eq. (B6) allows CLM to simulate matric head variations below the water table. This modification maintains a hydrostatic equilibrium soil moisture distribution, and fixes a critical deficiency of the θ -based formulation of Richards' equation (Zeng and Decker, 2009;Oleson et al., 2013). The matrix head, ψ i , and the equilibrium matrix head, ψ e,i , of each soil layer are computed as follows: and ψ e,i = ψ s,i θ e,i θ s,i with θ e,i = θ s,i ψ s,i + z ∇ − z i ψ s,i where z ∇ (L) is the depth of the water table.
The bottom boundary condition of Eq. (B6) depends on the depth of the water table. This depth, z ∇ , is calculated following Niu et al. (2007) and assumes the presence of an unconfined aquifer below the soil column. If the water table is within the modeled soil column (top 10 layers), then a constant water storage is assumed in the unconfined aquifer (the soil column is saturated with water below the water table) and a zero-flux bottom boundary condition is used. Recharge, q rec (LT −1 ), to the unconfined aquifer is calculated as follows: where k wt (LT −1 ), ψ wt (L), and z wt (L) signify the hydraulic conductivity, matric head, and depth of the layer that contains the groundwater table. Drainage, q drain (ML −2 T −1 ), from the aquifer is calculated via a simple TOPMODEL-based (SIM-TOP) scheme (Niu et al., 2005) using q drain = 10 sin (ε) exp(−2.5z ∇ ), where ε (Rad) signifies the mean topographic slope of the respective grid cell. The change in the water table depth is then given by where S y (-) denotes the specific yield which depends on the properties of the soil.