Effect of unrepresented model errors on estimated soil hydraulic material properties

Unrepresented model errors influence the estimation of effective soil hydraulic material properties. As the required model complexity for a consistent description of the measurement data is application dependent and unknown a priori, we implemented a structural error analysis based on the inversion of increasingly complex models. We show that the method can indicate unrepresented model errors and quantify their effects on the resulting material properties. To this end, a complicated 2-D subsurface architecture (ASSESS) was forced with a fluctuating groundwater table while time domain reflectometry (TDR) and hydraulic potential measurement devices monitored the hydraulic state. In this work, we analyze the quantitative effect of unrepresented (i) sensor position uncertainty, (ii) small scale-heterogeneity, and (iii) 2-D flow phenomena on estimated soil hydraulic material properties with a 1-D and a 2-D study. The results of these studies demonstrate three main points: (i) the fewer sensors are available per material, the larger is the effect of unrepresented model errors on the resulting material properties. (ii) The 1-D study yields biased parameters due to unrepresented lateral flow. (iii) Representing and estimating sensor positions as well as small-scale heterogeneity decreased the mean absolute error of the volumetric water content data by more than a factor of 2 to 0.004.


Introduction
Soil hydraulic material properties are essential to advance quantitative understanding of soil water dynamics.Despite decades of research, direct identification of these properties is time-consuming and near to impossible at larger scales.Therefore, indirect identification methods, such as inversion (Hopmans et al., 2002;Vrugt et al., 2008a), have been successfully applied to evaluate experiments starting from lab-scale (e.g., Parker et al., 1985;Van Dam et al., 1994;Šimůnek et al., 1998;Schneider et al., 2006) up to field-scale studies (e.g., Wollschläger et al., 2009;Huisman et al., 2010).Due to the multiscale heterogeneity of the soil hydraulic material properties (Nielsen et al., 1973;Gelhar, 1986;Cushman, 1990;Vogel and Roth, 2003), effective material properties have to be identified directly at the scale of interest.Yet, most studies focus on 1-D subsurface architectures with homogeneous layers, e.g., Abbaspour et al. (2000); Ritter et al. (2003); Mertens et al. (2006); Wöhling et al. (2008); Wollschläger et al. (2009).Only a few studies, e.g., Abbasi et al. (2004); Palla et al. (2009); Huisman et al. (2010), estimate material properties of effectively 2-D subsurface architectures.Abbasi et al. (2004) conducted an irrigation experiment to estimate soil hydraulic and solute transport properties for a 2-D furrow architecture.Palla et al. (2009) estimated effective soil hydraulic properties for a 2-D layered coarse-grained green roof based on hydrographs.Huisman et al. (2010) estimated soil hydraulic properties of a homogeneous dike exploiting flat wire time domain reflectometry (TDR) and electrical resistance tomography (ERT) data Published by Copernicus Publications on behalf of the European Geosciences Union.S. Jaumann and K. Roth: Effect of unrepresented model errors recorded during a fluctuating groundwater table experiment.With increasing computational power in recent years, 1-D subsurface architectures were analyzed with ensemble-based parameter estimation methods reaching from Markov chain Monte Carlo (MCMC; e.g., Vrugt et al., 2008b;Scharnagl et al., 2011;Wöhling and Vrugt, 2011) and data assimilation (e.g., Wu and Margulis, 2011;Li and Ren, 2011;Erdal et al., 2014) to data-driven modeling (e.g.,Over et al., 2015).
Most of these studies describe the given data with models chosen upfront with restricted complexity and a minimum number of parameters.If the models are too simple, critical uncertainties and processes may be neglected, leading to suboptimal results.If the models are too complex, the resulting material properties are likely to be application dependent.In general, the required model complexity is unknown a priori (Vereecken et al., 2015).Quantitative learning about complicated systems is an iterative process (Gupta et al., 2008;Box et al., 2015).It starts from the current understanding of the system that is represented with a model (Clark et al., 2011;Gupta et al., 2012).The optimal experimental design is then based on the model and the resulting data are, figuratively speaking, the answer of reality to the questions asked through the experiment.Disagreement between the model and the data reveals incorrect understanding of the system.Consequently, the concepts, decisions, and hypotheses integrated into the model (including evaluation procedures of the data) and the data themselves are revised.If the model predicts the data accurately and precisely enough, the research objectives are expanded, such that the data cover a larger part of the state space.Ultimately, this iterative procedure leads to data covering the whole state space and a statistical model-data mismatch corresponding to the data error model.In general, such data are not available and the application merely requires a limited accuracy and precision.Hence, the crux is to determine the sufficient complexity of both the model and the data for the required accuracy and precision.
This problem can be quantified with a Bayesian total error analysis (BATEA; Kavetski et al., 2002Kavetski et al., , 2006) ) investigating the total uncertainty space which includes uncertainty in the observed input and responses as well as uncertainty in the model hypothesis.However, this analysis is computationally intensive if the number of uncertainties is large and required models may not be available, e.g., for hysteresis.For instance, Bauser et al. (2016) categorized the uncertainties a priori and estimated the most important ones along with effective material properties using an ensemble Kalman filter (EnKF) aiming for a consistent representation of reality.The temporal fluctuation of the estimated hydraulic parameters was used to identify a situation in which the representation of the dynamics is inconsistent.Hence, measurement data acquired during this period of time were merely used for state estimation and excluded from parameter estimation to prevent the incorporation of uncertainties in the dynamics into the estimated parameters.
Table 1.The grain size distribution in percent by weight displays the different granularity of the materials A, B, and C of ASSESS (G.Schukraft, personal communication, Institute of Geography, Heidelberg University, 2010).Whereas the composition of the materials B and C is similar, material A features a higher percentage of fine sand.Since the mechanical wet analysis is time-consuming and laborious, only material B was sampled twice.Thus, 80 g out of approximately 400 Mg were sampled.Due to rounding, the sum of the values is not always 100.In this work, we change the perspective and associate the model with our quantitative understanding of reality that is tested against the given measurement data.To analyze the required model complexity, we prescribe temporally constant material properties, calculate the maximum likelihood of increasingly complex models and analyze the corresponding structural model-data mismatch.We show that this structural error analysis indicates limitations of these models and quantifies the effect of the respective unrepresented model errors on the inversely estimated material properties.Specifically, we analyze measurement data acquired at the test site (AS-SESS) while it as forced with a fluctuating groundwater table which ensures a high dynamical range of the hydraulic state.We set up a basic representation accounting for uncertainties of the hydraulic material properties and the forcing.Following an uncertainty analysis, we additionally estimate the sensor position and small-scale heterogeneity.These increasingly complex models are applied to (i) three 1-D profiles in ASSESS with an increasing number of sensors per material and (ii) the full 2-D profile to additionally analyze the implications of the restriction to a 1-D subsurface architecture and to few sensors per material.

ASSESS
The approximately 2 m × 20 m × 4 m large test site (AS-SESS; Fig. 1) is located near Heidelberg, Germany, and consists of three different kinds of sand (materials A, B, and C) which are arranged in an effective 2-D subsurface architecture (Fig. 2).The grain size distributions of these materials are presented in Table 1.A geotextile separates the sand from an approximately 0.1 m thick gravel layer below, which ensures a rapid water pressure distribution and connects a groundwater well with the rest of the test site.Below this gravel layer, a concrete layer confines the site.As the test site is built into a former fodder silo, a concrete L element serves as additional wall.In order to stabilize the material during the construction, it was compacted.In addition to the compaction interfaces shown in Fig. 2, ground-penetrating radar (GPR) measurements indicate even more compaction interfaces (Klenk et al., 2015, Figs. 1b and 6).
The test site is equipped with a weather station, a tensiometer (UMS T4-191), and 32 soil temperature and TDR sensors.Each TDR sensor has three cylindrical rods (length: 0.20 m, diameter: 0.004 m) which are separated by 0.03 m.They are operated by a Campbell Scientific TDR100.

Representation
For representing the soil water dynamics in ASSESS during the experiment, we follow Bauser et al. (2016) and define the representation of a system as a set consisting of dynamics (mathematical description), subscale physics (material properties), forcing (superscale physics), and states.The representation of the hydraulic system also comprises its implementation.In order to separate the more general theoretical considerations from the application-dependent details, these are not directly given in this section but are gathered in Appendix A1.

Dynamics
The Richards equation (Richards, 1931) is the standard model to describe soil water dynamics: with the time t (s), volumetric water content θ (−), matric head h m (m), unit vector in z direction e z indicating the direction of gravity, soil water characteristic θ (h m ), and hydraulic conductivity function K(θ ).The material properties θ(h m ) and K(θ ) are required to solve this partial differential equation.Generally, these material properties are non-linear and vary over many orders of magnitude.

Subscale physics
We choose the Brooks-Corey parameterization (Brooks and Corey, 1966) for the soil water characteristic θ (h m ), since it has been found to describe the materials in ASSESS well (Dagenbach et al., 2013).This parameterization has four parameters: a scaling parameter h 0 (m) related to the air entry pressure (h 0 < 0 m), the saturated water content θ s (−), the residual water content θ r (−), and a shape parameter λ (−) related to the pore size distribution (λ > 0).In general, θ (h m ) shows hysteretic behavior (Topp and Miller, 1966).
Neglecting hysteresis, the parameterization may be inverted for θ r ≤ θ ≤ θ s .This leads to Inserting the Brooks-Corey parameterization into the hydraulic conductivity model of Mualem (1976) yields the parameterization for the hydraulic conductivity function, where K s (m s −1 ) is the saturated hydraulic conductivity and τ (−) a heuristic tortuosity factor.Small-scale heterogeneities, i.e., the texture of the porous medium, can be represented with Miller scaling if the pore spaces at any two points are assumed geometrically similar (Miller and Miller, 1956).Scaling the macroscopic reference state h * m (θ ), K * (θ ) with a local ratio of characteristic lengths ξ (−), leads to locally scaled material functions (Roth, 1995):

Forcing
The hydraulic state was forced with a fluctuating groundwater table by pumping water in or out of a groundwater well.The experiment was arranged in three different phases: (i) initial drainage phase, (ii) multistep imbibition phase, and (iii) multistep drainage phase.The detailed forcing is presented in Table 2. Throughout the forcing, equilibration steps were included in between, such that the relaxation of the capillary fringe happened within the measurement volume of the TDR sensors where possible.We neglect evaporation in the following, since the experiment took place at the end of November and the weather was cloudy with 2-7 • C air temperature.The last precipitation was measured approximately 10 days before the experiment.

State
The hydraulic state was monitored in particular with hydraulic potential and water content measurements during the The gravel layer at the bottom of the site ensures a rapid water pressure distribution over the site.An L element (black polygon, at 0.4 m) and compaction interfaces (white lines) were introduced during the construction.Note the different scales on the horizontal and vertical axes.experiment.The hydraulic potential was assessed via the position of the fluctuating groundwater table.This position was measured (i) manually in the groundwater well and (ii) automatically with the tensiometer (Fig. 3).The gradient between the hydraulic potential in the groundwater well and the hydraulic potential in the test site drives the water flux.The largest part of this gradient equilibrates approximately within 5 min.Afterwards, the position of the groundwater table still changes due to the long-term equilibration of the hydraulic state.
The water content data are based on measured TDR traces which yield the relative permittivity of the soil ε b (Sect.A1.3).This permittivity is converted to water content θ using the Complex Refractive Index Model (CRIM; Birchak et al., 1974): with the geometry parameter α = 0.5.In order to apply the CRIM, the porosity φ, the relative permittivity of water ε w , the relative permittivity of air ε a , and the relative permittivity of the soil matrix ε s have to be known.The relative permittivity of air ε a was set to 1.0.Assuming that the sand matrix consists mainly of quartz (SiO 2 ) grains, the relative permittivity of the soil matrix ε s was set to 5.0 (Carmichael, 1989).Core samples of the materials A, B, and C yielded the porosities 0.41, 0.36, and 0.38, respectively.These values will be assumed for the saturated water content θ s of the respective materials in the remainder of this paper.Following Kaatze (1989), we parameterize the dependency of the relative permittivity of water ε w on the soil temperature T ( • C) with and use soil temperature measurements near each TDR sensor to determine the according ε w .
The evaluated water content data of those TDR sensors that were desaturated during the experiment are displayed in Fig. 4. The data show that the experiment exhibits complicated flow phenomena.The measured water content in- show a high variability up to and beyond the validity limits of the Richards equation due to the fluctuating groundwater table (Fig. 3).Hence, in order to avoid effects related to entrapped air and two-phase flow phenomena, we neglect all data with a volumetric air content smaller than 0.1 (all values above the dashed horizontal lines) based on measured porosities from core samples.The colored solid lines show the results of the Miller and position setup of the 2-D study (Sect.3.2).The data measured before 12:50 UTC are only used for the initial state estimation (Sect.A1.6).
creases quickly during the imbibition steps as the groundwater table reaches the TDR sensor because of the narrow transition zone of sandy materials during imbibition (Dagenbach et al., 2013;Klenk et al., 2015) and the small measurement volume of the TDR sensors (Robinson et al., 2003).During the equilibration phases, for example, after the last drainage phase (19:15 UTC), the measured water content in the unsaturated material either decreases (e.g., sensor 27) or increases (e.g., sensor 2), depending on the hydraulic state at this position with respect to static hydraulic equilibrium.This effect is used in the following evaluation (Sect.3.1.3).
We attribute the spread of the water content during saturation mainly to small-scale heterogeneity and quasi-saturation due to entrapped air (Christiansen, 1944).In order to avoid effects related to entrapped air and also two-phase flow, all TDR data with an air content below 0.1 (Faybishenko, 1995) are neglected subsequently.

Structural error analysis
As outlined in Sect. 1, the structural error analysis rests on a basic representation and a general assessment of its represen-Table 2. During the experiment, ASSESS was forced with a fluctuating groundwater table.Therefore, 17.8 m 3 of water were pumped in and 14.7 m 3 were pumped out of the groundwater well.For the calculation of the according flux and equivalent height of the water column h eq , the surface area of ASSESS was approximated with 80 m 2 .All times are given in UTC.This allows to set up a number of distinct representations with increasing complexity.Using inversion to estimate optimal parameters for each of these representations facilitates analyzing (i) the resulting residuals to improve the representations and (ii) the effect of unrepresented model errors on the resulting material properties.

Phase
Preparing the tools for the method, we start this section with the Levenberg-Marquardt algorithm (Sect.2.3.1) and discuss the assessment of the representation errors (Sect.2.3.2) as well as the analysis of the resulting residuals (Sect.2.3.3)afterwards.

Levenberg-Marquardt
We employ the Levenberg-Marquardt algorithm for parameter estimation.Our implementation is based on Moré (1978), Press (2007), and Transtrum and Sethna (2012) together with some further modifications.
Assuming (i) M data points m µ (1, . .., M) measured at position x µ featuring a white Gaussian measurement error with standard deviation σ µ and (ii) a model f with P parameters p π (1, . .., P ), the χ 2 cost function is defined as This cost function assumes statistically independent residuals r µ that are normally distributed with zero mean and standard deviations σ µ (perfect model assumption).These residuals can be expanded as with the Jacobi matrix J µπ = ∂r µ /∂p π .The Jacobi matrix is assembled numerically with the finite differences method.Following Press ( 2007), the Hessian is approximated (H ≈ J J), assuming that the second term in the derivative cancels out as f (x µ , p) → m µ with an increasing number of iterations.Hence, for the Gauss-Newton algorithm, it follows that Since J J does not always have full rank, the inversion may be ill-conditioned, leading to uncontrolled large steps.One possibility to cope with this issue is to regularize J J by adding a diagonal damping matrix D D.
We follow Transtrum and Sethna (2012) and choose this damping matrix, such that the diagonal entry for p π contains the corresponding maximal diagonal entry of J J from all previous iterations if this value is larger than a predefined minimal value (1.0) which is used otherwise.The resulting damping matrix is scaled with a parameter λ which tunes both the amount of regularization and the step size of the parameter update.
Finally, the parameter update δp is calculated via where the linear problem is solved with a singular value decomposition (SVD).If the condition number of the sensitivity matrix S = J J+λ•D D is larger than a threshold (10 12 ), the linear problem is solved approximately with the conjugate gradient algorithm by choosing the maximal number of iterations smaller than the number of parameters P .The proposed parameters at iteration i are given as The convergence path of the Levenberg-Marquardt algorithm is influenced by both the size of the scaling parameter λ initial and the choice how to adapt λ after each iteration.In this work, we choose λ initial = 5.0 and apply the delayed gratification strategy proposed by Transtrum and Sethna (2012).
According to this strategy, λ is decreased by a previously chosen factor (2.0) if the parameter update is successful and increased by a larger factor (3.0) if the update is not successful.
The described gradient-based algorithm heuristically balances performance and stability.Expanding the stability measures, we introduce a damping vector d with entries ∈ (0, 1] to decrease the correction of particular parameters via where denotes the element-wise Hadamard product.Generally, the entries of the damping vector are set to 1.In order to delay the improvement for parameters which represent additional model components, we choose the according entries to be less than 1.We use this approach in particular to estimate sensor positions and Miller scaling factors along with effective soil hydraulic properties (Sect.A1.4).First, these parameters are initialized to neutral values: the modeled sensor positions are initialized to the measured sensor positions and the Miller scaling factors to 1.0.Subsequently, the damping vector for the associated parameters is set to 0.1, reducing the applied correction of these parameters to 10 % of the proposed correction by the Levenberg-Marquardt algorithm.Hence, the main focus of the algorithm is to estimate consistent effective soil hydraulic properties, whereas the sensor positions and Miller scaling factors are adjusted more gradually.

Assessment of representation errors
By applying the χ 2 cost function (Eq.7), it is implicitly assumed that the model is perfect aside from white Gaussian noise.This corresponds to complete quantitative understanding of reality and a Gaussian error model for the measurement data.Structural model-data mismatch indicates that this assumption is invalid.In our case, a Bayesian analysis of the total uncertainty space is not feasible, primarily due to a lack of models, e.g., for hysteresis.Hence, we have to ne-glect such representation errors and trust that the structural model-data mismatch will reveal any inadequacy.Table 3 gives an overview over the treatment of the representation errors considered in this work.The contribution of representation errors, which could not be quantified or excluded from the measurement data a priori, is parameterized and explicitly estimated.The remaining structural model-data mismatch or deviation from the prior for the parameters hints at representation errors which should be corrected in the subsequent iteration of the analysis.
The structural error analysis and the assessment of uncertainties result from iterative evaluations.To illustrate the method, we present an iteration where the orientation of AS-SESS was not yet compensated by rotating the geometry and the gravitation vector (Sect.A1.2).Considering the structural error analysis, we parameterized and estimated uncertain components in the representation.Hence, not only the Mualem-Brooks-Corey parameters, an offset to the Dirichlet boundary condition (Sect.A1.5) and the saturated hydraulic conductivity of the gravel layer, but also the position of the TDR sensors were estimated (Sect.A1.4).The results presented in Fig. 5 show that the estimated TDR positions display a consistent deviation from the positions, which were measured relative to the site's walls, as they compensate for the orientation of ASSESS.Thus, the position of most TDR sensors on the right is estimated to be higher and the position of most TDR sensors on the left is estimated to be lower than that of the measured ones.By estimating the TDR sensor position, we also incorporated other representation errors into the resulting parameters, such as small-scale heterogeneities and eventually a non-represented evaporation front mostly affecting the estimated position of the upper TDR sensors (3, 11, 18, and 25).Hence, this analysis (i) demonstrates the difficulty to separate representation errors and (ii) is able to identify representation errors which have to be improved subsequently.

Residual analysis
A visual analysis of the standardized residual increases the intuitive understanding of the model-data mismatch (e.g., Legates and McCabe, 1999;Ritter and Muñoz-Carpena, 2013).We analyze the standardized residual in two ways: (i) the visualization over time highlights the temporal development of the structural model-data mismatch.(ii) The visualization over theoretical quantiles corresponding to a Gaussian distribution with the standard deviation of the measurement data facilitates the comparison of the standardized residual distribution to the expected Gaussian distribution of the measurement data.Hence, if the perfect model assumption is true, the probability plot will show a straight line with slope 1.Yet, probability plots often show a characteristic S shape (e.g., Fig. 7f): the slope of less than 1 for small residuals indicates that these residuals are smaller than expected for a Gaussian distribution with the standard deviation of the measurements.The slope of greater than 1 for large residuals shows that these residuals are larger than expected for the presumed Gaussian distribution.Since in this work the theoretical quantiles are based on a Gaussian distribution, the S shape generally indicates non-Gaussian distributions.
In addition to the visual analysis of the standardized residual, statistical measures help to quantify the model-data mismatch.As a single measure might be misleading (Legates and McCabe, 1999), we calculate the root mean square error (e RMS ) and the mean absolute error (e MA ).

Setup
The setup of the parameter estimation is explained in Fig. 6.For each of the three materials, we estimate the Mualem-Brooks-Corey parameters h 0 , λ, K s , τ , and θ r (Sect.2.2.2).The saturated water content θ s is assumed to be equal to an estimate for the porosity φ based on core samples (Sect.2.2.4).In order to avoid parameter bias due to representation errors, we (i) neglect measurement values with volumetric air content smaller 0.1 (Sect.2.2.4), (ii) estimate a constant offset to the Dirichlet boundary condition (Sect.A1.5) and the saturated hydraulic conductivity of the gravel layer, and (iii) developed a method to estimate the initial water content distribution based on TDR measurement data (Sect.A1.6), because a spin-up phase would increase the computation time by up to a factor of 17.The details concerning the implementation of the TDR sensors and the small-scale heterogeneity with Miller scaling factors at the position of the TDR sensors are explained in Sect.A1.4.
In order to analyze the effect of the uncertainty of the sensor position, small-scale heterogeneity, and lateral flow on the estimated material properties along the lines presented in Sect.2.3, we implemented a 1-D and a 2-D study with four different setups.
i.In the basic setup, we estimated the hydraulic material properties, an offset to the Dirichlet boundary condition, and the saturated hydraulic conductivity of the gravel layer.
ii.With the position setup, we estimated the sensor positions in addition to the parameters in the basic setup.
iii.For the Miller setup, we estimated one Miller scaling factor for each TDR sensor in addition to the parameters in the basic setup.
iv.Finally, in the Miller and position setup, we estimated both the sensor positions and one Miller scaling factor for each TDR sensor in addition to the parameters in the basic setup.

1-D study
In order to investigate the extent to which the experiment at ASSESS can be described with a 1-D model, we set up three different cases with an increasing number of TDR sensors per material (Table 4): case I includes the measurement data of sensor 1 in material C and sensor 2 in material A, and thus comprises one sensor per material.Case II includes two sensors per material: sensors 10 and 11 in material C and sensors 12 and 13 in material B. Case III includes three sensors per material: sensors 25, 26, 27 in material A and sensors 28, 29, 30 in material B. Note (i) that the cases are located at different positions in ASSESS (Fig. 2) and (ii) that since the hydraulic potential is not measured in the domain covered with these 1-D studies, the respective inversions are only based on the TDR water content measurements.
As described above, the analysis is organized in four different setups (basic, position, Miller, and Miller and position).The basic setup is adjusted for the 1-D studies, such that not only the material functions of the materials with sensors but also the saturated conductivity of the third material (material A in case II and material C in case III) are estimated for case II and case III.cordingly.Further details concerning the implementation of the 1-D study are given in Sect.A2.1.
For each of the different setups, we ran an ensemble of 20 inversions, starting from Latin hypercube sampled initial parameter sets in order to analyze the convergence behavior.The sampling algorithm was implemented with the help of the pyDOE package (https://github.com/tisimst/pyDOE).For each setup, we only analyze the ensemble member with minimal χ 2 in the subsequent discussion (Sect.3.1).

2-D study
In this study, we expand the investigated domain to two dimensions and analyze the performance of the improved representation.To this end, we set up four different setups (basic, position, Miller, and Miller and position) as described above.Since the positions of both the tensiometer and the groundwater well are in the modeled domain, we use the hydraulic potential measurement data as well as the TDR

Results and discussion
In order to improve the quantitative understanding of the hydraulic behavior of ASSESS (Sect.2.1), we evaluate a basic representation (Sect.2.2) with a structural error analysis (Sect.2.3) that is implemented as outlined in Sect.2.4.The discussion of the results is done separately for the 1-D study (Sect.3.1) and the 2-D study (Sect.3.2).

Objectivity of the measurement data
The standardized residual for each case is presented in Fig. 7, combining the resulting data of all applied TDR sensors.Investigating them for case I, it is striking that all setups describe the data qualitatively equally well.Since the estimation of the material properties is only based on one sensor per material in this case, the parameterization offers enough freedom to describe the data.Hence, it also accommodates unrepresented model errors, such as the sensor position and small-scale heterogeneities.Therefore, additional representation and estimation of TDR sensor positions or Miller scaling factors do not lead to further improvement.The largest residuals occur during highly transient phases.Compared to the data, the simulated imbibition phase is too slow for sensor 1 and too fast for sensor 2. Also the simulated drainage phase is too slow for sensor 1 and drainage behavior of sensor 2 is consistently wrong.This structural model-data mismatch hints at unrepresented model errors due to the restriction to a 1-D domain, which is further discussed in Sect.3.1.3.Still, the residuals of all setups are smaller than 5 standard deviations, which translates to a volumetric water content of 0.035.The large residuals are not random and preferably occur in transient phases.We attribute them to missing processes in the dynamics or to biased parameters.As the curves in the probability plot are basically centered at the origin, a significant constant bias in the residuum can be excluded.The according statistical measures are given in Table 5.
The e MA of the basic setup increases in case II, because there are two sensors per material and the effective material parameterization can not completely compensate for the small-scale heterogeneity at the position of both sensors simultaneously.Consequently, representing the smallscale heterogeneity improves the description of the data.As before, the largest residuals occur during highly transient phases, especially during the drainage phase.Except for two outliers, the residuals stay smaller than 5 standard deviations here as well.Considering three sensors per material in case III, the e MA increases even further in the basic setup.Consequently, representing small-scale heterogeneities and uncertainties in the sensor position in the Miller and position setup improves the e MA by more than a factor of 2.

Separation of uncertain model components
Comparing the resulting material properties of the evaluated ensemble members for the different cases and setups (Fig. 8), we notice that the resulting soil water characteristic functions are shifted within each material.During static phases, and if only few measurement sensors are available, the parameters for the estimated uncertain model components (Sect.2.4) can be correlated.However, during transient phases and if a larger number of measurement sensors are available, the distinct properties of these uncertain model components are more clearly pronounced (Fig. 9 and Sect.3.2.3).
We also ran the inversions without estimating the offset to the Dirichlet boundary condition (Sect.A1.5), which are not shown here.Besides destabilizing the convergence of the Levenberg-Marquardt algorithm, this fully transfers the uncertainty in the boundary condition to the sensor position.Hence, setups that estimate the sensor position clearly outperform the others.Additionally, this does not remove the shift of the soil water characteristics.

Lateral flow
The three cases cover the three materials at different locations in ASSESS and are based on distinct data with respect to both quantity and data range.This is most evident for material A which is located at the bottom of ASSESS and nearly saturated in case I, whereas it is at the top and rather dry in case III (colored dots in Fig. 2).To illustrate that this leads to a different sensitivity on the unrepresented model errors, we highlight one example which is most pronounced during the final equilibration phase.In case III, the water content at the position of TDR sensors 25, 26, and 27 is higher than that in static hydraulic equilibrium, leading to a drainage flux and a decrease in water content (Fig. 4).However, in case I, at the position of TDR sensor 2, the water content increases as the sensor monitors the relaxation of the capillary fringe.Due to the different hydraulic properties of the materials in ASSESS, this relaxation also includes unrepresented lateral flow.
In order to minimize the structural model-data mismatch during this equilibration phase, the parameter estimation algorithm increases the hydraulic conductivity to compensate for the non-represented lateral flow with additional vertical flow from above the sensor.Hence, the hydraulic conductivity of case I is larger than the hydraulic conductivity for both case III and the 2-D study, which is discussed in Sect.3.2.4.
The measurement data of material B used in the inversions of case II and case III do not emphasize the relaxation of the capillary fringe strongly.Hence, we expect that the effect of the unrepresented lateral flow is not as significant as for material A, leading to relatively congruent resulting material functions.This expectation is confirmed by the results, except for those setups of case II, in which no Miller scaling factor was estimated.These setups show a larger curvature of the soil water characteristic and of the hydraulic conductivity function which is explained in Sect.3.2.4 in more detail.Additionally, we can identify the previously discussed shift of the soil water characteristic (Sect.3.1.2).
Similarly as for material B, the inversions for material C are not strongly influenced by the relaxation of capillary fringe.The large uncertainty in the saturated hydraulic conductivity reflects the low sensitivity of the measurement data on this parameter due to the lack of measurements influenced by the saturated material C.

Quality of the initial state material functions
The curvature of the soil water characteristic for the inversion results is reasonably close the initial state material functions (Sect.A1.6), although the initial parameter sets for the 1-D inversions were obtained with Latin hypercube sampling.This allows to use the initial state material functions to initialize gradient-based inversion methods.The estimate of the initial state material function for material C deviates strongest from the inversion results compared to the other two materials, since in material C only few sensors are available to assess the form of the capillary fringe.Naturally, the better the available number of TDR sensors is spread over the water content range, the better the fit of the initial state parameters gets.Iteratively restarting the inversion using the previous inversion results as initial state material functions is likely to improve the representation.Since K s and τ are not estimated along with the initial water content distribution but prescribed a priori, the hydraulic conductivity functions associated with the initial state show large deviations from the inversion results.

Objectivity of the measurement data
For the 2-D study, the number of sensors is comparable to the number of hydraulic material parameters.Therefore, estimating sensor positions and Miller scaling factors increases the total number of parameters and thus the computational cost considerably (basic: 17,position: 41,Miller: 41,Miller and position: 65).The total number of analyzed TDR sensors increased to 25, corresponding to 5, 12, and 8 TDR sensors for the materials A, B, and C, respectively.In the 1-D study, the residuals increased considerably during transient phases reaching up to 5 standard deviations in the Miller and position setup (except for three outliers).Due to the larger number of considered TDR sensors in the 2-D study, the measurement data cover more architectural situations and thus more complicated flow phenomena.In particular, there are more transient phases observed than in the 1-D studies.Therefore, we expect that (i) the resulting parameters are more objective (not shown, however), (ii) the standardized residuals at least in the basic setup increase, and (iii) estimating sensor positions and Miller scaling factors improves the description of the TDR data significantly.The standardized residuals confirm the last two expectations (Fig. 10).However, similar to the 1-D study, even the residuals of the Miller and position setup still reach more than 5 standard deviations for the 2-D representation.
In order to understand this deviation in more detail, we investigate the remaining structural model-data mismatch during the final drainage and equilibration phases between 30 and 40 h.The largest residuals occurring during the drainage phase around 30 h come from TDR sensors 6, 9, 13, and 17.We identified that these sensors are located close to a compaction interface (Sect.A1.6).Hence, the large residuals indicate that this horizontal compaction layer is not correctly represented with a point-scale representation of the smallscale heterogeneity.
The largest residuals during the final equilibration phase between 30-40 h come from TDR sensors 2 and 22 close to the capillary fringe.We attribute them to unrepresented processes in the dynamics, such as hysteresis or 3-D flow (Sect.3.2.2).
Due to the persisting large residuals during transient phases, the probability plot (Fig. 10b) displays a characteristic S-shaped curve for the TDR data (Sect.2.3.3).The large residuals during transient phases are evidently different from the small residuals during static phases.This is corroborated by a linear fit based on the residuals within [−2, 2] theoretical quantiles.For both the Miller and the Miller and position setups, the fits yield a slope of less than 1, indicating that distribution of the small residuals is more narrow than a Gaussian with a standard deviation of 0.007.This standard deviation is a measure that includes both precision and accuracy.We calculated the precision of the evaluated measurement data with a cubic spline fit yielding a precision of Water content [-] Material properties (h 0 , λ) TDR sensor position Small-scale heterogeneity Groundwater table Figure 9.The estimation of uncertain model components can lead to correlated estimated parameters, e.g., as an incorrect position of the groundwater table (z 0 ) can be compensated by changing h 0 and λ during static phases.During transient phases, however, the components have distinct effects, e.g., as λ also changes the conductivity function.Hence, the ability of the parameter estimation algorithm to separate these uncertainties depends on the available measurement data.Also, the more sensors are available, the fewer uncertain model components can be compensated simultaneously by the parameterization.
0.001, 0.007, and 0.006 m for the water content, tensiometer, and manual groundwater position data, respectively.With complete quantitative understanding (Sect.2.3), the standard deviation of the residuals would correspond to this precision.Lacking ground truth, the accuracy of the measurement data is unknown a priori and may depend on the hydraulic state.
In this study, its estimated contribution dominates the size of the standard deviations.Our results show that the model can represent static phases better than highly transient phases and that the accuracy of the measurement data is higher than that estimated a priori.The statistical measures for the water content data given in Table 6 reveal that the e MA of the basic setup merely increases by less than a factor of 2 compared to case III of the 1-D study.Estimating sensor positions and Miller scaling factors improves the description of the TDR measurement data by more than a factor of 2 leading to a e MA of 0.004.

Hydraulic potential
The description of the hydraulic potential data only improves in those setups in which the sensor position is estimated (Fig. 10 and Table 6).Also, the temporal structure of the model-mismatch does not change significantly with the different setups.The data show a gradient of the hydraulic pressure between the tensiometer and the groundwater well during the forcing phases (Fig. 3).Considering symmetry, we also assume this gradient of the hydraulic potential in the neglected third dimension.Hence, the forcing via the groundwater well leads to a 3-D water flux during the experiment.This makes a correct representation of the groundwater table impossible in 2-D.Consequently, the simulation should Therefore, the simulated hydraulic pressure during the imbibition is larger than the measured one which leads to negative residuals.As expected, this behavior reverses during drainage phases.

Separation of uncertain model components
The 2-D study is based on a larger number of water content measurements, additional hydraulic potential measurements, and more complicated flow phenomena compared to the previously discussed 1-D study (Sect.3.1).This improves the ability of the Levenberg-Marquardt algorithm to separate uncertain model components (Sect.3.1.2)and decreases the shift in the soil water characteristics of the different setups compared to the 1-D study (Fig. 11).

Effect of unrepresented model errors
Each setup starts from the same initial material functions (Sect.A1.6).Therefore, the difference between the resulting material properties of the setups (Fig. 11) is a direct consequence of the representation of uncertainties in the sensor position and small-scale heterogeneities.
To investigate this, consider the initial state estimation for material B shown in Fig. A2.The measurement data of sensors 5, 12, and 29, which are approximately 0.6 m above groundwater table, deviate from the estimated function considerably.In order to cope with this deviation, the leastsquares fit for the initial state draws the estimated soil water characteristic to higher water contents.Due to the rigidity of the Brooks-Corey parameterization, this causes an overestimation of the water content at the positions of the sensors S. Jaumann and K. Roth: Effect of unrepresented model errors  basic, position, Miller, and Miller and position).Same as for the 1-D study, the standard deviation for the TDR measurement data is chosen as 0.007.We choose the standard deviation for both the manual measurements in the groundwater well and the tensiometer measurement data as 0.025 m.The representation of uncertainties with respect to the sensor positions and small-scale heterogeneities improves the description of the TDR data quantitatively.The decreasing slope of a linear fit (thin lines in the probability plots), which is based on the standardized residuals within [−2, 2] theoretical quantiles, also indicates this improvement.The structural model-data mismatch for the hydraulic potential data is mainly due to (i) uncertainties concerning the position of the tensiometer and (ii) unrepresented 3-D flow phenomena.
Table 7.We present the effective hydraulic material parameters obtained with the Miller and position setup of the 2-D study.The formal standard deviations of the parameter estimation are given with the understanding that these are specific to the applied algorithm and will change for different algorithm parameters.The estimations for the saturated hydraulic conductivity of the gravel layer and for the offset to the Dirichlet boundary condition are 10 −0.728 ± 0.006 m s −1 and −0.034 ± 0.001 m, respectively.18).If the uncertainty in sensor position and small-scale heterogeneities are represented in the model, the outlying measurement data can be described without altering the effective material properties.
It is worth noting that although the uncertainty of the measured grain size distribution (Table 1) is large, the resulting material properties confirm the measurements, in that material A is the finest and the properties of materials B and C are similar.Our final best estimates for the effective hydraulic material properties for the Miller and position setup are given in Table 7.  (basic, position, Miller, and Miller and position).The plot range is adjusted to the available water content range for each material.The height of the histogram bars denotes the number of available water content measurements and is normalized over all figures in this work.Since the inversions for all setups are initialized with the material functions resulting from the initial state estimation (Sect.A1.6), the difference between the results is directly linked to the estimation of sensor positions and small-scale heterogeneities.For direct comparison, the results of the 1-D study are also shown.

Summary and conclusions
We applied a structural error analysis on a representation of the effectively 2-D architecture ASSESS.This representation includes TDR and hydraulic potential measurement data which were acquired during a fluctuating groundwater table experiment.Based on the assumption that structural modeldata mismatch indicates incomplete quantitative understanding of reality, we implemented a 1-D and a 2-D study organized in different setups with increasingly complex models.We started with the estimation of effective hydraulic material properties and we added the estimation of sensor positions, small-scale heterogeneity, or both.It was demonstrated that the structural error analysis can indicate significant unrepresented model errors, such as the slope of the ASSESS test site.
We showed that estimated material properties resulting from a 1-D study are biased due to unrepresented lateral flow.Analyzing representations with increasing data quantity, it was also found that the fewer sensors are available per material, the stronger is the influence of the unrepresented model errors on the estimated material properties.We illustrated that the more complicated flow phenomena are represented, the better uncertain model components can be separated by the parameter estimation algorithm leading to more reliable material properties.Generally, representing sensor position uncertainty and small-scale heterogeneity improved the description of the water content data quantitatively in setups with many sensors.Yet, the residuals of the water content data still reach more than 5 standard deviations during transient phases (Fig. 10).We attribute this to remaining representation errors in the dynamics, forcing, and compaction interfaces.
In order to minimize the error in the initial state, we developed a method to estimate the initial water content distribution based on TDR measurements and an approximation of hydraulic head which additionally yields an approximation of the soil water characteristic.We found that this approximation is reasonably close to inversion results and that the according parameters can be used as initial parameters for gradient-based optimization.Since all the inversions of the 2-D study are initialized with these parameters, the comparison of the results directly displays the quantitative effect of the according unrepresented model errors on the estimated material properties.
Since the three approaches ((i) initial state estimation, (ii) 1-D inversion, and (iii) 2-D inversion) allow to estimate effective hydraulic material parameters, we finally discuss their levels of improving the quantitative understanding of soil water dynamics.
The initial state estimation requires at least three water content measurements per material over the full water content range and the position of the groundwater table to estimate the parameters for soil water characteristic for one specific equilibrated hydraulic state.Lacking direct measurements of the unsaturated hydraulic conductivity, the method cannot estimate the remaining parameters K s and τ required to model soil water dynamics.Additionally, it is highly susceptible to uncertainties related to the sensor position and small-scale heterogeneities.Yet, the method is fast (a few seconds on a local machine) and suitable for providing initial parameters for gradient-based inversion methods.
The 1-D inversions are comparably fast (several minutes up to several hours on a local machine) and can represent transient states.In contrast to the initial state estimation, 1-D inversions can estimate all parameters of the material functions.However, more complicated flow phenomena including lateral flow can not be represented.This leads to biased parameters.
The unique characteristics of the 2-D inversions (days on a cluster with same number of cores as parameters) is the ability to represent lateral flow phenomena which are typically monitored with a high number of sensors.Hence, the consistency of the representation is implicitly checked.Therefore, we expect that of the three approaches discussed, this one yields the most reliable material properties.Still, unrepresented model errors including 3-D flow phenomena influence the results.sured water content.For each material, we then fit the parameters h 0 , λ, and θ r of the Brooks-Corey parameterization to the approximated matric potential and the measured water content (Fig. A2).The saturated water content θ s is assumed to be known from core samples.This yields an approximation for the initial water content distribution between the TDR sensors.With the resulting parameter values for each material, the subsurface material distribution, and the position of the groundwater table, we can calculate an estimation of the initial water content distribution in ASSESS (Fig. A3).
As the parameters for the Brooks-Corey parameterization are derived from static measurement data, we may use them as initial parameter values for computationally expensive gradient-based inversions of dynamic measurement data (Sect.2.4.2).The missing initial parameter values τ = 0.5 and K s = 8.3 • 10 m s −1 are taken from Carsel and Parrish (1988).We refer to these parameter sets as initial state material functions in this work.
In particular due to (i) a limited number of TDR sensors, (ii) missing hydraulic potential measurements at the position of the TDR sensors, and (iii) spatial small-scale heterogeneity present in the materials, structural deviations between the estimation and the measurements occur, which indicate limitations of describing ASSESS with effective soil hydraulic material properties.
The water content measured by TDR sensors 5, 12, and 29 deviate structurally from the estimation of the initial state for material B (Fig. A2).Klenk et al. (2015, Figs. 1b and 6) presented GPR measurements which indicate that at least TDR sensors 6, 9, 13, 17, and 22 are closely below a compaction interface.However, the position of this interface was not measured during the construction process.Thus, these TDR sensors are monitoring a compacted pore structure.In contrast, TDR sensors 5, 12, and 29 are situated in rather undisturbed areas.Hence, as most of the TDR sensors are influenced by the compaction interfaces, the analysis of this measurement data is likely to underestimate the effective wa-ter content leading to a biased soil water characteristic for material B. This is a typical situation encountered with pointlike sensors in heterogeneous media.

A2.1 1-D study
The forward simulations were calculated with a grid resolution of 0.005 m and 10 −8 as a limit of the Newton solver (Sect.A1.1).Following Jaumann (2012), the standard deviation of the TDR measurements is assumed as 0.007.We use the manually measured groundwater table data as a Dirichlet boundary condition.Uncertainties concerning the position of the sensors and the subsurface material interfaces directly translate to uncertainties in the boundary condition (Sect.A1.5).Accounting for the orientation of AS-SESS (Sect.A1.2), we add a constant offset to the Dirichlet boundary condition for each case (case I: −0.02 m, case II: −0.05 m, case III: −0.12 m).In order to minimize the input error, we also estimate this offset in the inversion.If TDR sensor positions are estimated, these are initialized to the measured position.Similarly, the Miller scaling factors are initialized to 1.0.

A2.2 2-D study
The 2-D simulations in this work are calculated with a grid resolution of 0.2 m × 0.02 m.The limit of the Newton solver is set to 10 −8 (Sect.A1.1).Like for the 1-D studies, we choose 0.007 as the standard deviation of the TDR measurements.The standard deviation of the tensiometer (0.025 m) is assessed from the accuracy (±5 hPa) as specified by the manufacturer.In order to transfer the given uniform distribution with range ±5 hPa ≈ ±0.05 m to a Gaussian distribution, we associate this range with the 2σ interval of a Gaussian (5 to 95 %).This leads to an approximate standard deviation of (0.05 m • 2)/4 = 0.025 m.Lacking an independent estimate for the accuracy of the manual groundwater table position measurement, we employ the accuracy of material interfaces in ASSESS (Sect.A1.5).Same as for the tensiometer, this leads to a standard deviation of 0.025 m.Some TDR sensors are located close to or even below the groundwater table.Therefore, the position and the Miller scaling factor could not be estimated for TDR sensors.Hence, no position was estimated for sensors 7, 8, 14, 15, 16, 23, 24, 31, and 32 and no Miller scaling factor was estimated for sensors 8, 14, 15, 16, 17, 23, 24, 31, and 32.

Figure 1 .
Figure 1.View of ASSESS site with tensiometer access tube, weather station, and groundwater well along the left boundary.The jump in color reveals different sands that crop out at the surface (figure adapted from Jaumann, 2012).

Figure 3 .
Figure3.The position of the groundwater table was measured manually in the groundwater well and automatically with the tensiometer (Fig.2) during three different phases (initial drainage, multistep imbibition, and multistep drainage -separated by the vertical black lines in the figure) of the experiment.Note that the discrete measurement steps reflect the resolution of the tensiometer.

Figure 4 .
Figure 4.The measured water content data for the three different phases (initial drainage, multistep imbibition, and multistep drainageseparated by the solid vertical black lines in the figure)show a high variability up to and beyond the validity limits of the Richards equation due to the fluctuating groundwater table (Fig.3).Hence, in order to avoid effects related to entrapped air and two-phase flow phenomena, we neglect all data with a volumetric air content smaller than 0.1 (all values above the dashed horizontal lines) based on measured porosities from core samples.The colored solid lines show the results of the Miller and position setup of the 2-D study (Sect.3.2).The data measured before 12:50 UTC are only used for the initial state estimation (Sect.A1.6).
Some of those representation errors are parameterized and included in the parameter estimation process.

Figure 5 .
Figure5.The subsurface architecture of ASSESS (Fig.2) is shown with a comparison of measured and estimated TDR sensor positions based on a first evaluation of the hydraulic measurement data.The consistent deviation of the estimated TDR sensor positions reveals an unrepresented model error: the orientation of ASSESS (Sect.A1.2).

Figure 7 .
Figure7.For the 1-D study, the standardized residuals of the best ensemble member are visualized over time (a, c, e) and over the theoretical quantiles of a Gaussian with the estimated standard deviation of the TDR measurements (0.007; b, d, f).The cases are analyzed with four setups: basic, position, Miller, and Miller and position.The more sensors per material are used in the inversion, the worse the representation of the basic setup gets.In this case, representing uncertainties with respect to the sensor position and small-scale heterogeneities improves the representation substantially.The decreasing slope of a linear fit (thin lines in the probability plots), which is based on the standardized residuals within [−2, 2] theoretical quantiles, also indicates this improvement.

Figure 8 .
Figure8.The estimated material functions of the best ensemble member are shown for each of the three cases (case I, case II, and case III) and the four setups of the 1-D study.Additionally, we present the material functions resulting from the initial state estimation (Sect.A1.6).The plot range is adjusted to the available water content range for all inversion results.The number of water content measurements within intervals of 0.05 is indicated with histogram bars for each case.The height of these bars is normalized over all figures in this work.The main message of this figure is that unrepresented model errors may lead to biased hydraulic parameters.In particular, this can be seen by comparing the hydraulic conductivity K of material A for cases I and III.

Figure 10 .
Figure10.The standardized residuals of the 2-D study are visualized over time (a, c, e) and in a probability plot (b, d, f) for all TDR and hydraulic potential sensors.The color associates the results with the four setups of the study(basic, position, Miller, and Miller and position).Same as for the 1-D study, the standard deviation for the TDR measurement data is chosen as 0.007.We choose the standard deviation for both the manual measurements in the groundwater well and the tensiometer measurement data as 0.025 m.The representation of uncertainties with respect to the sensor positions and small-scale heterogeneities improves the description of the TDR data quantitatively.The decreasing slope of a linear fit (thin lines in the probability plots), which is based on the standardized residuals within [−2, 2] theoretical quantiles, also indicates this improvement.The structural model-data mismatch for the hydraulic potential data is mainly due to (i) uncertainties concerning the position of the tensiometer and (ii) unrepresented 3-D flow phenomena.

Figure 11 .
Figure11.We show the resulting material functions for all three materials involved in the 2-D study which is analyzed with the four setups(basic, position, Miller, and Miller and position).The plot range is adjusted to the available water content range for each material.The height of the histogram bars denotes the number of available water content measurements and is normalized over all figures in this work.Since the inversions for all setups are initialized with the material functions resulting from the initial state estimation (Sect.A1.6), the difference between the results is directly linked to the estimation of sensor positions and small-scale heterogeneities.For direct comparison, the results of the 1-D study are also shown.

Figure A1 .
Figure A1.The evaluation of a TDR trace is based on the detection of the inflection points caused by the probe head and the end of the rod.This is done automatically after calculating of the first temporal derivative of the trace.Parabolas are fitted to the maxima of the temporal derivative to increase the precision of the evaluated signal travel time.

Figure A2 .Figure A3 .
Figure A2.We use the Brooks-Corey parameterization to estimate the initial water content distribution between the TDR sensors.Assuming hydraulic equilibrium, we approximate the matric potential h m with the negative distance to the groundwater table position z 0 : h m ≈ −(z − z 0 ).For each material, we then use the approximated matric potential at the position of the TDR sensors and the corresponding water content measurement data to fit the Brooks-Corey parameters.Each dot depicts the mean of 15 subsequent data points measured in the 4 h preceding the experiment.The according standard deviations are all smaller than 0.002, which indicates (i) that the hydraulic system is relatively equilibrated at the beginning of the experiment and (ii) that the deviations from the estimation are statistically significant.

Table 3 .
This overview includes specification whether the considered model error is represented and explicitly estimated within the scope of this study.
The available hydraulic potential h wt is measured at the bottom of the groundwater well x λ and at the position of the tensiometer x τ .The data set, which is measured in the groundwater well, is split according to the measurement times: the data measured during the equilibration phases t enter the Levenberg-Marquardt algorithm (Sect.2.3.1)directly, whereas the data measured during the forcing phases t ϕ are only used as a boundary condition for the Richards equation (Sect.2.2.1).The bulk relative permittivity ε b (x µ , t ν ) and the bulk soil temperature T b (x µ , t ν ) are measured at the position of the TDR sensors x µ at times t The other setups remain ac-Hydrol.Earth Syst.Sci., 21, 4301-4322, 2017www.hydrol-earth-syst-sci.net/21/4301/2017/ ν .Additionally using the porosity φ(x µ ), the bulk permittivity is transferred to water content (Sect.2.2.4).The water content data enter the initial state estimation (Sect.A1.6) yielding an initial water content distribution and optional initial parameter values for the Levenberg-Marquardt algorithm.The water content data are also directly used in the Levenberg-Marquardt algorithm.Dashed grey arrows represent one-time preparation steps, whereas solid orange arrows represent the iterative steps of the Levenberg-Marquardt algorithm yielding the final material parameters p final .

Table 4 .
The 1-D study comprises three different cases which investigate the three materials with an increasing number of TDR sensors per material at different locations in ASSESS (Fig.2).Note that each material is covered twice.Thus, the position setup is adjusted such that both the positions of TDR sensors and the tensiometer are estimated.All inversions for the 2-D study are initialized with the initial state material functions (Sect.A1.6) in order to bring out the quantitative effect of the different representations on the resulting material properties.Further details concerning the implementation of the 2-D study are given in Sect.A2.2.

Table 5 .
In order to analyze the results of the 1-D study, the performance of the best ensemble members for each case and for each setup are benchmarked with statistical measures.With increasing numbers of included TDR sensors per material, the statistical measures for the basic setup indicate a worse description of the measurement data.However, estimating the position and the Miller scaling factor for each TDR sensor improves the description of the measurement data significantly according to the statistical measures.

Table 6 .
For each setup of the 2-D study, the results are benchmarked with statistical measures.Similar to the 1-D study, estimating the sensor position and the Miller scaling factors improves the statistical measures related to the water content significantly.The statistical measures for the position of the groundwater table including both the tensiometer and the groundwater well data improve only for setups in which the sensor positions are estimated.position of the groundwater table in the well during imbibition phases and a lower groundwater table during the drainage phases.This expectation is confirmed by the standardized residuals shown in Fig.10.Thus, the structural model-data mismatch of the tensiometer data indicates that employing the groundwater table as a Dirichlet boundary condition overestimates the forcing in the simulation.