Joint inference of groundwater-recharge and hydraulic-conductivity fields from head data using the Ensemble-Kalman filter

Introduction Conclusions References


Introduction
Regional groundwater flow depends on spatially variable properties of the subsurface, notably the hydraulic conductivity field, and boundary conditions such as groundwater recharge.In practical groundwater-modeling applications, parameters of both aquifer properties and boundary conditions are estimated from measurements of hydraulic heads at a limited number of observation locations (e.g., Hill and Tiedeman, 2007).While many theoretical studies on parameter estimation in aquifers have concentrated on the assessment of the spatially variable hydraulic-conductivity field, also groundwater recharge is known to be highly variable in both time and space (e.g., de Vries and Simmers, 2002).
Among the different techniques of estimating recharge reviewed by Scanlon et al. (2002), we consider here numerical approaches in which measured time series of hydraulic head are used to estimate groundwater recharge.The key question to be addressed in the present study is under which conditions it is possible to infer both the recharge field (a space-time function) and the spatial distribution of hydraulic conductivity from the same data set of hydraulic-head measurements.
In engineering practice, the model domain is typically subdivided into a small number of zones with given geometry, and uniform values of the material properties are assigned to each zone.Likewise, the land surface is subdivided into zones with uniform recharge values, reflecting land use, soil types, and local climate variability.As an alternative, parameter values may be estimated at a limited number of points and interpolated in between (e.g., Doherty, 2003).By construction, these approaches can only determine spatial structures of the parameter fields meeting the prescribed shapes.A particular difficulty of this approach is that the variability within the given zones may be bigger than between the zones, while the internal variability is completely neglected in the parameter estimation.
The estimation of hydraulic conductivity as a continuous field has been intensively investigated in the past (see the reviews of Sanchez-Vila et al., 2006;Vrugt et al., 2008;and recently Zhou et al., 2014).In these approaches discretization of the domain leads to a formal number of parameters to be estimated that is identical to the number of cells or grid points.Typical 2-D applications result in O (10 4 ) parameters, whereas 3-D numerical domains may easily be made of O (10 6 ) cells.As the number of measurement points is by orders of magnitude smaller, this inverse problem is inherently ill-posed without additional constraints.Some authors Published by Copernicus Publications on behalf of the European Geosciences Union.D. Erdal and O. A. Cirpka: Joint inference of recharge and conductivity therefore rely on flexible sets of shapes, such as polynomial trends or Voronoi polygons (e.g., Tsai et al., 2003a, b) rather than estimating O (10 4 -10 6 ) parameter values.In standard geophysical inversion, Tikhonov regularization is the common approach to estimate distributed parameter fields from a limited set of measurements.Here, the parameters are assumed to be continuous spatial functions, but large gradients, curvatures, or deviations from prior values are penalized (applications to subsurface hydrology are given by, amongst others, Doherty and Johnston, 2003;Tonkin and Doherty, 2005;Doherty and Skahill, 2006).In subsurface hydrology, however, the geostatistical framework is more common.Kitanidis (1997) and independently Maurer et al. (1998) showed that the two approaches are mathematically equivalent to each other.
In geostatistical inversion, the parameter field to be estimated is assumed to be an autocorrelated random space function.This prior knowledge is used in Bayesian inference, where the statistical distribution of the parameters is conditioned on the measurements of dependent quantities, such as hydraulic heads.A variety of schemes target a single smooth spatial distribution approximating the conditional mean of the parameter field using Gauss-Newton-or conjugate-gradient-type estimation schemes (e.g., Yeh and Yoon, 1981;Kitanidis and Lane, 1985;Zou et al., 1993;Li and Elsworth, 1995;Kitanidis, 1995;Yeh et al., 1996;Aschenbrenner and Ostin, 1995;McLaughlin and Townley, 1996;Spedicato and Huang, 1997;Loke and Dahlin, 2002;Dai and Samper, 2004;Dai et al., 2010).These methods can be extended to the generation of multiple conditional realizations by the method of smallest modification (e.g., RamaRao et al., 1995;Gómez-Hernández et al., 1997).However, the computational costs to obtain a single conditional realization is identical to those of the smooth best estimate.Also, the Gauss-Newton method requires the evaluation of the sensitivity of each measurement with respect to all parameter values, involving the solution to as many adjoint problems as there are measurements, which may become unbearable in the case of many measurements, such as those obtained from transient processes.In the context of the present study it may be noteworthy that many geostatistical approaches have focused on the exclusive estimation of hydraulic conductivity; some include storativity (e.g., Gómez-Hernández et al., 1997;Kuhlman et al., 2008;Li et al., 2007), but most assume that the boundary conditions are deterministic.An exception is the study of Hendricks Franssen et al. (2004), who used the geostatistical approach of sequential self-calibration to jointly estimate the fields of hydraulic conductivity and groundwater recharge from head measurements.The authors considered the problem of a well-capture zone, in which they estimated hydraulic conductivity as a continuously varying spatial field, whereas recharge was parameterized by zones with uniform values.
In groundwater hydrology, sequential data assimilation and Kalman filter methods have long been used (e.g., Fer-raresi et al., 1996;Eppstein and Dougherty, 1996;Hantush and Mariño, 1997).Particularly, and increasingly, popular is the ensemble Kalman filter (EnKF) (Evensen, 1994) or versions thereof.Although the EnKF was primarily constructed to update model-state variables, in subsurface hydrology it is commonly used to estimate hydraulic conductivity.For this purpose Hendricks Franssen and Kinzelbach (2008), Drécourt et al. (2006), Tong et al. (2010Tong et al. ( , 2011)), Xu et al. (2013a, b), and Panzeri et al. (2015), among others, showed that the use of head observations in an EnKF framework can help improve the conductivity estimates, while Crestani et al. (2013) and Tong et al. (2013), among others, considered tracer tests for the same purpose.Most parameter estimations used 2-D models, as these are conceptually simpler, faster, and easier to constrain and display.However, EnKF has also successfully been applied to infer 3-D hydraulic-conductivity fields (e.g., Camporese et al., 2011;Schöniger et al., 2012).
An important step in setting up an EnKF to estimate parameters is the choice of initial ensemble.This choice is the most straightforward way of allowing prior informationsuch as ideas about correlation lengths, mean values, or spatial pattern -to influence the filter process.From a technical point of view, the issue of initial sampling is how to represent the prior knowledge in an ensemble that is as small as possible, by, for example, adding ensemble subspace restriction and requirements on the sampling (e.g., Evensen, 2004;Oliver and Chen, 2008).From a practical point of view, especially in subsurface modeling, the issue is that our prior knowledge of the parameters, their mean values, deterministic trends, and spatial correlation structure is often limited.This may be seen as a more severe problem than choosing a sufficiently large ensemble size to actually capture the assumed variability by the ensemble.To overcome the limited knowledge about true parameters values, the use of synthetic test cases for methods testing and evaluation is very common in subsurface hydrology (e.g., Schlüter et al., 2012;Schelle et al., 2013).Here, the prior knowledge is only limited to what the modeler considers a reasonable assumption, and it is not uncommon in the groundwater-EnKF context that the synthetic true parameter field is a single realization generated the same way as the initial ensemble (e.g., Huang et al., 2008;Tong et al., 2011Tong et al., , 2013;;Vogt et al., 2012;Panzeri et al., 2014;Zhou et al., 2014).Hence, perfect knowledge about the statistics of the estimated parameters is implicitly assumed, which is a highly unrealistic assumption.The impact of the prior assumptions in groundwater modeling was considered, for example, by Li et al. (2012), who concluded that it was possible to estimate reasonable log-conductivity fields using the EnKF despite wrong priors, although the result was worse than when using correct information, and by Camporese et al. (2011Camporese et al. ( , 2015)), who showed that it is possible to use the EnKF to correct a biased prior mean and partly correct a wrong prior variance, but not erroneous prior correlation lengths.
In this work we study the impact of the prior knowledge when jointly estimating conductivity and recharge from hydraulic-head data only.We use an EnKF setup in which the initial ensemble is drawn using different assumptions of the spatial pattern of the parameters.Section 2 discusses why the conductivity and the recharge are so difficult to estimate jointly if only pressure-head data are available.Section 3 explains the ensemble Kalman filter and the synthetic example used throughout this paper, while results and discussions are found in Sect. 4. We end with conclusions in Sect. 5.

Theory
In regional-scale groundwater-flow problems, we typically rely on the validity of the Dupuit assumption, stating that variations in hydraulic head and groundwater velocity are restricted to the horizontal directions.Under this condition, the depth-averaged, two-dimensional groundwater-flow equation for a phreatic aquifer reads as subject to initial and lateral boundary conditions.S y (x) [-] is the specific-yield field, which is the drainage-effective porosity of the formation, K(x) [L T −1 ] denotes the depthaveraged hydraulic-conductivity field, R(x, t) [L T −1 ] is the field of groundwater recharge, z 0 (x) [L] denotes the geodetic height of the aquifer bottom, h(x, t) [L] is the hydraulic-head field to be simulated, t [T] is time, and x [L] is the vector of horizontal spatial coordinates.The term K(h − z 0 ) may be interpreted as a transmissivity field T (x, t) [L 2 T −1 ], varying in space and time.We now consider a confined surrogate aquifer with an assumed transmissivity field T ass (x) [L 2 T −1 ] that differs from the true one (e.g., an incorrectly estimated transmissivity field).The logarithm of the scaling factor between the two transmissivities is denoted f (x, t) [-] Substituting Eq. (2) into Eq.(1) yields Applying the chain rule of differentiation to the divergence in Eq. (3), applying the product rule of differentiation to ∇ exp(f ), and dividing by exp(f ) results in subject to the same initial and lateral boundary conditions as above.In Eq. ( 5), S app (x, t) [-] and R app (x, t) [L T −1 ] are apparent specific-yield and groundwater-recharge fields.Equation (5) results in exactly the same hydraulic-head distribution as the original groundwater-flow Eq. ( 1), even though the transmissivity field is different.Note that exp(−f ) is positive, so the apparent specific yield S app remains positive, whereas no sign restrictions apply to ∇f • ∇h, resulting in both positive and negative R app values.In the case of a phreatic aquifer, the true transmissivity varies with hydraulic head, so the apparent parameters change with time.If the water-filled thickness of the true aquifer does not change with time, which is the case for confined aquifers, the apparent fields are time-invariant.The derivation given above exemplifies that the same hydraulic-head field can be obtained with different hydraulicconductivity fields by modifying recharge and, in the case of transient flow, the specific yield.It is noteworthy that the apparent recharge depends on the gradient of the original transmissivity field.Hence, a large -positive or negative -apparent recharge is expected at locations where the transmissivity changes drastically.Though we have shown that modifications of recharge and specific yield can always replace the conductivity, the opposite case is not guaranteed, because the conductivity has clear physical limitations: notably it cannot be negative.
The fact that conductivity variation can be exchanged by recharge and specific-yield variations renders the joint estimation of hydraulic conductivity, recharge (and specific yield) an inherently ill-posed problem even when the hydraulic-head field is known at every point in the domain (and every time point).
We may illustrate the problem by the example of an unconfined aquifer at steady state, shown in Fig. 1.The original simulation (left column in Fig. 1) exhibits a square-shaped inclusion of low permeability in an otherwise uniform highpermeability field (first row; 2 orders of magnitude difference in K) and a constant low recharge rate (second row).As boundaries, we employ a significant head drop from west (50 m) to east (8 m) and no flow boundaries on the north and south sides.The resulting head field is shown in the third row of Fig. 1, and the corresponding field of Darcy velocities in the fourth row of Fig. 1.
If the inclusion is removed, and the recharge remains the same, the system shows a perfectly homogeneous behavior (middle column of Fig. 1).The third column in Fig. 1, on the other hand, shows exactly the same hydraulic-head field as the original simulation, but the permeability field is uniform, whereas the recharge field shows strong fluctuation.From Fig. 1 we can note that, in accordance with Eq. ( 4), the strong positive and negative recharge rates are introduced at the interface of the removed inclusion.Also, while the head fields of the original and surrogate models are identical, the velocity fields are quite different, because the conductivities are different.The latter implies that transport would be strongly different between the two cases.It becomes also clear that, without additional constraints, a unique joint estimation of both recharge and conductivity fields is strictly impossible.
In classical model calibration, the ambiguity between transmissivity and groundwater recharge may cause prob- lems of ill posedness, but assuming known zones with blockwise uniform parameter values restricts the solution of the inverse problem.As example, the strong positive and negative recharge values of the surrogate model in Fig. 1 would most likely not be obtained in standard model calibration because the recharge zones would hardly be chosen as embedded rectangular frames.In shape-free inversion, using either Tikhonov regularization or geostatistical methods, by contrast, the solution space is much less restricted, and chances that unresolved transmissivity variations are traded for recharge fluctuations are in principle fairly high.The question thus arises under which conditions the estimated fields are reasonable despite the ambiguity of aquifer properties and boundary conditions.

Ensemble Kalman filter
In the following we briefly repeat the basic assumptions of deriving the EnKF within a Bayesian framework.While it is possible to have a much more pragmatic view on EnKF as an extended least-square estimator, we believe that the transparency of the Bayesian framework with respect to the underlying assumptions is beneficial.In particular, the Bayesian framework explains the choice of the initial ensemble as prior knowledge and the conceptual importance of the prior knowledge in the estimation procedure, while a frequentist's point of view is in contrast to making use of prior knowledge altogether.For further transparency, we first explain the extended Kalman filter (see similar derivations by Evensen, 2009).
We denote the vector of all parameters (recharge values and log-hydraulic conductivities of all cells) .Prior to considering measurements, they are assumed to be random functions following a multi-Gaussian distribution, which is fully characterized by the prior mean µ and covariance matrix P .Throughout this paper, the prior values are denoted by a single prime, and the posterior by a double prime.If we assume that the covariance function P (v) is stationary with the distance vector v and known structural parameters (variance, correlation lengths, rotation angles), the element (i, j ) of the covariance matrix P is P (|x i − x j |).The full matrix is constructed by all grid points.
The vector of simulated hydraulic heads h t at time level t depends on the heads h t−1 at the previous time level and on the parameters .Because the old heads h t−1 depend on , they are random variables, too.In the combination of data assimilation and parameter estimation applied here, the vector of all simulates states (the heads h t in all cells) and the vector of all parameters are concatenated to a single vector x t of states and parameters, assumed to be random multi-Gaussian functions with unconditional mean µ x and covariance matrix P xx , in which the prior statistics of h t are obtained by linearized uncertainty propagation of the statistics of h t−1 and .
For convenience, we denote running the model and simulating the observations (which is here just picking the heads at the observation locations) as f t (h t−1 , x t ).It should be noted that f here, hence, denotes both the forward model and the observation operator.This model outcome is contrasted to the measurements of heads at time level t, here denoted y t .The true (unknown) heads at the measurement locations are considered to be a vector of random variables with a multi-Gaussian distribution, characterized by the measurement vector y t as the mean and the covariance matrix R, reflecting measurement error.
Since we assume multi-Gaussian distributions, finding the best conditional estimate µ x of the entire head field at the new time level and the parameters by application of Bayes' theorem results in minimizing the following objective function W (x t ): which is done by setting the derivative of W (x) to zero (e.g., Evensen, 2009).In the linearized version, f t (h t−1 , x t ) is linearized about the prior mean µ x t , and the linearized conditional covariance matrix P x t x t of x t is obtained by inverting the Hessian of W (x t ), using the same linearization.Kalman filtering is based on these approximations.Here, the data are successively accounted for, considering one time level after the other.Then, the posterior mean µ x t and covariance matrix P x t x t of time level t are propagated to the next time level t + 1 to obtain the corresponding prior mean and covariance matrix.
By applying rules of matrix identities, it can be shown that linearization about the prior mean µ x t leads to the following expression for the conditional mean and covariance matrix: in which P y t x t = J P x t x t is the cross-covariance matrix between y t and x t , P x t y t = P T y t x t , and P y t y t = J P x t x t J T is the propagated covariance matrix of y t , expressing the uncertainty of y t caused by the uncertainty of x t .J denotes the sensitivity matrix of f t with respect to x t , derived about the prior mean.
The scheme described so far is known as extended Kalman filter.It relies on linearization about the prior mean and has the disadvantages that the full sensitivity matrix J must be evaluated, which can be computationally very costly.Also, even slight nonlinearities in f t (h t−1 , x t ) imply that the propagated covariance matrices are not correct.
A popular alternative to the original Kalman filter is the EnKF (Evensen, 1994), in which the linearization is performed about an entire ensemble of state and parameter values, and no sensitivity matrices are computed.The prior statistics are given by in which n is the number of ensemble members, the superscript (i) denotes the ith member, and a ⊗ b is the tensor product of vectors a and b.Upon initialization, the original ensemble members x (i) 0 are drawn from the unconditioned multi-Gaussian distribution of x, whereas the updating of the individual ensemble members follows the procedure outlined above: in which ε (i) is a vector of random observation noise drawn from a multi-Gaussian distribution with zero mean and covariance matrix R. The factor b is the so-called damping parameter (e.g., Hendricks Franssen and Kinzelbach, 2008), which serves to slow down the update of states and parameters.It is an ad hoc tuning parameter that is primarily required for small ensemble sizes; few guidelines exist on how to select it.In this work, the damping is set to 0.6 for the updates of the head values and 0.05 for the parameter update, though, since the ensemble size is large and there are many temporal observations (see below), the choice is not crucial in any sense.For a more in-depth description of the filter algorithm, the interested reader can consult Evensen (2003) or Burgers et al. (1998) for general filter details or Erdal et al. (2014) and Erdal (2014) for in-depth details on the actual implementation used in this study.
It should be noted that the ensemble Kalman filter still relies on the same assumptions as the original Kalman filter.Notably, the combined vector of states, parameters, and observations is assumed to be a multi-Gaussian random variable, which means that x t is multi-Gaussian, the model f t depends linearly on x t , and the measurement error is multi-Gaussian, too.These conditions are not strictly met, so the EnKF solution is only a linearized estimate.However, in contrast to the extended Kalman filter, in EnKF the linearization is performed by considering an entire ensemble rather than by taking derivatives at a single point (e.g., Nowak, 2009).The large ensemble sizes used in this work as well as the repeated application over many time steps alleviates the effects of nonlinearity to some extent, by allowing a generous use of the dampening factor.Hence the filter is slowed down and the possible erroneous updates resulting from the linearization have a less strong effect on the update.Further, the model considered is only weakly nonlinear, so in total the effects of the linearlizations are likely small compared to other sources of errors (e.g., prior uncertainties, as discussed later).For a detailed discussion of the linearization operated by the ensemble Kalman filter applied to groundwater models, see Crestani et al. (2013).
An important constraint is that the scheme, like any other Bayesian method, depends on the choice of the unconditional mean and covariance structure of the parameters .It is important to keep in mind that our application (estimating spatial patterns of both hydraulic conductivity and recharge from hydraulic-head data) is based on, at least partially, ambiguous data, as outlined in Sect. 2. Bayesian parameter-estimation schemes are well posed even in the presence of non-informative or ambiguous data due to the prior information: in the case of non-informative data, the likelihood of the data shows no dependence on the parameters, and the posterior falls back to the prior.Thus, while the updating procedure leads to modifications of the parameters, the original prior knowledge carries over.Spatial patterns that are in contradiction to the prior knowledge cannot be recovered by the scheme.This would of course be different if the observations were in strong contradiction to the prior.If so, we could see a departure from the prior, both in terms of absolute values and in terms of structure.This point will be discussed in more detail in Sect. 5.In our application, however, contains parameters describing both aquifer properties and boundary conditions, and, as we have shown above, the effects of these two types of parameters on the measured heads can be similar.Hence, the data can be non-unique with respect to the parameters, and the prior knowledge may determine which patterns of conductivity and which patterns of recharge can be jointly inferred by the scheme.If the prior knowledge is erroneous, the estimated fields may also be erroneous.

Setup of a synthetic experiment
For testing the possibilities and limitations in jointly estimating conductivity and recharge, we have set up a synthetic 2-D example of transient flow in an unconfined aquifer.The model setup is shown in Fig. 2 and consists of spatially variable recharge with a temporal seasonal trend, spatially variable conductivity, a temporally variable southern boundary corresponding to a river, and five pumping wells.The actual recharge is calculated by multiplying the trend parameter with the shown recharge field.More technical details about the setup are found in Table 1.Observations of groundwater heads are taken daily at 45 observation wells spread throughout the domain during a 1-year simulation and assuming an observation error of 1 cm.The recharge and logconductivity fields are both sampled as random fields with anisotropic, exponential covariance functions and strong rotation of the principal directions of anisotropy (Table 2).It should be noted that in the current example the reference conductivity and the reference recharge fields are generated as fields that are uncorrelated to each other.This could, for example, represent a scenario in which the recharge is primarily controlled by variable land use and vegetation while the conductivity is a material property that varies spatially but is constant over time.
For the estimation of the recharge and conductivity fields, we apply the ensemble Kalman filter using an ensemble of 2000 members.As this work aims at exploring which prior knowledge is required for the estimation process, three different cases of prior knowledge are considered.In the first, the initial ensemble members are drawn from the same (hence correct) distribution as the reference (true) field.The second case is identical to the first apart from the rotation 500 * µ is the mean; σ is the variance; α is the rotation angle; and l x and l y are the correlation lengths in x and y direction, respectively.angle of the anisotropy being randomly chosen for each ensemble member.In the third case, the rotation angle is fixed but wrong.Here, the recharge is sampled using the rotation angle and correlation lengths of the true conductivity field and vise versa, creating a rather problematic initial ensemble.A plot of the three correlation structures can be found in the bottom of Fig. 3 in Sect. 4 where the three initial ensembles are called the "good", "random", and "wrong" ones.Please note that the correlation plot for the random initial is only meant as an illustration of the fact that each ensemble has a unique rotation angle and does not show the actual angles considered.
The goodness of the resulting fields are judged in two ways.First, the ensemble mean of the fields are visually compared to the reference fields and subjectively judged to be similar or not.Second, the normalized root mean square error of the simulated heads in the 45 observation wells is computed by: where n t is the number of temporal observations between t 1 and t 2 , n nodes the number of nodes considered, h(i, t) is the ensemble mean head observation at position i and time t, h true is the corresponding true value, and σ h is a standard deviation used to normalize the error.As this is a virtual experiment, we can calculate the NRMSE both using the observation locations as nodes and using all nodes in the model.
The former corresponds to what can be done with real experimental data and is denoted OLE (observation location error), while the latter only works for a virtual experiment and is denoted TE (total error).For OLE, the normalizing standard deviation σ h (i, t) is the measurement uncertainty of hydraulic-head observations, hence here a fixed value decided prior to the EnKF simulations, while for TE this corresponds to the conditional ensemble standard deviation, variable in both space and time.Due to the choice of the normalizing standard deviation, the total error does not give a direct indication of the goodness of the ensemble mean solution but rather indicates how well the predicted heads fit the ensemble standard deviation.Therefore, a second version of the total error is also considered, where the normalizing standard deviation is replaced by normalizing the error with the mean of the two head values (i.e., σ h (i, t) = 0.5 • (h true (i, t) + h(i, t))).The two versions of the total error are abbreviated as TE-1 and TE-2.The use of NRMSE gives a quantitative metric of judging the actual performance of the estimated model.We assimilate head observations from day 50 to day 300, while the remaining 65 days of the 1-year data are used to test the model's predictive capabilities.This results in an assimila-  15) for three setups of prior knowledge (good, random, wrong) to estimate recharge alone (R), to estimate conductivity alone (K), and to jointly estimate conductivity and recharge (R and K).OLE is observation location error, and TE is total error.tion error for judging how well the assimilation went and a prediction error for judging the model's predictive powers.It should be noted that, to properly asses the predictive power of the model in a scenario different to the one used for the assimilation, one of the four wells shown in Fig. 2 only starts pumping at day 301.
We have combined the three different prior distributions with three different estimation problems, namely the estimation of (a) recharge alone, (b) hydraulic conductivity alone, and (c) recharge and hydraulic conductivity together, leading to a total of nine different scenarios.In the stand-alone scenarios, all other parameters and settings are assumed known and are set to their true values.As can be seen from Fig. 2, the recharge shows not only a strong spatial pattern but also a temporal trend.In the estimations shown below, this temporal trend is assumed known.

Stand-alone estimation of recharge or conductivity
The simplest of the estimation problems presented in this study is the stand-alone estimation of recharge, since the hydraulic heads depend linearly on recharge.This is reflected in Fig. 3, showing the ensemble mean of the estimated recharge fields.As expected, the best results are achieved with the best initial estimate (second column).However, the estimates using the covariance functions with the random and wrong orientations of anisotropy are also acceptable, with patterns similar to those of the true recharge field.Table 3 quantitatively confirms these qualitative findings by low values of the normalized root mean square error of predicted heads.From the last column in Fig. 3 we see that, although the filter manages to produce a reasonable ensemble mean of the recharge field, the similarity with the covariance function used to create the initial ensemble is still very prominent.This is especially so if one starts considering individual ensemble members (not shown), and it demonstrates how sensitive the EnKF method is to the initial guess, even in this linear problem.
It is important to keep in mind that the ensemble size is large, so the plots of the ensemble means shown in Fig. 3 are smoothed.It is not expected that the smooth ensemble estimate exhibits the same extreme values as those seen in the true parameter distribution, whereas individual ensemble members should show the same variability as the (unknown) reference field.
In comparison to estimating the recharge fields, the estimation of conductivity fields alone is more complicated.Here, the nonlinearities of Eq. ( 1) affect the estimation.More importantly, the orientation of the anisotropy of heterogeneity plays a vital role in the behavior of groundwater flow.This is also seen in the final estimates of the conductivity fields, shown in Fig. 4, where the only reasonable result is achieved if the right pattern is assumed in the prior knowledge (second column) or if the prior pattern is random (third column).The reasonable performance of the prior distribution with diffuse knowledge about the anisotropy orientation may be explained by the large initial ensemble containing some members with reasonable patterns and decent behavior.In the case that the orientation of anisotropy is assumed erroneously in the prior knowledge (fourth column), the filter completely fails to produce any result similar to the truth.This finding does not depend on the ensemble size.The prediction errors listed in Table 3 clearly confirm the visual impression.The result shows similarities with the results of Bailey and Baù (2012) and Camporese et al. (2011), who both managed to correct a wrong prior mean and variance of conductivity fields (here corresponding to the good and the random priors), but not the correlation lengths (here corresponding to having a wrong prior).The prediction errors, listed in Table 3, emphasize that estimating recharge leads to smaller errors in predicting heads than the estimation of the hydraulic-conductivity field.This could indicate that improvements of the estimated conductivities are more important for lowering the prediction error, which would follow the findings of Hendricks Franssen et al. (2004).As pointed out above, the higher errors when estimating conductivities are likely related to the head value in a cell depending not only on the conductivity of that cell but also on the macroscopic anisotropy of hydraulic conductivity in the entire aquifer.

Joint estimation of recharge and conductivity
As derived in Sect.2, joint estimation of recharge and conductivity fields is impossible without prior knowledge about either of the two quantities.In Bayesian inversion methods, however, prior knowledge is assumed anyway.In the EnKF method, the prior information is conveyed by the initial ensemble drawn from the prior distribution.By this, the jointly estimated recharge and conductivity fields are unique and reproducible in a statistical sense.The remaining question is whether these estimates also resemble the true fields and whether they are good for prediction purposes.
Figure 5 shows the results of the joint estimation using the three different initial ensembles and Fig. 6 shows the corresponding spatial distributions of the estimation variance.If the initial ensemble is good -that is, the reference fields are drawn from the same statistical distribution as the initial ensemble -it is possible to estimate both conductivity and recharge with reasonable precision, given the number and accuracy of observations (second column).When the initial ensemble is poor, however, the result is rather poor for the recharge and more blurry for the conductivity (third column), or we infer fields that look good but are wrong (last column).
As shown theoretically in Sect.2, it is always possible to compensate a missing or wrong conductivity field with a wrong recharge field.An effect of this compensation is also  clearly seen in the last column of Fig. 5: even after 250 days of data assimilation, the estimated recharge shows remarkable similarity with the reference conductivity field.The long assimilation time is important, since, if there had been no compensation, the estimated fields would not retain their erroneous structures for so many filter updates.This shows that the issue of trading one quantity for the other is not only a theoretical issue but also relevant in practice.It should be noted here that the cause of the original poor estimations is not the compensation mechanism described in this paper but the false prior sampling.However, the compensation mechanism sustains the poor estimates when the observations are, as in this work, non-unique.
The lacking ability of the random and wrong initial ensemble estimates with respect to predicting heads under conditions not encountered in the calibration period is documented as OLE and TE-2 in Table 3, where the prediction errors caused by the poorly estimated fields are often an order of magnitude larger then those resulting from a really good estimation.It is interesting to note that the error at the observation locations obtained throughout the assimilation, shown in Table 4, are not a good indicator for the predictive capabilities of the various models, as quantified by the prediction errors listed in Table 3.Although there are differences in the assimilation error, both within and between the different estimation setups, it would be difficult to predict any model performance from these errors.That the joint estimation is performing much better with the good prior compared to the poorer ones is only obvious if the full table is available.
When comparing the errors in Tables 3 and 4, it is easily detectable that the errors, especially at the observation locations, are much smaller during the assimilation than during the prediction.This marks the fact that we have not estimated exactly the true fields, and as soon as we stop assimilating head data into the system, the models may start to deviate.Depending on the goodness of the estimated fields, the models will deviate more or less, but all estimation setups show an increase in error at the observation location during prediction.It should also be noted that, during a resimulation using a different initialization for the randomness in the EnKF, the values of the observation location errors during prediction were notably different for most cases in which conductivity fields were estimated.This indicates that the estimated conductivity fields and, hence, the head predictions are influenced by the randomness in the EnKF.Therefore, the exact error values presented here should be used with care.For the comparative conclusions and discussions in this paper, however, the EnKF simulations are considered stable enough, as the errors are always on the same order and no differences are directly visible on the final mean estimated fields.
Interesting to note in Table 3 is that the highest errors, both at the observation location and for the total errors, are found when only conductivity fields are estimated using the wrong prior.Hence, using the wrong priors while jointly estimating recharge and conductivity gives a better prediction of the heads.As is obvious from Figs. 4 and 5, both estimated conductivity fields are similarly poor; hence, this quantitatively confirms the existence of the aliasing problem: jointly wrong estimated fields can compensate each other to simulate more correct head fields.
The values of the total normalized errors (TE-1), also listed in Table 3, must be interpreted in light of the standard deviation of estimation used for normalization.A value of unity would indicate that the mismatch between predicted and true hydraulic heads follows exactly the predicted uncertainty of the hydraulic-head estimation.A value significantly smaller than unity indicates that the conditional ensemble is too wide, whereas a value significantly larger than unity points to an erroneous estimate with erroneously small error bounds.As can be seen from Table 3, both the true and random priors lead to a combination of head mismatch and associated uncertainty close to unity, slightly overestimating the uncertainty, whereas the wrong prior leads not only to wrong patterns of the hydraulic conductivity fields but also to erroneous head predictions with too-small prediction uncertainty.As can also be seen in Fig. 6, the conductivity field estimation with the wrong prior shows a much smaller variance than using the other priors.This likely leads to an underprediction of the variance in the head fields and, hence, the high value observed for TE-1 during the prediction (Table 3).
The issue of low errors in the assimilation period is further illustrated with an example of two observations wells in Fig. 7, from which it is clearly seen that all approaches show a good fit during assimilation but that the heads using the wrong prior deviate in the prediction.From a practical point of view this highlights the importance of having relevant val- idation data to test the predictive power of a model when the parameters are inferred using sequential data assimilation.
Like in the scenarios in which only recharge or only conductivity were estimated, the mean joint estimate lacks the extreme values of the reference fields.As discussed above, such behavior is expected for the smooth best estimate even in cases where the scheme works perfectly fine.Individual ensemble members show significantly stronger variability, as can be seen also from the maps of the estimation variance in Fig. 6.We consider the results from the good initial ensemble as good, since they capture the main patterns of the parameter fields well and have, overall, reasonable absolute parameter values.For purposes of transport predictions, we would recommend using the entire ensemble rather than the ensemble mean.In the case of the estimates using the wrong prior knowledge, in particular where the orientation of anisotropy is chosen randomly, the fluctuations cannot be aligned well in the right direction, and averaging over features oriented in all directions leads to particularly smooth estimates of the mean.

Conclusions
In the present study we have shown that it is possible to jointly estimate reasonable fields of hydraulic conductivity (or its logarithm) and recharge as spatially fluctuating fields from pure head observations provided that the statistics of the true fields are fairly well understood.Starting with wrong assumptions about conductivity and recharge patterns can lead to aliasing, in which undetected features of hydraulic conductivity are traded for erroneous fluctuations in recharge.
In real-case applications, the prerequisite of a good prior can pose a severe problem because the true spatial patterns may be widely unknown.From a more technical point of view it may be noteworthy that a rather common way of setting up a synthetic groundwater-EnKF test is to generate a large ensemble of realizations and use one of them as the truth and the rest as the initial ensemble.By this it is guaranteed that the statistics of the initial ensemble are perfect, and, as shown here, a good result can be expected.Unfortunately, in real-world applications the geostatistics of (log-)hydraulic conductivity are typically quite uncertain, so the good performance of a scheme, involving both the measurement strategy and the inverse method, in an overly optimistic test case regarding prior knowledge may not be transferable.We thus highly recommend designing realistic test cases that include potential bias in prior knowledge.
In the present work, we only used head data for data assimilation and parameter estimation.As shown in Sect.2, however, conductivity and recharge are not simultaneously identifiable if considered as parameters that can vary unrestricted in space and time: even if hydraulic heads were observed everywhere at all times, exactly the same head field could be achieved with different combinations of conductivity and recharge fields.Therefore, the joint estimation is impossible without prior information about the parameter fields, implying that wrong prior information cannot be corrected by the head data.Other types of observations could, of course, also be considered.Ideally we would have (plenty of) observations of subsurface fluxes or of conductivity.In this case, the total data set would become highly informative and the prior would be significantly less important.However, this is not realistic for applications in subsurface hydrology.Fluxes cannot be measured as such, and conductivity measurements are, if existing and trusted, very local.Further observations could be obtained from tracer tests, which are time consuming, or age tracer data, which may be costly and require very long simulation times.Head observations are, in this respect, common and trustworthy measurement.Hence, our example can be considered rather realistic for a real-world scenario of estimating aquifer parameters.
In real-world applications, vague guesses of the hydraulic conductivity distribution may exist from drilling logs, slug tests, and pumping tests (e.g., Dietrich et al., 2008;Lessoff et al., 2010).All of these tests are independent of recharge so that making use of this information may alleviate the problem of non-uniqueness outlined in this paper to some extent.Vague guesses of K can find their way into parameter estimation either by means of an improved prior of K or by explicitly accounting for the additional measurement types in the EnKF procedure, including the full observation operator.For recharge, the patterns should in principle reflect land use and soil types, which are accessible information.Further, spatially variable recharge may also be constrained by the use of remote-sensing information (e.g., Brunner et al., 2006;Hendricks Franssen et al., 2008).These type of data could be either used as direct observations in the assimilation (if we trust them) or considered as prior information and used to condition the initial ensemble (Sun et al., 2009;Panzeri et al., 2013).The latter could also be seen as a way of discarding initial samples that contain unfeasible conductivity-recharge combinations.This would create a much more appropriate initial ensemble.Hence, as shown in this work, the filter would have an increased chance of successfully estimating the parameters when the prior is good.The idea of improving the initial ensemble can also be related to the popular method of multiple-point geostatistics.Here, training images which should represent relevant spatial correlation patterns have been used to condition conductivity fields (see Okabe and Blunt, 2004;Hu and Chugunova, 2008).The combination of assimilating head data and the use of training images to condition the ensembles has also been tested, with promising results (Li et al., 2013).The combination of these approaches could prove a possible way to achieve a more correct prior sample and, hence, to improve the performance of the joint estimation of conductivity and recharge fields by lowering the risk of conductivity-to-recharge aliasing due to wrong prior knowledge.
In the presented work, we consider a rather standard formulation of the ensemble Kalman filter without iterations, smoothing, and many ad hoc features.For the joint estimation of recharge and conductivity, an iterative approach such as the dual-state filter (El Gharamti et al., 2013), locally iterative filters (Hendricks Franssen and Kinzelbach, 2008), or fully iterative filters or smoothers (Sakov et al., 2012;Bocquet and Sakov, 2013) could be considered.The advantage would be a separation between the update of the recharge and the update of the conductivity.This could, potentially, reduce the risk for conductivity-to-recharge aliasing.The iterative approaches have been reported to have improved performance and physical consistency, but they tend to come with longer simulation times.

Figure 1 .
Figure 1.Illustrative example of replacing a heterogeneous conductivity field (left column panels) with a homogeneous conductivity and an effective recharge (right column panels).Please note the different scale on the third recharge plot.

Figure 2 .
Figure 2. Setup of the synthetic test case used for the parameter field estimations.

Figure 3 .
Figure 3. Estimation of stand-alone recharge.Upper panels show the final ensemble mean after all assimilation steps, and lower plots the covariance function used to generate the initial ensemble.Please note that the random covariance functions imply drawing the rotation angle from a uniform distribution between 0 and 2π, whereas only a few illustrative examples are shown.

Figure 4 .
Figure 4. Estimation of stand-alone conductivity.Upper panels show the final ensemble mean after all assimilation steps, and lower plots the covariance function used to generate the initial ensemble.Please note that only a few illustrative examples of the random orientation angle of anisotropy are shown.

Figure 5 .
Figure 5. Joint estimation of recharge (top row panels) and conductivity (middle row panels).Shown is the final ensemble mean after all assimilation steps and the covariance functions used to generate the initial ensembles (bottom row panels).

Figure 6 .
Figure 6.Joint estimation of recharge (top row panels) and conductivity (bottom row panels).Shown is the final ensemble variance after all assimilation steps.

Figure 7 .
Figure 7. Two head observations plotted over time for the joint estimation of recharge and conductivity.Shown is the ensemble mean.Assimilation is performed from day 50 to day 300, while the remaining days are considered for prediction.

Cirpka: Joint inference of recharge and conductivityTable 1 .
Pumping rates and general model setup * .Pumps are numbered as in Fig.2; z 0 and poro are the homogeneous bedrock elevation and porosity, respectively.

Table 3 .
Normalized root mean square error for the prediction period * .

Table 4 .
Normalized root mean square error for the assimilation period.