Testing alternative uses of electromagnetic data to reduce 1 the prediction error of groundwater models 2

8 Despite that geophysics is being used increasingly, it is often unclear how and when the integration 9 of geophysical data and models can best improve the construction and predictive capability of 10 groundwater models. This paper uses a newly developed HYdrogeophysical TEst-Bench (HYTEB) 11 which is a collection of geological, groundwater and geophysical modeling and inversion software 12 to demonstrate alternative uses of electromagnetic (EM) data for groundwater modeling in a 13 hydrogeological environment consisting of various types of glacial deposits with typical hydraulic 14 conductivities and electrical resistivities covering impermeable bedrock with low resistivity (clay). 15 The synthetic three dimensional reference system is designed so there is a perfect relationship 16 between hydraulic conductivity and electrical resistivity. For this system it is investigated to what 17 extent groundwater model calibration and, often more importantly, model predictions can be 18 improved by including in the calibration process electrical resistivity estimates obtained from TEM 19 data. In all calibration cases, the hydraulic conductivity field is highly parameterized and the 20 estimation is stabilized by (in most cases) geophysics-based regularization. 21 For the studied system and inversion approaches it is found that that resistivities estimated by 22 sequential hydrogeophysical inversion (SHI) or joint hydrogeophysical inversion (JHI) should be 23 used with caution as estimators of hydraulic conductivity or as regularization means for subsequent 24 hydrological inversion. The limited groundwater model improvement obtained by using the 25 geophysical data probably mainly arises from the way these data are used here: the alternative 26 inversion approaches propagate geophysical estimation errors into the hydrologic model 27 parameters. It was expected that JHI would compensate for this, but the hydrologic data were 28


Using hydrologic models for decision support
Groundwater models are commonly constructed to support decision-makers in managing groundwater resources.The model can, for example, be used to predict the impact of changes in groundwater pumping on hydraulic head and wellhead protection areas or to predict the fate and transport of groundwater pollution.In general terms, process models are used to base predictions of interest on all of the knowledge that we have about the physical/chemical system and the driving key processes.In this paper we will focus on 3-Published by Copernicus Publications on behalf of the European Geosciences Union.

N. K. Christensen et al.: Testing alternative uses of electromagnetic data
D models typically used for decision support on large spatial scales (from tens to thousands of square kilometers) with a heterogeneous and possibly complex geology.Deterministic groundwater modeling is generally used in such cases because the model simulation time is too long to make it feasible to use stochastic modeling.We will therefore mainly focus on deterministic groundwater modeling in the following.
A deterministic groundwater model is based on a conceptual model that encapsulates prior knowledge of important physical and chemical conditions and processes of the complex real world system.The conceptual model is translated into a numerical groundwater model whereby its reasonableness can be tested by comparing forward simulations with field observations.If the conceptual model appears reasonable, the groundwater model is calibrated by adjusting model parameters until simulated values fit corresponding field observations sufficiently well.The calibrated model is subsequently used to make predictions (Reilly, 2001;Reilly and Harbaugh, 2004).However, the prediction will be uncertain for various reasons, of which we will emphasize three.(i) Model calibration is done by fitting uncertain data.The calibrated parameters will therefore also be uncertain and this uncertainty is propagated to the model predictions (Hill, 1998;Moore and Doherty, 2006;Tonkin et al., 2007).A model's predictive uncertainty will only be reduced by calibration if the information content of the calibration data set constrains the parameter values that significantly influence the prediction (Harvey and Gorelick, 1995;Feyen et al., 2003;Franssen et al., 2003).Thus this source of uncertainty can only be reduced by collecting more or more accurate data of type(s) and location(s) that constrain parameter values important to the prediction.The data will typically be hydrologic or hydraulic, but they can also be geophysical.(ii) Because of scarcity and lack of sensitivity of data, there will always be small-scale heterogeneity that cannot be resolved.A groundwater model will therefore always contain small-scale structural errors, which may not cause bias in predictions but may still cause large prediction uncertainty (Cooley, 2004;Cooley and Christensen, 2006;Refsgaard et al., 2012).(iii) A model is also prone to possessing largescale structural errors that can cause significant bias and uncertainty of estimated parameters and simulated predictions (Doherty and Welter, 2010;Doherty and Christensen, 2011;Refsgaard et al., 2012).This bias and uncertainty can be reduced by collecting data that resolve the large-scale structures of the studied hydrogeological system, which can then be accurately represented in the model.These can, for example, be spatially dense geophysical data sets.
Model errors will lead to errors and uncertainties in predictions of interest.One of the key questions to address in creating models for decision support is the following: which additional data are most likely to improve key predictions?The types of data available for use in hydrologic analysis are increasingly diverse, including physical, chemical, isotopic, and geophysical data.In light of this complexity, it can be very difficult to compare the likely contributions of diverse data to model-based decision support.

Informing hydrologic models with geophysics
Over the last 3 decades, noninvasive geophysical methods have been used increasingly to construct groundwater models (Hubbard and Rubin, 2000;Vereecken et al., 2004).This is particularly true for data collected by airborne electromagnetic methods (AEM) because they can be collected quickly, densely, and at a relatively low cost for the very large spatial coverage (Steuer et al., 2008;Viezzoli et al., 2010b;Abraham et al., 2012;Faneca Sànchez et al., 2012;Refsgaard et al., 2014;Munday et al., 2015).Large-scale AEM (or ground-based EM) investigations have been used to delineate aquifers, aquitards, and buried valleys or other structures containing aquifers (Auken et al., 2003;Sandersen and Jørgensen, 2003;Jørgensen et al., 2003;Abraham et al., 2012;Oldenborger et al., 2013), to assess aquifer vulnerability (Refsgaard et al., 2014;Foged et al., 2014), to map saltwater intrusion (Fitterman and Deszcz-Pan, 1998;Viezzoli et al., 2010b;Lawrie et al., 2012;Herckenrath et al., 2013b), and to map freshwater resources (Steuer et al., 2008;Faneca Sànchez et al., 2012;Munday et al., 2015).The main drawbacks of electromagnetic (EM) data are (1) ambiguity in relating electrical properties to hydraulic properties, and (2) reduced lateral and vertical resolution with depth.The former effect can limit the quantitative use of geophysical data for parameterizing groundwater models.The latter effect makes identification of deep structures difficult (Danielsen et al., 2003;Auken et al., 2008), which will have different influences on predictions that are dominated by shallower or deeper flow paths.
Geophysical data must be related to properties or states of hydrologic relevance to use them in constructing hydrologic models.Whether the geophysical data are used to define hydrostratigraphic units or subregions or to parameterize the model, the data are often inverted.The way in which hydrologic and geophysical data are inverted and integrated can impact the extraction of information from geophysical data (Dam and Christensen, 2003;Day-Lewis, 2005;Moysey et al., 2005;Linde et al., 2006;Singha and Gorelick, 2006;Singha and Moysey, 2006;Hinnell et al., 2010).
The simplest inversion approach is sequential hydrogeophysical inversion (SHI).In the first step of this approach, the geophysical data are inverted independently of the hydrologic data or model.In the second step, the inverted geophysical properties are used to zonate or directly parameterize the hydrologic model (Hubbard et al., 1999;Dam and Christensen, 2003;Seifert et al., 2007;Koch et al., 2009;Di Maio et al., 2013;Marker et al., 2015).This is based on the assumption that the geophysical responses are sensitive to some of the same structures and property distributions that the hydrologic data are sensitive to.Using the SHI approach has built-in challenges.In the first step, the geophysical inversions are typically stabilized by using regularization and smoothing constraints that do not reflect real physical conditions (Day-Lewis, 2005;Linde et al., 2006;Singha and Gorelick, 2006;Singha and Moysey, 2006).Therefore one must be cautious when using such geophysical property estimates to infer hydraulic zones or property estimates to be used in the second step of the SHI (Day-Lewis, 2005;Slater, 2007;Hinnell et al., 2010).Furthermore, with the SHI approach used in the following, the geophysical models cannot be easily updated to conform to the hydrologic observations.Such updating is allowed by the SHI approach of Dam and Christensen (2003).
Two inversion alternatives to SHI are coupled hydrogeophysical inversion (CHI) and joint hydrogeophysical inversion (JHI) (Hinnell et al., 2010).For both alternatives, the hydrologic and geophysical data sets are inverted simultaneously.In CHI, the simulated response of one model (e.g., the geophysical model) is used to constrain the other model (e.g., the hydrologic model).CHI has been applied successfully for reducing parameter uncertainty by using ground penetrating radar and electrical resistivity tomography data in hydraulic models (Kowalsky et al., 2005;Hinnell et al., 2010).In JHI, the hydrologic and geophysical models are coupled directly through some of their parameters using assumed relationships among the geophysical and/or hydrologic parameters (Hyndman et al., 1994).For EM data, JHI is typically done using a relationship between hydraulic conductivity and electrical resistivity inspired by Archie's law (Archie, 1942;Revil and Cathles, 1999;Purvance and Andricevic, 2000;Slater, 2007).
Applications of sequential and joint versions of hydrologic and geophysical data using a petrophysical relationship in groundwater modeling have been demonstrated by Herckenrath et al. (2013a) and Vilhelmsen et al. (2014).Herckenrath et al. (2013a) were comparing a SHI with JHI for a largescale groundwater model using ground-based EM data.Vilhelmsen et al. (2014) were demonstrating a method for joint inversion of aquifer test data, magnetic resonance sounding data, and ground-based electromagnetic data.For synthetic benchmarking, both of these studies were using a simple model with a few layers with constant parameter values, and they were evaluating the model performance by the means of improved model parameter values and reduction of parameter uncertainty.However, as concluded by, e.g., Zhou et al. (2014), the goal of groundwater model calibration is not just parameter identification, but also to increase the model's prediction capability.
Petrophysical relationships between hydraulic conductivity and electrical resistivity are difficult to establish, because such translation tends to be site, scale and facies specific (Hyndman and Tronicke, 2005;Slater, 2007).Therefore methods have been developed that do not rely on a petrophysical relationship between a property of the geophysical model and the property of the hydrological model.For ex-ample, Marker et al. (2015) developed a deterministic SHI approach where spatially dense AEM data are first inverted, then combined with scarce lithological data from boreholes to form a clay fraction model for the sediments of the subsurface (Foged et al., 2014); subsequently, cluster analysis of electrical resistivity and clay-fraction values is used to generate one or more structural model realizations for the subsurface (Foged et al., 2014); finally, the structural model(s) are used in the groundwater model that is calibrated against hydrological data.He et al. (2014He et al. ( , 2015) ) developed a similar methodology where the transition probability method is used to generate the structural model realizations from borehole data and AEM estimated resistivities.The methodologies of Marker et al. (2015) and He et al. (2015) were both used by them on a large spatial scale.However, the number of calibrated parameters was kept small by assuming hydraulic conductivity to be constant within an entire deposit or structure.The methodologies are thus stochastic in terms of generating structure but deterministic in terms of representing and estimating property fields.
Full stochastic approaches have been developed.For example, Ruggeri et al. (2013) developed a Bayesian simulation approach to estimate the hydraulic conductivity field from measurements of geophysical parameters.So far the method has only been used for estimation along short profile lines (Ruggeri et al., 2013(Ruggeri et al., , 2014)), and the Bayesian scheme has not yet been extended to also involve hydrological data and groundwater modeling.Furthermore, the methodology of Ruggeri et al. (2013) requires the existence of a petrophysical relationship between electrical resistivity and hydraulic conductivity; the relationship does not need to be known, in which case a data set of collocated measurements of electrical resistivity and hydraulic conductivity is required.An alternative Bayesian approach was presented by Hermans et al. (2015), who integrate multiple and multiscale data types to falsify alternative structural models using a stochastic search method.The methodology was demonstrated for a field case using hydraulic head data and electrical resistivity tomography to constrain a hydrological model.Stochastic inversion approaches have been developed that invert electrical resistivity (ER) and hydrological data.Irving and Singha (2010), for example, describe a Markov chain Monte Carlo (McMC) method to estimate a binary hydraulic conductivity field from ER and tracer test data.As explained by Irving and Singha (2010, p. 13), their McMC approach is computationally expensive.Their demonstration examples were therefore 2-D and small in scale, and application to field data has not yet been attempted (Irving and Singha, 2010, p. 13).
In the following we therefore keep focus on use of geophysics in connection with deterministic groundwater modeling.We particularly focus on application and comparison of SHI and JHI when used in connection with groundwater investigation of large domains with 3-D heterogenous hydrogeological and geophysical systems.

Testing the worth of using geophysics
It is intuitively clear that spatially dense geophysics can offer valuable information for improved groundwater modeling for decision-making.However, many important questions can be raised.For example: how beneficial is it to collect and use EM data in groundwater modeling for a complex 3-D hydrogeological system?How much can be gained by using for example JHI instead of SHI for model calibration?Are both inversion approaches prone to leading to biased parameter estimates or model predictions?And what model prediction types will benefit from using EM data in connection with the model development and calibration?The answers to these questions will to some (large?)extent depend on the actual hydrogeological setting as well as on what types of prediction are going to be made by the groundwater model.
To provide such answers we have developed a crossdisciplinary, flexible platform to examine the worth of geophysical data for improvement of groundwater model predictions in potentially complex environments.The platform can be used to build synthetic experiments that have similarity with the actual hydrogeological and geophysical systems to be investigated, the types of data to potentially be collected, and the types of models to potentially be used.The flexibility of the platform allows easy investigation of the data worth when using alternative data sampling and alternative modeling or inversion strategies.Because of the supposed similarity between the synthetic and the actual systems, the conclusions from the synthetic study can be transferred to actual investigation.The platform is called HYTEB, which is an abbreviation of HYdrogeophysical TEst-Bench.HYTEB builds on a merge of software from different disciplines such as stochastic hydrogeological modeling, groundwater modeling, geophysical modeling, and advanced highly parameterized inversion using SHI, CHI or JHI.HYTEB can also support examination of use of geophysics in a stochastic groundwater modeling context (which will be demonstrated in a manuscript in preparation).

Objectives
The paper has the following objectives.First, it will present the important elements and steps in the use of HYTEB.Since HYTEB and its use are interdisciplinary, the presentation and the following case study introduce geophysicists to the methods, challenges, and purposes of groundwater modeling, and groundwater modelers to some of the challenges of using mainly electromagnetic data for groundwater model calibration purposes.Second, HYTEB is used to examine the worth of adding a ground-based time-domain electromagnetic data set to a hydrological data set when making a groundwater model for a glacial landscape of a kind that is typical of parts of northern Europe and North America.It is investigated whether the worth of adding the geophysical data depends on the type of groundwater model prediction as well as on whether the geophysical and hydrological data are inverted sequentially or jointly.Section 2 of this paper describes the elements of HYTEB and how they are used, Sect. 3 describes the case study, Sect. 4 presents the results, while Sect. 5 makes a summary and draws conclusions.
2 The elements and concept of HYTEB (HYdrogeophysical TEst-Bench) Our primary objective in developing HYTEB is to provide a synthetic environment that allows users to determine the value of geophysical data and, furthermore, to investigate how best to use those data to develop groundwater models and to reduce their prediction errors.We suggest that this can best be investigated by using a synthetic case study for which the "generated synthetic", in the following termed "reference", hydrologic and geophysical systems are known and the influences of different sources of error can be investigated.We use physical and geophysical forward simulators to generate measurements that would be collected from the reference systems in the absence of noise.We then examine the influence of measurement error and other sources of error on model predictions of interest.By repeating this for different synthetic system realizations (i.e., for different reference systems) and for different data sets it becomes possible to statistically quantify the worth of the various data for improving the predictions of interest.The workflow of HYTEB is shown in Fig. 1.The procedure is divided into six steps, which will be described separately and briefly in the following subsections.

Step 1 -Generation of geological realization
The first step is to generate a synthetic realization of the type of geological system under study.The generation can be made conditional on lithological data from boreholes.The borehole data can be imaginary, a real data set, or a combination of data, hydrogeologic structure, and geostatistics.Figure 1, step 1, displays an example of a generated system consisting of categorical geological deposits on a plain as well as in a valley buried under a part of the plain.The deposits are underlain by impermeable bedrock (not shown).Such categorical geological settings can, for example, be generated using T-PROGS (Carle, 1999) or BlockSIS (Deutsch, 2006).The spatial discretization used for the geological realization also defines the spatial discretization of the numerical model used to simulate groundwater flow or any other process model that a user decides to integrate into HYTEB.

Step 2 -Generation of reference groundwater system, data set, and predictions
Using the same spatial discretization as in step 1, the second step is to define the boundary conditions and the hydraulic and solute transport property values for the generated geo- logical system.The hydraulic and solute transport properties can include, for example, hydraulic conductivity, specific storage, and effective porosity.For categorical deposits (as in Fig. 1) the value of each type of property will typically vary among categories as well as within each category.Such variation can be simulated as categorical random fields by using, e.g., SGSIM (Deutsch and Journel, 1998) or FIELDGEN (Doherty, 2010).The generated realization of boundary and property values is used in a numerical simulator of groundwater flow and solute transport to simulate a set of state variables to be used in step 5 as hydrologic observations used for model calibration; random error is typically added to these observation data to represent all sources of noise that corrupt real observations.The numerical simulator is also used to simulate a set of predictions that are considered of particular interest to the study.We have implemented MODFLOW-2000 (Harbaugh et al., 2000) as the numerical simulator of groundwater flow and MODPATH (Pollock, 1994) to simulate solute transport by particle tracking.
In the following, the numerical simulators using the boundary conditions and property values that represent the system realization are called "the reference groundwater system", and the predictions (for example forecasts) simulated for this system are called "reference predictions".

Step 3 -Generation of a reference geophysical system and data set
The third step is to define the property values of the geophysical system corresponding to the geological realization generated in step 1.Like the hydraulic properties, the geophysical properties can be considered and simulated as categorical random fields.A geophysical property of relevance can, for example, be the electrical resistivity of the spatially variable geological deposits.For some geological systems, it is found or assumed that there is a correlation between electrical resistivity and hydraulic conductivity.In this case, the hydraulic and geophysical property fields must be generated to be dependent.Various empirical petrophysical relationships between hydraulic conductivity and electrical resistivity have been proposed (Slater, 2007).Both positive and negative relationships have been reported, and there can be significant uncertainty in the relationship (e.g., Mazáč et al., 1985).It is common to use a linear log-log relationship, which is given some theoretical support by Purvance and Andricevic (2000).
Having defined the property values of the geophysical reference system, the geophysical instrument responses are simulated to produce a noise-free geophysical data set that can be corrupted by adding method-specific and random error.Ideally a 3-D code should be used.Codes for 3-D compu-tation of TEM responses have been developed (e.g., Árnason, 1999), but the computation is impractical and burdensome.As a practical alternative we suggest simulating TEM responses by a 1-D code, where the 1-D geophysical model is created from the reference system by pseudo-3-D sampling, that is by taking the logarithmic average of the cells within the radius of the EM footprint at a given depth.Modeling TEM in 1-D can be problematic in connection with mineral exploration, but for sedimentary environments the 1-D approach should work well (Auken et al., 2008;Viezzoli et al., 2010a).In HYTEB we use AarhusInv (Auken et al., 2014) to simulate electromagnetic instrument responses.
In the following, the geophysical simulator using the actual realization of geophysical parameter values is called the "reference geophysical system".

Step 4 -Model construction and parameterization
In this step, the synthetic data are used to constrain parameter estimation for a groundwater model of the reference groundwater system.Each property of the real groundwater and geophysical systems needs to be parameterized in the groundwater model.This step thus corresponds to the construction of a groundwater model of a real field system on the basis of the available real data.In the synthetic case, the groundwater model can be discretized exactly as the "reference groundwater system" or it can use a coarser discretization.Here we adopt the former alternative to reduce numerical discretization error.However, this effect could be examined if it were of interest to a particular study.
In studies of real systems, the groundwater model is often constructed to consist of zones of uniform hydraulic properties.The subdivision into zones is typically done subjectively by an expert on the basis of geological, hydrological, and geophysical data (Seifert et al., 2007;Di Maio et al., 2013).This principle can also be used to define zones of a model of the synthetic groundwater system by using the synthetic lithological data from boreholes used in step 1, the hydrological data set generated in step 2, and geophysical models estimated by inverting the geophysical data sets generated in step 3.In this case, the geophysical data must be inverted between step 3 and step 4. The inverted data are used either in step 4 to support parameterization of the groundwater model or in step 5 for groundwater model calibration.To avoid overreliance on the geophysical data, it may be argued that they should not be used in both steps 4 and 5.If the geophysical data are used in step 4, they must be inverted before inverting the hydrological data (carried out in step 5); this is an example of sequential hydro-geophysical inversion (SHI).
An alternative parameterization approach uses the concept of pilot points (Certes and De Marsily, 1991) to parameterize the property fields and to let the data determine the variation of the model property fields (e.g., Doherty, 2003).Pilot point approaches result in a smooth property variation within the model domain (Doherty, 2003) rather than sharp zonal pa-rameter fields.Pilot points can be used in combination with zones, e.g., to represent property variation within categorical deposits.
HYTEB allows any type of parameterization, for example zones, pilot points, or combinations hereof.
It is emphasized that in the following we use the term "groundwater model" for a simulator that is set up, parameterized, and calibrated to make "model predictions" of states occurring in the reference groundwater system.States occurring in (i.e., simulated for) the reference groundwater system are here termed "reference predictions".The objective of model calibration is to make the model predictions as similar as possible to the reference predictions.

Step 5 -Calibrate the model(s)
The fifth step is to calibrate the groundwater model by using the data set produced in step 2 to estimate the model parameters.The step may also include estimation of geophysical model parameters on the basis of the data sets produced in step 3.The simultaneous estimation of the hydrologic and geophysical parameters can be done by using either the coupled (CHI) or joint (JHI) hydro-geophysical inversion approaches (Hinnell et al., 2010;Vilhelmsen et al., 2014).When the number of parameters is large compared to the number of data, the minimization can be aided by using a regularization technique (for example singular value decomposition or Tikhonov regularization); see Oliver et al. (2008) for an overview.

Step 6 -Simulate model predictions, then repeat steps 1-6
After successful calibration, the groundwater model is used to make model predictions equivalent to the reference predictions of step 2. (A prediction is a state variable different from the calibration data, for example a forecast.)For each prediction, this produces one value computed by a calibrated model that can be compared with the equivalent reference value.It is not possible to make a meaningful inference about a model's ability to make a specific prediction from just one experiment.To test the reproducibility of the experiment, steps 1 to 6 need to be repeated a number of times.Each repetition involves generation of a new realization of the geological system and the corresponding reference groundwater and geophysical systems, new data sets (i.e., new reference systems), model calibration, and predictions.The number of repetitions should be sufficient to provide a basis for making a consistent statistical inference on the model prediction results.

Step 7 -Evaluate model prediction results
When steps 1 to 6 have been completed, an ensemble of pairs of model prediction and equivalent reference prediction is plotted to evaluate the model performance.As discussed by Doherty and Christensen (2011) 2011) for further explanation).
Ultimately, calibrated models are used to make predictions of interest.These predictions are generally in the future and may describe the response of the system to alternative management actions.The calibrated model, or model ensemble, can be used to predict future hydrologic responses to nearterm actions, thereby providing information critical to informed decision-making.Increasingly, these decisions consider both the accuracy (bias) and the uncertainty of model predictions in a probabilistic framework (Freeze et al., 1990;Feyen and Gorelick, 2005;Nowak et al., 2012).

Demonstration model
We demonstrate the use of HYTEB through a synthetic case focusing on making three types of model predictions that are commonly useful for groundwater management: (i) hydraulic head; (ii) head recovery and change of groundwater discharge related to abandoning pumping from a well; and (iii) the recharge area and the average age of groundwater pumped from that well.The synthetic demonstration model used here is, to a large degree, inspired by the model of Doherty and Christensen (2011).The hydrogeological setting of the model domain is typical for large areas of northern Europe and North America: a glacially formed landscape with a buried tunnel valley eroded into impermeable bedrock (fat clay) with very low electrical resistivity (Wright, 1973;Piotrowski, 1994;Clayton et al., 1999;Jørgensen and Sandersen, 2006).The case is designed to have a perfect relationship between hydraulic conductivity and electrical resistivity.This is chosen to make a best possible case for resolving change of lithology and change of hydraulic conductivity inferred from of electrical resistivity.The deposits above the bedrock are glacial of different types.For the sake of clarity, the synthetic model will be described in the section below, and the exceptions and changes from the setup of Doherty and Christensen (2011) will be highlighted.Each HYTEB step will be presented in order following Fig. 1.

Generation of geological system realizations (Step 1)
The domain is rectangular, 7 km north-south (N-S) and 5 km east-west (E-W).It is capped by 50 m of glacial sediments deposited as gently N-S elongated layered structures composed of sand, silt or clayey till.The bedrock consists of  (Carle, 1999).The proportions and mean lengths for the different categories of sediments are provided in Table 1.The bedding is represented as a maximally disordered system using "maximum entropy" transition frequencies (Carle, 1999).
A total of 1000 geologic system realizations were generated.These categorical realizations were all conditioned on the same stratigraphy for the 35 boreholes, but are otherwise independent.Figure 3 shows one of these realizations.
Table 1.Geostatistical parameters for stochastic hydraulic field employed by the hydraulic reference model.K is for hydraulic conductivity (ms −1 ), R is for recharge (ms −1 ) to the groundwater model, and phi for porosity.µ is mean value to the log 10 of K, a is range for small-scale variability, and σ 2 the sill.The semivariograms are exponential.Within each category of sediment, the hydraulic conductivity varies as a horizontally correlated random field.The same is the case for porosity and recharge.The random fields were generated by FIELDGEN (Doherty, 2010) using the sequential Gaussian simulation method (Deutsch and Journel, 1998) with the geostatistical parameters given in Table 1.

Hydrological data set
All 35 boreholes have been constructed as monitoring wells; each well screens the deepest 10 m (deepest cell) of sand registered in the borehole (Table 2; Fig. 2).For each realization, groundwater flow was simulated as confined using MODFLOW-2000 (Harbaugh et al., 2000).The corresponding set of values for the hydrological observations, consisting of hydraulic head in the 35 wells and the river discharge, were extracted from the MODFLOW-2000 output.Independent Gaussian error with zero mean and 0.1 m standard deviation was added to the true head values to produce the head observations.Gaussian error with zero mean and a standard deviation corresponding to 10 % of the true river discharge was added to the discharge to produce the stream flow observation used for model calibration.

Reference predictions
Collecting and using new geophysical data is likely to constrain some groundwater model parameters more than others.Different predictions of interest will have different sensitivities to different model parameters.As a result, the addition of geophysical data for groundwater model calibration is likely affect the error of model predictions differently.To illustrate this, we present seven types of predictions of interest in Table 3.
Prediction types 1 to 3 relate to steady-state flow conditions with groundwater being pumped from the deep well in the buried valley.This is the same situation for which the hydrological data set was generated.Type 1 concerns head prediction at ten locations (Fig. 2 and Table 4).Type 2 is the size of the recharge area of the pumping well.Type 3 is the average age of the groundwater pumped from the well.
Prediction types 4 to 7 relate to a new steady-state long after pumping from the well has been stopped.Type 4 is head recovery at the ten locations given in Fig. 2 and Table 4. Type 5 is the travel time of a particle flowing with the groundwater from the location where it enters the system at the northern domain boundary (x = 2500, y = 6975.5,z = 0) until it exits the system either into the lake (at the southern boundary) or into the stream.Type 6 is the relative location of the exit point of that particle defined as the Euclidean distance between the reference and the model predicted endpoint in a 3-D space.Type 7 is groundwater discharge into the stream.
The prediction types 1, 4 and 7 were simulated by MODFLOW-2000 (Harbaugh et al., 2000).The other prediction types were simulated by forward particle tracking using MODPATH version 5 (Pollock, 1994) and MODFLOW-2000 results.Types 5 and 6 were simulated by tracking a single particle with MODPATH.Types 2 and 3 were simulated by placing particles in a horizontally uniform 25 m grid at the surface (i.e., releasing one particle at the surface at the center of each model cell) and tracking them forward in time until they reached either the river, the southern boundary, or the pumping well.Each particle represents an area of 25 × 25 m 2 .The number of particles ending in the pumping well thus defines the well's recharge area.The average groundwater age is computed as the weighted average of the travel time for all of the particles captured by the well.The weight for a particle is calculated as the recharge rate (in m 3 s −1 ) from the 25 × 25 m 2 surface area represented by the particle divided by the pumping rate.This sum of all weights adds to one because water only enters the model through the uppermost layer.

Reference geophysical system and data -Step 3
In the demonstration example, the geophysical property of interest is electrical resistivity of the subsurface.For simplicity it is assumed that there is a perfect relationship between hydraulic conductivity and electrical resistivity.The relationship is of the form where K is the hydraulic conductivity (m s −1 ) derived from resistivity, ρ is the electrical resistivity (ohm-m), and β 1 = log 10 1e −12 and β 2 = log 10 (4) are empirical shape factors that are constant within the model domain.The shape factor values reflect conditions where, for example, clay has low electrical resistivity and also low hydraulic conductivity, and sand has high electrical resistivity and high hydraulic conductivity.Equation (1) was used to compute the resistivity within each cell of the geological system from the corresponding cell hydraulic conductivity.Using a perfect relationship to generate resistivity from hydraulic conductivity must be characterized as the ideal case because in this case electrical resistivity data can be expected to provide maximal information about hydraulic conductivity.In practice, when possible, estimation of hydraulic conductivity from electrical resistivity is usually based on a site-specific noisy linear log-log relationship (see, e.g., Mazáč et al., 1985;Revil and Cathles, 1999;Purvance and Andricevic, 2000;Slater, 2007), which has been found to be a positive relationship in some cases (Urish, 1981;Frohlich and Kelly, 1985), and a negative relationship in other cases (Worthington, 1975;Heigold et al., 1979;Biella et al., 1983).

Geophysical data set
It is assumed that measurements of the geophysical system are conducted at 77 uniformly distributed locations within the domain (Fig. 2) using a ground-based time-domain electromagnetic system (TEM).It is assumed that the TEM system uses a receiver loop centered inside a 40 × 40 m 2 square transmitter loop.Measurements are gathered from about 10 to 10 ms using a steady current of 20 Ampere, which gives a magnetic moment of 32 000 Am 2 that, for the studied environment, would provide a penetration depth of around 250 m (Danielsen et al., 2003).For this system the electromagnetic field is propagating downwards and outwards like smoke rings increasing with depth at an angle of approximately 30 • (West and Macnae, 1991).In other words, the sounding loses resolution with depth because of its increasing footprint.In the following, we use the AarhusInv 1-D simulation code (previously called em1Dinv; Auken et al., 2014) to simulate the geophysical responses.To mimic the loss of resolution with layer depth, we use the logarithmic average resistivity of all model cells inside the radius of the footprint at a given depth.To obtain the geophysical data set, the simulated data were contaminated with noise consisting of (i) Gaussian 3 % noise contribution and (ii) "background" contribution with a value of 3 nV m −2 according to the noise model suggested by Auken et al. (2008).The noise-perturbed data were subsequently processed as field data (Auken et al., 2009).

Model construction and parameterization (Step 4)
The groundwater model uses the true boundary conditions except that recharge is to be estimated together with hydraulic conductivity.Because the reference groundwater and geophysical systems were generated with correlation between hydraulic conductivity and electrical resistivity, the hydraulic conductivity is parameterized by placing pilot points in each of the 20 layers at the locations where a geophysical sounding has been made.However, pilot points are excluded at depths of the impermeable bedrock.The number of pilot points used for hydraulic conductivity therefore totals 550 (Fig. 2).Kriging is used for spatial interpolation (here using the correct correlation lengths) from the pilot points to the model grid.This kind of parameterization creates a smooth transition in hydraulic conductivity that may seem problematic to use in the current case where there are "categorical" (lithological) shifts in the reference fields.However, because the property contrasts between categories are so large and the geophysical data and the pilot points so many, it is expected that the categorical shifts in property value can be fairly well resolved by the used interpolation.
Recharge is parameterized by assuming a linear log-log relationship between recharge and hydraulic conductivity of the uppermost layer.The two shape factors of the log-log relationship are chosen as parameters to be estimated; they are assumed to be constant within the model domain.The total number of parameters for estimating recharge from hydraulic conductivity is thus two.
Electrical resistivity is sensitive to porosity, but that is not incorporated in the relationship within the present work.Therefore porosity cannot be estimated from the hydrological and geophysical data available here, we always use the reference porosity field for making model predictions.A geophysical model is set up for every location of the 77 TEM soundings.Each geophysical model is parameterized to have a fixed number of layers equal to 1 plus the number of groundwater model layers above the bedrock; the layers above the bedrock all have a fixed 10 m thickness, while the bedrock is assumed to be of infinite thickness.The estimated parameters of the model are the resistivity within each model layer.The total number of parameters for the 77 geophysical models is thus 627.The model responses were simulated using AarhusInv neglecting lateral heterogeneity.In other words, the inverse model is 1-D, following the state of practice (Viezzoli et al., 2010a;Auken et al., 2014).

Model calibration by inversion (Step 5)
Calibrations of geophysical and groundwater models are conducted independently.However, for our demonstration problem, we want to explore the amount of "hydraulic" information contained within the geophysical data set.We will do this by applying three different calibration methods.

Three calibration methods
Method 1 estimates groundwater model parameters on the basis of hydrologic data only (HI).This estimation involves constrained minimization of the misfit between modelsimulated responses and the equivalent observation data.This misfit is quantified by the measurement objective function where h obs and h sim are observed and corresponding simulated hydraulic heads; r obs and r sim are observed and corresponding simulated river discharge; σ h and σ r are the noise levels (standard deviations) for the head and discharge data, respectively; n h and n r are the number of head, discharge observations, respectively.However, Eq. ( 2) cannot be minimized uniquely because the number of groundwater model parameters ( 552) is larger than the number of measurements (36).Method 1 therefore relies on minimization of the regularized objective function where φ t is the total objective function, φ m is the measurement objective function given by Eq. ( 2), µ is a weight factor, and φ r is a Tikhonov regularization term.Here, φ r is defined as preferred difference regularization, where the preferred difference between neighboring parameter values is set to zero.The regularization weight factor, µ, is iteratively calculated, based on a linearized model approximation, during each optimization iteration making φ m equal to a user specified target value (for details, see Doherty, 2010).In this case, for φ m defined by Eq. ( 2), the target value is set to 2 (indicating that the fitted data residuals correspond to the data noise levels).
Method 2 is joint hydrogeophysical inversion (JHI).For JHI, the hydrological model and the geophysical models are set up separately, but hydrological and geophysical parameters (hydraulic conductivity and electrical resistivity at the pilot points) are estimated simultaneously by minimization of a joint objective function where the regularization term uses an assumed relationship between electrical resistivity and hydrological conductivity (Herckenrath et al., 2013a).The minimized objective function is of the same form as Eq. ( 3), but the measurement and regularization terms are different.For Method 2 the measurement objective function is defined as where n TEM are of TEM observations, respectively.The first two terms on the right hand side of Eq. ( 4) are identical to the terms in Eq. ( 2).The values of V obs and V sim are observed and corresponding simulated decay data from TEM.Finally, σ tem is the noise level for the TEM data.Each of the three terms on the right hand side of Eq. ( 4) is divided by the number of respective measurements to promote a balanced weight among the three data sets.The regularized objective term for the joint approach is also preferred differences, now defined as r,joint = µ • n kpar i=1 log 10 K joint,i − log 10 K mf,i 2 . (5) In Eq. ( 5), K mf,i is the estimate of the hydraulic conductivity at the ith pilot point of the groundwater model; K joint,i is also an estimate of hydraulic conductivity but inferred from the estimated electrical resistivity at the same depth and location by using Eq. ( 1).In this case, the target value of φ m,joint is set equal to 3. Method 3 is sequential parameter estimation (SHI) modified from by Dam and Christensen (2003).First, the geophysical model parameters (electrical resistivities) are estimated on the basis of the geophysical data.Secondly, the groundwater model parameters are estimated on basis of the hydrologic data as well as the resistivity estimates that are used as regularizing prior information on the hydraulic conductivity.In the first step, the geophysical inversion is done as "smooth model" inversion (Constable et al., 1987).This means that each geophysical model has fixed 10 m layer thicknesses, while the resistivity within the layers is estimated.The 77 1-D models are inverted independently using AarhusInv (Auken et al., 2014), but vertical constraints were used to stabilize the inversion of each 1-D model (Constable et al., 1987).In the second step, the estimated electrical resistivities are used to constrain the subsequent hydrologic inversion, which is carried out as minimization of Eq. ( 3) where the measurement objective function φ m is defined by

N. K. Christensen et al.: Testing alternative uses of electromagnetic data
Eq. ( 2), while the preferred difference regularization term is defined by As in Eq. ( 5), K mf,i is the hydraulic conductivity at the ith pilot point of the groundwater model; K seq,i is the hydraulic conductivity at the pilot point calculated from the corresponding resistivity, estimated in the first step of Method 3, by using Eq. ( 1).In Eq. ( 6), ω i is a weight that can be varied between the terms of the summation in Eq. ( 6).The results presented in the following were obtained by using a weight of 1.0 for all preferred differences.We also tried using weights determined as is the variance of the log-resistivity estimate used to compute log 10 K seq,i .Changing the weights did not change the estimation results much; some prediction errors did decrease, others increased, but we found no general improvement by using ω i = (β 2 V (ρ i )) −1 instead of the simple choice ω i = 1.For the studied case the problem is that the hydrological data have low sensitivity to the hydraulic conductivity distribution within the deepest part of the buried valley, which is also where the resistivity estimates are most uncertain.Therefore, weighting in Eq. ( 6) will have very little influence on the hydrological inversion result for the deepest part of the valley; this mainly depends on the choice of initial parameter values.Similarity is seen by a comparison of JHI-H and JHI-G results in Fig. 5. Choice of weights can be important in other cases, for example, that of Beaujean et al. (2014).For method 3, the target value of φ m is set equal to 2. For all three methods, the objective function is minimized iteratively by the modified Gauss-Newton method.This involves recalculation of the sensitivity matrix for each iteration, which is time consuming due to the large number of model parameters.

Initial parameter values
We did the following to investigate how much the choice of initial parameter values influences the parameter estimates obtained by the three inversion approaches.
For method 1 (HI), we ran two inversions.In the first run, termed HI-T, we used the reference (true) hydraulic conductivity values at each pilot point as initial values.We acknowledge that this is not a realistic occurrence, but it is done as a control to show the supposedly best possible outcome of HI.In the second run, termed HI-H, we assumed a homogeneous initial hydraulic conductivity field with K equal to 1 × 10 −6 m s −1 , which is equal to the true mean value of silt.
For method 2 (JHI), we ran three inversions.In the first run, termed JHI-T, we used the reference (true) parameter values for hydraulic conductivity and electrical resistivity at the pilot points.As above this is done to show the supposedly best possible outcome of JHI.In the second run, termed JHI-H, we used a constant hydraulic conductivity of 1×10 −6 m s −1 and a constant electrical resistivity of 40 ohmm at the pilot points.In the third run, termed JHI-G, we first ran independent geophysical inversions (one for each sounding location) using a homogeneous half space of 40 ohm-m as the starting model.The resulting estimates of electrical resistivity were subsequently used as initial parameter values for JHI-G at the resistivity pilot points, and they were used together with relation (1) to produce the JHI-G initial values of hydraulic conductivity at the hydraulic conductivity pilot points.
For method 3 (SHI), we present results from only one inversion sequence, termed SHI-G.First we ran the independent geophysical inversions using a homogeneous half space of 40 ohm-m as the initial model.Subsequently we used the estimated resistivities together with relation (1) to produce the initial values for hydraulic conductivity at the pilot points that were used for the hydrologic inversion carried out in step two of SHI-G.(As for JHI, we also tried using an initially homogeneous hydraulic conductivity field for SHI; this gave a more blurred estimated hydraulic conductivity field than what is presented later.So, like is shown later for JHI, the result of SHI was found to depend on the choice of initial parameter values.)

Inversion software
The objective functions were minimized using BeoPEST, a version of PEST (Doherty, 2010) that allows the inversion to run in parallel using multiple cores and computers.We used a new version of BeoPEST modified by John Doherty particularly for our purpose to do gradient-based minimization involving several models with each of their parameters; thus, the modified BeoPEST exploits the fact that different parts of the sensitivity matrix can be calculated by running just one of the models.However, for method 3, the geophysical data were inverted using AarhusInv (Auken et al., 2014).

Picking 10 realizations
For this demonstration, the computational burden would be overwhelming if the entire HYTEB analysis was to be carried out for each of the 1000 system realizations.We therefore sought a way to reduce the number of models to just 10 that would maintain a representative diversity of models.The strategy we used to down sample from 1000 realizations to 10 was as follows.
We first decided to group the models based on the predictions of interest.It would be reasonable to group models based on other characteristics, such as underlying conceptual model, or zonation, or imposed boundary conditions.However, we contend that for both practical and scientific applications, it is more often the predictions of models that are of primary interest than the structure or parameterization of the models.We began by creating an ensemble from the 25 predictions of interest listed in Table 3 over all 1000 realizations.We then used k-means clustering to group the prediction sets into 10 clusters within this prediction space.Because the units of the predictions varied, all predictions were whitened, or normalized, before clustering.For stability, we ran 1000 repetitions of the clustering to minimize the effects of initial cluster selection.Once the clusters were defined, we identified the prediction set that was closest to the centroids.This resulted in ten models that broadly represent the range of model behaviors, including both the range of each prediction and the correlations among predictions.

Model calibration
Figure 4 shows the measurement objective function value, m , obtained for the various groundwater model calibration cases and for the 10 different system realizations.It also shows the separate terms of the objective function.We aimed at using weights that would make each term contribute by a value of approximately 1.0.For HI and SHI there are two terms, quantifying fit to head data and fit to the flux measurement, respectively; the results in Fig. 4 show that the head data are fitted as intended, while the flux measurement is fit-ted more closely than intended.This fitting picture is also seen for JHI.JHI tends to produce a better fit to the hydrologic data than HI and SHI.
For JHI the objective function (4) has a third term quantifying the fit to decay data of the TEM measurements.Figure 4 indicates that the actually used weighting for JHI ended by producing a slightly better fit to the hydrologic data than to the TEM data.It also shows that for JHI the fit to the hydrologic data is not strongly dependent on the choice of initial parameter values; JHI-T for example did not always produce better fits than JHI-G or JHI-H.That JHI-T, JHI-G, and JHI-H lead to different fits (and different parameter estimates) shows that the JHI minimization problem is nonunique.
For two realizations HI-T produced much a worse fit to the hydrologic data than HI-H (Fig. 4): the HI-T minimization got stuck at a local minimum where a parameter adjustment improving the fit to head deteriorated the fit to the flux measurement.We did not investigate whether PEST parameters could have been set differently to overcome this problem.

Estimated hydraulic conductivity fields
Figure 5 shows the reference hydraulic conductivity fields of the uppermost six layers and a representative cross section for one of the 10 chosen system realizations.It also shows  the corresponding estimated hydraulic conductivity fields obtained by six different inversion runs.The figure can thus be used to visually compare the estimated hydraulic conductivity fields and to judge whether they resolve the structures of the reference system.Figure 6 shows corresponding pilot-point-by-pilot-point scatterplots of reference versus es-timated hydraulic conductivity.Except when noted specifically, the results in Figs. 5 and 6 for this realization are typical for all 10 chosen system realizations.
The second and third rows of Fig. 5 show results for the two hydrologic inversion (HI) runs.Inversion HI-T, which used true (reference) parameter values as initial values, produces very blurred hydraulic conductivity fields.This is caused by the used Tikhonov regularization constraint that guides the inversion to estimate a field as smoothly as possible while still fitting the calibration data.The estimated field for layer 1 has some structural similarity to the reference field, but the estimated values vary much less than the reference values.Similar results are seen for layers 2 to 5, while structure has disappeared from the deeper layers representing the deposits in the buried valley.Similar results were achieved for three other realizations.For the remaining six realizations HI-T produced very blurred hydraulic conductivity fields for all model layers, having essentially no resemblance to the structure of the reference fields.The third row of Fig. 5 illustrates that for inversion HI-H, which used homogeneous initial hydraulic conductivity fields, there is almost no structural similarity between the estimated and reference hydraulic conductivity fields, and for most layers the estimated field appears to be almost homogeneous.However, the cross sections show that the structure with high hydraulic conductivity in the bottom of the buried valley is resolved to some degree by both HI-T and HI-H. Figure 6 shows that both HI-T and HI-H underestimate hydraulic conductivities for high-permeability deposits (sand and gravel) but overesti-mate them for low-permeability deposits (silt and clay).For HI-H, the range of estimated conductivities is the same for high-permeability and low-permeability deposits.For HI-T, there is a small difference between the two ranges -they are slightly shifted in the correct directions compared to HI-H.
The fourth row of Fig. 5 shows hydraulic conductivity fields estimated by the sequential geophysical approach (SHI-G).For the upper layers, the true (reference) structures can be recognized, but the resolution decreases with depth.The cross section shows that the true structures of the upper five layers can be identified to some degree from the estimated fields.Because of loss of resolution, the structures cannot be identified inside the buried valley.Figure 6 shows that for low-permeability deposits, the range of estimated log-hydraulic conductivities is twice as large as the reference range of values, and the horizontal scatter around the identity line is considerable.For high-permeability deposits, the range of estimated values is much larger than the range of reference values, and the estimated values tend to be orders of magnitude too small (Fig. 6).This happens because the resistivities estimated from the TEM data in the first step of the SHI scheme often turn out to be too small if the resistivity at depth is high.This is a well-known result from the fact that the sensitivity of TEM data with respect to layers of high resistivity reduces with depth, which causes problems of equivalence for the geophysical models.(This has been demonstrated and discussed by Auken et al. (2008) for a similar type of geological system.)When resistivity estimates that are too small are used to regularize the second hydrologic inversion step of the SHI scheme, the hydraulic conductivity estimates are likely to be too small as well.Similarly, hydraulic conductivity estimates are too high in some high-resistivity parts of the shallow layers (Fig. 6) because the resistivity estimated from TEM tends to be too high due to low sensitivity of the TEM data.For the studied system, this shows that resistivities estimated by independent TEM data inversion must be used with caution as estimators of hydraulic conductivity or as regularization means for subsequent hydrological inversion.In this case, the absolute relationship between hydraulic conductivity and reference electrical resistivity led to an over-reliance on the use of inferred resistivities to populate the model's hydraulic conductivity field.
The last three rows of Fig. 5 show hydraulic conductivity fields estimated by the three joint hydrogeophysical inversion runs (JHI-T, JHI-H and JHI-G), respectively.JHI-T, which used true (reference) parameter values as initial values, resolves the true structures of the upper five layers well, while the estimated field of layer six is blurred; the cross section shows that the true structures within the buried valley are also resolved to some degree.Figure 6 shows that estimated versus reference hydraulic conductivity values plot nicely along the identity line for JHI-T.The resolution of structures (Fig. 5) and the quality of the K estimates (Fig. 6) deteriorate for JHI-H and JHI-G, both of which use less informative initial parameter values.Figure 5 visually indicates that JHI-G resolves structures better than JHI-H.Figure 6 shows that estimated hydraulic conductivity for sand and gravel tends to be much too small for both JHI-G and JHI-H (the explanation of which is similar to that given for SHI above), and that particularly JHI-H cannot resolve variations in hydraulic conductivity within the buried valley: the estimated values vary only within roughly an order of magnitude, whereas the reference values vary within 5 orders of magnitude.

Prediction results
For each of the ten chosen geological realizations, each of the six calibrated groundwater models was used to make the model predictions described in Sect.3.2.2. Figure 7 shows five examples of scatterplots of reference predictions versus calibrated model predictions; each plot shows ten points, each of which corresponds to a particular geological realization selected by the clustering.Each plot also gives the mean error of the prediction (ME) calculated from the ten model predictions.The five predictions represented in Fig. 7 are head in the capping layer at location 1, head recovery at location 1, head recovery within the deepest part of the buried valley at location 8 near the pumping well (Fig. 2), groundwater discharge to the river after pumping has stopped, and recharge area of the pumping well.
Figure 8 shows the mean absolute relative error (MARE) for the 25 model predictions made by models calibrated with six inversion approaches.The relative error magnitudes are calculated as the absolute value of the difference between the reference and model predicted value for each prediction of interest averaged over the ten geological realizations considered.The prediction results are discussed individually below.

Head prediction
All calibrated groundwater models appear to be fairly good predictors of hydraulic head.Nearly unbiased head prediction is exemplified by the plots in the first column of Fig. 7 for which the points scatter around the identity line.This indicates that all calibrated models make unbiased prediction of hydraulic head at location 1.However, the scatter around the identity line appears to be larger for HI calibrated models than for JHI calibrated models.This indicates that the use of geophysical data in JHI reduces the uncertainty of this head prediction as compared to the HI calibrated models.The scatterplots for the other head predictions are similar to those shown for location 1.
Figure 8 shows that for all head predictions except at location 2, the use of geophysical data with SHI-G, JHI-H and JHI-G reduces the prediction error when compared to the HIbased predictions.It also shows that the relative error magnitude is smaller for head predictions than for most other prediction types.Only change in discharge prediction has a relative error magnitude comparable to the head predictions.The small relative head prediction errors are likely due to the fact that this type of prediction is similar to the head data used for model calibration.Only the location differs between prediction and calibration heads.

Head recovery prediction
Head recovery due to cessation of pumping is a type of prediction that turns out to be biased for all calibrated models.This is exemplified by the results shown in the second and third columns of Fig. 7.The two plots at the top of the second column indicate that head recovery at location 1 tends to be underpredicted by the models calibrated by purely hydrologic inversion (HI-T and HI-H).The third plot in this column indicates that SHI-G slightly reduces the bias seen in the HI-based model.Finally, the last three plots in the sec-ond column of Fig. 7 show that all the models calibrated by JHI appear to be better predictors for this head recovery than the HI-and SHI-G-based models.The quality of this model prediction appears to be unaffected by the choice of initial parameter values used for JHI.However, for JHI the points tend to scatter around a line with an intercept less than zero and a slope larger than unity.The former indicates consistent bias in the prediction, probably due to consistent errors in null space parameter components omitted from the parameterized groundwater model; the latter probably indicates parameter surrogacy incurred through model calibration (see Sect. 2.7).The appearances of scatterplots for head recovery at locations 2 to 7 are similar to that for recovery at location 1 (Fig. 7).
The second plot in the third column of Fig. 7 indicate that head recovery at location 8 within the deeper part of the buried valley is predicted fairly well for nine out of ten geological realizations when the model is calibrated by hydrologic inversion (HI-H); however, the nine points tend to fall slightly below the identity line, while the tenth point falls far above the identity line.Generally, the plots indicate a consistent underprediction of head using HI-based inversion.The remaining plots in the third column show that recovery prediction at location 8 turns out to be too large for the models calibrated with geophysical data (JHI) or by using geophysics-based regularization (SHI).
Figure 8 shows that for recovery predictions 1 to 7, the use of geophysical data with SHI-G, JHI-H and JHI-G reduces the prediction error when compared to the HI-based predictions.For recovery 1, this is confirmed by the scatterplots in column two of Fig. 7. On the contrary, for recovery prediction 8, located within the buried valley, both Figs.7 and 8 show that including the geophysics in the groundwater modeling with either SHI-G, JHI-H or JHI-G tends to increase the prediction error as compared to HI-H and HI-T.Depending on the choice of initial parameter values, a similar result is seen for recovery predictions 9 and 10. (An explanation for this predictive degradation is given above, in the end of the first paragraph of this subsection.)It is finally noted that recovery prediction 2 benefits from use of geophysical data, while head prediction at the same location does not, and that the relative error magnitude is larger for recovery predictions than for head predictions.This is likely because head recovery depends on a different stress situation than that represented by the head calibration data.

Discharge prediction
The scatterplots in the fourth column of Fig. 7 indicate that discharge to the river without pumping is overpredicted except for the HI-T and JHI-T based models.Furthermore, this is a type of model prediction that is not improved by including geophysical data by the SHI or JHI inversions used here (compare for example the HI-H plot with the JHI-G plot).This is confirmed by the relative error magnitudes for discharge shown in Fig. 8.

Recharge area and other particle tracking predictions
The plots in the fifth column of Fig. 7 are for the recharge area prediction.Except for JHI-T and JHI-G, the points in all plots appear to fall along an almost vertical line; the scatter along the vertical axis is much longer than the scatter along the horizontal axis, indicating that all of these models are a poor, highly biased predictor of the pumping well's recharge area.Including TEM data in the model calibration only improves this model prediction for JHI-T and JHI-G.Further analysis shows that at least part of the reason for the poor prediction is that the estimated areal average recharge for the model domain in all cases is too low.Lower estimated recharge rates requires a larger predicted recharge area to balance the rate of water pumped from the pumping well.
For the JHI-T models, the estimated areal recharge amounts to about two-thirds of the actual average recharge.For the JHI-H models the estimated recharge tends to be less than half (for one model realization as low as one-third) of the actual area.The estimated areal recharge for the other models is between the JHI-T and JHI-H estimates.It should be mentioned that all calibrated models sufficiently fit the river discharge measurement; the underestimated recharge means that the simulated discharge to the lake turns out to be too small (typically less than half of the actual discharge to the lake; for one calibrated model there is almost no simulated lake discharge).
It is finally mentioned that the scatterplots look similar to those in column 5 of Fig. 7 for the prediction of the average age of groundwater pumped from the well and for the prediction of particle travel time.The explanation for these poor predictive performances must be because the calibrated hydraulic conductivity field is too smooth (and the hydraulic connectivity therefore exaggerated) in the calibrated models.Figure 8 shows that use of TEM data does not improve the model performance with respect to prediction of groundwater age and particle travel time.
It is intuitively clear that geophysics can offer valuable information for improved groundwater modeling, but for an actual investigation it is often unclear how, at what cost, and to what extent modeling can be improved by adding geophysical data.This paper presents a newly developed platform that allows for such an application-and method-specific examination of the potential value of using geophysical data and models to develop a groundwater model and improve its predictive power.We call the platform HYdrogeophysical TEst-Bench (HYTEB).HYTEB allows for treatment of hydrologic and geophysical data and inversion approaches.It can be used to examine the combined use of hydrologic and geophysical data, including model parameterization, inversion, and the use of multiple geophysical or other data types.It can also be used to discover potential errors that can be introduced through petrophysical models used to correlate geophysical and hydrologic parameters.We use HYTEB to work with rather complex, fairly realistic but synthetic systems.In this work we strive at (and recommend) balancing complexity with the advantage of knowing the "true" system or condition to assess model/data performance, and at avoiding to overextend the likely value of data or models beyond the tested conditions.
Our recommended way of using HYTEB is demonstrated by synthesizing a hydrogeological environment that is typical to parts of northern Europe and North America, consisting of various types of glacial deposits covering low-permeability (in practice impermeable) bedrock of Tertiary clay, which has a surface with the form of a plateau with a deep valley buried by the glacial deposits.HYTEB is used to investigate to what extent groundwater model calibration and, often more importantly, model predictions can be improved for this kind of setting by including in the calibration process electrical resistivity estimates obtained from TEM data in two different ways: by using either sequential hydrogeophysical inversion (SHI) or joint hydrogeophysical inversion (JHI).For simplicity we assumed that the resistivity correlates with hydraulic conductivity and that the relationship is constant and known.The results are compared to those obtained by a groundwater model calibrated by purely hydrologic inversion (HI).
The calibrated groundwater models are parameterized by many pilot points that should allow a reasonable resolution of the hydraulic and geophysical property fields at depths where the properties are resolved by the data.Using PEST (Doherty, 2010), Tikhonov or geophysical regularization is used to stabilize the HI, SHI, and JHI inversion problems.In this case, JHI tends to produce the best fit to the data, while SHI and HI produce comparable fits.
For HI, the estimated hydraulic conductivity field turns out to be very smooth in the top layers and almost homogeneous in the deeper layers, which is expected for this type of (Tikhonov) regularization.For SHI and JHI, the estimated hydraulic conductivity field resolves much of the true struc-tures in the shallow layers, while less or, in the deeper part, no structure is resolved inside the buried valley.However, the estimated hydraulic conductivities are orders of magnitude wrong in some parts of the model.This occurs because the resistivities estimated for the geophysical models either in the first step of the SHI scheme or during the JHI scheme can turn out to be very erroneous when the sensitivity of the TEM data with respect to resistivity is low.Such uncertain, potentially very erroneous, resistivity estimates should be discarded (or filtered out) from the SHI or JHI.By not doing this, we showed that resistivities estimated by SHI or JHI must be used with caution as estimators of hydraulic conductivity or as regularization means for subsequent hydrological inversion; the use of the absolute relationship between hydraulic conductivity and electrical resistivity led to an overreliance on the use of inferred resistivities to populate the model's hydraulic conductivity field.
With respect to reducing model prediction error, it depends on the type of prediction whether it has value to include geophysics in the model calibration.It was found that all models are good predictors of hydraulic head.However, head prediction errors tend to be reduced for models calibrated by SHI or JHI as compared to models calibrated by HI.
When the stress situation is changed from that of the hydrologic calibration data, then all models make biased predictions of head change.Use of geophysical data or models (with JHI or SHI) reduces error and bias of head prediction at shallow depth but not in the deep part of the buried valley near the pumping well (where the stress field changes the most).Analyzing the prediction results by the method described by Doherty and Christensen (2011) indicates that geophysics helps to reduce parameter null space as well as parameter surrogacy for parameters determining the shallow part of the hydraulic conductivity field.In hindsight, this is obvious since the TEM method better resolves the shallow variations in glacial deposits' resistivity than the variations inside the deep buried valley.
For model prediction of change in discharge to the stream, there is no improvement in using geophysics.HI-based prediction results are comparable to SHI-and JHI-based results.
All models are very poor predictors of the pumping well's recharge area and groundwater age.The reason for this is that distributed recharge is here estimated during the model calibration together with distributed hydraulic conductivity.Recharge is parameterized by assuming a linear log-log relationship between recharge and hydraulic conductivity of the upper model layer; two shape factors of the relationship are treated as parameters that are calibrated together with the pilot point parameters for hydraulic conductivity and (for JHI) resistivity.It was assumed that the shape factors could be estimated because stream discharge data were included in the calibration data set.All models fit these data, but the estimated areal recharge turned out to be two-thirds or less of the actual areal recharge.The predicted recharge area of the pumping well and the predicted age of the pumped water therefore turn out to be much too large.So another important insight from this study is that recharge should be parameterized and estimated in a different way than was done in the demonstration example.Alternatively HYTEB could be used to consider adding other types of data to better constrain recharge rates.

Figure 1 .
Figure 1.Workflow of the HYTEB.Each numbered dashed box marks a major step in the workflow.In parts 1 and 5 the red, yellow, blue and green colors indicate different categories (types) of geological deposits; color variation within each category (in part 6) indicates variation in hydraulic conductivity.

Figure 2 .
Figure 2. A map of locations of boreholes, a pumping well, geophysical data, pilot points, predictions of interest and location of a geological cross section.(The positions of the pilot points and geophysical measurements are coincident.)

Figure 3 .
Figure 3. Hydraulic conductivity field for one of the model realizations.(Red shades are for gravel, yellow for sand, green for silt, and cyan/blue for clay.)

Figure 4 .
Figure 4. Measurement objective function value obtained for the various groundwater model calibration cases and for the 10 different system realizations.The two dashed lines in the top plot indicate the target value for the various model calibrations: the upper dashed line is the target value for the JHI, and the lower dashed line is the target value for HI and SHI.The dashed line in the three lower plots similarly marks the respective target value.

Figure 5 .
Figure 5. Reference and estimated hydraulic conductivity fields for model realization number 189: (a) shows the fields for layers 1 to 6; (b) shows the field along an east-west cross section in the middle of the domain.

Figure 6 .
Figure 6.Pilot-point-by-pilot-point scatterplot of reference versus estimated hydraulic conductivity for the six inversion runs.Black dots are estimated parameter values from the capping part of the model, while the red dots are estimated parameter values within the buried valley.

Figure 7 .
Figure 7. Scatterplots of calibrated model prediction versus reference prediction for five predictions (explained in the body text).Each plot shows results from ten system realizations.

Figure 8 .
Figure 8. Mean absolute relative prediction error calculated from the ten geological realization results.The symbol type indicates the inversion approach and the symbol color indicates the initial parameter values used when calibrating the groundwater model.Red labels at x axis highlight prediction errors that are reduced by using TEM data and TEM models for groundwater model calibration.
, if the plotted data do not scatter around the identity line, it indicates bias in the model prediction.If the intercept of a regression line through the scatter of points deviates from zero, it indicates consistent bias in the prediction due to consistent errors in null space parameter components omitted from the parameterized groundwater model; if the slope of the regression line deviates from unity, it indicates parameter surrogacy incurred through model calibration (seeDoherty and Christensen (

Table 2 .
Location and screened layer of boreholes with head measurements for model calibration.

Table 3 .
Different types of model predictions with and without a pumping well.

Table 4 .
Head and head recovery prediction points and screened layer.