Performance evaluation of groundwater model hydrostratigraphy from airborne electromagnetic data and lithological borehole logs

. Large-scale hydrological models are important de-cision support tools in water resources management. The largest source of uncertainty in such models is the hydrostratigraphic model. Geometry and conﬁguration of hydrogeological units are often poorly determined from hydrogeological data alone. Due to sparse sampling in space, lithological borehole logs may overlook structures that are important for groundwater ﬂow at larger scales. Good spatial coverage along with high spatial resolution makes airborne electromagnetic (AEM) data valuable for the structural input to large-scale groundwater models. We present a novel method to automatically integrate large AEM data sets and lithological information into large-scale hydrological models. Clay-fraction maps are produced by translating geophysical resistivity into clay-fraction values using lithological borehole information. Voxel models of electrical resistivity and clay fraction are classiﬁed into hydrostratigraphic zones using k means clustering. Hydraulic conductivity values of the zones are estimated by hydrological calibration using hydraulic head and stream discharge observations. The method is applied to a Danish case study. Benchmarking hydrological performance by comparison of performance statistics from comparable hydrological models, the cluster model performed competitively. Calibrations of 11 hydrostratigraphic cluster models with 1–11 hydraulic conductivity zones showed improved hydrological performance with an increasing number of clusters. Beyond the 5-cluster model hydrological performance did not improve. Due to reproducibility and pos-sibility of method standardization and automation, we believe that hydrostratigraphic model generation with the proposed method has important prospects for groundwater models used in water resources management.


Introduction
Large-scale distributed hydrological and groundwater models are used extensively for water resources management and research. We use large scale to refer to models in the scale of 100 to 1000 km 2 or larger. Examples are water resources management in water-scarce regions (Gräbe et al., 2012;Laronne Ben-Itzhak and Gvirtzman, 2005), groundwater depletion (Scanlon et al., 2012), contamination (Li and Merchant, 2013;Mukherjee et al., 2007), agricultural impacts on hydrogeological systems (Rossman and Zlotnik, 2013), and well-capture zone delineation (Moutsopoulos et al., 2007;Selle et al., 2013).
Such models are typically distributed, highly parameterized, and depend on data availability to sufficiently represent the modeled systems. Model parameterization includes, for example, the saturated and unsaturated zone hydraulic properties, land use distribution and properties, and stream bed configuration and properties. Hydrological forcing data such as precipitation and temperature are also required. Parameters are estimated through calibration, which requires hydrological observation data commonly in the form of groundwater hydraulic heads and stream discharges. Calibration data should be temporally and spatially representative for the modeled system, and so should validation data sets.
One of the main challenges in modeling large-scale hydrogeological systems is data scarcity (Refsgaard et al., 2010;Zhou et al., 2014). Uncertainty inherent in distributed hydrological models is well known (Beven, 1989). Incorrect system representation due to lack of data contributes to this uncertainty, but the most important source of uncertainty in distributed groundwater models is incorrect representation of geological structures Seifert et al., 2012;Zhou et al., 2014). In this paper, we refer to a 3-D subsurface model that delineates the structure of the hydraulic conductivity (K) field as a hydrostratigraphic model.
Lithological borehole logs are the fundamental data source for constructing hydrostratigraphic models. The modeling process is often cognitive, but two-point geostatistical (He et al., 2013;Strebelle, 2002) and multiple-point statistical (e.g. Park et al., 2013) methods are also used. Geostatistical methods have the advantage of uncertainty estimation. Spatially inconsistent sampling pattern and scarcity make lithological borehole logs alone insufficient to capture local-scale geological structures relevant for simulation of groundwater flow and contaminant transport. Cognitive methods have the advantage of using information from geological maps to assist interpretation of larger scale geological features.
Airborne electromagnetic (AEM) data are unique with respect to good spatial coverage and high resolution. AEM is the only technique that can provide subsurface information with a resolution down to ∼ 25 m in the horizontal and ∼ 5 m in the vertical at regional scales . Geological structures and heterogeneity, which spatially scarce borehole lithology data may overlook, are well resolved in AEM data. Geophysical data and especially AEM data are commonly used to support lithological borehole information in geological mapping and modeling (Bosch et al., 2009;Burschil et al., 2012;Høyer et al., 2011;Jørgensen et al., 2010;Jørgensen et al., 2013;Steinmetz et al., 2014). Furthermore, multiple-point statistical methods are applied to invert geophysical data, where a priori geological information is incorporated through training images (e.g. Caers and Hoffman, 2006;Lange et al., 2012;Lochbuhler et al., 2015). Although uncertainty of the estimated structures is available from the inversion, multiple-point statistical methods are applied at scales smaller than large-scale hydrological models. He et al. (2014) used transition probabilities (two-point statistics) to integrate AEM data with borehole lithological data.
Current practice for cognitive hydrostratigraphic and geological model generation faces a number of challenges: structures that control groundwater flow may be overlooked in the manual 3-D modeling process; geological models are subjective, and different geological models may result in very different hydrological predictions; structural uncertainty inherent in the model building process cannot be quantified. Currently there is no standardized way of integrating highresolution AEM into hydrogeological models. Sequential, joint and coupled hydrogeophysical inversion methods, as defined by Ferré et al. (2009), have been developed and used extensively in hydrological and groundwater research. In sequential inversion, hydrological and geophysical models and inversions are set up and performed separately (e.g. Binley et al., 2001;Kemna et al., 2002). In joint inversion, hydrological and geophysical models are set up separately but hydrological and geophysical parameters are estimated simultaneously through a joint objective function (e.g. Hyndman and Gorelick, 1996;Hyndman et al., 1994;Linde et al., 2006;Vilhelmsen et al., 2014). In coupled inversion only one model is set up, the hydrological and the geophysical data are evaluated by comparison to translated simulated hydrological states (e.g. Hinnell et al., 2010;Kowalsky et al., 2005). The methods have been applied to capture hydrological processes or estimate aquifer properties and structures from geophysical data. Hydrogeophysical inversion addresses hydrogeological property estimation or delineation of hydrogeological structures. In the context of large-scale groundwater models studies, Dam and Christensen (2003) and Herckenrath et al. (2013) translated between hydraulic conductivity and electrical resistivity to estimate hydraulic conductivity parameters of the subsurface in a joint hydrogeophysical inversion framework. Petrophysical relationships, however, are uncertain, partly because of unknown physical relationship between geophysical and hydrological parameter space. The relationship may vary within and/or between field sites depending on given conditions and cannot be determined a priori. For electrical resistivity versus hydraulic conductivity, relationships suggesting both positive and negative correlation have been found (Purvance and Andricevic, 2000). Herckenrath et al. (2013) concluded that sequential hydrogeophysical inversion was preferred over joint hydrogeophysical inversion due to the uncertainty associated with the petrophysical relationship. Structural inversions are often performed as purely geophysical inversions, where subsurface structures (that mimic geological or hydrogeological features) are favored during inversion by choosing appropriate regularization terms. An example is the layered and laterally constrained inversion developed by Auken and Christiansen (2004), which respects vertically sharp and laterally smooth boundaries found in sedimentary geology. Joint geophysical inversions have been used extensively to delineate subsurface hydrogeological structures under the assumption that multiple geophysical data sets carry information about the same structural features of the subsurface (Christiansen et al., 2007;Gallardo, 2003;Haber and Oldenburg, 1997) but examples of successful joint hydrogeophysical inversion at larger scales are rare.
As a response to lack of global petrophysical relationships, clustering algorithms as an extension to structural inversion methods have been applied in geophysics (Bedrosian et al., 2007;Doetsch et al., 2010). Fuzzy c-means and kmeans clustering algorithms have been used with sequential inversion schemes (Paasche et al., 2006;Triantafilis and (Di Giuseppe et al., 2014;Paasche and Tronicke, 2007). These studies have focused on the structural information contained in geophysical information, and hydrogeological or geological parameters of the subsurface are assumed uniform within the delineated zones. This approach corresponds well with the common practice in groundwater modeling where degrees of freedom of the subsurface are reduced by zoning the subsurface.
We present an objective and semi-automatic method to model large-scale hydrostratigraphy from geophysical resistivity and lithological data. The method is a novel sequential hydrogeophysical inversion for integration of AEM data into the hydrological modeling process. Hydrostratigraphic structures and parameters are determined sequentially by geophysical/lithological and hydrological data, respectively.
As shown in Fig. 1, the 3-D subsurface zonation is completed in two parts: (1) a hydrostratigraphic cluster modeling part, and (2) a hydrological modeling part. In part 1 the hydrostratigraphic structures are delineated (see Fig. 2c) through k-means cluster analysis on resistivity data (see Fig. 2a) and clay-fraction values (see Fig. 2b). To obtain clay-fraction values, resistivity data are translated into clayfraction values by inverting for the parameters of a spatially variable translator function (this is the petrophysical relationship) . The cluster analysis is performed on the principal components of normalized resistivity data and clay-fraction values. In part 2 the K of each zone in the hydrostratigraphic cluster model is estimated in a hydrological model calibration using observations of hydraulic head and stream discharge. The zones identified in the cluster analysis are assumed to have uniform hydrogeological properties, and thus form the hydrostratigraphic model.
The method is applied to a Danish case study, for which details and results are presented in the following sections.

Study area
The Norsminde study area is located on the eastern coast of Jutland, Denmark, and covers a land surface area of 154 km 2 . Figure 3 shows a map of the area delineating the study area boundary, streams, and hydrological data. An overview of the geophysical and lithological data can be found in Foged et al. (2014). Within 5-7 km from the sea, the land is flat and rises only to 5-10 m a.s.l. (above sea level). Further to the west, the land ascends into an up-folded end moraine at elevations between 50 and 100 m a.s.l. The town of Odder with approximately 20 000 inhabitants is located at the edge of the flat terrain in the middle of the model domain.
Palaeogene, Neogene, and Quaternary deposits characterize the area. The Palaeogene deposits are thick clays, and define the lower geological boundary. Neogene marine clays  interbedded with alluvial sands overlay the Palaeogene deposits in the elevated northern and western parts of the model domain. Quaternary deposits are glacial meltwater sediments and tills found throughout the domain. The west-east striking Boulstrup tunnel valley (2 km by 14 km) incises the Palaeogene clay in the south (Jørgensen and Sandersen, 2006). The unconsolidated fill materials are meltwater sand and gravel, clay tills, and water-laid silt/clay. Groundwater is abstracted for the drinking water supply, mainly from tunnel valley deposits and the elevated southwestern part of the domain. The groundwater resource is abstracted from 66 abstraction wells, with a total production of 18 000-26 000 m 3 yr −1 , excluding smaller private wells. Maximum annual abstraction from one well is 12 400 m 3 yr −1 . Actual pumping variation among the 66 wells and inter-annual variation of pumping rates are unknown. Abstraction is planned locally by water works and only information about permissible annual rates has been obtained for this study.
Groundwater hydraulic heads are available from 132 wells at various depths; see Fig. 3 for the spatial distribution. Hydraulic head data are collected from the Danish national geological and hydrological database Jupiter (GEUS, n.d.).
Average annual precipitation is 840 mm yr −1 for the years 1990-2011. Most of the area is tile drained. The catchment is drained by a network of 24 streams; the main stream is gauged at the three stations 270035, 270002, and 270003 (see Fig. 3). Streams vary from ditch-like channels to meter wide streams. Low and high flows, respectively, are on the order of 0.05-0.5 and 0.5-5 m 3 s −1 . Daily stream discharge data are available from three gauging stations. Discharges are calculated from mean daily water   translated with QH curves, which are available from approximately monthly discharge measurements. Time-domain electromagnetic (EM) data collected through ground and airborne surveys are available for most of the study area. The AEM survey covers 2000 line kilometers, equivalent to 106 770 1-D models and was carried out with the SkyTEM 101 system . Lithological information is available at approximately 700 boreholes. The borehole descriptions are from the Dan-ish Jupiter database (GEUS, n.d.) and the level of detail and quality varies from detailed lithological description at 1 m intervals to more simple sand, clay, till descriptions at layer interfaces. A thorough description of EM data collection and processing and lithological borehole information can be found in Foged et al. (2014).

Hydrostratigraphic model
Geophysical and lithological data are used to zone the subsurface. Geophysical data consist of resistivity values determined from the inversion of airborne and ground-based electromagnetic data. Lithological information is represented in clay-fraction values determined through inversion within the clay-fraction concept (CF concept). Zonation is performed in 3-D.
The CF concept is formulated as a least-squares inversion problem to determine the parameters of a petrophysical relationship (in the inversion this is the forward model) that translates geophysical resistivities into clay-fraction values. The concept is described in detail in Foged et al. (2014) and Christiansen et al. (2014), and only a brief introduction is given here. The inversion minimizes the difference between observed clay fraction as determined from borehole lithological logs (in the inversion this is the data) and translated clay fraction as determined from geophysical resistivity values (in the inversion this is the forward data). Clay fraction expresses relative accumulated thickness of clay material over an interval. In this context clay refers to material described as clay in lithological logs, and not clay minerals. Clay definitions in-Hydrol. Earth Syst. Sci., 19, 3875-3890, 2015 www.hydrol-earth-syst-sci.net/19/3875/2015/ clude, among others, clay till, marl clay, mica clay, and silty clay. In the CF inversion, the translator function is a heuristic two-parameter function defined on a regular 3-D grid that is constrained vertically and horizontally. Discretization is 1000 m in the horizontal and 4 m in the vertical. The translator function is a scaled inverse error function (see Eq. (1) and Fig. 4).
m low and m up are the model parameters of the translator function, W (ρ), that translates resistivity, ρ, into clay fraction. K scales the error function so that W (ρ) equal 0.025 and 0.975 for resistivity values equal to m low and m up , respectively (see Fig. 4). The parameters of the translator function vary throughout the 3-D grid. The objective function, with a data misfit term and vertical and horizontal regularization term, is minimized iteratively. The regularization constraint is a measure of weighted-squared difference between m low and m up at neighboring grid nodes, where the weighting is the regularization constraint. The final parameters of the translator function translate geophysical resistivity values into CF values. An experimental semi-variogram is estimated from the simulated CF values, and 2-D block kriging is used to obtain a 3-D CF model. The resolution difference between lithological borehole data and AEM data is discussed in . Delineation of subsurface structures is performed as a kmeans cluster analysis on geophysical resistivities and clayfraction values. Information contained in clay-fraction values is to some extent duplicated in the geophysical resistivity values. Heterogeneity captured in the resistivity data, however, is simplified in the translation to clay fraction; for example, till and Palaeogene clay have, respectively, medium and low resistivity values, while the clay fraction for both materials is 1.
k-means clustering is a well-known cluster analysis that finds groups in multivariate data based on a measure of similarity between cluster members (Wu, 2012). Similarity is defined as the minimum of squared Euclidean distances between each cluster member and cluster centroid, summed over all cluster members. The number of clusters that the data are divided into is defined by the user. We use the k-means analysis implementation in MATLAB R2013a, which uses a two-phase search, batch, and sequential, to minimize the risk of reaching a local minimum.
Because clay-fraction values are correlated with geophysical resistivities, k-means clustering is performed on principal components (PCs) of the original variables. Principal components analysis (PCA) is an orthogonal transformation based on data variances (Hotelling, 1933). PCA thus finds uncorrelated linear combinations of original data while obtaining maximum variance of the linear combinations (Härdle and Simar, 2012). The uncorrelated PCs are a useful representation of the original variables as input to a k-means cluster analysis. Original variables must be weighted and scaled prior to PCA, as PCA is scale sensitive, and the lack of explicit physical meaning of the PCs makes weighting difficult. Clay-fraction values are unchanged as they range between 0 and 1. The normalized resistivity values are calculated as ρ norm = log ρ−log ρ min log ρ max −log ρ min . Where ρ min and ρ max is minimum and maximum resistivity values, respectively.
Eleven hydrostratigraphic cluster models consisting of 1-11 zones are set up and calibrated.

Hydrological model
Hydrological data are used to parameterize the structures of the hydrostratigraphic model. Stream discharges and groundwater hydraulic heads are used as observation data in the hydrological calibration.
The hydrological model is set up using MIKE SHE (Abbott et al., 1986;Graham and Butts, 2005), which is a physically based hydrological model code simulating evapotranspiration, the unsaturated zone, overland flow, and saturated flow, while stream discharge is simulated by coupling with the MIKE 11 routing model code.

Hydrological model parameterization
The model has a horizontal discretization of 100 m × 100 m, and a vertical discretization of 5 m following topography. The uppermost layer is 10 m thick for numerical stability, which is not expected to negatively impact river discharge as this is largely controlled by drainage. Because the model represents a catchment, all land boundaries are defined as no-flow boundary conditions following topographical highs. Constant head boundary conditions are defined for sea boundaries, and the model domain extends 500 m into the sea. Model grid cells 10 m below the Palaeogene clay surface have been de-activated, due to the computational burden.
The unsaturated zone and evapotranspiration (ET) are modeled using the two-layer water balance method developed to represent recharge and ET to/from the groundwater in shallow aquifer systems (Yan and Smith, 1994). The reference evapotranspiration is calculated using Makkink's formula (Makkink, 1957). Soil water characteristics of the five soil types and the associated 250 m grid product are developed and described by Borgesen and Schaap (2005) and Greve et al. (2007), respectively. Land use data are obtained from the DK-model2009, for which root-depth-dependent vegetation types were developed .
Stream discharge is routed using the kinematic wave equation. The stream network is modified from the DK-model2009  by adding additional calculation points and cross sections. Groundwater interaction with streams is simulated using a conductance parameter between aquifer and stream. Overland flow is simulated using the Saint-Venant equations (DHI, 2012, 267-281). Manning number and overland storage depth is 5 m 1/3 s −1 and 10 mm, respectively. Drainage parameters, drain time constant (s −1 ) and drain depth (m) are uniform in space and time. Parameterization of spatial variable drain time constant relies on direct drainage flow measurements, and Hansen et al. (2013) found little variability in the estimated time constants and no justification for a spatial variability judging from eight hydrological performance criteria. Drain depth is 1 m below terrain.
Saturated flow is modeled as anisotropic Darcy flow, xy to z anisotropy being restricted to the orientation of the computational model grid (DHI, 2012). A vertical anisotropy of 1/10 is assumed. The saturated zone is parameterized with the cluster models. The lower boundary of the saturated zone is defined by the surface of the Palaeogene clay, available in 100 m grid, and has a fixed horizontal K of 10 −10 m s −1 . Specific yield and specific storage are fixed at 0.15 and 5 × 10 −5 m −1 for the entire domain.

Hydrological model calibration
Forward models are run from 1990 to 2003; the years 1990-1994 serve as warm up period (this was found sufficient to obtain stable conditions); the calibration period is from 2000 to 2003 and the validation period is from 1995 to 1999.
Composite-scaled sensitivities (Hill and Tiedeman, 2007) were calculated based on local sensitivity analyses. these sensitivities. The lower panel in Fig. 5 shows subsurface parameters for the 5-cluster model. The following parameters are a part of the model calibration: -The root-depth scaling factor, which was found sensitive (see Fig. 5, top panel). Because root-depth values vary inter-annually and between crop types, root-depth sensitivity was determined by a root-depth scaling factor, which scales all root-depth values.
-The drain time constant. Especially considering discharge observations, the model shows sensitivity towards this parameter. Stream hydrograph peaks are controlled by the drainage time constant (Stisen et al., 2011;Vazquez et al., 2008).
-The river leakage coefficient.
-The horizontal hydraulic conductivities of all zones of the 11 hydrostratigraphic cluster models. Figure 5 shows sensitivity to K of the zones of the 5-cluster model. K of the zones is unknown; hence all K values have been calibrated. Vertical K values are tied to horizontal K with an anisotropy factor of 10. Initial horizontal K values are 10 −4 , 10 −6 , or 10 −8 m s −1 depending on the mean clay-fraction value of a zone. Calibration is performed using the Marquardt-Levenberg local search optimization implemented in the parameter estimation software, PEST (Doherty, 2005). Observations are 632 hydraulic heads from 132 well filters and daily stream discharge time series from three gauging stations (see Fig. 3). Observation variances are estimated, and, in the absence of information, observation errors were assumed to be uncorrelated. Objective functions for head and discharge have been scaled to balance contributions to the total objective function.
The aggregated objective function, , shown in Eq.
(2) is the sum of the scaled objective function for head and discharge. The subjective weight, w s , was determined through trial and error by starting numerous calibration runs; w s was chosen to be 0.8.
Hydraulic head observation errors are determined according to the guidelines following Henriksen et al. (2003). They suggest an error budget approach that accounts for contributions from (1) the measurement (e.g. with dip meter), (2) inaccuracy in vertical referencing of wells, (3) interpolation between computational nodes to observation well location, and (4) heterogeneity that is not represented in the lumped computational grid. The total error expresses the expected uncertainty between observation and corresponding simulation. The approach for estimating these uncertainties can be found in Appendix A. Total errors amount to 0.95, 1.4, and 2.2 m. Uncertainty of stream discharges is mainly due to translation from water stages to discharge (daily mean discharges). Uncertainties originate from infrequent calibration of rating curve, ice forming on streams, and especially stream bank vegetation (Raaschou, 1991). Errors can be as large as 50 %. Blicher (1991) estimated errors of 5 and 10 % on the water stage measurement and rating curve, respectively. In cases of very low streamflows (1 L s −1 ), Christensen et al. (1998) assigned a standard deviation of 200 % while flow of 50 and 5-10 L s −1 are assigned standard deviations of 5 and 25 %, respectively. We have assigned an error of 20 % to all stream discharge observations.

Results and discussion
First, we show results for the hydrological performance of 11 hydrostratigraphic cluster models consisting of 1-11 zones. Second, details of the cluster analysis for the case of a 5-cluster hydrostratigraphy are shown. Finally, the cluster model hydrological performance is benchmarked with comparable hydrological models.   Figure 6 shows the weighted root mean square error (RMSE) of model performances for a hydrostratigraphic cluster model consisting of 1 to 11 zones, head and discharge, respectively, is shown in Fig. 6a and b. The 1-cluster model is a homogeneous representation of the subsurface resulting in a uniform K field. The 1-cluster model represents a situation where we have no information about the subsurface. Increasing the number of clusters to represent the subsurface successively adds more information from geophysical and lithological data to the calibration problem. The weights used to calculate weighted RMSE are the same weights as used in Eq.

Calibration and validation of hydrological model
(2). Head and discharge contribute by approximately twothirds and one-third of the total objective function. From the 1-cluster to the 2-cluster model, weighted RMSE for discharge is reduced by more than a factor 2. No significant improvement of the fit to discharge data is observed for more than 2 clusters. Fit to head data improve almost by a factor of 2 from the 1-cluster to the 2-cluster model. Improvement of the fit to head data continues up to the 5-cluster representation of the subsurface. Improvements are a factor of 3 from the 1-cluster to the 5-cluster model. Beyond the 5-cluster model, the fit to head observations stagnates. The 7-cluster and 9-cluster hydrostratigraphic models perform worse than the 3-cluster model. The 8-, 10-, and 11-cluster models obtain an equally good or better fit to head data compared to the 5-cluster model.
The blue lines in Fig. 6 illustrate mean standard deviation on log(K) values of the cluster models based on the postcalibration standard deviation of log(K) for each K zone. Beyond the 4-and 5-cluster models, the precision of the estimated K values decrease. The mean standard deviations on log(K) for the 4-and 5-cluster models are 0.12 and 0.15. The corresponding widths of the 95 % confidence intervals are between 15 and 90 % of the estimated K value for 3 out of 4 zones and 3 out of 5 zones, respectively. Beyond the 5-cluster model, mean standard deviations on log(K) are be- tween 0.17 and 0.27, and corresponding width of the 95 % confidence intervals are largely above 100 % for all but two zones.
With the combined information from weighted RMSE values and standard deviation on log(K) we are able to address over-parameterization. The results indicate that we obtain good fit to observations without over-parameterization with a 3-to 5-cluster hydrostratigraphic model.
In this paper, we have discussed the performance of the cluster models as a measure of fit to hydraulic head and stream discharge observations. Hydrological models are typically used to predict transport, groundwater age, and capture zones, which are sensitive to geological features. It is likely that the optimal number of clusters is different for these applications. An analysis, as is presented here for head and discharge, for predictive application is more difficult because observations are often unavailable.
The hydrostratigraphic models are constructed under the assumption that subsurface structures governing groundwater flow can be captured by structural information contained in clay-fraction values (derived from lithological borehole data) and geophysical resistivity values. If this is true, an asymptotic improvement of the data fit would be expected for increasing cluster numbers. However, as shown in Fig. 6, this is not strictly the case: weighted RMSE of the 7-cluster and 9-cluster models is higher than weighted RMSE of the 3-cluster, 6-cluster, and 8-cluster models. The likely explanation is that the increasing number of clusters not only corresponds to pure cluster sub-division but also to relocation of cluster interfaces in the 3-D model space. We expect the difference in hydrological performance to be due to changes in interface configuration. It is well known that an unsupervised k-means clustering algorithm does not result in a unique solution, due to choice of initial (and unknown) cluster centroids. We have sampled the solution spaces (200 samples) of the eleven cluster models. Clustering the principal components of geophysical resistivity data and clay-fraction values into 1 to 5 clusters gives unique solutions. Clustering the principal components of geophysical resistivity data and clay-fraction values into 6 to 11 clusters results in three or more solutions. However, the non-unique solutions have different objective functions (squared Euclidean distance between points and centroids). In all cases the cluster model with the lowest objective function was chosen as the best solution. Figure 7 shows RMSE and mean errors for calibration and validation periods for all eleven cluster models. Data used to calculate the statistics are a temporally split sample from 35 wells, which have observations both in the calibration and validation period, and the discharge is for stations 270002 and 270003.
The cluster models perform similarly in the periods 2000-2003 and 1995-1999. With respect to RMSE, Fig. 7a, for head the validation period is approximately 10 % worse than the calibration period. RMSE for discharge (Fig. 7b) is lower in the validation, approximately one-third of the calibration values. Mean errors for head (Fig. 7c) are lower and higher, respectively. The hydrological models analyzed in this study generally under-simulate the average discharge. Figure 8 presents histograms of clay-fraction values and resistivity values and how the values are represented in the five clusters, which was chosen to be the optimal number. Counts are shown as percentages of the total number of pixels in the domain. The histograms in Fig. 8 show that the clay fraction attribute separates high resistivity/low clay fraction (sandy sediments) from other high-resistivity portions of Hydrol. Earth Syst. Sci., 19, 3875-3890, 2015 www.hydrol-earth-syst-sci.net/19/3875/2015/ the domain, while the resistivity attribute separates low resistivity/high clay fraction (clayey sediments) from other high clay-fraction portions. High resistivity/low clay-fraction values are represented by clusters 1, 3, and 4, and low resistivity/high clay fraction are represented by clusters 2 and 5 (see Fig. 8a). Figure 9 shows the data cloud that forms the basis of the clustering. The data cloud is binned into 300 bins in each dimension and the color of the cloud shows the binwise data density. We see that cluster boundaries appear as straight lines in the attribute space. Values with a low resistivity and corresponding high clay fraction, mainly clusters 2 and 5, populate more than half of the domain. Clay is expected to dominate this part of the domain. The results of the cluster analysis are presented with respect to geophysical resistivity and clay-fraction values, while the cluster analysis is performed on the PC of geophysical resistivity and clay-fraction values. The first PC explains the information where the two original variables, log resistivity and clay fraction, are inversely correlated. This corresponds to the situation where a clay fraction of 1 coincides with a low resistivity value, and vice versa for clay-fraction values of 0 and high resistivities. This is the information that we expect, i.e. our understanding of how geophysical resistivities relate to lithological information as represented by the translator function (Eq. 1) (defined under the assumption that variation in geophysical resistivities with respect to lithological information depends on the presence of clay materials). Thus, the first principal component is the "clay" information in the geophysical resistivities. The second PC is less straight forward to interpret. Ideally, the second PC represents the data pairs where the resistivity response is not dominated or explained by lithological clay material. This might reflect a situation where a low resistivity value -and its associated low clay-fraction value -is a result of a sandy material with a high pore-water electrical conductivity due to elevated dissolved ion concentrations. The second PC can also be a result of the CF conceptualization. Clay till, categorized as clay in the CF inversion, can have electrical resistivities up to 60 m (Jørgensen et al., 2005;Sandersen et al., 2009), which will yield a high clay fraction coinciding with a relatively high geophysical resistivity. Electromagnetic methods are sensitive to the electrical resistivity of the formation, which is commonly dominated by clay-mineral content, dissolved ions in the pore water and saturation. Groundwater quality data are available at numerous sites in the domain. Pore-water electrical conductivity (EC) values were gathered from the coast and inland following the Boulstrup tunnel valley. From the coast to 12 km inland values are stable around 50-70 mS m −1 at 28 wells with varying filter depths. Four outliers with EC ranging between 120 and 250 mS m −1 were identified at various locations and depths. No trend due to salinity from the coast was identified. In theory, variations in formation electrical resistivity that are not due to lithological changes will implicitly be taken into account by spatial variation of the translator function in the CF inversion. If there is a region in the modeled domain where the electromagnetic signal, as well as the resulting resistivity value, is affected by pore-water salinity (low resistivity value is due to salinity and not clay content) and there is available borehole information, the parameters of the translator function will adjust to obtain lower values in order to translate a low resistivity value to a low clay-fraction value. Table 1 shows RMSE and mean error (ME) for head and discharge based on the 5-cluster model. Weighted RMSE for discharge is below 1, indicating that discharge is over fitted. The standard deviation of discharge is 20 % of the observation, which is a conservative definition. As presented in the methods, section errors may vary between 5 and 50 %. The 1995-1999 hydrograph and scatter plot in Fig. 10 for the 270002 gauging station show good fit to data. Peak and low flows are fitted, but baseflow recession is generally not  Stisen et al. (2011) 3.9 1.2 500 m 3500 km 2 Mean of calibration using seven different calibration setups Seifert et al. (2012) 3.03-6.34 −1.17-0.605 200 m 465 km 2 Min and max of calibration of six different geological models He et al. (2015) 4.85 -100 m 101 km 2 Mean using borehole-based geology Madsen (2003) 1.08 0.19 -440 km 2 Balanced Pareto optimum With weighted RMSE for head of 1.63 and 1.85 the model is almost 2 SD (standard deviations) from fitting head data. Assuming head observation error estimates are correct, this indicates model deficiencies such as structural errors and/or forcing data errors. Figure 12a-b show distributed head results. Generally hydraulic head in the tunnel valley is disconnected from the elevated terrain (Fig. 12a), and groundwater overall flows towards the sea. Figure 12b shows errors (obs-sim) between observed and simulated heads for 1995-1999. The largest errors are found in the southeastern part of the domain, where discharge station 270003, with the worst fit, is located (see Fig. 10, top row panels).

Benchmarking hydrological performance
We have compared the hydrological performance of the Norsminde model based on the 5-cluster hydrostratigraphic model with similar Danish hydrological models. We have chosen Danish models due to comparability with respect to data density and quality, and hydrostratigraphy. The model performances are compared based on RMSE and ME of simulated heads; see Table 2, as these statistics are reported in the studies. The horizontal discretization of the models is 100, 200, and 500 m, and the models cover between 202 and 3500 km 2 . We can see that the 5-cluster model is comparable with the other models.

Advantages and limitations
We have presented a method for automatic generation of hydrostratigraphic models from AEM and lithological data for groundwater model applications. Other automatic methods of integrating AEM data into geological models are geostatistical methods presented by, for example, Gunnink et al. (2012), using artificial neural networks, or He et al. (2014), using transition probabilities.
The risk of misinterpretation of AEM data, due to effects of saturation, water quality, depth and material dependent resolution, and vertical shielding, are higher with an automatic approach compared to a cognitive approach, as these effects may be identified by a geologist during the modeling process. AEM data can be integrated into geological models using cognitive methods, for example, as presented by Jørgensen et al. (2013), who provide an insightful discussion of the pros and cons of automatic versus cognitive geological modeling from AEM data.
Geological knowledge, which can be incorporated into cognitive geological models (Royse, 2010;Scharling et al., 2009;Sharpe et al., 2007), cannot be included in automatically generated models. Geological knowledge may identify continuity/discontinuity of geological layers, or discriminate between materials based on stratigraphy or depositional environment. For regional-scale groundwater flow, characteri-zation of sedimentation patterns and sequences may not be relevant, but at smaller scales this information is valuable for transport modeling.
The hydrostratigraphic cluster model presented in this paper does not represent a lithological model, but has the advantage of incorporating close to all the structural information contained in the large AEM data sets in a fast and well-documented way. This is not possible in practice for cognitive methods due to spatial complexity and the large amount of AEM data. For hydrological applications hydrostratigraphic model uncertainty, and the resulting hydrological prediction uncertainty, has great value. We believe that the cluster model approach presented in this paper can be extended to address structural uncertainty and its impact on hydrological predictions. Cognitive geological model uncertainty is difficult to quantify.
The CF model is to some degree influenced by smoothing resulting from the AEM data inversion and CF inversion, and the finial kriging of CF values to a regular grid. Smoothing effects causing resistivity transition zones are inconsistent with our understanding of geological interfaces. In future studies different geophysical inversion schemes will be compared to evaluate the effect of smoothing on the final cluster model. This work will partly evaluate how the smooth transition zones impact hydrological results. We expect the geological interfaces to lie in the transition zones, but the exact location is unknown. We will address this problem by generating several cluster models that identify zonal divides at different locations in the transition zones. Hereby hydrological uncertainty as a result of the transition zones may also be assessed.

Conclusions
We have presented an automated workflow to parameterize and calibrate a large-scale hydrological model based on AEM and borehole data. The result is a competitive hydrological model that performs adequately compared to similar hydrological models. From geophysical resistivity data and clay-fraction values, we delineate hydrostratigraphic zones, whose hydrological properties are estimated in a hydrological model calibration. The method allows for semi-automatic generation of reproducible hydrostratigraphic models. Reproducibility is naturally inherent as the method is data driven and thus, to a large extent, also objective.
The number of zones in the hydrostratigraphic model must be determined as part of the cluster analysis. We have proposed that hydrological data, through hydrological calibration and validation, guide this choice. Based on fit to head and discharge observation and calibration parameter standard deviations, results indicate that the 3-and 5-cluster models give the optimal performance. Distributed groundwater models are used globally to manage groundwater resources. Today large-scale AEM data sets are acquired for mapping groundwater resources on a routine basis around the globe. There is a lack of knowledge on how to incorporate the results of these surveys into groundwater models. We believe the proposed method has the potential to solve this problem.
Quantitative estimates of the different error sources are to a large extent based on data from the Danish Jupiter database. Head measurements are typically carried out with a dip meter, and occasionally pressure transducers are used. Information about which measurement technique has been used for the individual observations is not clear from the Jupiter database. It is assumed that dip meters have been used and σ meas has been determined to be 0.05 m for all observations. Well elevations are referenced using different techniques. The elevation can be determined from a 1 : 25 000 topographic map, by leveling or by differential GPS. The inaccuracies for using topographic maps and DGPS measurements are on the order of, respectively, 1-2 m and centimeters. The Jupiter database can have information about the referencing techniques, but this information is rarely supplied. An implicit information source is the number of decimal places the elevations have in the database. Elevation information is supplied with 0, 1, or 2 decimal places. For the wells where the reference technique is available (checked for cases with topographic map and DGPS only) the decimal places reflect accuracy of the referencing technique used. From this information decimal places of 0, 1, and 2 have been associated with σ elev of 2, 1, and 0.1 m, respectively.
Errors due to interpolation depend on horizontal discretization of the hydrological model and the hydraulic gradient. Sonnenborg and Henriksen (2005, chapter 12) suggested it be estimated as σ int = 0.5 · x · J , where x is horizontal discretization and J is hydraulic gradient. The model domain has been divided into three groups for which the error from interpolation has been calculated. The three areas are geologically different: north is glacial tectonically deformed, the west has similar Miocene and glacial meltwater sediments, and the Palaeogene tunnel valley. Hydraulic gradients of the Miocene glacial west and the Palaeogene tunnel valley are between 0.001 and 0.002. The Miocene glacial area and the Palaeogene tunnel valley areas were thus considered as one with a σ int of 0.07 m. The glacial tectonic area has an estimated hydraulic gradient of 0.01 and thus associated with a σ int of 0.6 m.
Within-cell (hydrological model grid) heterogeneity affecting the hydraulic head was estimated using data from eight wells that are located within the same hydrological model grid. Temporally coinciding head observations from the period 2001 and 2002 were used. The error is evaluated as the standard deviation of a linear plane fitted through the observed heads at the eight boreholes. This has been done for three dates, which gives a mean σ hetereo of 0.53 m.
σ unknown was set to 0.5 m.