Observed groundwater head and soil moisture profiles are assimilated into an integrated hydrological model. The study uses the ensemble transform Kalman filter (ETKF) data assimilation method with the MIKE SHE hydrological model code. The method was firstly tested on synthetic data in a catchment of less complexity (the Karup catchment in Denmark), and later implemented using data from real observations in a larger and more complex catchment (the Ahlergaarde catchment in Denmark). In the Karup model, several experiments were designed with respect to different observation types, ensemble sizes and localization schemes, to investigate the assimilation performance. The results showed the necessity of using localization, especially when assimilating both groundwater head and soil moisture. The proposed scheme with both distance localization and variable localization was shown to be more robust and provide better results. Using the same assimilation scheme in the Ahlergaarde model, groundwater head and soil moisture were successfully assimilated into the model. The hydrological model with assimilation showed an overall improved performance compared to the model without assimilation.

Integrated hydrological modelling plays an important role in water resources management to develop sustainable environmental and economic schemes. Integrated models offer advantages with respect to incorporating different physically based hydrological processes and providing a consistent prediction of different hydrological variables. Hydrological data assimilation aims to utilize the information embedded in hydrological observations for improving the performance of hydrological models. Data assimilation (DA) has the advantage of exploiting both imperfect models and limited observations, considering uncertainties in both to provide a more accurate prediction.

Groundwater head and soil moisture are two key variables in hydrological modelling of the saturated and unsaturated zones respectively. Several applications of assimilating each variable individually in either groundwater models or land surface models have been reported. For example, Chen and Zhang (2006) presented an application of the ensemble Kalman filter (EnKF) to a groundwater flow model, with updating of both groundwater head and hydraulic conductivity. De Lannoy et al. (2007) applied the EnKF for soil moisture state and bias estimation in a small field using the CLM (Community Land Model). There are also a few studies with assimilation of both groundwater head and soil moisture. For example, Visser et al. (2006) used groundwater head and soil moisture data to re-calibrate the SWAP (Soil, Water, Atmosphere and Plant) model online using a simplified form of Newtonian nudging (NN), and showed superior results compared to offline calibration. Camporese et al. (2009a) used Newtonian nudging and the EnKF to assimilate synthetic observations in a coupled surface–subsurface flow model.

The use of multivariate assimilation in integrated hydrological models
provides great potential to deepen our understanding of the value of
different measurement data. Several studies of multivariate assimilation
applications in integrated hydrological models have been reported. Xie and
Zhang (2010) applied EnKF to the Soil and Water Assessment Tool (SWAT), with
updating of multiple states and parameters including runoff, soil moisture
and evapotranspiration. Camporese et al. (2009b) used EnKF in the CATHY
(CATchment HYdrology) model with coupled surface and subsurface flow, to
assimilate groundwater head and stream discharge. Rasmussen et al. (2015)
assimilated the same variables using the ensemble transform Kalman filter
(ETKF) with the MIKE SHE model. Kurtz et al. (2014) jointly assimilated
groundwater heads and groundwater temperatures with EnKF using both synthetic
and real-world models. Shi et al. (2014) employed EnKF to assimilate
multivariate hydrological states in a small catchment modelled by the
Flux-PIHM land surface model, with a focus on parameter estimation. Lee et
al. (2011) used a variational assimilation approach to assimilate streamflow
and in situ soil moisture, to correct the soil moisture profiles within the
HL-RDHM model. Ridler et al. (2014b) developed a generic DA framework that
enables coupling of hydrological models with the OpenDA library
(

Meanwhile, techniques have been developed for multivariate DA. The most straightforward approach used in integrated models is state augmentation, which is commonly applied with EnKF and its variants, with nearly no additional modifications on algorithms. The observation vector can be extended to accommodate multiple types of observations. Similarly, the state vector can be augmented to include all relevant state variables, and possibly model parameters. The covariance matrix is thereby expanded to a block matrix where each block presents the cross-covariance between variables in the state vector (Montzka et al., 2012). A potential challenge in this respect is that implementing EnKF techniques like localization no longer becomes straightforward. Commonly used localization techniques usually belong to covariance localization (Hamill et al., 2001) or local analysis (Anderson, 2003). When updating a single state variable with corresponding measurements, distance localization is usually used to reduce the impact of long distance sampling errors in the forecast error covariance due to a limited ensemble size. When there is more than one state variable, the degree of localization for each variable needs to be appropriately specified. Another incidental fact in multivariate DA is that the spurious correlation across variables is usually more pronounced, leading to deterioration of the model updating. To overcome this problem, Kang et al. (2011) successfully introduced “variable localization” in addition to distance localization and tested this with the local ensemble transform Kalman filter (LETKF) in a carbon cycle model.

In this study, we systematically investigate the performance of a filter assimilating soil moisture and groundwater head, with respect to the assimilated variable type, localization scheme and ensemble size. The assimilation method is based on the ETKF (Bishop et al., 2001), distance localization using local analysis (Sakov and Bertino, 2010), and variable localization (Kang et al., 2011). The approach is first tested on a catchment of less complexity (the Karup catchment in Denmark) and using synthetically generated data, and later implemented in a larger and more complex catchment (the Ahlergaarde catchment in Denmark) using real data. From the methodology point of view, the novelty of this study is the use of advanced multivariate assimilation methodologies in combination with the application of different localization schemes. From the application point of view, the novelty of this study is to investigate the value of assimilated variables and their impact on other processes through integrated hydrological modelling in a complex catchment using real data.

The paper is organized as follows: the two study areas and the hydrological modelling processes are introduced in Sect. 2; the detailed assimilation methodology is described in Sect. 3; Sect. 4 presents the experimental settings and the assimilation results based on the Karup catchment; Sect. 5 presents the real observations, experimental settings and the results based on the Ahlergaarde catchment; and finally general discussions and conclusions are given in Sect. 6.

Two study areas in Denmark are used in this study. The 440 km

Left: Karup catchment; right: Ahlergaarde catchment. “Obs Q”, “Obs Head” and “Obs SM” represent discharge, groundwater head and soil moisture observations respectively used for assimilation.

The Karup catchment is a well-studied catchment in terms of model
parameterization and model calibration (Refsgaard, 1997; Madsen, 2003; Zhang
et al., 2015). A relatively simple model with a fast computation time was
developed for this catchment to test and verify various DA methods. The
Ahlergaarde catchment is the research catchment of the Danish Hydrological
Observatory (

The MIKE SHE hydrological modelling system is used for developing models for the above two catchments. As a physically based distributed hydrological model, MIKE SHE simulates the major processes in the water cycle, including evapotranspiration, overland flow, unsaturated flow, groundwater flow, river flow and the interactions between them. MIKE SHE also has the flexibility of modelling each process at given spatial and temporal resolutions with different complexity. The complexity can be chosen according to the model purpose and data availability (Graham and Butts, 2005).

In the Karup catchment, the modelling is based on the following process
descriptions: 2-D groundwater flow is assumed and modelled by one
computational layer in the saturated zone, drain flow (pipes/ditches) is
described by a simple conceptual relationship and occurs when the groundwater
table exceeds the drain level, 1-D unsaturated flow is assumed and based on a
simplified gravity-based flow equation, 1-D channel flow is assumed and based
on kinematic routing, 2-D overland flow routing is based on the diffusive
wave approximation of the Saint-Venant equations, and evapotranspiration is
described, including interception, soil evaporation and transpiration by
vegetation (DHI, 2015). The numerical discretization in the horizontal plane
is a

For the Ahlergaarde catchment, the same model components are included as for
the Karup catchment. For computational efficiency, and due to the fact that
the exact irrigation information in terms of both location and amount is not
known, the irrigation module is not activated in the model. The modelling
approaches are the same as for Karup, except that 3-D groundwater flow is
considered with six numerical layers defined according to geological
stratigraphy. Another main difference is that the model uses a smaller grid
size (

The finer model resolution and increased complexity for the Ahlergaarde
catchment increase the simulation time significantly. For example, the
average model time step in the groundwater zone decreases from 7.5

Differences in model resolution and computation time between the two catchments. SZ refers to the saturated zone and UZ to the unsaturated zone; the term “No. of” means “Number of”.

Calibrated and perturbed parameters for the Ahlergaarde catchment. “Value” represents the estimated value, and “Lower” and “Upper” represent the 5 and 95 % confidence intervals respectively. Parameters 1–6 are assumed to be lognormal distributed. Parameters 7–13 are assumed to be normal distributed.

For both catchments, the model parameterization is kept relatively simple yet able to represent the overall spatial patterns of key model parameters. When specifying the parameter values for each property class (e.g. geological unit, vegetation type and soil type), most of the parameters cannot be estimated empirically or directly inferred from data. Thus model calibration is usually required using an optimization algorithm like AUTOCAL (Madsen, 2003) or PEST (Doherty, 2010).

For the Karup model, the most sensitive parameters describing the hydraulic properties of the river, unsaturated zone, saturated zone, and river–aquifer interaction are calibrated using AUTOCAL (Zhang et al., 2015). As calibration data we use 35 biweekly groundwater head observations and daily observations of stream discharge for a 6-year period (1969–1974) (Fig. 1).

The Ahlergaarde model is calibrated using PEST version 11.8 (Doherty, 2010). The data used in the calibration are groundwater head observations (466 in total) scattered over the catchment (not shown in Fig. 1) and river discharge observations from the period of 2006–2009. In most of the groundwater wells only one observation is available for the entire calibration period and only a few wells have time series. Discharge data comprise time series of daily values from five stations (Fig. 1). Similar to the Karup catchment, the most sensitive parameters (7 parameters) are selected for calibration, with 13 parameters tied to those 7 parameters. The calibrated values for those 7 parameters are listed in Table 2 (first 7 parameters) together with the confidence intervals obtained from the inversion process. The remaining parameters in Table 2 are not included for calibration, but are only selected for perturbation, with a detailed explanation given in Sect. 5. The original calibrated model uses a simplified two-layer approach to simulate unsaturated flow and evapotranspiration, where the average soil moisture is calculated for the root zone and the layer below the root zone. In order to assimilate in situ soil moisture data at different depths, the gravity flow module is used as a replacement for the two-layer approach in the unsaturated zone. By doing so, soil moisture can be calculated at different depths. The overall modelling performance in terms of water balance and discharge dynamics becomes marginally reduced compared to the original calibration results.

The assimilation algorithm used in this study is the ETKF, which is a popular variation of the EnKF (Evensen, 2003). Similar to the EnKF, the ETKF is a Monte Carlo implementation of the Kalman filter, which approximates the posterior probability distribution conditioned on a series of observations, and is able to deal with non-linear models. In comparison to the EnKF, the ETKF is a deterministic filter, as it does not require additional observation perturbations. The ETKF was originally introduced by Bishop et al. (2001) and later modified to be unbiased (Wang et al., 2004). As an ensemble-based deterministic filter, it has the advantage of calculating the forecast error covariance efficiently. It is also computationally faster than the ensemble square root filter (EnSRF) (Whitaker and Hamill, 2002).

To develop the DA algorithm, a state–space formulation is needed:

At time

The forecast state distribution can be estimated by a finite number

The forecast error covariance can be written as

In practise,

The MIKE SHE model is coupled with a generic DA library that handles the time propagation and update of the model ensemble based on the ETKF (Ridler et al., 2014b).

In ensemble-based Kalman filter systems, the forecast state and its associated uncertainty are represented by a limited ensemble of realizations. The undersampling can lead to filter inbreeding and spurious correlations in the error covariance matrix, which potentially can lead to filter divergence. Localization is a commonly used technique when applying ensemble-based Kalman filters to overcome this problem. By artificially reducing the impacted spatial domain of observations, the spurious correlation between two remote locations can be avoided. For each element in the state vector, local analysis (LA, Sakov and Bertino, 2010) is used to approximate the state error covariance within the local window. The ensemble anomalies outside this local window will be unchanged during the filter updates. However, LA is usually applied to a single state variable for which certain spatial correlations exist. When the state vector contains two or more variables, specifying the localization degree for each variable is not straightforward. More importantly, correlations between variables are not clear, because physical distances between variables may not exist. Similar to the approach by Kang et al. (2011), we introduced different variable localization schemes based on whether the correction of one variable can impact the update of other variables. In this section, the distance localization will be introduced first, followed by the variable localization.

We formulate the distance-localized ETKF equations with similar notations as
in Sakov and Bertino (2010). A variable with an upper accent “

Therefore, to update the

For groundwater heads, in both catchments, the LA taper function is chosen to
have a radius of 5

Sketch of the localization scheme for soil moisture at a site where
soil moisture is measured at 0–5 and 20–25

Variable localization is an option when assimilating both groundwater head and soil moisture. Variable localization determines whether the information from one variable can be used to update the other. When variable localization is off, no matter the available observation type (groundwater head, soil moisture or both), all observation data are used to update the ensemble mean (Eq. 11) and anomaly (Eq. 13) for both variables. Therefore the correlation between the variables is kept during the assimilation. In addition, if distance localization is applied, the correlation exists in localized domains between variables. When variable localization is applied, each observation type will only be used to update its own type of state variable. Other variables in the state vector will be unchanged during update. If distance localization is applied, state updates are spatially localized within its own type of variable.

Practically, the variable localization can be done by slight modifications to
Eqs. (11)–(15). The taper function is extended to have an “if/else”
statement prior to the existing distance-based taper function, depending on
whether variable localization is chosen or not. Here we explain the process
of updating one element when variable localization is applied. When looping
over the

Observed and simulated water table at well 12 (top panel) and hydrograph at station 20.05 (bottom panel) in the Karup catchment.

In the Karup catchment experiment, the calibrated model described in
Sect. 2.3 is used as the deterministic model. The calibrated model has
relatively good performance in reproducing the observations, with an averaged
root mean square error (RMSE) of around 1

The ensemble is generated by adding an appropriate model error to the deterministic model. Similarly, given the predefined model error, a single random model realization is generated to be the “true” model. Note that the “true” model here is only an assumption of reality. The model error is defined by perturbing both model forcing (precipitation and potential evapotranspiration) and selected model parameters (Zhang et al., 2015). The ensemble runs freely from 1 December 1969 to 1 January 1973 as a warm-up period. During the warm-up period, each ensemble member starts with the same initial condition but has different model trajectories because of different forcing and parameter values. It is important to generate an ensemble with a realistically large spread, so that the model uncertainty can be fully represented by the ensemble.

The synthetic observations to be assimilated are generated from the “true”
model. Given the true realization, by adding measurement errors to observed
model variables at a given time and location, a set of synthetic observations
can be produced. Both groundwater head and soil moisture (depths of 5 and
25

After the warm-up period, the synthetic observations are assimilated over a 1-year period from 1 January 1973 to 1 January 1974. Given the fact that the “true” model is known, the deterministic model can be seen as an imperfect model. With the purpose of combining the imperfect model and the synthetic observations, different experiments are carried out to investigate under which conditions the assimilation results are most similar to the “true” model. These experiments are designed using different observation variables, localization schemes and ensemble sizes. The assimilation performance can be assessed by taking the RMSE between the model simulation and the true state for selected variables over the entire domain at all available time steps. As soil moisture measurements are depth-dependent, RMSE is calculated for each depth (each layer). Here we not only show the results from 5 and 25 cm depths where observations are assimilated, but also at 50 cm depth. In addition, other hydrological responses in the form of evapotranspiration and discharge are evaluated.

Spatially and temporally averaged RMSE of groundwater head and soil moisture at different depths for each univariate assimilation experiment in the Karup catchment. The left axis represents soil moisture and the right axis head.

When a single variable is assimilated (groundwater head or soil moisture),
the state vector only consists of the corresponding observed variable at all
model grid cells. Therefore, the remaining variables will not be changed
directly from the filter. However, as both the groundwater component and
unsaturated zone are fully coupled with surface water and other model
components, the whole model state will be affected from updating a single
variable. Different experiments are carried out using an ensemble size of
60:

NoDA: deterministic model without DA;

DA_H: assimilating head without localization;

DA_HLoc: assimilating head with a horizontal localization radius of
5

DA_SM5: assimilating soil moisture at 5 cm depth without localization;

DA_SM5Loc: assimilating soil moisture at 5 cm depth with localization of a 5 km spatial radius within 1 m depth;

DA_SM5LocSmall: assimilating soil moisture at 5 cm depth with localization of a 3 km spatial radius within 50 cm depth;

DA_SMBoth: assimilating soil moisture at both 5 and 25 cm depths without localization;

DA_SMBothLoc: assimilating soil moisture at both 5 and 25 cm depths with a 5 km spatial radius within 1 m depth.

As the experiment names indicate, H stands for groundwater head and SM stands for soil moisture. Loc indicates that localization is added to the experiment.

Results from the DA experiments are shown in Fig. 4. When head is assimilated
(DA_H), the RMSE for head improves significantly from 0.21 to
0.08

When soil moisture at 5 cm depth is assimilated alone without localization (DA_SM5), the soil moisture profile clearly improves at all three depths. However, for head the performance is almost the same as in the deterministic model. Different localization scales have been tested with assimilating soil moisture at 5 cm depth (DA_SM5Loc and DA_SM5LocSmall). The result indicates that the overall assimilation performance decreases with a smaller localization scale.

When soil moisture at both 5 and 25 cm depths is assimilated (DA_SMBoth
and DA_SMBothLoc), the performances are similar regardless of
localization. Compared to the result from DA_SM5, the soil moisture
estimate improves at 25

As we can see from Fig. 4, univariate assimilation with localization improves
the estimate of the assimilated variable albeit the results are slightly
worse compared to the experiment without localization in the case of
assimilating head or soil moisture at 5

Spatially and temporally averaged RMSE of groundwater head and soil moisture at different depths for each multivariate assimilation experiment in the Karup catchment. The left axis represents soil moisture and the right axis head.

Results from different experiments in the Karup catchment. From top to bottom, the first panel shows the average spatial RMSE of groundwater head, and the second, third and fourth panels are the average spatial RMSE of soil moisture at 5, 25 and 50 cm depths respectively. From left to right, the experiment names are indicated as the horizontal axis label from the bottom panel. For each experiment except NoDA, the results of three ensemble sizes (30, 60 and 90) are represented using different colours as shown in legends.

In this section, several experiments assimilating both groundwater head and
soil moisture are carried out with a focus to test different localization
schemes. The abbreviations D and V indicate distance localization and
variable localization respectively.

DA_HSM: assimilating both head and soil moisture (at both 5 and 25 cm depths) without localization to any variable.

DA_HSMLoc_DV: assimilating both head and soil moisture (at both 5 and 25 cm depths) with variable localization and with distance localization applied to head (same as DA_HLoc) and soil moisture (same as DA_SMBothLoc).

DA_HSMLoc_D: assimilating both head and soil moisture (at both 5 and 25 cm depths) without variable localization, but with distance localization applied to head (same as DA_HLoc) and soil moisture (same as DA_SMBothLoc).

DA_HSMLoc_V: assimilating both head and soil moisture (at both 5 and 25 cm depths) with variable localization, but without distance localization to any variable.

Results from the DA experiments are shown in Fig. 5. When neither distance
localization nor variable localization is used, all observations are used to
update the state in all grid cells for each variable (DA_HSM). In this
case the estimated correlations between groundwater head and soil moisture
are used in the update. The DA results show improved performance for soil
moisture at 5 and 25

In experiment DA_HSMLoc_DV, both distance localization and variable localization are used. Therefore, the state updates are spatially localized for each variable and the correlation between the two variables is neglected. Particularly in this case, when there is only soil moisture observation assimilated, the updates are limited to the upper 1 m soil moisture profile, while no correction is made for head. When both types of observation are assimilated, the corrections are made for each variable using its own error information. We can see from Fig. 5 that the experiment shows an overall improved result.

In experiment DA_HSMLoc_D, distance localization is applied to head and
soil moisture, but variable localization is not included. In this case,
regardless of observation type, the soil moisture is corrected within 1 m
depth together with head. The result from this experiment shows an improved
estimate for soil moisture at 5 and 25

In experiment DA_HSMLoc_V, distance localization is off and variable
localization is applied. This means that the error information from one
variable is used to update the entire domain of its own variable but does not
affect the other variable. The result indicates worse assimilation
performance for soil moisture at 50

As mentioned in Sect. 3.2, localization allows the ensemble filters to work properly with a limited ensemble size. The above experiments are based on an ensemble size of 60, which is determined by balancing both assimilation performance and computational time. Some of the experiments are repeated for ensemble sizes of 30 and 90 respectively to analyse how the assimilation performance and the choice of localization are affected by the ensemble size. The results are shown in Fig. 6.

As can be seen from Fig. 6, in the experiment assimilating head without localization (DA_H), increasing the ensemble size (from 30 to 90) slightly improves the head estimation. However, the performance difference between ensemble sizes of 60 and 90 is small. When localization is used, the performances with all ensemble sizes are very similar (DA_Hloc).

Impact of assimilation on evapotranspiration (ET) (averaged RMSE with respect to the true model of actual evapotranspiration over all 35 soil moisture observation locations during the DA period) and discharge (Nash–Sutcliffe efficiency of discharge at the catchment outlet during the DA period) for each experiment in the Karup catchment.

In the experiment assimilating soil moisture at 5 cm depth without localization (DA_SM5), increasing the ensemble size also improves the soil moisture at deeper depths. This indicates that using only an ensemble size of 30 introduces a spurious correlation between surface soil and deeper soil, which is reduced with larger ensemble sizes. An ensemble size of 30 also leads to a much worse result for groundwater head compared to ensemble sizes of 60 or 90. When localization is used (DA_SM5Loc), the assimilation performance is similar using the three ensemble sizes. Compared to the DA_SM5, there is a large improvement in groundwater head when using an ensemble size of 30.

When both soil moisture (at 5 and 25 cm depths) and head are assimilated
without localization (DA_HSM), the performance is generally improved when
increasing ensemble size. However, increasing the ensemble size to 90 still
leads to a worse performance for soil moisture at 50 cm and groundwater head
compared to the deterministic model. When localization is used
(DA_HSMLoc_DV), the soil moisture at 50

Using an integrated model where the various hydrological processes are coupled, assimilation of head and soil moisture may also affect other model variables. The effects on evapotranspiration and river discharge are examined in this section. For actual evapotranspiration, we calculated average RMSE with respect to the true model of actual evapotranspiration over all 35 soil moisture observation locations during the DA period, and for discharge the performance at the catchment outlet for the entire assimilation period is evaluated using the Nash–Sutcliffe efficiency score. The results are summarized in Table 3.

The differences in RMSE for actual evapotranspiration among all experiments are small. When H is assimilated alone (DA_H and DA_H_Loc), actual evapotranspiration is basically unchanged, while when soil moisture is assimilated, RMSE is marginally reduced compared to the deterministic model.

The performance of discharge is slightly improved by assimilating head (DA_H and DA_H_Loc). The improvement is mainly with respect to low flow, which is underestimated by the deterministic model. This is expected as the baseflow is corrected by updating groundwater levels. When soil moisture is assimilated with localization (DA_SM5Loc and DA_SMBothLoc), the discharge is also slightly better. However, when both variables are assimilated without localization (DA_HSM), the discharge is significantly worse, with unrealistic peak flows during spring. This is a result of the poorer head estimations in the entire domain. When localization is used for soil moisture and groundwater head (DA_HSMLoc_DV), discharge is improved significantly and comparable with the deterministic model. This also demonstrates the necessity of using localization to constrain the spatial updates.

For the Ahlergaarde catchment, we use the calibrated model to simulate a
20-year period from 1990 to 2010 to provide initial conditions for the
experiment used in this study. Starting from 1 January 2010, the experiment
is split into two periods: a warm-up period (1 January 2010 to
1 November 2012) and a DA period (1 January 2012 to 31 December 2013).
Grid-based daily precipitation (10

The deterministic model used in this study, although based on a model calibrated against older data at different sites, has good skills after 2012. The model performance in terms of the hydrograph at the catchment outlet in year 2013 is shown in Fig. 10 (Obs and NoDA in the top panel), with a Nash–Sutcliffe efficiency of 0.67. From the hydrograph, it can be seen that the model underestimates low flows and overestimates peak flows.

Groundwater heads are measured bi-hourly in nine wells (Fig. 1) using Eijkelkamp mini divers. The divers were installed in these wells in November 2012, and thus the length of the time series is limited. Moreover, due to occasional instrument failure, the data coverages are further constrained and vary among the wells. In the groundwater model six numerical layers are defined (layer 1 in the bottom and layer 6 in the top). The nine wells are screened at different depths. Wells M5398, M5637, M5353, and L8008 are screened in layer 5, while wells M5373, M5647, M5844, M5393 and M5366 are screened in layer 4. When comparing in situ head measurements with model-predicted equivalents, large-level differences usually occur due to scale disparities, and are sometimes also accompanied by dynamic differences. Therefore, we calculated the average difference between observations and model simulations, and subtracted this difference from the original data. By doing so, we can avoid introducing observation bias into the assimilation system. An example of the processed observations and the open loop ensemble for well 5737 (1 November 2012 to 31 December 2013) is shown in Fig. 7 (top panel).

Soil moisture is measured at 30 sites across the catchment according to
representative combinations of topography, land cover, and soil type using
Decagon 5TE sensors. The dominant land uses are heath, agriculture and
forest. At each site, sensors are installed at three depths, 2.5, 22.5 and
52.5

Most of the agriculture sites are irrigated in May and June, and the soil moisture is greatly influenced, with several sudden increases during that period. However, in the model irrigation is not considered because detailed information on irrigation at the local sites is not available. Therefore, the sites where irrigation is evident from the soil moisture recordings are excluded for assimilation. In addition, a quality control to correct for systematic biases and to filter out unrealistic values has been carried out for the remaining sites. Although measurements are carried out at three depths at each site, we only use measurements at 2.5 and 22.5 cm depths for assimilation, as the surface/near-surface moisture is of the most importance for the exchange of water and energy between land and the atmosphere. After processing, 18 out of 30 sites are used for assimilation (Fig. 1). As an example, Fig. 7 (middle and bottom panels) shows the processed soil moisture observations and the open loop ensemble at site nw1.1 (1 November 2012 to 31 December 2013).

In addition to groundwater and soil moisture observations, discharge observations are available in the Ahlergaarde catchment at the outlet and at tributaries (right side of Fig. 1). Evapotranspiration data-based eddy covariance measurements are available from a flux station (Voulund station) located in the catchment.

Top: groundwater head at well M5373. Middle: soil moisture at
2.5

Top: groundwater head at well M5373. Middle: soil moisture at
2.5

Similar to the experiment settings in the Karup catchment, the observation
noise for each variable is assumed to be white Gaussian, with a homogeneous
and constant standard deviation of 0.2

With the purpose of assimilating head and soil moisture, different
experiments have been carried out to investigate the assimilation
performance. Considering the large model domain and fine grid, localization
becomes more important here than in the previous example. Distance
localization is added to both variables separately, and variable localization
is used when both variables are assimilated. For groundwater head, we allow
for updates in all layers over the vertical. Horizontally, we use a
localization radius of 5

NoDA: deterministic model without DA;

DA_HLoc: assimilating groundwater head with distance localization;

DA_SMLoc: assimilating soil moisture (at both 2.5 and 22.5 cm depths) with distance localization;

DA_HSMLoc_DV: assimilating both groundwater head and soil moisture (at both 2.5 and 22.5 cm depths) with variable localization and distance localization.

Average RMSE of head and soil moisture (2.5 and 22.5

The assimilation performance is evaluated by comparing the model output with
the observations (18 sites) using the average RMSE over the assimilation
period. The result is summarized in Table 4. In the experiment with
assimilating head only (DA_HLoc), the RMSE of the head decreases from 0.34
to 0.21

Figure 8 shows the assimilated results for the same sites as shown in Fig. 7. Clearly, after 1 November 2012 when the DA period starts, the ensemble mean is approaching the observations, especially for the head and soil moisture at 22.5 cm depth. Although limited observations are assimilated, corrections are made for a large area within the model domain. Figure 9 shows spatial root mean squared differences (RMSD) of soil moisture and head at corresponding observation layers between the assimilation result and the deterministic model, which illustrates the corrections made by DA spatially. For each grid cell, the variables' time series values from the assimilated model and the deterministic model are used to calculate the RMSD.

From Fig. 9, we can clearly see the effect of the assimilation in the model domain. For soil moisture relatively large corrections are made at 22.5 cm depth compared to the surface layer. Compared to groundwater head, however, the soil moisture corrections are more localized. For both soil moisture and groundwater head, most of the large corrections are made at places near the locations of observations. For groundwater head in the western and south-eastern regions where no head observations are available, the corrections are generally small.

In this section, the effect of assimilation on actual evapotranspiration and river discharge is evaluated by comparing model predictions and observations. Figure 10 compares discharge at the catchment outlet and evapotranspiration at the flux station for the different experiments. The flux station is located in the central–northern part of the catchment, with several soil moisture stations around. In both graphs in Fig. 10, only small differences are seen between different simulations. This is further substantiated by the performance measures listed in Table 5.

As shown in Table 5, RMSE for actual evapotranspiration is similar in all three assimilation experiments. There is a small improvement for discharge when head is assimilated (DA_HLoc). The experiment DA_HSMLoc_DV with both variables being assimilated provides better results overall.

Spatial RMSD between assimilated and deterministic models in the Ahlergaarde catchment: soil moisture at 2.5 cm depth (upper left) and 22.5 cm depth (upper right), groundwater head at layer 4 (lower left) and layer 5 (lower right). The observation locations at each layer are marked with violet crosses.

Top: discharge at the Ahlergaarde catchment outlet (station 250082) for each experiment and observed discharge. Bottom: actual evapotranspiration in each experiment and observed evapotranspiration at the observed station (Voulund) at Ahlergaarde catchment.

This study has investigated assimilation of soil moisture and groundwater head in an integrated hydrological model. To the best of our knowledge, this is the first study using an ETKF to assimilate these two variables in an integrated hydrological model. The method considers both distance and variable localization. The proposed method is first explored for a catchment with synthetic data and then applied to a complex model using data from real observations.

The MIKE SHE model is used as the integrated hydrological model throughout this study. In the MIKE SHE model, the saturated and unsaturated zones are explicitly coupled. This is done to optimize modelling time steps used in the unsaturated zone (minutes to hours) and saturated zone (hours to days) respectively. The flux between the unsaturated and saturated zones is calculated by an iterative procedure that conserves mass for the entire column. This means that assimilation of soil moisture may have an effect on groundwater and vice versa through this explicit coupling. However, this study shows relatively weak correlations between surface soil moisture and groundwater head in the MIKE SHE model through assimilation. First, the univariate assimilation improves the state of the variable being assimilated, but does not improve the other variable. This can be seen from the experiments in both catchments. Second, in multivariate assimilation, when the complete state error covariance of both variables is used for updating and spurious correlations are not cut off by localization, the filter failed to provide a reasonable result. This indicates that the unrealistic inter-variable and cross-variable correlations may exist in the model ensemble. In a similar study, Camporese et al. (2009b) showed the EnKF assimilation of surface soil moisture can actually improve the saturated zone and assimilation of groundwater head can also improve surface soil moisture, where the saturated and unsaturated zones are based on solving the 3-D Richards equation for the entire subsurface.

Quantitative performance measures for evapotranspiration (ET) and discharge for each experiment in the Ahlergaarde catchment.

In the assimilation set-up, a hybrid localization scheme which consists of variable localization and distance localization has been developed and implemented in the ETKF. Localization not only provides better results, but also reduces the computational cost, as only a section of the full state is used within the filter. Similar localization approaches have been reported in hydrological models with discharge involved (Li et al., 2013) as well as in other models (e.g. Kang et al., 2011). Other approaches to deal with the potential inter-variable spurious correlation include for example adaptive localization (Rasmussen et al., 2015) and using two iterative filters instead of one filter (Gharamti et al., 2013). The method used here proved to be suitable for assimilating both groundwater head and soil moisture in integrated hydrological models, and has the potential to be generalized to deal with other processes.

The impact of assimilation on discharge and evapotranspiration is analysed in the Ahlergaarde catchment with real measurements as a reference. Neither the discharge nor evapotranspiration were included in the filter state vector. However, through integrated hydrological modelling, the discharge is improved when head is assimilated, and evapotranspiration is improved when soil moisture is assimilated. Although the improvements seem to be marginal, we nevertheless see the benefits in other modules in MIKE SHE when improving the estimate of groundwater head and soil moisture.

Increasing the ensemble size is beneficial in general, especially for estimating unobserved and un-localized variables. This is because an increased ensemble size can better describe the true correlation in the state error covariance matrix. The effect of ensemble size has also been widely reported in previous studies, e.g. Xie and Zhang (2010). However, the balance between the assimilation result and the computational cost is usually considered when choosing the appropriate ensemble size for heavy models. This is an important issue for the Ahlergaarde model as the computational expenses here become substantial. Due to the time and resource limitation, the choice of ensemble size for the Ahlergaarde model is not analysed in the study, but will certainly be essential for real-time applications in future studies. In addition, the multivariable assimilation could be extended with remote sensing soil moisture and other important hydrological variables (e.g. discharge) that are not included in this study.

The hydrological model forcing data (temperature, precipitation and reference
evapotranspiration) are from the Danish Meteorological Institute
(

The study has been carried out with the support of the Danish Council for
Strategic Research as part of project HydroCast – Hydrological Forecasting
and Data Assimilation, contract no. 0603-00466B
(