Data assimilation in integrated hydrological modelling in the presence of observation bias

. The use of bias-aware Kalman ﬁlters for estimating and correcting observation bias in groundwater head observations is evaluated using both synthetic and real observations. In the synthetic test, groundwater head observations with a constant bias and unbiased stream discharge observations are assimilated in a catchment-scale integrated hydrological model with the aim of updating stream discharge and groundwater head, as well as several model parameters relating to both streamﬂow and groundwater modelling. The coloured noise Kalman ﬁlter (ColKF) and the separate-bias Kalman ﬁlter (SepKF) are tested and evaluated for correcting the observation biases. The study found that both methods were able to estimate most of the biases and that using any of the two bias estimation methods resulted in signiﬁcant improvements over using a bias-unaware Kalman ﬁlter. While the convergence of the ColKF was signiﬁcantly faster than the convergence of the SepKF, a much larger ensemble size was required as the estimation of biases would otherwise fail. Real observations of groundwater head and stream discharge were also assimilated, resulting in improved streamﬂow


Introduction
Sequential assimilation of observations in models is a widely used method in several fields, including meteorology and hydrology.The method has repeatedly been shown to improve forecasting performance, reduce uncertainty and optimize parameter values, and is still a topic subject to ongoing research.
Data assimilation in hydrological models has been studied in a number of settings, from single process models, modelling only a limited part of the hydrological cycle (e.g.Franssen et al., 2011;Albergel et al., 2008;Moradkhani and Sorooshian, 2005), to integrated models incorporating all the relevant processes including precipitation, evapotranspiration, recharge and streamflow (e.g.Camporese et al., 2009;Shi et al., 2014;Rasmussen et al., 2015).The latter presents a number of challenges that have yet to be comprehensively addressed; particularly relating to the differences in process timescales, e.g. between groundwater flow and surface runoff, and the coupling between these processes.An integrated approach to hydrological modelling is, however, important in many applications due to the exchange of water between the hydrological components; thus, it remains important to explore these aspects.In Camporese et al. (2009), the ensemble Kalman filter (EnKF) was applied to an integrated model of a synthetic tilted v-catchment and both stream discharge and groundwater hydraulic head observations were assimilated to update both groundwater and stream states.Shi et al. (2014) applied the EnKF to an integrated land surface hydrological model of a small catchment and, using seven different observation types, successfully es-timated six parameters and sequentially updated the model states.Rasmussen et al. (2015) used the ensemble transform Kalman filter (ETKF) to assimilate groundwater head and stream discharge in a catchment-scale integrated hydrological model for both state updating and parameter estimation.Other studies that focus on joint state updating and parameter estimation in integrated hydrological modelling include Bailey and Baù (2012), in which a smoother was used to calibrate hydraulic conductivity using streamflow and head observations, and Kurtz et al. (2013), which used head observations to calibrate heterogenous riverbed conductivities.
Biases in both models and observations pose challenges to data assimilation in hydrology, and have previously partly been studied (e.g.Dee and da Silva, 1998;Dee, 2005;Reichle et al., 2004;Lannoy et al., 2007;Bosilovich et al., 2007).Bias is found in all components of the hydrological cycle, and take a variety of forms.Notable examples are model bias stemming from model structure or parameter errors, and observation errors, which is due to the difference in scale between point observations and gridded model variables.The latter is a significant source of bias in many groundwater models, as the horizontal discretization of the models is often large.If one is to update the groundwater head in a hydrological model using sequential data assimilation, this observation bias must be taken into account.
While the EnKF, and any derivation thereof, implicitly accounts for both model and observation uncertainty in the form of zero-mean white noise, model and observation biases remain an issue that requires modifications to the filter.A few methods have been developed that attempt to estimate biases online, and they have been applied successfully in many settings.With few exceptions, the bias aware filters can be grouped in two: separate filter methods and augmented state methods.The separate-bias Kalman filter (SepKF) (e.g.Dee and da Silva, 1998;Pauwels et al., 2013;Drecourt et al., 2006) uses a second Kalman filter for updating the biases.This second filter is independent from the filter that updates the states, and the method can therefore not account for correlation between states and biases.Alternatively, augmenting the state space with bias estimates (e.g.Derber and Wu, 1998;Dee, 2005;Drecourt et al., 2006;Fertig et al., 2009) allows the filter to account for the correlation between states and biases, and is therefore useful when the bias is dependent on the observed values.While most implementations of bias estimation assume that the model is unbiased and that the observations are biased, or vice versa, Pauwels et al. (2013) presented a method for estimating both model bias and observation bias simultaneously using a double SepKF.
This study uses both a synthetic test set-up and real observations to test the application of bias correction to a data assimilation framework that assimilates groundwater head and stream discharge observations in an integrated hydrological model for joint state updating and parameter estimation.We discuss the challenges associated with observational bias in hydrological data assimilation for both state updating and pa-rameter estimation.Two existing methods of estimating observation bias, the SepKF and the augmented state vector approach, are tested and the results compared.The novelty of the study lies in the focus on data assimilation bias estimation in a complex, integrated hydrological model as well as the impact of bias on parameter estimation in both synthetic test and using real-world observations.While each of these aspects have previously been studied individually the combination of the aspects creates new challenges, which require particular attention.This paper shares several similarities with the preceding Rasmussen et al. (2015), notably the model catchment and set-up, but differs in the focus on bias and the application of data assimilation to real-world observations.Rasmussen et al. (2015) presents a synthetic study of data assimilation in integrated hydrological modelling in which the filter performance as a function of ensemble size is investigated and the current paper expands on this and adds the complexity of bias estimation and real-world data.

Model
This study uses a transient, spatially distributed hydrological model based on the MIKE SHE code (Graham and Butts, 2005).This code considers all major components of the land phase of the hydrological cycle and the code allows the hydrological components to be dynamically coupled, meaning that feedback (i.e.exchange of water) between the processes is possible at each time step.The feedback is of particular importance for the groundwater-stream interaction in areas where these processes are closely linked, and it makes the model code particularly suited for investigation of data assimilation in integrated hydrological modelling.The coupling between the unsaturated and groundwater zones in MIKE SHE is complicated, as the processes in the two zones are interdependent.When water is exchanged from the unsaturated zone to the groundwater, the groundwater table rises, thereby changing the flow of the unsaturated zone.This complex interdependence is in MIKE SHE simplified, as the two processes only exchange water at every time step of the groundwater model.As the time step of the groundwater model is often much longer than the time step of the unsaturated zone model, the groundwater table is kept constant during several unsaturated zone time steps.This may lead to water balance errors, and in an attempt to reduce these errors, MIKE SHE has a coupling control that adjusts the groundwater table and recalculates the unsaturated zone states if the water balance error is above a user-specified threshold.
An integrated model, which includes groundwater flow, vadose zone flow, evapotranspiration, surface flow and streamflow is used in this study.Vertical groundwater flow components are neglected in the study and groundwater flow Hydrol.Earth Syst. Sci., 20, 2103-2118, 2016 www.hydrol-earth-syst-sci.net/20/2103/2016/ is simulated based on the 2-D Boussinesq equation.Each numerical element of the groundwater flow model is coupled to a one-dimensional (1-D) model for vertical flow in the vadose zone.For numerical and computational convenience capillary forces are neglected and only gravity-driven flow is considered, which is an option in the MIKE SHE code (Graham and Butts, 2005).Streamflow is simulated using the kinematic routing option.The stream network model is set up using an alternating calculation scheme in which discharge and water level is calculated respectively in alternating points, and is independent from the groundwater model in discretization, but exchanges of water between the two processes is made available in model grids of the two processes that physically overlap.The exchange takes place at every groundwater model time step.Evapotranspiration is modelled using the Kristensen and Jensen (1975) model.The groundwater model initial conditions are based on an extended warm-up of the model, win which a quasi-steadystate develops, while the streamflow initial conditions were calculated automatically assuming a steady-state condition.
A horizontal grid size of 1 km × 1 km is used, with a vertical discretization of the unsaturated zone gradually increasing from 0.05 m at the top to 1 m below a depth of 10 m.Further details of the MIKE SHE model application to the Karup catchment can be found in Rasmussen et al. (2015).

The Karup catchment
This study is based on the Karup catchment (Fig. 1), which is located in the northern part of the Danish Jutland peninsula.The catchment has an area of 440 km 2 and agriculture is the dominant land use, while the geology is dominated by highly permeable quaternary sand.It is a very flat catchment, with a gentle south-north slope ranging from 93 m a.s.l. in the southern part to 22 m a.s.l. in the northern part.The Karup river is the primary drainage feature and it springs in the southern part and exits in the northern edge of the catchment.Along its path, the Karup river is joined by seven smaller tributaries.The flat topography and sandy sediments implies that the Karup river is primarily groundwater fed, which emphasizes the importance of an integrated approach to the hydrological modelling of the catchment, as the exchange between the groundwater and the river is a predominant process of the hydrological response of the catchment.

Model parameterization
The geological model used in this study is a 3-D model, which contains one dominant geological unit (meltwater sand) and five lenses (clay, quartz sand, mica clay/silt and limestone), each with assigned parameters of hydraulic conductivity, specific yield and specific storage.The geological model is in a preprocessing step converted into a 2-D model by interpolating the parameter values and gridding them to the computational grid, resulting in a spatially variable field of hydraulic conductivity.The parameter values of the stream model are assumed uniform throughout the model domain.The drain level and drain time constant parameters control the amount of groundwater drained to the nearest stream once the groundwater table exceeds the drain level, and are as such linking the groundwater module and the streamflow module of the model.This models the artificial drain systems installed under most farmlands as well as the natural drainage processes that often occur in the topsoil, and the parameters are therefore particularly important for the drain flow of the river.The leakage coefficient is another coupling parameter, which represents the hydraulic properties of the thin layer of the sediments at the bottom of the stream.This parameter is of particular importance with regard to river base flow.For more details of the model parameterization, reference is made to Rasmussen et al. (2015).

Ensemble transform Kalman filter
The algorithm used for assimilating data in this study is the ETKF (Bishop and Hodyss, 2009), which is a square root formulation of the EnKF.It is more computationally efficient than the EnKF and is furthermore deterministic, meaning that the observations are in the filter not perturbed, which reduces www.hydrol-earth-syst-sci.net/20/2103/2016/ Hydrol.Earth Syst.Sci., 20, 2103-2118, 2016 the issues introduced by sampling.While the ETKF was first presented by Bishop and Hodyss (2009), the implementation used in this study is that of Harlim and Hunt (2005).Vectors of the forecasted state variables of the ensemble members are structured in an m × k matrix, X f , where m is the number of states and k is the number of ensemble members: A s × k matrix Y f of model observations (s is the number of observations) is formed by applying a linear operator H that maps the state space into observation space to each column of X f .This matrix is averaged over the columns to form a s × 1 vector of mean model observations, y f , which is then columnwise subtracted from Y f to form the s × k matrix of model observation anomalies, Y b .Next, X f is averaged columnwise to form the m × 1 vector of mean model states x f and this vector is subtracted from each column of X f to create an m × k matrix of model state anomalies X b .An k × s matrix, C, is computed as follows: where R is a s × s matrix of observation error covariance, and P obs is a s × s diagonal matrix with the localization weights of each observation on the diagonal.The k × k error covariance matrix is computed by where I is a k × k identity matrix.The k × k matrix of analysis error covariance is computed as The k × 1 vector w a is calculated as where y is a s × 1 vector of observations, and y b is a s × 1 vector of the mean model observations.w a is then added each column of W a , forming the k × k matrix of updated error covariance W .The m × k matrix is calculated: Finally, the updated model ensemble, X u , is calculated by adding x b to each column of X c .

Localization
Rasmussen et al. (2015) showed that the common distancebased localization methods do not suffice for localization in integrated hydrological models; instead an adaptive localization method first developed by Miyoshi (2010) will be used.
Rather than removing correlation based on the physical distance from an observation, this localization method is a combination of cross-validating the sample correlation (as estimated from the ensemble) and raising the correlation coefficient to a power in an attempt to distinguish true correlation and spurious correlation.As each state vector member is analysed in the ETKF, the ensemble of model observations (i.e. the ensemble of model states in the observation locations) is generated, and the sample correlation coefficient between each of the model observations and the state member is determined.The localization weights of the observations to the state member being analysed are then calculated from the correlation coefficients as follows.
For each state variable, the ensemble is split into two subensembles of equal size.The sample correlation between the state variable and each observation state variable is calculated for both sub-ensembles.These correlation coefficients are then combined using the following expression: where p obs,a is the localization weight, c 1 and c 2 are the correlation coefficients from the first and second sub-ensembles, and a is a constant used for tuning the localization.Another localization weight, p obs,b , is determined using the sample correlation coefficient for the entire ensemble, c, and another tuning constant, b, as follows: The final (applied) localization weight, p obs (Eq.2), is calculated as the product of p obs,a and p obs,b .Rasmussen et al. (2015) found that parameter values of a = 2 and b = 2, produced the lowest root-mean-square error in the groundwater head in a similar model, and these parameter values will also be used in this study.

Parameter estimation with the ETKF
Parameters are in this study estimated sequentially using the augmented state vector approach (Liu and Gupta, 2007;Rasmussen et al., 2015).The state vectors (Eq. 1) are extended to also contain the parameters that are to be estimated: where f i is the set of parameters used to propagate the ith ensemble member.The mapping matrix H is extended according to Eq. ( 10).
x = H • X f

Inflation
In order to compensate for the systematic underestimation of error variance that is endemic to ensemble-based Kalman filtering, covariance inflation (Anderson and Anderson, 1999;Whitaker and Hamill, 2012;Shi et al., 2014) was applied to both the groundwater head states and the stream discharge states.The inflation is applied by adding a percentage to the ensemble of forecast anomalies: where α is the inflation factor.The inflation factor used in this study is 0.2, which is based on tests of different inflation factors, and has been shown to help maintain a good spread of the ensemble of states.
The ensemble of parameter values is also inflated using Eq. ( 11) but instead of using a constant inflation factor, the inflation factor for the ensemble of parameter values is calculated at each update and for each parameter to match a target spread (as described by the standard deviation): where σ is the standard deviation.σ target denotes the desired spread of the ensemble of parameter values and σ forecast denotes the spread of the ensemble before updating.This method is only applied if the forecast standard deviation of the ensemble of parameters is smaller than the target standard deviation, which in this study is set to 10 % of the initial standard deviation of the ensemble.This value has shown to produce the best results, by maintaining a sufficient spread that does not create instabilities in any of the ensemble members.
Using covariance inflation is, like using localization, inconsistent with the deriviation of the filter and only necessary due to inadequate or incorrect noise description and ensemble generation.However, due to the complex nature of the model, Generating an ensemble that perfectly represents the uncertainty of the model is difficult and particularly in the test using real data outside the scope of this paper.

Damping
A simple damping mechanism is implemented in the modelling framework to reduce the magnitude of the state-and parameter updates and thereby reduce the shock introduced to the system in the form of instantaneous changes of model states and parameter values at the time of updating.Furthermore, damping has the same effect as inflation, as it helps maintain an ensemble spread and thus combats the tendency for the ensemble to collapse.Damping of parameter updates is common, and has been studied in Franssen and Kinzelbach (2008) and used in Rasmussen et al. (2015).
Damping is pragmatically applied post-updating as follows.For each ensemble member, the post-damping and final state vector is calculated as where D is a user specified m × 1 vector of damping coefficients.Note that the values in D may vary depending on the variable type (i.e.hydraulic head, stream discharge or water level) or parameter type.A damping coefficient of 0.1 was used for all parameters in all scenarios studied, while different damping coefficients for the states have been analysed in the tests described below.

Bias estimation
This study compares two different methods for estimating observation bias: the coloured noise Kalman filter (ColKF) and the SepKF.The ColKF methodology for estimating bias follows that of Fertig et al. (2009), in which the biases are estimated online by augmenting the state vector, in a similar way as for estimating parameters.That is, the augmented state vector, which contains both states and parameter values is further augmented by an ensemble of observation biases as follows: where β f i is the set of observation biases of the ith ensemble member.The linear operator H is modified such that when it is applied to the columns of X f , the bias is added to the appropriate model observations: where x is the ith unbiased model observation.Note that a constant bias forecast model is used, meaning that where the super script "u" indicates an updated value, and t refers to the time step.This study assumes no bias in discharge observations, meaning that the only biased observations are the groundwater head observations.In real-world observations, discharge observations would usually also be biased, but this bias is generally small compared to the random error of the observations and compared to biases in groundwater head observations.
The method requires an initial bias estimate based on a priori information.Furthermore, as with estimation of parameters, a spread in bias estimates needs to be generated.In this study, the initial estimate of bias in all observation points is generated by sampling from a normal distribution with a standard deviation of 0.6 m and a mean of 0. The standard deviation vas chosen based on precursive testing in the synthetic test environment, that showed that this value generally led to the best estimates of bias.
The implementation of the SepKF in this study is similar to the one derived and presented in Drecourt et al. (2006) but modified to estimate observation bias rather than model bias and to be implemented for use in a square root formulation of the filter.The bias filter is a discrete filter that is coupled to the ensemble-based filter used for updating the states and the parameters as follows.The forecasted model observation error covariance, P is estimated from the ensemble of anomalies: The bias error covariance is estimated as being proportional to the ensemble model observation forecast error covariance, P, through a parameter γ (0 ≤ γ ≤ 1): where γ is a tuning parameter that controls the fraction of information from the observations that is used to bias and for states respectively.Tests using different values of γ revealed that this parameter had little impact on the final estimated bias, but a value of 0.1 was chosen for this study, as it performed slightly better than other values tested.The bias error covariance is furthermore conditioned to the assumption of no bias in discharge observations.The Kalman gain for the bias filter is then calculated as The bias Kalman gain is localized as follows: where L is a s × s matrix containing the localization weights for each state as determined by the adaptive localization algorithm.The updated biases are calculated as Finally, the updated states are calculated using the following modification of Eq. ( 5): The augmented state method has the advantage that it can take any interaction between the bias and the states into account, as the full forecast covariance matrix is used.On the other hand, the SepKF filter ignores any cross-correlation between bias and states.
While ignoring the correlation between state error and bias error may be problematic where such correlation exists, the price of using the augmented state method is the increase in the state space that needs to be spanned by the ensemble.To describe the uncertainty of the augmented state, an (m + p + s) × (m + p + s) (states, parameters and observations) covariance matrix is necessary, while an (m + p) × (m + p) plus a (s × s) matrix is necessary for the SepKF.This is likely to increase the required ensemble size when using the augmented state method and thus increase computational demands.

Asynchronous assimilation
Due to the differences in frequency between the two observation types, this study uses asynchronous assimilation (Sakov et al., 2010).This way, the more frequent stream discharge observations can be assimilated along with the less frequent groundwater head observations, without the states having to be updated each time a discharge observation is available.The state vector is extended with model results for asynchronous observation times and the observation vector is extended with the asynchronous observations.After that, the asynchronous observations and model results are simply treated as normal model states.The information contained in the extensions are then used to improve the update at the time of updating.This is done by saving the individual observations and the model results for the time steps in which observations are available, and calculating the ensemble error as if the model results for the different time steps are model states.So, given a set of j observations at times t 1 , . . ., t j collected, the model observations is formulated as follows: Similarly, the observation vector is extended to correspond to the ensemble observations.While the asynchronous observations and model observations are saved and used in the filter at the time of updating, they are afterwards discarded and no retrospective updating of states is performed.

State variables
In this study, the state vector contains groundwater head, stream discharge and stream water level, all of which are updated at each updating time step.The states are updated every 4 weeks, when groundwater head observations are available.The daily discharge observations available in between updates are included as asynchronous observations while the discharge observations available at the time of updating are assimilated normally.

Estimated parameters
The horizontal hydraulic conductivities of meltwater sand (HK_mws) and quaternary sand (HK_qs) are estimated, with the vertical conductivities tied to them at a ratio of Hydrol.Earth Syst.Sci., 20, 2103Sci., 20, -2118Sci., 20, , 2016 www.hydrol-earth-syst-sci.net/20/2103/2016/ 10 : 1.Note that the estimated hydraulic conductivities are those of the geological units, that are gridded to the computational grid before further propagation of the ensemble (see Sect. 2.2.2).Furthermore, the two parameters that control drainage, the drain level and the drain time constant, are estimated, and so is the leakage coefficient, which controls the groundwater-streamflow interaction.These parameters were selected based on their scaled sensitivities as determined by using the AUTOCAL software (Madsen, 2003), with HK_mws being by far the most sensitive towards both streamflow and groundwater head.For a full list of sensitivity coefficients, the reader is referred to Rasmussen et al. (2015).HK_mws, HK_qs, the drain time constant and the leakage coefficient were transformed logarithmically, as their uncertainty is expected to span several decades.

Inverse modelling
In order to evaluate the performance of the data assimilation algorithm for parameter estimation using real observations, the model is also calibrated using AutoCal in order to be able to compare the parameter estimation through data assimilation with parameter estimation through more common method, such as inverse modelling.A multi-objective calibration approach is used, in which both groundwater head observations and stream discharge observations are aggregated and optimized.The set-up of parameters is similar to the one used in the data assimilation approach (see Sect. 2.4.2), with the same variable and dependent parameters and initial values, in order to make the results of the inverse modelling and the data assimilation directly comparable.
Root-mean-square error is used as objective function of both groundwater head observations and stream discharge observations, and the two are aggregated using transformation to a common distance scale (Madsen, 2003).Both objective functions are weighted equally in the aggregation, to ensure an equal importance on optimizing both the streamflow and the groundwater head of the model.

Data availability
Between 1970 and 1990, the Karup catchment was the subject of an extensive monitoring campaign in which stream discharge and groundwater head were rigorously measured.As a result, groundwater head observations are available in 35 locations (Fig. 1) with a frequency of 14 days −1 , and daily stream discharge observations are available in four locations in the stream network.

Synthetic test observations
A twin test approach is used in the first part of this study, meaning that a "true" model is defined, and that the observations to be assimilated are generated from the results of this true model.The same model, but with perturbed parameter values, denoted the base model, forms the basis of the ensemble that is used for data assimilation.Note that both the true model and the base model are deterministic models, that is, single, propagated models without any noise added.The set-up is identical to that of Rasmussen et al. (2015), and the reader is referred thereto for a detailed description and a list of parameter values.Groundwater observations are made available at 24 locations that form a subset of the 35 locations in which real observations are available (Fig. 1).The reason for omitting some of the observation locations is that they are located too close to the stream network, and act as an exchange between the groundwater model and the stream model.It was found that the groundwater head of these grid cells are very sensitive to the streamflow simulation, and small changes in the head lead to significant changes in the streamflow.As such, they are not suitable for assimilation and were used only as validation observations.Furthermore, one observation did not reflect the dynamics of the model due to its proximity to the model boundary and was therefore omitted.In the twin test experiment the groundwater observations are generated with a frequency of 28 days −1 and are added a time-varying, normally distributed white noise with a standard deviation of 0.05 m and each are added a randomly generated (normally distributed) constant bias with a standard deviation of 0.5 m.
Four stream discharge observations that coincide with the locations of real observations are included.The discharge observations are made available on a daily basis, and are added to a normally distributed white noise that is proportional to the observed value using a standard deviation of 5 % of the observed discharge, which is a common error observed in real-world observations of discharge (Herschy, 1999).
The states and parameters are updated every time groundwater head observations are available, i.e. every 28 days, and the daily discharge observations available in between updates are assimilated asynchronously.Tests have shown that the length of the assimilation window is of little importance and therefore no other assimilation window was tested.

Real observations
Like in the synthetic test, the same 24 groundwater head observation locations are chosen for assimilation, while the remaining locations are used for validation.The real groundwater head observations are available with a frequency of 14 days −1 , but to avoid updating the states and parameters too often every other observation is assimilated asynchronously, allowing an assimilation window of 28 days like in the synthetic test.All four discharge observation locations are used for assimilation and are assimilated asynchronously.

Model noise
Model noise is added to the ensemble through the forcings, i.e. precipitation and reference evapotranspiration, and the parameters.Noise on forcings is added as a Gaussian noise with a standard deviation of 20 % of the observed value, while no spatial correlation of the noise is considered.
Noise is added in the form of a Gaussian zero mean distribution to a large number of model relating to all model processes and not just to the estimated parameters.In total noise is added to 66 parameters, only five of which are estimated.Adding noise to parameters that are not estimated helps maintain the spread of the ensemble even as the spread of the estimated parameters is reduced.Note that the zero mean of parameter noise means that if the filter successfully estimates all of the five included parameters, the ensemble of models is unbiased except for any bias there may have been introduced through the sampling of parameter noise and forcing noise.

Test scenarios
For studying the performance of the data assimilation using synthetic observations, the study includes the seven scenarios listed in Table 1.All scenarios include bias estimation, joint state updating and parameter estimation and simultaneous assimilation of groundwater head and stream discharge observations.
When assimilating real observations, three scenarios are studied: ColFil and SepFil and NoBiasEst (Table 2).The ColFil uses the ColKF, a damping factor of 0.1 and an ensemble size of 200, making it a combination of the Ens200 and Hdampen scenarios studied in the synthetic test.The SepFil uses the SepKF and an ensemble size of 100.The increase in ensemble size used when using real observations is due to the more complex nature of the model and observation error caused by differing dynamics of the observations and the model.For comparison, the NoBiasEst scenario uses no bias estimation.

Performance indicators
The model simulation period is from 1 January 1968 to 31 December 1973, and is divided into the following periods: -1969: warm-up, in which the ensemble is propagated without being updated in order to allow a spread in the ensemble of states to develop.At the end of the year 1969, the spread of the ensemble of groundwater head is between 2.1 and 0.7 m (depending on the location in the catchment), which is considered sufficient for assimilation to commence.
-1970: preliminary assimilation of observations, which allows the filter to constrain the states and parameters.The results of this period are not included in the performance evaluation.
- [1971][1972]: assimilation of observations for evaluation.The results of this period are included in the performance evaluation as an indicator for how well the filter performs.In the remainder of the report described as the "assimilation period".

Synthetic test performance indicators
The performance of the filter when using synthetic observations is measured using three indicators: the mean estimated bias error (mean bias error), calculated as the average difference (in all observation points) between the actual bias used to generate the biased observation and the mean of the ensemble of estimated biases at the end of the assimilation period; the average root-mean-square error of the groundwater head (head RMSE) in all calculation points of the groundwater model domain for the assimilation period; -The Nash-Sutcliffe coefficient of the stream discharge at the outlet of the catchment ("NS") for the assimilation period.

Real data performance indicators
The performance of the filter when using real observations is measured using two indicators: the mean RMSE of all 35 groundwater head observation points for a. the assimilation period b. the validation period.
-The Nash-Sutcliffe coefficient for stream discharge in the outlet of the catchment for a. the assimilation period b. the validation period.
Furthermore, a deterministic model with the optimal parameter set (as determined by the data assimilation algorithm) is used to evaluate the estimated parameters.This model is designated "optimal model" and is evaluated using the above indicators.For comparison, the results of the optimized model using AUTOCAL is included (hereafter designated "AutoCal model").

Bias correction using the coloured noise filter
The filter set-up that is considered the baseline set-up is ColFilEns50 in which the ensemble size is 50 and the parameter updates are dampened by a factor of 0.1, while no damping of the state updating is performed.The baseline setup is adopted from Rasmussen et al. (2015) as this set-up performed satisfactorily for the same catchment and similar number of observations.However, Rasmussen et al. (2015) did not consider bias correction.
The ColFilEns50 performed poorly in all three performance indicators as seen in Fig. 2. The average error in estimated bias is 0.47 m; worse than the average absolute bias of the observations (0.38 m), and the filter often estimates a bias that is in the wrong direction.This suggests that better, or at least similar poor results, could be obtained by not correcting the bias.Furthermore, the updating of groundwater head is often erroneous, as evident from the spikes in groundwater head RMSE (Fig. 3) that occur at the time of updating.This wrong updating may be explained by two issues: the wrongly estimated bias, which compels the filter to update the states wrongly as it does not know the unbiased observations, or the appearance of spurious correlation.Rasmussen et al. (2015) observed the same spikes in head RMSE when using unbiased observations and concluded that they are caused by spurious correlation.
The poor performance of the ColFilEns50 is unexpected, as an almost identical set-up was successfully used in Rasmussen et al. (2015), albeit using unbiased observations.However, adding bias correction to the filter increases the state space that must be spanned by the ensemble, thus potentially requiring a larger ensemble size.
Doubling or quadrupling the ensemble size to 100 and 200 (ColFilEns100 and ColFilEns200 scenarios, respectively) resulted in major improvements in almost all indicators (Fig. 2).In terms of estimating bias, the error is reduced by approximately 50 % to 0.24 and 0.22 m respectively, and the head RMSE is reduced by 26 and 31 %.However, as visible in Fig. 3, incorrect state updates still occur even with an ensemble size of 200.As shown by Rasmussen et al. (2015), these spurious correlations are likely to result in increased drainage to the stream model, resulting large errors in streamflow.The errors from spurious correlations in the streamflow model dominate the performance indicator and are, due to the nature of spurious correlation, random.As a result, the Nash-Sutcliffe coefficient is reduced when using an ensemble size of 100, but increased with an ensemble size of 200.
The increased performance, and the reduction in the spikes in head RMSE, supports the hypothesis that the poor performance of the ColFilEns50 set-up is caused primarily by spurious correlation.
Dampening the update of groundwater head (ColFilHdamp scenario) had a profound effect on all the performance indicators (Fig. 2).The mean bias error is reduced by 63 % compared to the baseline set-up, and the NS is nearly doubled.Finally, the head RMSE is reduced by 28 % to 0.32 m, which is higher than what is obtained by increasing the ensemble size or retuning the localization algorithm, but still a significant improvement.
Dampening reduces the instant change in groundwater head, and as such reduces the problems that arise due to the non-linear relationship between states as well as reducing spurious correlation.Furthermore, it reduces the numerical effects that come from changing model states and parameters, in which the model attempts to regain equilibrium.However, dampening the state updates causes a slower reduction in head RMSE (Fig. 3), the value approximately converges to the RMSE of Ens100 and Ens200 within 1 year of assimilation.

Bias correction using the SepKF
Using the SepKF (scenario SepFil) resulted in significant improvements over the ColFilEns50 set-up in all performance indicators compared to the ColKF set-up with the same number of ensemble members (ColFilEns50) (Fig. 2).The mean bias error is reduced to 0.20 m, which is comparable to ColFilEns200 and ColFilHdamp set-ups and little drifting behaviour is observed in the model (Fig. 4).NS is increased to 0.75, and head RMSE is reduced to 0.34.

Excluding discharge observations
When excluding the discharge observations (scenario SepFil-NoQ), the filter performs worse in all three indicators.Compared to the SepFil scenario, both the mean bias error and the head RMSE is increased by 58 %, and the NS is reduced to −0.74.The reduction in NS is explained by a bias in the estimated drain constant (Fig. 5) and by a poorer description of the groundwater head level as indicated by the head RMSE.Rasmussen et al. (2015) showed that discharge observations are particularly valuable for estimating parameters and updating stream discharge, but less valuable for groundwater head updating.They found that excluding discharge observation resulted in an improvement in groundwater head description when the spatial coverage of groundwater head observations is good, as there is a trade-off between optimizing the streamflow and the groundwater head.However, the current results suggest that discharge observations also helps improve the estimation of groundwater head observation bias and consequently of the groundwater heads.

Bias-unaware filter
Excluding bias estimation from the filter (NoBiasEst scenario) results, as expected, in significant reductions in filter performance (Fig. 2).This scenario may be considered as having an estimated bias of zero, and as such has a mean bias error of 0.44 m (i.e. the average absolute bias used to generate the observations), which lead to an increase in head RMSE of 6 % over the already poorly performing ColFilEns50 scenario and 50 % over the SepFil scenario.Furthermore, the NS was reduced to zero due to erroneous updates of the groundwater head and poorly estimated parameters; in particular the drain level and the drain constant (see Fig. 5).The omission of bias estimation also resulted in significant (and expected) gradual deviation from the updated level (i.e.drifting) as seen in Fig. 4. When considering the predictive power of the model, the bias-unaware filter is also more likely to estimate biased groundwater heads (Fig. 6).This figure indicates that particularly in the observation point "Well 8", the bias unaware will consistently forecast a too low groundwater head, and, as seen in Fig. 4, this is to a great extent caused by the biased filter updates.
It is clear that omitting bias estimation when biases are present has a negative impact on both state updating and parameter estimation.It is observed that updating the groundwater head to a biased observation level causes the head to return to an unbiased level when model propagation is resumed (i.e. it is drifting as seen in Fig. 4).The model behaviour becomes unnatural in the sense that it is not controlled primarily by the input forcings, but rather by the model trying to retain equilibrium.This can result in deteriorated estimation of parameters and updates of model states not only in the observation points but also in the entire model domain., but as Fig. 7 shows, there are significant differences in the estimation of bias in most observation locations.The ColKF converges significantly faster than the SepKF to the true value in most locations where the bias estimation is successful, due to the inclusion of bias-state correlation in the ColKF.The SepKF also underestimates the bias in some locations, most likely due to the simplifications and assumptions, notably the assumption that the bias error covariance is proportional to the state error covariance.
Figure 4 shows that the drifting behaviour is generally most pronounced in the NoBiasEst scenario and least pronounced in the ColFIlEns200 scenario, with the drifting behaviour of the SepFil scenario falling in between the two former scenarios.Both the ColKF and the SepKF reduce the bias error in most locations except in wells 39, 54 and 63.The erroneous bias estimation may be because of the estimated parameter values.Visual inspections of the groundwater head as a function of time (Fig. 4) reveal that there is no significant systematic deviation from the updated level (i.e.drifting) in the ColFilEns200 and SepFil and therefore no update of the observation bias in the filter.The lack of model drifting despite erroneous bias estimation is caused by the wrongly estimated parameters, and as such this is an equifinality issue: The filter has been able to produce nondrifting behaviour of the model despite biased states, by using a biased parameter set.On the other hand, the NoBiasEst displays significant drifting in wells 8, 39 and 63, even when the updated states are unbiased (well 39) but as the filter is unaware of bias, this is not corrected.
The improvements gained from using the SepKF filter rather than the ColKF stem from the reduction in uncertainty needed to be described by the ensemble, and thus a smaller ensemble size is required.Ignoring the correlation between the bias and the state reduces the complexity of the system, and if that correlation is negligible, as in this case, there is little advantage in using the ColKF over the SepKF.
The two bias correction methods were also compared in Drecourt et al. (2006) using a simple 1-D groundwater model.While they did not consider the issue of ensemble size, they too found that both the ColKF and the SepKF can successfully estimate biases and improve model forecasting abilities.They also noted that the convergence of the SepKF is slower than the convergence of the ColKF, but the performances of the two methods were otherwise comparable.

Real data tests
The Nash-Sutcliffe coefficient for stream discharge and the mean RMSE of groundwater head can be seen in Fig. 8.When comparing to the base values data assimilation with the separate-bias filter (scenario SepFil), the coloured noise filter (scenario ColFil) and the bias-unaware filter (scenario NoBiasEst) all result in increased Nash-Sutcliffe coefficients and reduced mean head RMSE in the assimilation period.
In the NoBiasEst scenario, the model states are forced to match the observations as any bias is ignored, which results in a lower mean head RMSE in both the assimilation and the validation period (Fig. 8).However, the assumption of unbiased head observations results in the NoBiasEst scenario having the lowest Nash-Sutcliffe coefficients of the three scenarios due to a trade-off between stream discharge observations and groundwater head observations, and it results in the drifting model behaviour apparent in Fig. 9, in which the model deviates strongly from the observed level in between updates.In SepFil, bias estimation is included using the SepKF, which results in a higher Nash-Sutcliffe coefficient and a comparable head RMSE to that of the NoBias-Est scenario.The effect of the bias estimation can be seen in Fig. 9 as the filter does not update the groundwater head to the level of the observation but acknowledges a bias, which results in less drifting between updates compared to the No-BiasEst scenario.However, the deviation is still significant, which indicates that the bias is underestimated for this observation point.This is in line with the synthetic tests, where it was observed that the SepFil tends to underestimate large biases.
The ColFil scenario results in higher mean head RMSE and slightly lower Nash-Sutcliffe coefficient than the SepFil, but the ColFil optimal model (i.e. the deterministic model us- ing the parameter set estimated by the filter) performs better than the SepFil optimal model with respect to most indicators.
The ColFil scenario estimates significantly larger biases in most observation points (Fig. 10), with an average absolute estimated bias of 0.63 m, compared to 0.19 m in the Sep-Fil scenario.With few exceptions, SepFil estimates a smaller bias than the ColFil, though in most cases in the same direction.Different bias directions are estimated by the two filters in two of the 24 observation locations, as illustrated in Fig. 9, which may be caused by significant differences in the estimated parameter values (see Fig. 11), as is also seen in the synthetic test.
A bias of approximately zero is estimated in seven observation locations, while biases of up to 1.8 m are estimated in others.In most locations, however, the bias appears underestimated, as exemplified by Fig. 9.This underestimation is observed as drifting and is likely caused by two factors.For the SepKF, the update of bias is constrained by the γ parameter, meaning that a too low value of γ may limit the update too much and thereby make the filter unable to estimate the correct bias, while a too high γ value is likely to yield unstable bias estimates.A test was made using a γ value of 0.3 (the value used in SepFil is 0.1), which resulted not only in increases in the estimated biases, but also in unstable bias estimates that changed significantly with each time step as the filter did not properly distinguish biases, random error and model dynamics.Furthermore, as more and more observations are assimilated and the spread of the ensemble of states is reduced, the update of the biases is smaller as the bias error covariance is assumed proportional to the state error covariance.If the ensemble spread of states is reduced too much, or even collapses, before correct biases are estimated, the bias estimation effectively stops.A similar consideration is applicable to the ColKF, as the ColKF operates with an ensemble of biases, and the spread of the ensemble of biases (and thereby the bias error covariance) is independent of the ensemble of states.If the spread of the ensemble of biases is too small, bias estimation effectively stops.Comparing the optimal models of the ColFil, the SepFil and the NoBiasEst with the base model and the AutoCal model reveals a clear difference between the assimilation period and the validation period.While the optimal models produce lower NS for the assimilation time than both the base model and the AutoCal model, there is a clear improvement in the NS in the validation period over both the Auto-CalModel and the base model.This suggests that AutoCal has produced a biased parameter set, which is not the case using any of the three Kalman filters.However, the value of bias correction for parameter estimation is unclear, as there is no significant difference in the validation NS of the biasaware Kalman filters and the bias-unaware Kalman filter.
This tendency is not present in head RMSE, where the optimal models perform more poorly in terms of head RMSE than the base model and the AutoCal model.While it is to be expected that the AutoCal model would produce lower head RMSE than both the ColKF and the SepKF since the AutoCal model has been optimized specifically based on the head RMSE, it was expected that the optimal models of the ColKF and SepKF would produce improvements over the base model.However, it should be noted that the eval- uation of model performance is based on the possibly biased observed values, and that the estimated biases have not been taken into account in the head RMSE calculations.The lack of clear improvement in the optimal models may be explained by the fact that there is little room for improvement with the current model structure as underlined by the relatively small improvements between the AutoCal model and the base model.It may also in part be explained by the underestimation of the biases in both the ColFil and SepFil scenar-ios.Improving the model structure and the filter set-ups may improve the potential of estimating parameters, but with the current results the value of data assimilation for parameter estimation is not clear.

Conclusions
Observation bias is a notable challenge in integrated hydrological modelling and needs to be addressed when applying data assimilation to the models.Updating the states of a model to match strongly biased observations will decrease filter performance and may even cause numerical instability.The two methods for correcting observation bias presented in this study can help reduce the bias issue in data assimilation and improve filter performance.Both methods improved the groundwater head and stream discharge of the model, and with varying degrees of success estimated the observation bias when using synthetic observations.When using real observations, both bias estimation methods resulted in improved streamflow modelling, but little improvement was seen in groundwater heads.
The main difference in the bias correction methods analysed is the interaction between the bias and the states.While the ColKF takes advantage of the full covariance matrix, the SepKF only takes into account the interaction that is present from the state to the bias and not the other way around.While this is a limitation of the SepKF, it results in a lower requirement for ensemble members, meaning that for smaller ensembles, the SepKF outperforms the ColKF.To obtain similar results to those of the SepKF when using the ColKF, the ensemble size needed to be doubled or even quadrupled, or the updates of the states needed to be dampened in an attempt to reduce the spurious correlations.
Most of the model parameters were successfully estimated in the synthetic tests, but biased observations introduce issues with equifinality.A biased parameter set may produce unbiased model behaviour (i.e.without drifting) in one or more observations even if the estimated bias is incorrect.As a result, the filter does not update the bias of the observation, and the erroneous parameter set is not corrected.This resulted in significantly different parameter sets estimated by the different filters for both the synthetic tests and the tests using real data.
The study has shown that hydrological observational bias can be corrected in a data assimilation scheme and that it can improve state updating and parameter estimation.With both model bias and observational bias being significant sources of error in hydrological modelling that may function as a road block for the application of data assimilation to hydrological models, these results may act as a stepping stone for the advancement of hydrological data assimilation in large-scale, integrated hydrological models.
Edited by: I. Neuweiler

Figure 1 .
Figure 1.The Karup catchment with locations of discharge and hydraulic head observations.

Figure 2 .
Figure 2. Mean bias error, NS and H RMSE for the years 1971-1972 in the synthetic test.

Figure 3 .
Figure 3.The temporal variation of Head RMSE in the synthetic test.

Figure 4 .
Figure 4. Groundwater head as a function of time in four selected observation locations for the year 1972 (synthetic test).

Figure 5 .
Figure 5. Spread of estimated parameters at the final update (synthetic test).Thin blue lines show the total spread of the ensemble and thick blue lines show the 25th and 75th percentile.Dots show the mean of the ensemble.The horizontal lines show the true parameter value (black line) and the base parameter value (magenta line).

Figure 6 .
Figure 6.Model observations versus synthetic observations in selected observation locations.The dashed line indicates the 1 : 1 line when corrected for the applied bias.Note that the plotted model observations are the forecasted model observations, i.e. before the states are updated in the filter.
4.1.5Comparison of the ColKF, the SepKF and the bias-unaware filter The time-varying estimated biases using the ColKF and the SepKF for each observation location are shown in Fig. 7.The figure compares the ColFilEns200 and the SepFil scenarios, as they are the most easily comparable in terms of set-up and performance.Both scenarios have comparable mean bias error (0.22 and 0.20 m for ColFilEns200 and SepFil, respectively)

Figure 7 .
Figure 7.Estimated bias in the ColFilEns200 and SepFil scenarios as a function of time in the synthetic tests, compared to the true bias value used to generate the biased observations.

Figure 8 .
Figure 8. Nash-Sutcliffe coefficient for stream discharge (left panel) and mean RMSE of groundwater head observations (right panel) in the assimilation and validation periods, respectively (real data).

Figure 9 .
Figure 9. Groundwater head as a function of time in head observation location well 64 (real data).

Figure 10 .
Figure 10.Estimated bias in the ColFilEns200 and SepFil scenarios as a function of time (real data).The black line indicates zero bias.

Figure 11 .
Figure 11.Spread of estimated parameters at the final update (real data).Thin blue lines show the total spread of the ensemble and thick blue lines show the 25th and 75th percentile.Dots show the mean of the ensemble.The horizontal lines show the AutoCal parameter value (black line) and the base parameter value (magenta line).

Table 1 .
Overview of set-ups studied in the synthetic tests.

Table 2 .
Scenarios studied in the real data tests.