On the efficiency of the hybrid and the exact second-order sampling formulations of the EnKF: A reality-inspired 3D test case for estimating biodegradation rates of chlorinated hydrocarbons at the port of Rotterdam

In this study, we consider the assimilation problem of subsurface contaminants at the port of Rotterdam in the Netherlands. This involves the estimation of solute concentrations and biodegradation rates of four different chlorinated solvents. We focus on assessing the efficiency of an adaptive hybrid ensemble Kalman filter (EnKF-OI) and the exact second-order sampling formulation (EnKFESOS), for mitigating the undersampling of the filter covariance and the observation errors, respectively. A multi-dimensional and multi-species reactive transport model is coupled to simulate the migration of contaminants within a 5 Pleistocene aquifer layer located around 25 m below mean sea level. The biodegradation chain of chlorinated hydrocarbons starting from Tetrachloroethene and ending with Vinylchloride is modelled under anaerobic environmental conditions for five decades. Yearly concentration data is used, in a synthetic setup, to condition the forecast concentration and degradation rates in presence of model and observational errors. Assimilation results demonstrate the robustness of the hybrid EnKF, for accurately calibrating the uncertain biodegradation rates. When implemented serially, the adaptive EnKF-OI scheme efficiently adjusts 10 the weights of the involved covariances for each individual measurement. The relevance of the EnKFESOS is emphasised by a better maintaining of the parameters ensemble spread. On average, a well tuned hybrid EnKF-OI and the EnKFESOS suggest around 48% and 21% improved concentration estimates and around 70% and 23% improved anaerobic degradation rates, over the standard EnKF respectively.

The contaminant data collected in 2012 by the municipality of Rotterdam, is used for initialising the contaminant migration, which propagates to surface and deep aquifer layers (≈ 50 m below sea level). The goal of the study is to use "synthetic" CH concentration data on a yearly basis, for a total of 50 years, to calibrate four biodegradation rates of the reaction chain. To the best of our knowledge, this is the first study in which biodegradation parameters of a reductive dechlorination process are estimated in a real-world system using a sequential DA procedure. On top of the biodegradation, the concentrations of the 95 components are also constrained using the EnKF, the hybrid EnKF and the EnKF ESOS schemes. Sensitivity analyses are performed to study the efficiency and the accuracy of the assimilation schemes under different experimental settings. The filtering schemes are evaluated based on the accuracy of the estimated solute concentrations, the handling of the posterior distributions of the biodegradation rates and computational complexity.
The rest of this paper is organised as follows. Section 2 presents the ensemble assimilation methods. Section 3 describes 100 the large-scale subsurface reactive transport model and its numerical implementation. Section 4 presents the assimilation setup and the experimental scenarios. Results of assimilation experiments are presented and analysed in Section 5. Conclusions and further discussion are given in Section 6.

The Data Assimilation Framework
The aim of DA is to combine measured observations and a dynamical model in order to compute the best possible estimates 105 of the past, current and future states of the system, together with estimates of the associated uncertainties (Nichols, 2010). We follow the standard discrete nonlinear dynamical system: where x k ∈ R Nx denotes a state vector of N x variables at time t k , Θ k ∈ R N Θ is the vector of model parameters, M k : R Nx → R Nx is the nonlinear operator that propagates the model state from t k to t k+1 . η k+1 ∈ R Nx is a model error (system noise) 110 accounting for model uncertainties. It is commonly assumed that η k+1 follows a Gaussian distribution N (0, Q k+1 ). The measurements obey the following observational system: where y k+1 ∈ R Ny is a vector of N y observations at time t k+1 , H k+1 : R Ny → R Nx is an observational map including grid interpolations, and could be nonlinear. The observation errors ε k+1 ∈ R Ny are Gaussian with zero mean and covariance R k+1 . 115 We assume independent model and observation errors.
Following the Bayesian filtering problem, the objective is to evaluate the joint probability density function (pdf), i.e., p (x k , Θ k |y 0:k ), of the system state x k and the parameters Θ k given all available observations y 0:k . The observations, y 0:k , are used to update the model forecast. The updated estimate is then used to compute a future prediction. Likewise, the estimation problem can be also tackled using variational approaches that involve minimisation of a cost function (Dimet and Talagrand,120 1986; Courtier et al., 1994;Hoteit et al., 2005;Altaf et al., 2013). Variational DA techniques, such as 3DVar and 4DVar, are widely used in geoscience applications. These methods look for an optimal state trajectory that best fits observational data over a time window, but do not offer an efficient framework for quantifying uncertainties in the solution. In this study, we will only consider the sequential Bayesian filtering problem.

125
The computation of p (x k , Θ k |y 0:k ) is not feasible in real applications owing to the nonlinear character of the model and observation operators in addition to the very large dimension of the subsurface flow and transport system. The ensemble Kalman filter (EnKF) is an efficient Monte Carlo method, which computes an approximation of the joint pdf at reasonable computational requirements. The EnKF represents the distribution of the system using a collection of joint state and parameters vectors, called ensemble. We follow the state-parameters augmentation procedure (Annan et al., 2005) and denote by ψ the 130 jointly concatenated state and parameters vector. The parameters are time-invariant so that their time-propagation function is simply the identity operator.
To illustrate, starting at time t k−1 from an analysis ensemble, ψ a,i k−1 : x a,i k−1 , Θ a,i k−1 Ne i=1 , which represents p(ψ k−1 |y 0:k−1 ), the EnKF propagates the dynamical model (1) to compute the forecast ensemble at the time of the next available observation, t k . Incoming measurements are then used to update the joint ensemble. The EnKF algorithm is summarised below.

-Forecast
Step: The analysis members are integrated forward in time to obtain the forecast ensemble from which we estimate the first two moments as follows: (4) The joint sample covariance P f k consists, as shown in (4), of the sample state covariance P f xx , the state-parameters 140 cross-correlation P f θx and the sample parameters covariance P f θθ matrices. The joint state-parameters forecast estimate (mean) is denoted by ψ f k . The complexity of the forecast step grows with the ensemble size. If one supposes that C M is the cost for integrating the model to the next observation time, the computational requirement of the forecast step is N N e C M , where N is the final simulation time (Gharamti et al., 2014a).
-Analysis Step: When the observation y k becomes available, the joint forecast members ψ f,i k are updated using the 145 Kalman-update step; i.e. where is the Kalman gain and the analysis state is: The observation perturbations, denoted by k , are sampled from a Gaussian distribution of zero mean and covariance 150 R k . The observational operator H k = H k , O , acting on the augmented state-parameter vector, is assumed linear for simplicity, and the matrix O is a zeros matrix. Computationally, the update step in hydrological applications is usually less demanding than the forecast step, with a complexity of N N e N y N x + N N 2 e (N x + N Θ ). The observations used in the update equation of (5) are processed in one single batch. In our implementation, however, we will consider the serial EnKF update formulation in which the observations are assimilated one at a time. The reason for this will become clear 155 in section 2.3.

EnKF Limitations
The performance of the EnKF strongly depends on the accuracy of the forecast error covariance matrix P f . The errors in P f are essentially due to: (1) model errors and the use of small ensemble sizes, and (2) propagation of errors in the sample covariance matrix P a at the previous step. The Gaussian assumption of the system's distribution is also a limiting factor but 160 this was proven to be less problematic (e.g., Frei and Künsch, 2013).
The main advantage of the ensemble approximation (Eqs. 3 and 4) is that it does not involve any linearisation and allows to represent the first two moments of the state and parameters by an ensemble of vectors (Evensen, 2003). The use of large ensembles is practically not possible and thus the sample covariance P f k may not well approximate the forecast covariance P f k of the KF. As such, the joint forecast pdf of the system's state and parameters at any time t k is only partially sampled, which 165 means that there exists a null subspace in the error space that is not covered by the ensemble (Song et al., 2010;Mandel et al., 2011). To mitigate this, we will a use hybrid formulation of the forecast state and parameters statistics before performing the EnKF update (e.g., Wang et al., 2007). Further details are given in section 2.2.
The limited ensemble size may also introduce noise in the update step of the EnKF when the rank of the observation error covariance is large (Hoteit et al., 2015). This is because the number of observation perturbations may not be enough to 170 sample the observation error covariance matrix, R k . In addition, spurious correlations between the observation and the forecast perturbations will also introduce noise in the EnKF update (e.g., Bowler et al., 2013;Hoteit et al., 2015). To illustrate, the EnKF analysis assumes zero cross-correlations between the observation perturbations and the forecast ensemble; i.e.: This can be easily seen by subtracting eq. (5) from eq. (6). After arranging the terms and using eq. (4), one obtains: where ∆ is the sampling error term; not accounted for in the EnKF. Consequently, the ensemble analysis covariance matches the optimal KF covariance, P a k , only when the observational sampling errors and the cross-correlation terms in ∆ are indeed zero. This can be numerically achieved by assimilating the observations serially using the so-called EnKF with exact second-180 order perturbations sampling, EnKF ESOS , as will be discussed in more details in section 2.3.

The Hybrid EnKF
The hybrid EnKF and optimal interpolation (EnKF-OI) was introduced as a way to account for small ensemble sizes and model deficiencies in the EnKF (Hamill and Snyder, 2000). Using small ensembles results in rank deficient forecast covariance matrices, that strongly limit the fit to the observations (Song et al., 2010). Neglecting model errors might further produce small 185 ensemble spread, and consequently unrealistic confidence in the forecast (Song et al., 2013). The standard solution for rank deficiency or covariance underestimation is to apply inflation and localization. Inflation artificially inflates the spread of the ensemble around the mean state (Hamill et al., 2001;Hoteit et al., 2002). It is also a simple way to account for neglected model errors (Pham et al., 1998). Covariance localization eliminates spurious correlations by a Schur product multiplication of the under-sampled covariance matrix with a function of local support (Houtekamer and Mitchell, 2001;Sakov and Bertino, 190 2011). Inflation and localization, although efficient and widely used (especially in atmosphere and ocean application), are generally model dependent and require important tuning efforts. They further do not introduce any new directions to diversify the ensemble, limiting the filter update to a small-dimensional ensemble subspace (Song et al., 2010(Song et al., , 2013. Moreover, global model parameters are not local quantities and therefore localization techniques might not be as straightforward (Devegowda et al., 2007). In addition, the parameters are dynamically constant quantities, and thus large ensembles are usually needed to 195 well approximate the parameters distributions (Hendricks Franssen and Kinzelbach, 2008;Zhou et al., 2012).
The hybrid approach estimates the EnKF's forecast error covariance by a weighted sum of the ensemble covariance and a stationary covariance matrix, typically used in a variational or an optimal interpolation (OI) assimilation system. More specifically, the background state-state and state-parameters covariances are estimated as: where P EnKF xx and P EnKF θx are the sample covariance and cross-correlation matrices of the EnKF ensemble, respectively. The background covariances are denoted by P b xx and P b θx , respectively. It was indeed shown by Hamill and Snyder (2000) that this additional stationary background covariance may help representing part of the ensemble's null space that is not represented by the limited ensemble. This procedure is based on physically reliable statistics, although flow-independent, unlike inflation and 205 localization (Wang et al., 2009). The scalar quantities α and β are weighting factors, taking values between 0 and 1.

Practical Implementation
The static background covariance, P b xx , is often built on the basis of a long inventory of forecast errors (Wang et al., 2009). It is usually assumed to be of low-rank, r x , and can be factorised into spectral modes using Proper Orthogonal Decomposition (POD) as follows; where S is a matrix of spectral coefficients, Ω carries information about the associated spectral variances and Ω 1 2 is the Cholesky decomposition of Ω. The background perturbation matrix, S, has r x columns, with r x often smaller than the number of state variables. The background state and parameters cross-covariance, P b θx , can be also approximated by a low-rank, r θ , matrix using singular value decomposition (SVD) if the number of parameters is not equal to the number of state variables 215 (and thus the matrix P b θx is not square). This decomposition in practice in order to reduce computational burden and memory storage. Accordingly, the complexity of the analysis step (referred to as O a ) in the hybrid EnKF-OI scheme becomes: Given that r x and r θ are usually small in subsurface flow and transport problems (Gharamti et al., 2014b), the complexity of 220 the analysis step of the hybrid EnKF-OI is only marginally larger than that of the EnKF. The complexity of the forecast step of the EnKF and the hybrid EnKF is the same when both are implemented with the same ensemble size.
The weighting factors α and β need to be defined in eqs. (10a) and (10b). Careful tuning of α and β is very important (Hamill and Snyder, 2000). The simplest way is to select them based on trial and error but this can be computationally very intensive. A more efficient approach was introduced by Gharamti et al. (2014b) and consists of optimising a one-dimensional (1D) objective 225 function at every update step of the state and the parameters. Based on Kalman's update formulation, assimilating observations causes the uncertainties in the prior estimates to shrink. Thus, using the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951), one can choose α and β that maximise the information gains at the analysis time t k . In this study, we opt to assimilate the observations serially and thus one can adaptively compute optimal weighting factors as follows: where tr [·] denotes the trace of a matrix and d is a scalar quantity equivalent to observation variance H k P f xx H T k + R k when assimilating one observation. c [m] xx is the m th forecast variance-component corresponding to one observed variable. Similarly, one can derive the objective function for the parameters' weighting factor as follows: θx is the forecast cross-correlation component between the m th parameter and one observed variable. Such KL criterion describes the information gain from each individual observation as it reflects the difference between the prior and the posterior distributions. The interesting point here is that for each observation, different weights would be assigned to the background and 240 the ensemble statistics. The maximisation problems in (13) and (14) are 1D and bounded, yielding minimal forecast variance after the update. In terms of implementation, we perform the optimisation, over the interval [0, 1], using a computationally efficient scheme combining both golden-section search and repeated parabolic interpolation (Forsythe et al., 1977).

Exact Second-Order Observation Perturbations Sampling
The sampling error from neglecting the cross-correlation terms in eq. (9) in the EnKF analysis is generally not globally small.

245
It is often composed of a large number of elements that can add up after successive assimilation steps (Hoteit et al., 2015). This may degrade the filter's accuracy and increases the underestimation of the analysis error covariance (Whitaker and Hamill, 2002). Furthermore, such sampling errors can propagate to subsequent steps, eventually deteriorating the performance of the filter.
In a mathematical sense, for the condition in eq. (7) to hold, the rank of the forecast perturbation matrix plus the rank of R k must not exceed N e − 1, which is essentially the rank of Ψ f k (Pham, 2001). Obviously, this is not possible given that N y + N e − 1 is always greater than N e − 1. Yet, if we suppose that Ψ f k has a rank N e − 2, then when R k is scalar, it is possible to draw the observation perturbations i k such that the EnKF's analysis first and second moments are exactly the same as those computed using the KF. Accordingly, Hoteit et al. (2015) proposed to remove one rank from Ψ f k using an SVD 255 decomposition: where w k is the normalised right singular vector of Ψ f k associated with the smallest nonzero singular value. The i th component of w k is denoted by w i k and the symbol ← means "replaced by." Then, assimilating the observations serially and simply choosing i k = (N e − 1) R k w i k would guarantee zero cross-correlations between the modified forecast perturbations and the EndWhile where s is an independent plus or minus sign. The superscript [j] denotes the j th element and row of the given vector and 275 matrix, respectively. The superscript [j, j] denotes the element in row and column j of the associated matrix. Note that unlike the EnKF, the observation perturbations cannot be Gaussian because of the constraint they satisfy in eq. (7). In the experiments of Hoteit et al. (2015), these were shown to be almost Gaussian. In term of complexity, the EnKF ESOS algorithm has almost the same computational cost as that of the serial EnKF. Additional cost is required for iteratively updating the vector w k and performing an SVD on Ψ k to reduce its rank by one. Both operations are computationally almost negligible compared to the 280 cost of integrating the subsurface model. At the port site, more than 600 companies perform various activities such as trans-shipment of containers (coal, oil, gas, etc), storage of oils and chemicals, building/repairing ships and oil/gas rigs, distribution and transport inland, and disposal/treatment 295 of chemical wastes. As a result of the long-term presence of these industrial activities, the soil and groundwater have become contaminated. This contamination is substantial, complex, and not limited to one particular site but affects the groundwater systems at a regional scale (Marsman et al., 2006;Ter Meer et al., 2007). Part of the contaminants are non-mobile such as heavy metals including arsene, cadmium, copper, mercury, lead and zinc. Other mobile contaminants are mineral oils, volatile aromatics, chlorinated solvents and pesticides. 3.2 Coupled 3D Subsurface Model

Organic Contaminants
Wells monitoring and lab analysis have concluded that groundwater at the port area is contaminated, at different depth, with varying levels of pollutants (Marsman et al., 2006). One of the major contaminants are chlorinated hydrocarbons that had entered the subsurface as Dense Non-aqueous Phase Liquids (DNAPL) and often have source zones of stagnant pure 305 phases at considerable depth. Numerous industrial companies at the port manufacture or work with these organic molecules.
Here, we simulate the degradation chain of four CH components; namely Tetrachloroethene (PCE, a.k.a perchloroethene), Trichloroethene (TCE), 1,2-Dichloroethene (DCE) and Vinyl Chloride (VC). We use plume data from a real site, but for confidentiality reasons we do not show the exact location of the site. The horizontal area of the domain is equal to 1.5 km 2 , extending 1 km in the transverse direction and 1.5 km in the longitudinal direction ( Figure 2). Degradation of the dissolved components 310 takes place as chlorine atoms are subsequently replaced by hydrogen atoms under anaerobic environmental conditions (Vogel and McCARTY, 1985;Clement et al., 2000;Tobiszewski and Namieśnik, 2012). Chlorinated hydrocarbons can pose serious threat to human and environmental health (Ojajärvi et al., 2001;Lee et al., 2002Lee et al., , 2003.

Flow-Transport-Reaction Model (FTR-Model)
The subsurface model consists of three major components; namely flow, transport and reactions. First, the groundwater flow 315 (assumed steady) is solved on a rectangular domain using MODFLOW (Harbaugh, 2005). Next, MT3DMS is used to solve the advection-dispersion based transport of the components (Zheng and Wang, 1999), in which the degradation process of the components is added based on the module within the 3D-multispecies reactive package; RT3D (Clement, 1997). The softwares are integrated in a sophisticated fortran-based tool (with graphical interface) called iMOD (Vermeulen et al., 2013).
In differential form, the fate and transport of the components is modelled following: where φ is porosity, ρ b is the bulk density of the soil, k is the distribution (sorption) coefficient, C is the solute concentration, λ is first-order reaction rate, D consists of hydrodynamic dispersion and molecular diffusion, ν denotes the Darcy velocity, q s is the volumetric source/sink flow rate, C s is the source/sink flux concentration and r C refers to the rate of reactions.
The superscript corresponds to the component number taking values between 1 and 4 in this study. Along with the basic 325 groundwater flow and transport equations, and using the reaction operator-split strategy (Clement et al., 1998) reaction kinetics are assembled as a set of ordinary differential equations as follows: where C PCE , C TCE , C DCE , and C VC are the concentrations of the components, K P , K T , K D , and K V are first-order anaerobic degradation rate constants, S T /P , S D/T , and S V /D are stoichiometric yield values, and R P , R T , R D , and R V are retardation factors. Linear sorption conditions are assumed for all components.
The model domain as indicated by the blue region of Figure 2 is discretised horizontally into 20×30 grid cells of 50×50 m. In 335 the vertical direction, we consider 120 layers each of 0.5 m thickness. The discretisation is based on the geological voxel model GeoTOP (Stafleu et al., 2011). The top layer starts at 7.5 m above sea level, whereas the lowest layer is located at around 52.5 m below sea level. Based on different simulations conducted as part of this study, the migration of the contaminants was found to be limited to a certain depth. We thus assume that only layers 21 − 100 are active. Furthermore, the PCE plume is used as a continuous contamination source and was included in the Source/Sink Mixing [SSM] package of the MT3DMS simulator. Up to this date, other time-series and well contaminant data are not accessible due to confidentiality imposed by local companies. Modelling parameters required for running the coupled FTR-Model, such as 345 porosity, distribution coefficients and others are defined, based on real data and laboratory assessment, as 3D heterogeneous fields. In Table 1, we report the mean value (averaged over all layers) for some of these parameters. As an illustration, we show in Figure 3 the spatial map of the distribution coefficient of TCE averaged over the top 10 layers. The map shows larger sorption degrees in the northeast part of the domain. This gradually decreases towards the southern region.

Reference Run and Pseudo-Observations
In the scope of twin-experiments, we first conduct a reference model run using some "true" (reference) parameters and initial condition. Next, we impose different uncertainties on the model and the initial conditions, and we assimilate pseudoobservations extracted from the reference run to recover the "true" trajectory of the model. The goal is to estimate the concen-tration of chlorinated hydrocarbons (i.e., state variables) and their associated degradation rates (i.e., parameters): Based on the model's configuration, the dimension of the state is N x = 288, 000 and the parameters N Θ = 4. The reference values of the anaerobic degradation rates are obtained through field and laboratory testing (Suarez and Rifai, 1999) and are given as K P = 0.068, K T = 0.086, K D = 0.004, and K V = 0.153 per day. We perform the reference run for a 50-years period 360 using these rates along with the parameters of Table 1 and the initial conditions, x 0 , as defined above. We plot the resulting hydraulic head field at four different layers (i.e., 30, 50, 70, and 90) in Following this steady flow, we then simulated the reference transport of PCE, TCE, DCE, and VC. The time step of the transport-reaction simulator was about 11 days. To visualise the migration process, we show in Figure 5 the contaminant evolution of PCE, TCE, DCE and VC in layers 40, 60, 80, and 100 after 50 years, respectively. As shown, the contaminant plume, which is originally present in layer 60, has moved into deeper Pleistocene layers. After 50 years, the maximum concentration 370 of DCE in layer 80 reaches 650 µg/l. Careful assessment of the transport process shows that the four plumes have reached the last active layer in the second aquifer; i.e., layer 100. This is mostly due to the continuous PCE contamination source located in layer 60. Contaminant concentrations in the top Holocene layers are much smaller. Laterally, the contaminant plume is seen to expand from its initial location to a distance of 1.3 km southwards.
From the reference run, we collect pseudo-observations of the concentration to use them later for assimilation. Observations 375 are assumed available for all components from layers 30, 50, 70, and 90. From each of these four layers, only 10 data points are collected and thus a total of N y = 160. Note that N y is much smaller than the number of state variables, N x . This is usually the case in subsurface hydrology applications, given the significant and expensive cost incurred for preparing, drilling, and completing wells. The observation points are uniformly distributed throughout the domain as denoted by green triangles in Figure 2. We assume that these 160 measurements are available for assimilation on a yearly basis. We also place a control 380 well in layer 70 around the center of the domain, particularly at the local coordinates x = 450 and y = 600 m, to monitor the concentration evolution in time. We further assume that these observations are noisy, in order to mimic realistic settings.
We thus perturb them with a Gaussian noise of mean 0 and standard deviation equal to 15% of the total observation mean (averaged over the entire 50 years). During assimilation, the updated concentration values are monitored to make sure that they are non-negative. Cell values that fall out of this constraint are set to zero to obtain a physically meaningful solution (Li et al.,

Initial Ensemble and Background Statistics
To initialise the filters, we perform an unconditional 50-years simulation run (referred to as free run) starting from the mean concentration of the reference model run. Thus, at time t = 0 the concentration of the four CH components is not only present in layer 60, but rather spread-out in all layers. In this free model run, we use around 30% larger degradation rates than the 390 reference values. The concentration of the components was saved each 6 months. Next, we randomly select a set of N e concentration snapshots from the free run outputs to form the state ensemble. This prior (initial) ensemble is quite far from the truth and further has a relatively small spread. This is chosen in purpose to test the robustness of the assimilation schemes to challenging initial uncertainties. The initial parameters ensemble is sampled assuming a Gaussian distribution with mean equal to the reference rate values and variance 40%.

395
The background error statistics required for the EnKF-OI scheme are parameterised as follows. We form a set of 200 degradation realisations, as described above, and use these to perform 3-months forecasts starting from a series of initial concentrations distributed at 3-months intervals over a 50-years period, as outlined in Figure 6. To illustrate, starting from the mean concentration of the reference run, one realisation of the degradation rates Θ 0 is used to obtain a 3-months forecast of the concentration x 1 . Then, using x 1 and Θ 1 , the 3D-FTR model is integrated forward to obtain x 2 concentration after 400 6 months. We continue this process until the end of the 50 years period. Then, we collect the predicted contaminant states for all components and augment them with the corresponding degradation rates in a joint matrix form. POD and SVD are then performed on the augmented concentration-degradation forecast perturbations to summarise the correlations by a small number of orthogonal patterns (Hoteit et al., 2002;Skachko et al., 2009;Altaf et al., 2013). Consequently, the parameterisation of the background covariance matrix, P b xx , is achieved using the leading 10 POD modes (i.e., r x = 10) of this ensemble, which 405 capture more than 98% of the total variance. Concerning the background cross-correlations, we use the first 10 singular vectors (thus, r θ = 10) to parametrise P b θx matrix. To visualise these correlations, we plot in Figure 7, as an illustration, the spatial correlation between the rate at which PCE is degrading and the concentration of the ending product of the chain, VC. Clearly, VC concentration in layer 60 exhibits the largest correlation values because of the continuous source term. Furthermore, since the groundwater flow is stronger in the downwards direction, and so is the contaminant migration, the cross-correlations 410 in deeper layers are larger than those of the shallow layers. The background P b θx seem to vanish in the upper parts of the Holocene clay and peat layer. A consistent behaviour is observed for the remaining three degradation rates, in which the largest correlations are those associated with C PCE and smallest with C TCE . PCE has the highest correlation because of the continuous source zone of PCE. Any removal of PCE due to biodegradation in the source zone is directly replenished the next time step, and therefore K P determines the total load of chlorinated hydrocarbons in the system. On the other hand, TCE has the lowest 415 correlation because its value is high and that makes the parameter relatively insensitive to the amount of biodegradation taking place as compared to the other degradation rate constants.
State and parameters estimates of the EnKF, hybrid EnKF-OI and EnKF ESOS schemes are evaluated using two metrics; namely mean-squared-error (MSE) and average-ensemble-spread (AES): where x t i is the true value of the variable at location i, x e j,i is the forecast ensemble value andx e i is the corresponding ensemble mean. MSE measures the distance from the estimate to the truth and AES measures the spread or the uncertainty of the estimates (Hendricks Franssen and Kinzelbach, 2008).

425
In this section, we present and compare assimilation results with the Rotterdam port's 3D-FTR model using the standard EnKF, the hybrid EnKF-OI and the EnKF ESOS schemes. The observations are assimilated serially in all three schemes. Concentration and degradation rate estimates of the filters are compared in terms of accuracy and spread. Concentration data are assimilated every year for a total of 50 assimilation cycles. The ensemble size, N e , is set to 48 in which batches of four members are run in parallel using Fortran's OpenMP library.

Adjusting Concentration Statistics
To initiate the assimilation experiments, we first run the EnKF and the hybrid EnKF-OI, implemented using only state background statistics, i.e., using eq. (10a) and β = 1. We carry out 10 different experiments by changing the weighting factor α, for each individual run, between 0 to 1 with a step of 0.1. To visualise the resulting estimates, we plot the average MSE of the 435 contaminant concentrations, averaged over the 4 components and in time, in Figure 8. As shown in the left panel of Figure 8, the most accurate concentration estimates are obtained using α = 0.7. This indicates that out of the total forecast error variance, the best reconstruction of the reference contaminant solution is obtained when 30% of this variance is traced from the background statistics. Increasing or decreasing this background contribution (i.e., 30%) results in less accurate contaminant estimates. We also note that the least accurate estimates are those obtained when 90% of the ensemble statistics are built based 440 on the static background error covariance. On average, we found that when α takes values between 0.4 and 0.9, the EnKF-OI is 16% more accurate than the EnKF.
We also study the effect of changing α on the resulting parameters' estimates. In the right panel of Figure 8, we plot the average MSE of each individual degradation rate obtained using the EnKF and 9 different EnKF-OIs (i.e., α = 0.1, 0.2, . . . , 0.9).
We notice that the most influenced biodegradation rates are those associated with TCE and VC. In fact, K T and K V are the 445 least identifiable parameters and therefore a small difference in the estimation algorithm (i.e., hybrid EnKF-OI and the EnKF) may lead to different estimates. In contrast, K P and K D are less sensitive to the weighting factor α. In accordance with the estimates of the contaminant concentrations, the best match for K T is obtained using α = 0.7. This is not the case for the other parameters; in which α = 0.3, 1 and 0.8 resulted in the best fit to the reference degradation rates of VC, PCE and DCE, respectively. On average, K T and K V estimates are 13% and 35% more accurate than those of the EnKF, respectively. The 450 key point is that complementing the state statistics, using a weighted error covariance as in (10a), does not only contribute to a better retrieval of the concentration but also helps adjusting the cross-correlations with the uncertain parameters. This is essentially the case when biodegradation is taking place at a higher rate, as in K T and K V , and thus the more information fed through observations, the better the state-parameters cross-correlations would be.
To better understand the performance of the EnKF-OI scheme, we further study how the uncertainties of these degradation 455 rates are maintained in time. For this, we plot in Figure 9 the overall ensemble spread of the four degradation rates obtained using the EnKF and the EnKF-OI (all tested α's). As shown, the EnKF's spread around these 4 parameters is quickly reduced after the first 2 or 3 years. This rapid reduction of the ensemble spread, which is due to the relatively small ensemble size and large initial uncertainties, limits the ability of the filter to impose larger corrections in the future, eventually degrading the accuracy of the estimated parameters. In contrast, the EnKF-OI maintains larger uncertainties in time for different weighting 460 factor values. As such, as α decreases from 1 to 0.1 the performance approaches that of the EnKF. For instance, after the second year, the ensemble spread of the EnKF reaches 0.03 while it continues to be higher for the EnKF-OI and equal to 0.09, 0.07 and 0.045, respectively. This in turn helps the hybrid filter benefit more from the assimilation of concentration information. This can be confirmed by noticing that the spread of the hybrid filter continues to decrease after 25 years of assimilation, unlike the EnKF that does not show signs of more corrections. All in all, selecting α = 0.7 seems to maintain enough spread for 465 both components' concentration and degradation rates and it retains, on average, the most accurate estimates. For the sake of comparison, we refer hereafter to this scenario as EnKF-OI α=0.7 .

Adjusting Concentration and Parameters Statistics
In the following set of experiments, we fix α to 0.7 and focus on changing the weighting factor between the EnKF and background state and parameters cross-correlations; i.e., β. As in the previous section, we conduct 9 experiments in which we 470 change β between 0.1 and 0.9. Note that the larger β is, the closer the performance is to EnKF-OI α=0.7 . To analyse the results, we plot in Figure 10 the average MSE and AES of the chlorinated hydrocarbon concentrations. Compared to the previous runs that hybridise the state only, including background cross-correlations information slightly increases the spread around the ensemble mean of concentration, as observed for 0.1 < β < 0.3. In terms of accuracy, varying β between 0.1 and 0.6 yields more accurate concentration estimates for all components. To illustrate, when β = 0.1 the average improvements over 475 the EnKF and the EnKF-OI α=0.7 are around 50% and 32%, respectively. This vigorous performance suggests that using only 10% of the "flow-dependent" parameters' ensemble to characterise the pdf of the system is enough to outperform the EnKF.
In essence, the background state and parameters cross-correlations seem to carry sufficient description of how the degradation rates and the concentration of each of the components are related. Consequently, only a small portion (i.e., 10%) of the online parameters' ensemble is required to obtain an accurate biodegradation picture, while the rest of the information all comes from 480 16 Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-156, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 20 April 2016 c Author(s) 2016. CC-BY 3.0 License. the specified offline background statistics. This could be due to the time-independent nature of the propagation step describing the evolution of the degradation rates, thereby manifesting a minimal dependence on the online ensemble. This observation comes in accordance with the famous steady-state Kalman filter (El Serafy and Mynett, 2008) that assumes time-invariant error covariance matrix as long as accurate spatial correlations are used within the so-called Kalman gain. In here, our experimental results suggest that the best parameter's hybrid covariance matrix is very close to a steady-state one. However, this is only for 485 the parameters and this was not the case for the state as described in section (4.1.1). Following the notation introduced earlier, we refer to this scenario, hereafter, as EnKF-OI β=0.1 α=0.7 To have a better insight at the suggested robust performance, we plot in Figure 11 the evolution of the concentration ensemble members for all components in time. For a fair comparison, we also plot the associated reference solution, the EnKF's and the EnKF-OI α=0.7 's ensemble members. As explained in section (3.4), the initial ensemble spread is clearly far from the truth.

490
When data is assimilated into the system, all schemes tend to move closer to the truth. By the end of the 50-years period, both EnKF and EnKF-OI α=0.7 underestimate the concentration of DCE and VC and end up with quite small ensemble spread. The EnKF-OI β=0.1 α=0.7 , on the other hand, retains the best performance, matching the reference solution for all components. Moreover, this hybrid scheme is shown to preserve the ensemble spread around the true final concentrations. In terms of the estimated degradation rates, we show in Figure 12  is 33%, 17%, 33% and 15% more accurate for retrieving K P , K T , K D and K V , respectively. From the images, one can see that the accuracy of the degradation rates tends to improve in time except for K D that is shown to degrade for small α and large β values. To interpret this behaviour, one should recall that K D (and also K V ) can only be estimated correctly as long as the 500 concentration of the source component (C DCE in this case) is accurately recovered. Before attaining this, the estimates of K D are compensated for errors in K P and K T .
Next, and instead of manually changing the weighting factors α and β, we follow eqs. (13) and (14) and conduct a 1D optimisation problem prior to assimilating the observations serially. The idea is to get the maximum reduction in the prior uncertainties for both the concentration and the degradation rates. As such, different weights can be assigned to the background 505 and the ensemble statistics. Based on this, we plot in Figure 13 the optimal α values at every assimilation cycle and for each included. When assimilating PCE, TCE and VC concentrations, the adaptive scheme tends to use the background covariance (i.e., α = 0) for almost the first 25 years. Beyond this, the filter statistics are only based on the ensemble flow-dependent information (i.e., α = 1). This is not surprising given the large initial uncertainties imposed on the contaminant concentrations.
Once the statistics are adjusted towards the truth, the filter starts trusting the information coming from the ensemble statistics.
This adaptive performance changes when assimilating DCE observations in the sense that the filter builds its forecast error covariance mostly using background statistics and less using the flow-dependent ensemble. This comes in agreement with the analysis and the conclusion drawn from Figure 12 in which the background information of DCE are more useful than the ensemble statistics. Averaging over the entire optimal values of α values, we obtain a global α * = 0.64, which is quite close to the 0.7 value that resulted in the best performance in section (4.1.1). In terms of the adaptive β values, we found out that maximising the difference between the prior and the posterior parameters' covariance, P θθ , may not always be helpful. This 520 is because doing such maximisation can quickly diminish the ensemble spread eventually paralysing the filter's analysis. In fact, minimising the difference 1 yielded more accurate degradation rates. To analyse this, we plot on the same figure the timeevolution of MSE of concentration when (1) maximising the information gain for both state and parameters, (2) minimising the information gain for both and (3) maximising the state's and minimising the parameters' information gain, respectively. As demonstrated by the 3 curves, the best performance happens when the information gain for concentration is maximised and 525 the associated parameter's one is minimised. Compared to maximising the information gain of both state and parameters, this mixed scheme now yields 37% more accurate contaminant concentrations.

The EnKF ESOS vs the EnKF
In the previous section, all approaches and experimental results were intended to overcome the rank deficiency and the undersampling of the ensemble's sample forecast error covariance. In the following experiment, we attempt to deal with the un-530 dersampling of the observation errors by implementing the EnKF ESOS algorithm presented in section (2.3). We first note that the distribution of the new observation perturbations show reasonable deviations from the prescribed Gaussian errors in the original EnKF algorithm, as has been noticed by Hoteit et al. (2015). To assess the performance of the EnKF ESOS against the EnKF, we study at a closer glance the contaminant maps after 50 years as estimated by the ensemble means from both schemes.
Thus, we plot in Figure 14 the normalised errors for the components TCE and DCE at layers 70 and 80, respectively. These 535 error maps are obtained by subtracting the ensemble mean concentration from the reference and then normalising the result by the average of the reference solution. One common feature in these maps is the clear underestimation of TCE and DCE in the north part of the domain. This is because the initial reference concentration is quite different from the one assigned to the initial ensemble using the free run setup as outlined in section (3.4). In time, both filtering schemes try to push the contaminant plume, which has already moved towards the southern region, upwards to match the truth. Moreover, as demonstrated in layer 540 70 and unlike the EnKF ESOS , the EnKF overestimates TCE concentration in the center of the domain, which further continues to move southwards. In layer 80 (i.e., 5 m deeper), the EnKF tends to underestimate the concentration of DCE especially in the southern part of the domain. On the other hand, a slight overestimation of this DCE concentration towards the center is suggested by the EnKF ESOS . In general, and assessing similar patterns at other layers, the EnKF ESOS shows higher accuracy in retrieving the contaminant concentration than the EnKF. This provides further evidence that ignoring the observation sampling 545 errors within the EnKF can indeed deteriorate the quality of the state estimates. 1 Minimising the difference between the prior and the posterior covariances does not mean that the filter does not apply any correction. Since the Kalman analysis equation always minimises the variance, the adaptive algorithm now acts in a way such that only the lowest minimisation possible is retrieved. Unlike standard Kalman filtering, this procedure moves at a slower pace towards the truth.
To study the impact of the EnKF ESOS on the estimates of the parameters, we examine the evolution of the entire distribution of TCE degradation rate in time. We compare the resulting pdfs with those obtained using the EnKF after 5, 15, 30 and 45 years. On top of the pdfs, we also monitor the temporal evolution of K T AES in Figure 15. Starting from rather flat and uncertain pdfs of K T , both EnKF and EnKF ESOS exhibit a rigorous progress pushing the members of TCE degradation rate 550 towards the truth, which is 0.086 per day. Notice that within the first 15 years, the pdfs seem to move in the wrong direction, away from the truth. This is due to the large concentrations at time 0, and thus the filter increases the degradation rates to fit the reference contaminant concentration. Beyond that and once the concentration is adjusted, the parameters from both filtering schemes begin moving closer to the true degradation rate. However, the EnKF is seen to move faster towards the truth and further diminishes the uncertainty around K T quite rapidly. Consequently, the resulting pdf of K T after 45 years looks like a 555 Kronecker delta function. This is, roughly speaking, not a very healthy assimilation system as the parameter updates become insignificant over the rest of the assimilation window. In contrast, the degradation rate obtained using EnKF ESOS moves at a slower pace towards the true rate maintaining large enough spread to fit the incoming observations. Compared to the EnKF, the AES suggested by the EnKF ESOS , as shown on the left y-axis, is almost 40% to 50% higher. As a matter of fact, this performance is more trustworthy than that of the EnKF indicating the essential need to account for observation sampling errors 560 at the time of the analysis. Hoteit et al. (2015) found that the ensemble spread of the EnKF ESOS is larger than that of the EnKF for state estimation. In here, we experienced a similar, yet more pronounced behaviour for the estimates of the parameters.
As a final assessment, we compare the best estimates obtained using all considered schemes in this study; i.e., the EnKF, the EnKF-OI β=0.1 α=0.7 and the EnKF ESOS . We plot the time series of MSE for contaminant concentration and degradation rates, summed over all components, in Figure 16. Clearly the EnKF is the least accurate. Accounting for observation sampling errors 565 yield around 21% and 23% more accurate state and parameters estimates, respectively. Tackling the rank deficiency of the EnKF results in 48% and 70% more accurate state and parameters estimates, respectively. Accordingly, addressing the issues of observation sampling errors and rank deficient forecast ensemble matrices seem to be crucial and can highly improve the accuracy of the estimates. From our experimental results and for this particular setting, resolving the rank deficiency issue appear to have the largest impact on the final estimates of the filter. 1. Both the hybrid EnKF-OI and the EnKF ESOS successfully provide accurate concentration and degradation rate estimates.
On average, a tuned hybrid EnKF-OI (using α = 0.7 and β = 0.1) suggests 48% and 70% more accurate state and parameters estimates than those obtained using the EnKF. On the other hand, the EnKF ESOS 's state and parameters estimates are 21% and 23% more accurate, respectively. In addition, the two schemes are easy to implement and computationally 585 efficient requiring only a minimal change to the standard EnKF code.
2. Both filtering schemes demonstrated a better handling of the ensemble spread, for both state and parameters, avoiding any collapse or false (unrealistic) confidence in the estimates, leading to better ability to fit the observations.
3. The hybrid scheme requires some effort to tune two weighting factors that adjust the background statistics for both state and parameters. The serial adaptive version of this scheme, which relies on maximising the information gain between 590 the forecast and analysis for each individual observation point, seems promising. From the experiments, we found that maximising the information gain however could possibly deplete the uncertainty within the ensemble quite rapidly. Yet, this observation may vary between systems depending on the degree and the rate of uncertainty growth. One possible solution that we tested is to minimise the information gain, and thus decrease the update impact when fitting the observations. Further, one could also build the objective function in such a way that only a portion of the information gain 595 is maximised. For instance, an example would be to enforce the ratio between the trace of the analysis and the forecast covariance matrices to be greater than 40% meaning that at least 60% of the ensemble uncertainty is preserved.
4. Failing to account for observation undersampling errors in the standard EnKF can impact not only the quality of the state but more importantly the estimates of the parameters. In our experiments, the degradation rates obtained after assimilating using the EnKF ESOS scheme were more accurate, more reliable and certainly more realistic. 600 5. Careful tuning of the hybrid EnKF-OI yields the best estimates of the concentration and the degradation rates as compared to the EnKF and the EnKF ESOS . This manifests the importance of hybridising the parameters cross-correlation matrix.
6. Building a unified EnKF scheme, which tackles both the undersampling of the forecast covariance and the observation sampling errors simultaneously is an interesting line of research in the future. The idea, as we see it now, would be 605 to hybridise the joint ensemble statistics and then apply an SVD decomposition on the forecast perturbations. Thus instead of removing one rank only, N b + 1 ranks need to be removed where N b is the rank of the background covariance.
Subsequently, a second-order sampling of the observation perturbations could be then applied as described in section (2.3).