Technical note: assessment of observation quality for data assimilation in flood models

. The assimilation of satellite-based water level observations (WLOs) into 2-D hydrodynamic models can keep ﬂood forecasts on track or be used for reanalysis to obtain improved assessments of previous ﬂood footprints. In either case, satellites provide spatially dense observation ﬁelds, but with spatially correlated errors. To date, assimilation methods in ﬂood forecasting either incorrectly neglect the spatial correlation in the observation errors or, in the best of cases, deal with it by thinning methods. These thinning methods result in a sparse set of observations whose error correlations are assumed to be negligible. Here, with a case study, we show that the assimilation diagnostics that make use of statistical averages of observation-minus-background and observation-minus-analysis residuals are useful to estimate error correlations in WLOs. The average estimated correlation length scale of 7 km is longer than the expected value of 250 m. Furthermore, the correlations do not decrease monotonically; this unexpected behaviour is shown to be the result of assimilating some anomalous observations. Accurate estimates of the observation error statistics can be used


Introduction
In data assimilation (DA), observations are combined with numerical model output, known as the background, to provide an accurate description of the current state, known as the analysis.In DA the contributions from the background and observations are weighted according to their relative uncertainty.The observation error statistics are the sum of the instrument error and representation error (Janjić et al., 2017).The error of representation arises due to the mismatch in the observation and its model equivalent, and it is often correlated and state dependent (Waller et al., 2014;Hodyss and Nichols, 2015).In DA, observation error statistics are typically assumed to be uncorrelated.The data density is reduced in order to satisfy this assumption (Lorenc, 1981).Yet having adequate estimates of these uncertainties is crucial in order to obtain an accurate analysis.Since the true state of the system is not known, the exact observation errors and their statistics can not be calculated.Instead observation uncertainties must be estimated statistically (e.g.Hollingsworth and Lönnberg, 1986;Ueno and Nakamura, 2016).Desroziers et al. (2005) provide a diagnostic to estimate observation uncertainties using the statistical average of observation-minusbackground and observation-minus-analysis residuals.The diagnostic has been applied to operational numerical weather prediction (NWP) settings to estimate observation uncertainties (Stewart et al., 2014;Waller et al., 2016a, c;Cordoba et al., 2017).The use of these estimated statistics in NWP results in a more accurate analysis and improvements in objec-Published by Copernicus Publications on behalf of the European Geosciences Union.
The development of DA systems has largely been driven by its use in NWP, but the methodologies are applicable to any system that can be modelled and observed.There have been recent advances in real-time 2-D hydrodynamic modelling and the acquisition and processing of relevant remote sensing observations (earth observations, EOs) (Raclot, 2006;Andreadis et al., 2007;Schumann et al., 2007Schumann et al., , 2011;;Mason et al., 2010aMason et al., , 2012aMason et al., , 2014)).Consequently, several studies have shown the benefit of applying DA to operational flood forecasting (Durand et al., 2008(Durand et al., , 2014;;Montanari et al., 2009;Roux and Dartus, 2008;Neal et al., 2009;Matgen et al., 2010;Mason et al., 2010b;Giustarini et al., 2011;García-Pintado et al., 2013, 2015).Grimaldi et al. (2016) review the potential of EOs for inundation mapping and water level estimation and their use for calibration, validation and constraint of real-time hydraulic flood forecasting models.
A predominant EO technique to obtain water level observations (WLOs) is synthetic-aperture radar (SAR).SAR provides high-resolution observations of radar backscatter which, after processing, serve to delineate the flood extent.Then, the intersection of the flood extent with a highresolution lidar digital terrain model is used to obtain the WLOs.The resulting WLOs are discontinuous but locally dense in space; consequently, the errors in the observations may be highly correlated.However, the current practice when assimilating WLOs is to neglect the error correlations.To make the assumption of uncorrelated errors valid the current approach is to thin the data.Hence, in hydrology, one scenario that would benefit from improved understanding of the observation uncertainties is the assimilation of the satellite-derived water level observations (WLOs) for either operational flood forecast or hindcast analyses (Mason et al., 2010a;García-Pintado et al., 2013).A more detailed understanding of the observation uncertainties would be highly useful as understanding the error statistics may permit more observations to be included in the assimilation, which should allow the information from dense observation sets to be fully exploited.Additionally, accurate estimates of observation uncertainties can inform the thinning strategy and suggest which observations may benefit the assimilation most (Fowler et al., 2018).There is a clear potential to improve the flood forecast if all the SAR WLOs could be assimilated in an appropriate way.
In this article we use the diagnostic of Desroziers et al. (2005), described in Sect.2, to estimate the observation error statistics for SAR WLOs that are assimilated using a local ensemble transform Kalman filter (LETKF) into the LISFLOOD-FP 2-D hydrodynamic model.For this study, we use a sequence of real SAR overpasses in a flood event that occurred in November 2012 in SW England.A description of the SAR WLOs and experimental design are given in Sect.3. Results are discussed in Sect. 4. First, we estimate average WLO error statistics across the entire domain for the dura-tion of the flood event.It will be seen later that these globally estimated error statistics show an anomalous pattern.To determine the cause of these anomalous results we consider if observations in different sub-domains have different error characteristics.We also consider if the error statistics differ for different phases of the flood event.From the results we infer that the anomalous pattern is not related to the distribution of observations over the domain but to observations during the later stages of the flood.To the best of our knowledge this is the first time that the diagnostics have been applied to estimate error statistics for hydrological data assimilation.Importantly, we show that the diagnostic of Desroziers et al. (2005) can be used to identify anomalous observation datasets that are not suitable for assimilation.
2 The diagnostic of Desroziers et al. (2005) Data assimilation is a technique used to provide the best estimate, the analysis, of the current state of a dynamical system.The analysis is denoted x a ∈ R N m .The analysis is determined by combining the background x b ∈ R N m , a model prediction, with observations, y ∈ R N p , weighted by their respective error statistics.Here the dimensions of the observation and model state vectors are denoted by N p and N m , respectively.To compare observations and background it is necessary to project the background into observation space using the observation operator, H : R N m → R N p , which may be non-linear.The analysis can be used to initialize a forecast which in turn provides a background for the next assimilation.
In Desroziers et al. (2005) the analysis is calculated using where R ∈ R N p ×N p and B ∈ R N m ×N m are the observation and background error covariance matrices, K is the Kalman gain matrix and H is defined as the observation operator linearized about the background state.The observation-minusbackground residuals d o b = y − H(x b ), also known as the innovations, are assumed to be unbiased.Hence any bias should be removed before assimilation (Dee, 2005).
The observation error covariance matrix can be estimated using the observation-minus-background, d o b = y − H(x b ), and observation-minus-analysis, d o a = y − H(x a ), residuals (Desroziers et al., 2005).Assuming that the observation and background errors are mutually uncorrelated, the statistical expectation of the product of the analysis and background residuals is As the resulting matrix is estimated statistically it will not be symmetric.Therefore, it must be symmetrized before it can be used in a data assimilation scheme.
The form of the diagnostic in Eq. ( 2) is not suitable to calculate observation error statistics when each assimilation cycle uses different observations.Instead components of the background and analysis residuals must be paired and binned, with the binning dependent on the type of correlation being estimated.For example, when calculating spatial correlations the bins may depend on the distance between observations, whereas for temporal correlations the bins would depend on the time between observations.For each bin, β, the covariance, cov(β), is then computed individually using where It is assumed that the observation-minus-background and observation-minus-analysis residuals are unbiased, but this is not guaranteed.Hence the second term of Eq. ( 3) ensures that the computation of the observation error statistics is not affected by bias (Waller et al., 2016a).To calculate the spatial correlation, the covariance in each bin, cov(β), is divided by the estimated variance (the covariance at zero distance, cov(0)).
The diagnostic in Eqs. ( 2) and (3) only gives a correct estimate of the observation error uncertainties if the error statistics used in the assimilation are exact.Even if the assumed statistics are not exact the diagnostic can still provide useful information about the true observation error statistics (Waller et al., 2016b;Ménard, 2016).Further limitations include the use of an ergodic assumption in order to obtain sufficient samples (Todling, 2015) and the assumption that the observation operator is linear (Terasaki and Miyoshi, 2014).
One further issue is that the standard diagnostic is derived assuming that the analysis is calculated using minimum variance linear statistical estimation.If local ensemble DA is used to determine the analysis, the diagnostic does not result in a correct estimate of the observation uncertainties.However, by using a modified version of the diagnostic some of the observation error statistics may be estimated.It is possible to estimate the error correlations between two observations if the observation operator that determines the model equivalent of observation y i acts only on states that have been updated using the observation y j (Waller et al., 2017).Since we use a LETKF assimilation scheme in this study, we must take this into account when estimating observation error statistics for the WLOs.

Methodology
In this article we estimate the observation error statistics for SAR WLOs that are assimilated using a LETKF into the LISFLOOD-FP 2-D hydrodynamic model.This study makes use of the observation, model and assimilation system described in García-Pintado et al. (2015).We direct the reader to this reference, and references therein (particularly Mason et al., 2012a, b), for a thorough description of the derivation of WLOs and the assimilation design.Here we summarize the methodology and provide a description of the data used specifically in this study.

Derivation of WLOs
The original observations used in the deviation of WLOs are obtained using SAR which observes the surface backscatter.In a SAR image flood water appears dark so long as the surface water turbulence is insignificant.Therefore, to obtain flood extent, the pixels in a SAR image are grouped into homogeneous regions.A mean backscatter value is calculated for each region and if this value is below a given threshold, the region is classified as flooded.The threshold is determined by using training data from "flood" and "non-flood" regions.This initial estimate of flood extent is then refined by, for example, (1) correcting for any high backscatter that is a result of vegetation either within the flooded region or at the flood edge; (2) correcting for high backscatter near flooded areas that is a result of water with a rough surface; (3) performing a "nearest neighbour" check, where any local flood height that is significantly larger than those nearby is reclassified as non-flooded.
To provide the WLOs the refined flood extent is intersected with high-resolution digital elevation model (DEM).In order to improve the accuracy of the WLOs, they are only calculated if the slope in the DEM is sufficiently shallow.A further refinement takes into account, for example, the emergent vegetation at the flood edge.
The WLO derivation process results in a large number of WLOs that exist in clusters.It is expected that many of the observations in a cluster will be highly correlated and hence not contribute independent information.At this stage in the processing, Mason et al. (2012a) thin the WLOs to reduce spatial correlation.However, we postpone this step until after the quality control procedures for the data assimilation have been performed.

Model and data assimilation
The observations are assimilated into a 75 m resolution LISFLOOD-FP flood simulation model (Bates and Roo, 2000) using a LETKF (Hunt et al., 2007).Due to the formulation with the diagnostic described in Sect.2, the localization in the LETKF is set in standard 2-D Euclidean space rather than the physically based distance along the river channel described in García-Pintado et al. (2015), which would require a further adaptation of the diagnostic calculation.The localization radius is set using a compactly supported fifthorder piecewise rational function (Gaspari and Cohn, 1999, Eq. 4.10) with length scale 20 km.
To compare the modelled field with the observed quantity it is necessary to define an observation operator that maps from model to observation space.In this study we use the "nearest wet pixel" approach described in García-Pintado et al. (2013).The mapping in the nearest wet pixel approach is dependent on the inundation status at the model location.
If at an observation location the model is flooded, the model equivalent of the observation is simply the water level predicted by the model.However, if the model is dry at the observation location the model equivalent of the observation is taken to be the model water level at the wet pixel nearest to the observation location.

Quality control and data thinning
Data assimilation techniques can lose accuracy if presented with an observation that is grossly inconsistent with the model state (Vanden-Eijnden and Weare, 2013).Thus, before being assimilated, the WLOs are subjected to several quality control (QC) protocols according to the physical characteristics of the terrain and land cover.An additional background check is performed where observations that result in anomalous observation-minus-background residuals are discarded.The QC procedures result in dense cluster of discontinuous observations in which both the observations and their errors may be highly correlated.A direct assimilation of this dense dataset would lead to an analysis biased towards the observations and, for covariance-evolving methods (e.g.ensemble Kalman filters), an over-reduced posterior covariance and unstable long-term forecast/assimilation cycles.Thus, to reduce the number of correlated observations and to avoid dealing with the spatial correlation in the assimilation, the current approach is to further thin the data (as is standard in other assimilation applications such as NWP and oceanography; Dando et al., 2007;Li et al., 2010).The applied thinning, as described in Mason et al. (2012a), uses a top down clustering approach in which principal component analysis is used to select observations that have the highest information content.The spatial autocorrelation of the resulting observations is calculated, and if any significant correlation exists the thinning procedure is applied iteratively until no significant correlation remains.Typically the thinned dataset contains approximately 1 % of the pre-thinned observations.The measured standard deviation for the thinned dataset can be calculated by fitting a plane by linear regression to the WLOs.The variance of the difference between the WLO and planar surface can be used as an estimate of the observation error variance.This approach is considered adequate for this case study as the floodplain in the downstream observed areas is reasonably flat.

Potential observation error sources
In data assimilation the observation uncertainty has contributions from both measurement errors and representation errors.The representation error arises due to the difference between an actual observation and the modelled representation of an observation; this difference can be a result of the following: -Pre-processing/QC errors are errors introduced during the observation pre-processing or quality control procedures.
-Observation operator errors are errors that arise due to approximations in the mapping between model and observation space.
-Errors due to unresolved scales and processes are errors that result from the mismatch between the scales represented in the model field and the observations.
For the WLOs it is clear that a pre-processing error will exist as there is potential for errors to be introduced in the derivation of the WLOs.For example if the water surface is rough it may be assumed that the pixel is dry; as a result the flood extent would be incorrect and hence an error would be introduced in the WLO.For nearby pixels it is possible that there will be similar errors in the derivation process, thereby introducing correlated observation errors.The procedures in Mason et al. (2012a) provide an estimated standard deviation for the WLO pre-processing error and thin the data to ensure that the pre-processing error is uncorrelated.However, we note that in this study we use a denser dataset than is typically produced.Therefore, there is potential for some correlated pre-processing error to remain.
A potential source of correlated error for WLOs is the observation operator error.As described in Sect.3.2 the observation operator uses the "nearest wet pixel" approach.For observations in locations where the model is flooded it is expected that there is minimal error in the observation operator (since the corresponding water level is predicated directly by the model).However, if the observation location does not coincide with a flooded model pixel it is necessary to find the nearest wet pixel in the model.It is possible that in locating the nearest wet pixel and extrapolating information we introduce correlated error.
The error due to unresolved scales and processes is also a possible source of observation error correlations.Although in this case the model is of relatively high resolution compared to the observation resolution, there are still scales that are unresolved.Previous studies that have considered these scale mismatch errors have found that they are typically correlated (Janjić and Cohn, 2006;Waller et al., 2014;Hodyss and Nichols, 2015).

Calculation of WLO error statistics
We estimate observation uncertainties for observations from a real flood event that occurred in West England on an area of the lower Severn and Avon rivers in November 2012 (Fig. 1a).The WLOs were extracted from a sequence of seven satellite SAR observations (acquired by the COSMO-SkyMed constellation) using the method described in Mason et al. (2012a).During the flood event the WLOs are available daily for the period 27 November to 4 December 2012 (with the exception of 3 December).Observations on the first day illustrate the flood levels just before the flood peak in the Severn.On 30 November the river went back in bank; however, a substantial amount of water remained on the floodplain (see Fig. 2 in García-Pintado et al., 2015).Before being assimilated, the WLOs are subject to the QC and thinning procedures described in Sect.3.3.When used in previous studies such as García-Pintado et al. (2015) the dataset has been thinned to a separation distance of 250 m, at which the observation errors are assumed uncorrelated.However, in this article a denser observation set (although still sparse) with thinning distance of 125 m is used, in which some spatial correlation should remain.The location of the observations is plotted in Fig. 1b.
We apply the diagnostic of Desroziers et al. (2005) to the observation-minus-background and observation-minusanalysis residuals resulting from the flood assimilation.We first use all available data to calculate the average horizontal error variance and correlations.We then consider if the observations of the flood on the Severn are similar to the error statistics for the Avon.Finally we consider if the error statistics vary for different periods of the flood.For all cases the observation error correlations are calculated at a 1 km bin spacing.As we use an LETKF we must use a modified form of the diagnostic (see Sect. 2).As a result we are not able to calculate observation error correlations for observation pairs with a separation distance greater than 19 km.When evaluating the correlations we assume that they become insignificant when they drop below 0.2 (Liu and Rabier, 2002).
For this assimilation system we assume that the ensemble background error covariance matrix gives a reasonable estimate of the true background error statistics.The assumed standard deviation for the WLOs is 59 cm; this is calculated as described in Sect.3.3.The value accounts only for the preprocessing error, and not for any error introduced by the approximations in the observation operator or scale mismatch errors and, therefore, may be an underestimate of the true error standard deviation.
As is typical for most DA systems, the observation errors are assumed uncorrelated.With these assumed error statistics the theoretical work of Waller et al. (2016b) suggests that the observation error statistics estimated using the diagnostic will have the following: an underestimated standard deviation an underestimated correlation length scale.Therefore, we would expect the true standard deviations and length scales to be larger than those we estimate using the diagnostic.

Average observation error statistics
We first estimate average horizontal error covariances across the entire domain for the duration of the flood event.We plot in Fig. 2 the estimated correlation, along with the number of samples used, for the WLOs.
The estimated statistics give a standard deviation of 54 cm.This is slightly lower that the assumed error standard deviation of 59 cm.Following the theory of Waller et al. (2016b) we expect the estimated standard deviation to be an underestimate of the true observation error standard deviation, and hence the results suggest that the assumed standard deviation is likely set at the correct level.
Our results show that the correlations become insignificant (< 0.2) at approximately 8 km, but there is some unexpected behaviour before 8 km.The correlations drop smoothly between 0 and 4 km then increase again up to 6 km before dropping off.This behaviour is seen for a variety of different binning widths (not shown).We investigate the cause of this "local maximum" in the estimated correlations in Sects.4.2 and 4.3.In general we find that the correlation distance is much longer than the thinning distance of 125 m, which was chosen to try to ensure that the observation errors are uncorrelated.Furthermore, theoretical results of Waller et al. (2016b) suggest that, with this design of assimilation experiment, the correlation length scales will be underestimated.

Correlations in different parts of the domain
It is possible that the local maximum in the correlations is a result of observations on different tributaries of the river.To test this hypothesis we split the domain in two (as shown in Fig. 1): the western domain covering the river Severn and eastern domain covering the river Avon.We plot the estimated correlations, along with the number of samples used for the SAR WLOs, for the western part of the domain in Fig. 3 and for the eastern part of the domain in Fig. 4. We note that there are fewer observations in the eastern domain.This results in fewer available samples for the calculation in Eq. (3) and hence the results are subject to greater sampling error.
From Figs. 3 and 4 we see that the "local maximum" in the correlations is still present in both parts of the domain.In the eastern domain it is very pronounced.This suggests that the cause of the increase in correlations between 4 and 6 km is not observations on different tributaries of the river.

Correlations at different times
We next consider if the correlation structure changes over time.We plot in Figs. 5, 6 and 7 the correlations calculated for the first three days, the second two days and the final two days respectively.At the beginning of the flood period, the observations have similar standard deviations to those estimated for the entire flood event; however, the correlation length scale is short, approximately 2 km.During the middle of the flood event the observation error standard deviation decreases and the correlation length scale increases slightly.For the final two days the river is back in bank; for this period the standard deviation is largest, as is the correlation length scale, which is approximately 8 km.It is also in this final period where the "local maximum" appears in the correlations.
Figure 7 shows the estimated error statistics for the recession stages for the flood.During this period a high proportion of the observations were in areas which remained flooded but were disconnected from the main river flow.For this same sequence of SAR overpasses García-Pintado et al. (2015) showed that the assimilation of the last three overpasses was still able to exploit the background ensemble covariances to pass some of the information from these WLOs to the main flow.However, two effects became evident: (a) the assimilation increments were of a smaller magnitude in these last stages, and (b) the corrections to the flow in these last stages were gradually more short-lived.This was a result of the reduced information content in these WLOs regarding the inflow errors upstream, which in the end control the flood and flow evolution.Here the Desroziers et al. (2005) diagnostic has been able to identify a corresponding anomalous structure in the WLO errors at these last stages.The correlation structure shown in Fig. 7 indicates that apart from the longer correlation errors, which can be expected from the smoother flood dynamics at the end of the flood, an increase in the correlation appears at ∼ 6 km.The increasing disconnection of the WLOs in the flood plain from the main flow appears to be the cause for the local maximum in the estimated correlation structure.However, further work is required to determine why the "local maximum" in the estimated correlation function appears at 6 km.

Conclusions
We have shown that the Desroziers et al. (2005) diagnostic is a useful tool to identify the error covariance in WLOs from satellite SAR.Further, the diagnostic has been able, in the case study, to isolate an unexpected anomaly in the correlation structure, pointing to the applicability limits of the satellite WLOs in the flood plain in the recession stages of the flood.The diagnostic has been useful in this study for highlighting anomalous data.Given its low-cost calculation, we propose it be customarily calculated in flood forecasts and hindcast analyses to support the understanding of the observation errors and to support QC protocols for selection of adequate observations.However, due to the dependence of the observation error on the choice of observation operator and model resolution, results will differ for each individual user.Therefore, further study may be required to understand how the diagnostic results can best support QC protocols.
Data availability.The data used in this study are available in García-Pintado (2018).
the kth pair of elements of d o a and d o b in bin β, and N β is the number of residual pairs in bin β.

Figure 1 .
Figure 1.(a) Flood model domain where the colour bar denotes the height in metres and (b) position of SAR WLOs on OSGB 1936 British National Grid projection; coordinates in metres.For (b) the line denotes the west/east domain split discussed in Sect.4.2, crosses: 27-29 November, circles: 30 November and 1 December, squares: 2 and 4 December.

Figure 2 .
Figure 2.Estimated SAR WLO error correlations (black line) and number of samples (bars) used for the calculation.Estimated error standard deviation is 54 cm.

Figure 3 .
Figure 3.Estimated SAR WLO error correlations (black line) and number of samples (bars) used (bin width = 1 km), west domain.Estimated error standard deviation is 58 cm.

Figure 4 .Figure 5 .
Figure 4.Estimated SAR WLO error correlations (black line) and number of samples (bars) used (bin width = 1 km), east domain.Estimated error standard deviation is 43 cm.