Images from satellite-based synthetic aperture radar (SAR) instruments contain large amounts of information about the position of floodwater during a river flood event. This observational information typically covers a large spatial area but is only relevant for a short time if water levels are changing rapidly. Data assimilation allows us to combine valuable SAR-derived observed information with continuous predictions from a computational hydrodynamic model and thus to produce a better forecast than using the model alone. In order to use observations in this way, a suitable observation operator is required. In this paper we show that different types of observation operators can produce very different corrections to predicted water levels; this impacts the quality of the forecast produced. We discuss the physical mechanisms by which different observation operators update modelled water levels and introduce a novel observation operator for inundation forecasting. The performance of the new operator is compared in synthetic experiments with that of two more conventional approaches. The conventional approaches both use observations of water levels derived from SAR to correct model predictions. Our new operator is instead designed to use backscatter values from SAR instruments as observations; such an approach has not been used before in an ensemble Kalman filtering framework. Direct use of backscatter observations opens up the possibility of using more information from each SAR image and could potentially speed up the time taken to produce observations needed to update model predictions. We compare the strengths and weaknesses of the three different approaches with reference to the physical mechanisms with which each of the observation operators allow data assimilation to update water levels in synthetic twin experiments in an idealised domain.

During a fluvial flood it is possible to use a numerical
hydrodynamic model to predict future water levels and flood extents. Such
predictions are subject to uncertainties and can be inaccurate; data
assimilation can therefore be used to improve predictions by updating model
forecasts based on various types of observations
(e.g.

SAR sensors are active, side-looking sensors included on several satellites, e.g. COSMO-SkyMed and Sentinel-1. Radiation (of wavelength cm to m), is emitted from the satellite and directed towards the surface of the Earth. The returning signal is recorded at a sensor and can be used to reconstruct information about the observed terrain. SAR radiation is cloud penetrating, giving the instruments all-weather capability. SAR instruments can also produce observations day and night, unlike passive sensors that rely on solar radiation.

The strength of the returned signal measured at the SAR sensor depends
strongly on the roughness properties of the surface from which it has been
reflected. During a flood event SAR images therefore generally show a clear
difference between flooded and non-flooded areas. Pixels in flooded or other
wet areas such as lakes and rivers have low backscatter values and appear as
dark areas on SAR images; dry areas have higher backscatter values, and dry
pixels therefore appear paler. There are a number of techniques for
separating pixels into wet and dry areas based on backscatter. Methods
include thresholding (e.g.

In this work we consider different ways in which information from an SAR image can be used to correct inundation forecasts using data assimilation. The use of observations requires two steps. First, we must extract relevant, useable information from an SAR image. This involves processing the raw SAR data in some way to produce an observation, or set of observations, per image. In the second step we need to use an observation operator to map our model state vector into observation space – i.e. we extract the equivalent information from our model in order to compare it to the observations. The size of the difference between the observation and the equivalent information from the model forecast is then used to calculate an update or correction to the forecast. The observation operator depends on the type of observational information used, and we show in this paper that the impact of observations on the forecast can be strongly dependent on the observation operator approach used. Despite this, the mechanisms through which different observation types and different observation operators update hydrodynamic forecasts have not received much attention in the literature.

In order to extract observational information from an SAR image, authors such
as

We propose a new type of observation operator which directly uses
pixel-by-pixel backscatter values as observations. As in

In this paper we examine the performance of our new observation operator and
that of two flood-edge observation operators in a series of synthetic
experiments. We compare the physical mechanisms by which the different
approaches update predicted water levels in the ETKF; to the authors'
knowledge these physical mechanisms have not been discussed in the literature
before. We outline the ETKF data assimilation algorithm in Sects.

In this paper we explore the use of observations from SAR images in updating
forecasts from a hydrodynamic flood model. In Sect.

In data assimilation, forecasts from a numerical model are combined with
observations of the same system. We use a state vector,

In the update step the mean state vector and the error covariance matrix are
both updated based on observational information. We use the ETKF in its
standard application as a sequential filter. As such we perform an update
step at the time of each available observation. We assume that the
observations are related to the true state of the system,

In order to update the model forecast, it is useful to create a
forecast–observation ensemble, which contains

We use a square root formulation for the ETKF, following

The state error perturbation matrix is updated in the ETKF according to

State augmentation techniques can be used to correct values of uncertain
forecast model parameters at the same time as the state is updated. In this
approach, parameters are appended to the state vector (see

The forecast equation for the case of an augmented state vector can be written as

The augmented state vector is updated by the ETKF algorithm through Eqs. (

Model friction parameter values are more traditionally calculated using offline calibration techniques and data from previous flood events. Updating parameter values using a state augmentation approach has the advantage that it uses information from observations of the flood event of interest as it occurs. State augmentation can therefore take into account any recent changes to the river and its environment.

Much existing work on data assimilation for fluvial inundation forecasting
has focussed on assimilating derived water level observations. Water level
extraction is based on the fact that it is usually possible to differentiate
between wet and dry areas in an SAR image; the contrast in backscatter between
wet and dry pixels means that it is therefore possible to determine the
position of the edge of a flooded area. Along this edge, the water elevation
is the same as the elevation of the topography. This means that as long as a
flood edge can be accurately identified and topographical information is
available (e.g. a digital terrain model – DTM), water levels at the flood
edge can be derived from an SAR image. This approach has also been used for
operational flood mapping, e.g. in

In the remainder of this section we describe the three different observation
operators used in this study. In Sect.

In this approach, we assume that

This approach can lead to problems in application and is therefore not widely
used, but we include it here to show the importance of how observations are
used in data assimilation. The problem with this simple method is essentially
that it does not use all of the available information. All ensemble members
that predict shallower local water levels than the truth at the position of
the observation will make the same contribution to

In this approach we assume again that

Finding the “nearest wet pixel” can be difficult in practice, since is it important to find the local flood edge that corresponds to the observation. In simplified topography such as that used in this study, this can be assumed to be the first wet model grid cell encountered when moving from the observation towards the centre of the river along a cross section perpendicular to the flow of the river. In situations where the topography is complex (e.g. the local direction of flow is not clear or the river has tight meanders), finding the nearest wet pixel becomes more complicated. One approach is to require that the nearest wet pixel is in the direction of the steepest downhill descent from the observation location.

A related approach has been successfully used by

We have developed an alternative method for extracting observations from an SAR image, which directly
uses SAR backscatter measurements as observations rather than derived water elevation information.
This means that the observation vector

The observations used in this method are measured SAR backscatter values; we
follow the approach of

A new observation operator is required in order to use backscatter
observations in data assimilation. The operator needs to take each state
vector (containing water levels in each pixel) and transform that information
into model equivalent backscatter values. This could potentially be achieved
using an SAR simulator to generate a synthetic SAR image, but this would be
computationally expensive and would require detailed knowledge of the
underlying terrain and land-use cover. Instead we take a statistical approach
that makes use of the wet and dry pixel backscatter distributions obtained
from an SAR image. The observation operator comprises two steps. We can
describe this such that

A different approach to using binary-type observations in data assimilation
is used by the authors

The inundation model used in this work is a non-linear hydrodynamic model.
The model uses Clawpack code (

Experiments to compare the performance of the three operators have been carried out in an idealised river valley-like domain. The use of an idealised domain is important here so that we can examine the effects of the operators under ideal conditions, without the complications of complex topography. It will also be important to understand how the operators work under real conditions, but experiments in an idealised topography are a vital first step.

Test domain used in all assimilation experiments.

The test domain used in the experiments in this paper is the same as that
used in

We have carried out a number of twin experiments in order to illustrate and
compare how well forecasts can be corrected when using the three different
observation operator approaches. The experiments use a “truth” flood
simulation and a forecast ensemble of flood realisations comprising 100
members. The forecast ensemble is updated using synthetic observations at
several times during the simulation time; synthetic observations are created
from the truth as described in Sect.

In this work, the truth flood is driven by a time-varying inflow based on
data taken from a gauge on the river Severn during a flood in
November–December 2012. The true inflow is shown in Fig.

Inflows with time. True inflow values are represented with circles, and ensemble inflows are shown by grey lines.

Each ensemble member was run with a different value of the channel friction
parameter,

In identical twin experiments, observations are generated from a truth run;
in this case the truth flood simulation is described in Sect.

In order to test our backscatter observation operator, we require synthetic
backscatter observations; we therefore create a synthetic SAR image from our
truth run, comprising backscatter values in each cell. We can then extract
synthetic backscatter observations at desired locations. We have taken a very
simple approach to generating a simplified synthetic SAR image in order to
perform proof-of-concept experiments with our new observation operator; we
will apply the method to a real case study and real SAR images at a later
date. To generate a synthetic SAR image, we have taken our truth run water
level output and applied a threshold water level of 5 cm in each cell to
determine which cells are wet and which are dry. Water levels below a
threshold of a few centimetres are likely to be misclassified as dry in a real SAR
image due to vegetation. Synthetic backscatter values are then assigned to
each cell: dry cells are assigned a backscatter value drawn from

Synthetic SAR image generated from truth run water levels as
described in Sect.

In order to derive synthetic observations from the synthetic SAR image, the
observation process is then carried out, i.e. we do the following:

We bin all the synthetic backscatter values in a histogram (see
Fig.

We fit two Gaussian curves to the synthetic backscatter values (using Python fitting
algorithm scipy.optimize.curve_fit; see Fig.

We extract new values of

Histograms and fitted Gaussian distributions of synthetic backscatter values. Dashed grey lines show two fitted Gaussian distributions, and the solid grey line shows the sum of the two fitted distributions. Vertical lines show the positions of the mean wet and dry backscatter values.

We then extract backscatter values to be synthetic observations. Although it
would be possible to use a large number of backscatter observations in this
method, for the experiments presented here we have not used all of the
available synthetic observations. There are a number of reasons for limiting
the number of observations. Firstly, observation errors are likely to be
correlated for observations that come from positions close to each other in
physical space. Some thinning of the observations is therefore necessary to
meet the requirement that the observations used in the assimilation have
uncorrelated errors (

In this study we wish to investigate the differences in the updates generated
by different observation operator approaches. We therefore use equivalent
observation information for each of the operators. In the case of the water
level observation operators, we have used flood-edge water level observations
at six locations, where the flood-edge location is defined as the position of
the first dry model cell (see Sect.

Figure

Schematic of observation locations used in this study for each transect in cross section. The black thick line shows the discretised domain elevation, and the blue dashed line shows the observed floodwater level. The arrows and green crosses show locations of the observations as labelled.

It is important to specify the observation error statistics in data
assimilation. In all cases we assume that our observation errors are
uncorrelated so that we can use a diagonal error covariance matrix,

The uncertainty in each backscatter observation reflects the distribution to
which it belongs (wet or dry). We assume that each entry can be set to be

We present here the results from a number of data assimilation experiments,
each lasting for a total simulation time of 112 h. This includes an
initial spin-up period with constant inflow for 4 h (as shown in Fig.

Experiments were run as follows:

State-only estimation experiments were carried out using a positive bias in the forecast
channel friction parameter, which leads to forecast water levels that tend to be deeper
than the truth (PBSO – positive bias in

Improvement in the forecast at each assimilation time (PBSO
experiment). The hatched white bars show improvement for the

Figure

Improvement in the forecast at each assimilation time (NBSO
experiment). The white hatched bars show improvement for the

Figure

Schematic showing innovation for flood-edge observation. In all
cases blue lines represent the true water level, and blue circles represent
the corresponding flood-edge observation,

Figure

Figure

Cross section of the domain showing bathymetry as a black solid
line. The true water level is shown as a red dotted line, and water levels
predicted by each ensemble member are shown as blue circles. The mean
forecast in each model cell is shown as a cross.

Figure

The results in Figs.

Schematic of innovation in observation (backscatter) space and
increment in physical space for one backscatter observation. The horizontal
blue line represents the true water level, and the blue circle represents a
corresponding backscatter observation,

Figure

The innovation is shown in observation space in Fig.

A potential problem with the backscatter operator can be illustrated through
inspection of Eqs. (

The large source of error in these experiments is, by design, due to a large
bias in the forecast ensemble channel friction parameter values. In this
section we show the results of updating the forecast channel friction
parameter values as part of the assimilation process. One way to measure the
effectiveness of a data assimilation approach is to compute the root-mean-square error (RMSE) between the resulting forecast and the truth. Here, the RMSE
is defined as

RMSE between forecast and truth (PBJ experiment). Open triangles show
the RMSE between the open-loop forecast and the truth. Blue stars, green
squares and red circles show the RMSE between the forecast mean and the
truth, using the

Figure

Calculated analysis mean channel friction parameter (PBJ experiment). Red horizontal line shows true value of channel friction parameter. Error bars show one standard deviation of ensemble parameter distribution.

Figure

RMSE between forecast and truth (NBJ experiment). Open triangles show
the RMSE between the open-loop forecast and the truth. Blue stars, green
squares and red circles show the RMSE between the forecast mean and the
truth,
using the

Figure

Calculated analysis mean channel friction parameter (NBJ experiment). Horizontal red line shows true value of channel friction parameter. Error bars show one standard deviation of ensemble parameter distribution.

Figure

In this study we have chosen to use a small number of backscatter observations for our experiments. This allowed us to compare updates between the three observation operators when the observation operators were all given equivalent information; in this way we can draw conclusions about the physical mechanisms responsible for the different updates. In a real case, one of the major advantages of using our new backscatter observation operator is that it would be possible to use a large number of backscatter observations compared to the number of water level observations which are typically available. The availability of a large number of observations may be a major strength of our new approach; in our simple experiments (not shown) we found that assimilating a larger number of observations with the backscatter operator provided a better analysis than using only a few. Another merit of the backscatter operator is that there is less processing involved in using backscatter observations directly, potentially reducing the amount of time between acquisition of an SAR image and its use to update an inundation forecast. The backscatter operator also removes the need for locating the nearest wet pixel in the model forecast, which can be computationally costly.

There are a number of potential problems with practical implementation of the
backscatter operator. One is that using histograms to produce SAR-derived
inundation maps can lead to errors in assigning pixels to wet–dry categories.
One way to deal with this would be to use region-growing techniques (see e.g.

The new backscatter operator is likely to work well in cases where good
separation of the wet–dry distributions can be obtained through a histogram
and works less well in cases where the distributions overlap. The new observation
operator does not require a digital elevation model to generate
forecast–observation equivalents, although the hydrodynamic model would
require topography information to generate a forecast. Water level
observations cannot be accurately determined in areas with high slope,
whereas backscatter observations will be unaffected. Like the other
observation operators, the new operator will likely provide better results in
rural settings than urban settings; double-bounce and layover effects due to
buildings are potential sources of problems for all of the operators
(

We have carried out a series of experiments to test the performance of three
different types of observation operators in an ETKF approach to data
assimilation for fluvial inundation forecasting. Although the results are for
one specific idealised domain, one realisation of true inflow and a single
realisation of observation error per observation type, we believe that many
of our conclusions will be applicable much more widely through the mechanisms
we describe. Repeats of experiments (not reported here) with different
realisations of observation error show evidence of the same behaviour in
terms of the mechanisms we have described. Our experiments show the following:

Simple assimilation of flood-edge water elevation observations can result
in no correction to the forecast even when there is a large difference
between the forecast and the observation. This happens when both the model
prediction and the observation predict no flooding at the observation
location. We have illustrated the physical mechanism responsible for this
(Fig.

The nearest wet pixel approach provides better assimilation accuracy than
simple flood-edge assimilation: in our experiments we find no evidence of
negative “improvement” scores or zero increments when the forecast and
observations are very different. In our idealised system it is the best
choice of observation operator in terms of better forecast accuracy in the
state-only experiments and in terms of rapid convergence to the true solution
for both water levels and the mean channel friction parameter value in the joint
state–parameter experiments. However, we have shown that using water edge
observations when the river is well within bank can lead to a degradation of
the forecast. Also, locating the nearest wet pixel is likely to be difficult
in practice for operational applications using real, more complicated
topography. One way to limit the distance between the flood-edge observation
position and the nearest wet pixel is to locate the nearest pixel at which
some threshold of ensemble members predict a positive water depth. The
predicted water elevations at this location could then be used to create

Our new backscatter observation operator performs well compared to more conventional options in our idealised domain using synthetic observations. The operator does not suffer from the problems of the simple flood-edge operator and is able to correct the forecast for the state-only assimilation cases. The backscatter operator approach also allowed the forecast to converge to the true solution for both water levels and channel friction parameter value in the joint state–parameter experiments, although in our experiments convergence was slower than for the nearest wet pixel approach. Using backscatter values operationally may speed up the time taken from image acquisition to assimilation and an improved forecast due to fewer steps in the processing. The new operator could also potentially allow the use of much more information from any given SAR image, although there is likely to be a limit to the number of backscatter observations that can be used without causing variance collapse in the channel friction parameter distribution. Tests using larger numbers of backscatter observations have not been presented here; we plan to address this question in a real case study so that the results will be more directly applicable to real world situations.

This work has shown that our novel backscatter operator has the potential to
improve inundation forecasting in fluvial floods, and we believe it may have
applications in other types of flooding where SAR images are available.
Further work is required to test the operator against the

The inundation simulations in this work were generated
using Clawpack 5.2.2, a collection of Fortran and Python code available from

ESC ran the experiments and drafted the paper. SD, JG-P, NN and PS contributed to analysis of the results, the discussion and manuscript editing.

The authors declare that they have no conflict of interest.

The authors gratefully acknowledge CASE sponsorship for ESC from the Satellite Applications Catapult and partial support for NKN from the NERC National Centre for Earth Observation (NCEO).

This research has been supported by the Natural Environment Research Council (grant nos. NE/L002566/1, NE/K00896X/1 and NE/K008900/1) and the Engineering and Physical Sciences Research Council (grant no. EP/P002331/1).

This paper was edited by Nunzio Romano and reviewed by two anonymous referees.