The suitability of remotely sensed soil moisture for improving operational flood forecasting

We evaluate the added value of assimilated remotely sensed soil moisture for the European Flood Awareness System (EFAS) and its potential to improve the prediction of the timing and height of the flood peak and low flows. EFAS is an operational flood forecasting system for Europe and uses a distributed hydrological model (LISFLOOD) for flood predictions with lead times of up to 10 days. For this study, satellite-derived soil moisture from ASCAT (Advanced SCATterometer), AMSR-E (Advanced Microwave Scanning Radiometer Earth Observing System) and SMOS (Soil Moisture and Ocean Salinity) is assimilated into the LISFLOOD model for the Upper Danube Basin and results are compared to assimilation of discharge observations only. To assimilate soil moisture and discharge data into the hydrological model, an ensemble Kalman filter (EnKF) is used. Information on the spatial (cross-) correlation of the errors in the satellite products, is included to ensure increased performance of the EnKF. For the validation, additional discharge observations not used in the EnKF are used as an independent validation data set. Our results show that the accuracy of flood forecasts is increased when more discharge observations are assimilated; the mean absolute error (MAE) of the ensemble mean is reduced by 35 %. The additional inclusion of satellite data results in a further increase of the performance: forecasts of baseflows are better and the uncertainty in the overall discharge is reduced, shown by a 10 % reduction in the MAE. In addition, floods are predicted with a higher accuracy and the continuous ranked probability score (CRPS) shows a performance increase of 5–10 % on average, compared to assimilation of discharge only. When soil moisture data is used, the timing errors in the flood predictions are decreased especially for shorter lead times and imminent floods can be forecasted with more skill. The number of false flood alerts is reduced when more observational data is assimilated into the system. The added values of the satellite data is largest when these observations are assimilated in combination with distributed discharge observations. These results show the potential of remotely sensed soil moisture observations to improve nearreal time flood forecasting in large catchments.


Introduction
Floods are extreme hydrological events caused by excessive water availability and may cause large economical, societal and natural damage. One example is the summer 2013 flood in central Europe producing historical high-water levels in large parts of the Danube and Elbe catchments, causing a total estimated economic loss of EUR 23 billion (Aon Benfield, 2013). Due to their increasing impact on society, forecasting of these extreme events has become more important to increase preparedness and improve the response to and prevention of floods. This requires an increasing need to develop accurate and reliable flood forecasting systems. National forecasting systems have been developed in, for example, England (National Flood Forecasting System), Germany (Hochwasservorhersagezentral), the Netherlands, Germany and Switzerland (FEWS-Rhine & Meuse), Czech Republic (CHMI-IWSS), Sweden (SMHI) and most other countries in Europe. For transboundary river basins, national forecasting systems are often lacking skill and transboundary forecasting systems are preferred. To fulfil this need, the European Commission developed the European Flood Awareness System (EFAS) for flood forecasting with lead times of up to 10 days for the European continent (Thielen et al., 2009). Additionally, EFAS will contribute to the understanding of flood events on a transboundary scale and will support international crisis management at the European level.
Flood forecasts are made for multiple basins, using distributed hydrological modelling. Systems like EFAS are highly dependent on the meteorological forcing provided as well as the pre-storm initial conditions of the catchment (Nester et al., 2012;Alfieri et al., 2013). To improve estimates of initial conditions data assimilation techniques have the potential to update incorrect model states with observational data to obtain the best possible estimate of the current status of the hydrological system. Discharge data is often used in these data assimilation frameworks, because it contains the integrated information of all other hydrological states (e.g. Vrugt et al., 2006;Clark et al., 2008;Rakovec et al., 2012). However, it is difficult to obtain these measurements in real-time in a way they can be used in EFAS. Observations might not be available in real-time, quality control cannot be done in real-time or local data providers are unfortunately not willing to share the information. Measurements of hydrological states other than discharge are rarely used for estimating the model's initial state while these may be of considerable value. In particular, measurements of the pre-storm soil moisture conditions could potentially improve flood forecasting systems, since initial soil moisture conditions are expected to have a large impact on the flood peaks during a storm event. The soil moisture content determines the amount of water which can still be stored in the unsaturated zone or percolate to the saturated zone and thereby influences the precipitation required to generate overland flow. However, field observations at continental scale are not available due to the limited number of observational networks and their low spatial support. Remotely sensed soil moisture retrievals from the microwave domain could potentially fill the need for soil moisture observations at the large spatial scales. Observations are globally available and revisit times per sensor are between 1 and 3 days depending on latitude. An additional advantage is that the data is available within 3 h after being observed and the satellites have a global coverage, while single discharge observations are only valid for the catchment scale.
Multiple studies have used remotely sensed soil moisture to improve discharge simulations in small catchments (≤ 1000 km 2 ) and to correct for errors in pre-storm soil moisture conditions (Pauwels et al., 2001;Scipal et al., 2008;Brocca et al., 2010;Chen et al., 2011;Brocca et al., 2012;Matgen et al., 2012). These studies show that assimilation of these data improved the simulation of flood events and especially the height of the flood peak. For large-scale catchments, Draper et al. (2011) assimilated remotely sensed soil moisture from ASCAT over France to improve discharge simulations. It was concluded that the assimilation of soil moisture mainly corrected for biases in precipitation or incorrect model climatology. However, the potential to improve flood forecasts was not studied at the large scale. The previously mentioned studies mainly focussed on the potential gain for flood forecasting, when only observations from a single sensor are assimilated. This potential can be increased by making use of soil moisture retrieved by multiple sensors, thereby increasing the quality and quantity of the observations. However, the added value of combined assimilation of data from multiple sensors for operational flood forecasting at a large scale remains unknown. Moreover, it is equally important to take into account that assimilation of remotely sensed soil moisture can lead to significant differences in the parametrization of the hydrological model (e.g. Santanello et al., 2007;Sutanudjaja et al., 2013;Wanders et al., 2013) and this will also impact the potential gain from the assimilation of observations of other hydrological variables. Additionally the added value of the remotely sensed soil moisture compared to the assimilation of discharge observations has not been studied so far. Therefore, more research is required, especially in large-scale catchments using conjunctively multisensor remotely sensed soil moisture observations and discharge data.
The aim of this study is to determine the benefits of the assimilation of multisensor soil moisture observations in operational flood forecasting systems in large-scale catchments. To achieve this aim, this research focuses on three main research questions: (i) does the assimilation of remotely sensed soil moisture lead to increased forecasting skills in terms of forecast uncertainty and forecast bias compared to assimilation of discharge observations? (ii) Does the assimilation of remotely sensed soil moisture increase the lead times at which floods can be accurately predicted? (iii) Is it possible to reduce the number of false flood alerts with the use of remotely sensed soil moisture? These research questions are answered using the EFAS model set-up, which enables a proper validation of the results in the context of a real operational system. Results of assimilating remotely sensed soil moisture are compared with assimilation of discharge data only. Also, the impact of the number of discharge observations and the benefit of the assimilation of remotely sensed soil moisture for a model calibrated on discharge are investigated. These analyses enable a more detailed evaluation of the potential gain of the assimilation of remotely sensed soil moisture for operational flood forecasting. As a test basin the Upper Danube catchment is selected, which is one of the largest catchments in Europe containing a large number of locations with time series of discharge. Satellite data from three microwave sensors (ASCAT, AMSR-E and SMOS) is used in the assimilation framework to increase the number of observations and the potential benefits of these observations for flood prediction.

Study area
The study area is the Upper Danube catchment upstream of Bratislava (catchment size 135 × 10 3 km 2 , Fig. 1). The border of the Upper Danube is formed by the Alps in the south and the catchment contains the northern part of Austria, the southern part of Germany, the south-eastern part of the Czech Republic and western Slovakia. Elevations range from 150 to 3150 m a.s.l. (above sea level). In the catchment, daily discharge observations for 23 locations are available through the Global Runoff Data Centre (GRDC) which enable validation and assimilation (Fig. 1). With a split-sample approach discharge observations used for assimilation will not be used for validation to assure an independent validation of the improvements in the flood forecasting after the assimilation.

European Flood Awareness System
The European Flood Awareness System was developed in 2003 by the European Commission at the Joint Research Centre in Ispra and is being improved since 1 . In 2012 EFAS became an operational service aiming to provide flood forecasts up to 10 days in advance over the European continent. At the core of the EFAS system is the hydrological model LISFLOOD which was originally developed by De Roo et al. (2000), later improved by Van Der Knijff et al. (2010) and running in the PCRaster modelling environment (Wesseling et al., 1996;Karssenberg et al., 2010). LISFLOOD was specifically developed for discharge simulations of largescale river basins. The model consists of a vegetation layer, two layers to simulate the unsaturated zone, two linear reser-1 www.efas.eu , saturated conductivity of the topsoil (KSat 1 ), saturated conductivity of the subsoil (KSat 2 ), empirical shape parameter preferential macro-pore flow (c pref ), maximum percolation rate from upper to lower groundwater (GW prec ), reservoir constant upper groundwater (T uz ), reservoir constant lower groundwater (T lz ), surface runoff roughness coefficient (Chan N 2 ), and the channel's Manning roughness coefficient (CalMan).
voirs to represent fast and slow responding groundwater systems and a channel network for discharge routing. In this study, the original two layer representation of the unsaturated zone (De Roo et al., 2000;Van Der Knijff et al., 2010) was replaced by a new unsaturated zone model component that uses four layers (Fig. 2). This enables a more detailed representation of the soil moisture in the topsoil and results in modelled soil moisture that is directly comparable to the soil moisture observations retrieved from remotely sensed soil moisture. The layers have been added in the topsoil and possess a depth equal to the typical penetration depth of microwave sensors. The new model set-up consists of unsaturated zone layers of 2 and 3 cm thick, respectively, the third layer represents the remaining part of the rooting depth (the topsoil, Fig. 2). The root zone is simulated using the topsoil and evapotranspiration occurs from these layers. The evaporation for a particular layer is limited if soil moisture is below critical soil moisture conditions, in which case more water is extracted from the other soil moisture layers to compensate for the reduced evaporation. The abstraction per layer is linearly related to the total storage capacity of the layer. Thick layers will thus have a larger contribution to the evapotranspiration compared to thinner layers. When the entire root zone is below critical soil moisture conditions the evaporation is limited for the entire topsoil and actual evapotranspiration will be lower than potential evapotranspiration. Bare soil evapotranspiration occurs only from the first layer of 2 cm. Via capillary rise, replenishment of the root zone can occur from the fourth unsaturated zone layer (the subsoil). The amount of capillary rise depends on the difference in hydraulic head between two layers and the average conductivity of the layers. The first layer will also largely impact the amount of surface runoff in the LISFLOOD model. The soil wetness of the first layer determines the infiltration capacity of the unsaturated zone and when the infiltration capacity is exceeded by rainfall or snowmelt this will generate overland flow. Subdaily time steps are included to enable a stable performance of the soil moisture simulation, where the number of subdaily time steps is dependent on the amount of potential infiltration and water storage in the unsaturated zone.
In order to use the best calibrated model for the study area, the hydrological model LISFLOOD was calibrated for the Upper Danube. For the calibration, soil moisture and discharge observations were used to calibrate the most sensitive model parameters. The parameters which were calibrated were related to the snow accumulation, infiltration and percolation through the unsaturated zone, the groundwater system and routing of discharge (Fig. 2). A dual state and parameter ensemble Kalman filter was used to calibrate LISFLOOD for the Upper Danube. A total of 300 members was used to estimate all parameters of the model over the period 2010-2011. The period was selected because satellite data from multiple sensors is available for this period. This resulted in calibrated parameters with distributions defined by 300 realizations of parameter sets, which could be used for hydrological simulations. The use of these parameter distributions allows accounting for the uncertainty in the initial conditions and for different hydrological response to identical meteorological input. More detailed information on the probabilistic model calibration set-up can be found in Wanders et al. (2013).
The meteorological forcing of EFAS consists of daily precipitation, daily potential evapotranspiration and the average daily temperature. EFAS uses meteorological forcing from the 51 members of the European Centre for Medium-Range Weather Forecasting Ensemble Prediction System (ECMWF-EPS). This results in 51 hydrological forecasts for every 12 h at midday and midnight. The new set-up of EFAS which uses 300 realizations of parameter sets, differs from the original EFAS set-up which uses one parameter set. Additionally, the new set-up also uses a set of initial hydrological conditions which are forced with identical meteorological forcing. The original EFAS set-up only uses one parameter set and one initial hydrological condition for all meteorological forecasts. The EFAS set-up used here will allow accounting for the uncertainty in the initial conditions which can be an important factor in flood forecasting.
Throughout the manuscript the term EFAS will be used when talking about the entire forecasting system, i.e. the combination of meteorological forcing, hydrological model and resulting flood forecasts. The term LISFLOOD will be used when the focus is specifically on the data assimilation or the hydrological model.

Satellite data
Remotely sensed soil moisture data from three satellites is used, namely SMOS (Soil Moisture and Ocean Salinity), AS-CAT (Advanced SCATterometer) and AMSR-E (Advanced Microwave Scanning Radiometer -Earth Observing System). SMOS is the first dedicated soil moisture satellite using fully polarized passive microwave signals at 1.41 GHz (L-band) observed at multiple angles . The observation depth of SMOS is 5 cm with a spatial resolution of 35-50 km depending on the incident angle and the deviation from the satellite ground track. The revisit time of SMOS is within 1-3 days depending on the latitude. SMOS retrievals which are potentially contaminated with radio frequency interference (RFI) have been removed. The observations from SMOS can be directly compared to the weighted average soil moisture content of the two top layers of LIS-FLOOD, together 5 cm thick.
AMSR-E is a multifrequency passive microwave radiometer (6.9 GHz, C-band) and is a widely used sensor for soil moisture retrievals. The spatial resolution of AMSR-E is between 36 and 54 km with an observation depth of 2 cm and a revisit time of 1-3 days. Several algorithms estimating surface soil moisture from AMSR-E observations exist (e.g. Njoku et al., 2003;Owe et al., 2008). One of the algorithms using exclusively satellite observations is the Land Parameter Retrieval Model (LPRM) which was used for this study. LPRM soil moisture products have been validated against in situ observations (e.g. Wagner et al., 2007;De Jeu et al., 2008;Draper et al., 2009), models (e.g. Loew et al., 2009;Crow et al., 2010;Bisselink et al., 2011) and other satellite products (e.g. Wagner et al., 2007;Dorigo et al., 2010). Observations from AMSR-E are compared to the first unsaturated zone layer of LISFLOOD.
Unlike SMOS and AMSR-E, ASCAT uses active microwave at a frequency of 5.3 GHz (C band) to determine the soil moisture content (Wagner et al., 1999;Naeimi et al., 2009). ASCAT uses a change detection method (Naeimi et al., 2009) and data is provided relative to the soil moisture content of the wettest (field capacity) and driest (wilting point) soil moisture conditions measured (Wagner et al., 1999). The spatial resolution of ASCAT is around 25 km, the observation depth is 2 cm and the temporal resolution equals a revisit time of 1-3 days. As for AMSR-E, ASCAT observations are compared to the top layer of the unsaturated zone of the model simulations.
All satellite soil moisture products are used on an equal area discrete global grid product (DGG). For the SMOS and ASCAT soil moisture product a DGG is available (Bartalis et al., 2006), while for the AMSR-E product a DGG is not available. Therefore, the AMSR-E data was projected on the DGG of SMOS using the nearest neighbour approach, because both satellites have roughly the same spatial resolution. The DGG of ASCAT uses equally spaced areas of 12.5 km while the other DGG uses a slightly lower resolution of 15 km between points.
Although the passive microwave satellite missions, SMOS and AMSR-E, give absolute soil moisture values (in m 3 m −3 ), all satellite data was converted using a rescaling approach. The converted satellite values θ s,new (in m 3 m −3 ) used for calibration are calculated as where θ s are the observed satellite soil moisture values (−) at a DGG location, θ s,95 and θ s,5 are the 95th and 5th percentiles of satellite soil moisture values at the DGG location respectively (−), θ FC and θ WP are field capacity and wilting point of the modelled soil moisture values (m 3 m −3 ) at the DGG location. The average model values, θ FC and θ WP , are dependent on the soil texture and are averaged over the support unit of the satellite retrieval. Frozen soils, snow accumulation and RFI hamper the soil moisture retrieval due to changes in the dielectric constant when water freezes. Therefore, retrievals done with (1) an air temperature below 4 • C, (2) simulated snow accumulation and (3) the presence of RFI were not used in the calibration.

Discharge data
The Upper Danube catchment contains 23 locations where daily discharge observations are available (Fig. 1). Time series of discharge are available from January 2000 until December 2011. Using a split sample approach the discharge of seven stations was used for data assimilation into the forecasting system, while the other 16 stations were only used for validations of the forecasts. This approach is similar to the experimental set-up of Lee et al. (2012) and Rakovec et al. (2012), who used multiple interior discharge stations for validation and assimilation. Assimilation and validation stations are selected such that they are equally distributed over the catchment and are situated both in small rivers and the main Upper Danube River. This will allow to evaluate the impact of the data assimilation at different catchment sizes within the Upper Danube catchment.

Data assimilation
The ensemble Kalman filter (EnKF) is a Monte Carlo based approach which is highly suitable for data assimilation in high dimensional systems (Evensen, 1994(Evensen, , 2003(Evensen, , 2009Burgers et al., 1998), such as the LISFLOOD model. The EnKF is applied to update state variables of the hydrological model. The forward model is given as where f is the set of model equations, i.e. the model structure, representing the hydrological processes that lead to change in the system state over time, (t) is the state of the model at time t, F (t) the model forcing at time t (e.g. precipitation and evaporation) and p are the model parameters.
The EnKF is applied on each daily time step using observations from remote sensing (when available, AMSR-E, SMOS and ASCAT) and discharge observations. If no observations of any kind are available no update will be performed. When only a limited number of observations are available these will be used to update the model. The general form of the EnKF (Evensen, 2003) is given as where a is the analysis of f , the model forecast, P f the error covariance matrix of the model, R is the measurement error covariance, and H is the measurement operator which relates the model states to the satellite or discharge observations Y. The observations Y can be described as where the true model state ( t ) is transformed to the Y, using H and random noise with a zero mean and an error given by R. The state error covariance matrix of the model prediction is directly calculated from the spread between the different ensemble members using where is the model state vector and the superscripts "f" and "t" represent the forecast and true state, respectively. Since the true state is not known it is assumed that where f represents the ensemble average and it is assumed that the ensemble of model simulations is sufficient to represent the true state. The EnKF is implemented in the PCRaster modelling environment (Karssenberg et al., 2010). For the assimilation of the satellite data with the EnKF, spatial information on the measurement error covariance (R; Eqs. 3, 4) is required. The structure of R is determined from estimates of Wanders et al. (2012) over Spain, obtained by using high-resolution modelling of the unsaturated zone. From this study the local errors of each satellite product were determined as well as the spatial correlation of the errors of the satellites and the correlation between the errors of different sensors. The average standard errors of the different sensors from Wanders et al. (2012) are 0.049 (AMSR-E), 0.057 (SMOS) and 0.051 m 3 m −3 (ASCAT). This information can be used to simultaneously assimilate soil moisture observations from different sensors with the additional information about the error structure obtained from the observation errorcovariance matrix. To avoid errors produced by downscaling of the satellite soil moisture, the average modelled soil moisture values are upscaled to the satellite resolution. Each individual satellite observation is then compared to the corresponding model average soil moisture at the same spatial support. The spatial support will differ for different sensors and hence it is important to correctly compare modelled soil moisture to observed soil moisture.
All observations are assimilated as daily averages, since this is the same temporal resolution as the meteorological forcing. The error covariance between the discharge observations is set to zero while the standard error for the discharge observations is assumed to be 30 % of the discharge (e.g. Di Baldassarre and Montanari, 2009). It is assumed that the covariance between the satellite soil moisture observations and discharge observations equals zero.

Assimilation and ensemble hindcasting
In this study, observed satellite and discharge data for December 2010-November 2011 are used in a hindcasting experiment for the Upper Danube. Only 1 year was selected to test the procedure since all satellite products are available for this time period with sufficient data quality. After the selected time period the AMSR-E sensor was shut down and before the selected period the quality of the SMOS observations was still below the potential maximum quality due to RFI contamination.
A data assimilation procedure was used to create a reanalysis time series of all state variables which are used as starting point for the hindcast (t 0 ). Model states are updated with the observations and used to have a better estimate of initial condition at t 0 . Figure 3 provides a flowchart that shows the full hindcasting procedure described below. The 300 parameter realizations from the probabilistic calibration were used to generate the reanalysis time series. As meteorological forcing for the analysis, observed time series of daily precipitation, daily potential evapotranspiration and the average daily temperature were used. Observations are interpolated between meteorological stations with an inverse distance interpolation. For every time step up till t 0 , observed state variables, remotely sensed soil moisture and/or discharge (depending on the scenario), are assimilated into the model. Assimilation is done on a daily time step, since information on the exact time of the discharge observations is largely unknown. Additionally, the model uses meteorological input with a temporal resolution of 1 day. Parameters are not updated in the assimilation. Thus, the same set of 300 parameter sets is used to generate the 300 ensemble members between analysis steps with the EnKF. At t 0 , the start of the hindcast, the forward model (Eq. 2) is used for the hindcasting of discharge and other state variables. After t 0 , the daily forcing from the ECMWF-EPS is used to drive the model simulation. The hindcast is evaluated based on the observed discharge for the hindcasting period. Like in EFAS, hindcasts are done at midday and midnight based on the latest simulations of the ECMWF-EPS leading to a total of 730 hindcasts. In the original forecasts from EFAS only one set of initial conditions is used, thereby Table 1. Hindcasting scenarios for the EFAS system including abbreviations and assimilated data used to create a re-analysis time series from which hindcasts were initiated. The calibration indicates the data used by Wanders et al. (2013) to calibrate the hydrological model.

Scenario
Hindcast Calibration All satellite data 7 discharge stations neglecting the uncertainty in the initial conditions. In this experiment, 300 possible realizations of the initial conditions are available from the reanalysis. For each hindcast the 51 members of the ECMWF-EPS are used twice with random realizations from the 300 members of the reanalysis to create n = 102 realizations per hindcast. In this approach different meteorological forcing and initial conditions are used for each hindcast to have a better estimate of the forecast uncertainty. A 4-month simulation was performed using all 300 members in combination with all 51 meteorological forecasts. An analysis of the probability density functions of each hindcast showed that a total of 102 realizations showed no significant differences to a simulation using all possible (51 × 300 = 15 300) realizations (for lead times up to 10 days). The significance was tested with a non-parametric Kolmogorov-Smirnoff test, which showed that distributions created with 102 realizations and 15 300 realizations are identical (p = 0.05). In another set of runs, it was shown that using fixed initial conditions for the hydrological state leads to significantly different distributions. The same holds for fixing the meteorological forecast for all 300 ensemble members which results in a significantly different probability density function compared to the run created with 15 300 realizations. With this exercise we concluded that both the uncertainty in initial states as well as the forcing uncertainty need to be taken into account, but that it suffices to use a subset of the possible realizations to model this joint uncertainty. Hence, to reduce calculation times 102 realizations per hindcast were used in all scenarios. Calculation times for this new assimilation system are low. For a 10-day forecast with 102 members for the Upper Danube the required calculation time is 120 s on a 8-core machine with 2.26 GHz processors and 24 GB RAM (random-access memory).

Scenarios
The different scenarios used are given in Table 1 as well as the data used in the assimilation before the hindcasting was done. The parametrization was calibrated for the Upper Danube for the period 2010-2011 and was used to create analysis time series for each scenario. The calibration was based on the observations available for the reanalysis, so if both discharge and satellite data were available these were also used for the calibration of the hydrological model (Table 1). Two additional scenarios have been included (bottom half of Table 1) to show the performance of the hindcasts in case of limited or no data availability. Both scenarios have been calibrated on discharge and use assimilation of satellite observations or no data.

Evaluation
The evaluation of each hindcast was done based on coefficient of variation (cv), continuous ranked probability score (CRPS, Hersbach, 2000), mean absolute error (MAE), Brier score (BS, Brier, 1950) and the number of false and true positive flood alerts. These scores were calculated for each lead time separately to evaluate the quality of the hindcast for different lead times.
To assess the spread of the ensemble of simulated discharges, the coefficient of variation was determined as where σ Q mod (t) and Q mod (t) (m 3 d −1 ) are the standard deviation and the mean of the ensemble of modelled discharge at time t, respectively, and T is the number of time step (days) in the reanalysis period.

N. Wanders et al.: Remotely sensed soil moisture for flood forecasting
The CRPS (Hersbach, 2000) was used to calculate whether the uncertainty of the forecast is correct and not over-or underestimated. The CRPS is given as where F f i (x, t) is the cumulative density function of the hindcast at time t, F o i (x, t) is the cumulative density function of the observation at time t. F o i (x, t) is given by a Heaviside function, with a step from 0 to 1 probability at the observed value. The CRPS is standardized by Q obs for each validation location to enable a comparison between stations with a different magnitude of discharge.
To calculate if the hindcasts were biased the MAE was calculated using the ensemble mean of the forecast. The MAE is given as where Q mod (t) and Q obs (t) (m 3 d −1 ) are the average hindcasted discharge and observed discharge at time t respectively and Q obs is the average discharge over the evaluation period. The cv, CRPS and MAE were used to evaluate the performance of each scenario and to determine the quality of each hindcasting scenario. Scores were standardized to enable a comparison between upstream and downstream stations without correcting for differences in discharge volumes. In addition these scores were determined per lead time separately to enable a better comparison between the different scenarios and also to determine the flood forecasting performance of EFAS for different lead times.
To test the accuracy of the flood alerts (both timing and height of the flood peak), the Brier score is calculated for different flood thresholds and different lead times. The Brier score was calculated as where f (t) is the probability that discharge will exceed a certain threshold (calculated from the probability density function) and o(t) is a binary value which is 0 if this threshold is not exceeded and 1 if it is exceeded. The Brier score can be calculated for different thresholds of discharge and different lead times. In this study we focussed on two threshold levels namely the 80th and 90th percentiles of the discharge (Q 80 , Q 90 ). Exceedance of these arbitrary levels will not necessarily cause a flood situation, however to allow for evaluation of hindcasts these high discharge events were used. Furthermore the number of false positives (flood forecast, no flood observed), missed (no flood forecasted, flood observed) and correctly forecasted (flood forecasted, flood observed) were

Reanalysis
To analyse the performance of the reanalysis the cv (Eq. 7) is used to determine the uncertainty after the assimilation of the observations (Fig. 4, Table 2). In the Q0 scenario, the model is not calibrated and no data is assimilated into the reanalysis to correct for incorrect model states. The uncertainty in the model simulation is large with a cv of 0.25. Uncertainty even increases during extreme flood events, reducing the potential to use a model calibrated on expert knowledge without data assimilation for flood forecasting. The assimilation of three different satellite products (Q0 sat ) results in a reduction of the cv of the discharge simulation to 0.136 compared to 0.25 for Q0 (Fig. 4). This reduction is caused by the assimilation procedure which constrains the model to follow the observations and hence the spread between ensemble members is reduced. Soil moisture observations do not contain information on groundwater and routing processes, hence they impact the discharge simulation only indirectly via surface runoff and percolation to the groundwater from the unsaturated zone. This results in the fact that the discharge simulations are not necessarily improved by assimilation of remotely sensed soil moisture observations. Two scenarios were created where only discharge is assimilated into the model, namely Q1 and Q7. For Q1 only discharge from the outlet was used and for Q7 additional discharge observations (Fig. 1) upstream were assimilated into the model. The assimilation of additional observation data reduces the cv to 0.08 for Q1 and to 0.04 for Q7, which is for both scenarios lower than for Q0 (Table 2). Q1 shows a small positive bias in the selected time period compared to the discharge observations. However, on average the bias does not exist for the entire entire simulation period and no systematic bias exist between the simulation and the observations. Finally, two scenarios where both discharge and remotely sensed soil moisture observations are assimilated into the model (Q1 sat and Q7 sat ) were evaluated. In these scenarios the uncertainty is reduced compared to most other scenarios. However, peak discharge for Q1 sat is overestimated, while baseflow simulations are better compared to Q1. Improved simulations are also observed with Q7 sat compared to Q7 and the problem with overestimated peak discharge is gone with Q7 sat (Fig. 4). An example time series is provided to show the impact of the satellite observations in the Q7 sat scenario (Fig. A1 in the Appendix).
It must be mentioned that additional discharge data has a larger impact on the reduction of the uncertainty, than assimilation of remotely sensed soil moisture. Remotely sensed soil moisture enables a better simulation of the baseflow compared to assimilation of discharge observation only. The reduction in uncertainty of the discharge simulations with the assimilation of remotely sensed soil moisture shows that this method has a high potential in sparsely gauged river basins to reduce uncertainties in simulated discharges.

Hindcasting performance
The hindcast performance of each scenario was evaluated using the CRPS (Eq. 8) and the MAE (Eq. 9). In general the uncertainty in the hindcast is reduced when more data is assimilated into the system leading to a better hindcast simulation (Fig. 5). When more discharge data is assimilated, the uncer-tainty is more strongly reduced than with the assimilation of only remotely sensed soil moisture data (Figs. 4,5). This is also confirmed by the CRPS score for the different scenarios (Fig. 6), where the decrease in CRPS is strongest when more discharge data is used (Table 2). In general the CRPS increases with increasing lead times for all scenarios with the exception of Q1 sat . Due to the larger spread for longer lead times (Fig. 5) the CRPS will increase, because forecasts with high uncertainty are penalized. The CRPS for Q1 sat is the highest indicating that this scenario has the lowest hindcasting skill of all scenarios (Fig. 6, Table 2). This is caused by the overestimation of most flood events, which results in a high CRPS. When more discharge data is assimilated (Q0 compared to Q1 and Q7) the CRPS is reduced throughout the catchment for most locations including the outlet near Bratislava. When a combination of discharge data and satellite data is assimilated (Q7 sat ), the quality of the hindcast is highest (Fig. 5).
The MAE (Eq. 9) is calculated for all scenarios for different lead times and locations (Fig. 7). Compared to the scenario without assimilation of observations (Q0), only the scenarios where multiple discharge stations are assimilated (Q7 and Q7 sat ) show an increase in performance. The best performance is generated by Q7 sat , which shows a low bias compared to the observed discharge. For Q1 sat the MAE is relatively low, especially when compared to the CRPS. This is mainly caused by the accurate discharge simulation in baseflow periods, resulting in a low MAE.

Flood hindcasting skill
The performance of each scenario was evaluated using the BS (Eq. 10) and the number of false positive flood alerts. Due to the high spread within the ensemble the Q0 in general has a low forecasting skill (Table 2). This is shown by the relative high BS (Fig. 8) and the high number of false positive forecasts (Fig. 9). Almost all flood events are correctly captured also for long lead times, which is caused by the overestimation of discharge in general (Fig. 5) Q0 the forecasting skill for Q0 sat is decreased, shown by an increasing BS and a higher number of false positives. The high number of false positives is the result of an even higher overestimation of the peak discharge in this scenario (Fig. 5), which results in false flood alerts. The number of missed and correctly forecasted floods remains the same. The BS and the number of false positives for Q1 and Q7 is considerably lower than for Q0. Q7 also has a better hindcast skill than the Q1 caused by the increased number of observations used in the assimilation framework. The improved forecasting skill is also found in the BS for both Q1 sat and Q7 sat (Fig. 8), which are for both scenarios lower than without the assimilation of remotely sensed soil moisture. For Q1 sat this is mainly caused by an increased performance in the upstream areas of the catchment, while Q7 sat shows an improved performance throughout the catchment. The number of false positive flood forecasts is reduced by 70 % compared to the scenarios with only discharge assimilation, while the number of missed and correctly forecasted floods remains the same. This leads to the conclusion that even when the simulation of discharge throughout the catchment is used and discharge simulations are of a high quality, adding satellite data will lead to an improvement in the forecasting skills of the hydrological model.

Hindcasting performance with limited assimilation
Two additional scenarios have been evaluated were the model was calibrated on discharge observations alone and either remotely sensed soil moisture is assimilated (Q7 satDA ) or no observations are assimilated (Q7 noDA ) in the reanalysis period ( Table 2). The reanalysis for Q7 noDA shows the largest spread in the reanalysis (indicated by a large cv), while with the assimilation of remotely sensed soil moisture (Q7 satDA ) this uncertainty is reduced. However, the uncertainties remain larger than for scenarios Q7 and Q7 sat where in both cases discharge data has been assimilated. The uncertainty in the hindcasting performance (CRPS) is reduced for Q7 satDA compared to Q7 and almost equal to joint assimilation of discharge and soil moisture (Q7 sat ). This indicates that the more accurate representation of the soil moisture will reduce the uncertainty in model simulations and hence hindcasts. For both Q7 noDA and Q7 satDA the MAE does not show an increased performance, indicating that the bias is not reduced compared to Q7 or Q7 sat .
As expected, the hindcast skills scores (BS) are reduced when the satellite data is used in the assimilation scenario compared to the no-assimilation scenario. Compared to Q7 and Q7 sat the hindcast skill for the extreme events is not increased. However, compared to Q7 and Q7 noDA the assimilation of satellite data (Q7 satDA and Q7 sat ) will increase to hindcast skill for the less severe flood (BS Q 80 ).
In general, the assimilation of remotely sensed soil moisture will increase the simulation of discharge. However, the discharge simulation performance for the extreme events is less impacted by the assimilation of soil moisture observations. The assimilation of soil moisture observations results in a better estimate of the initial soil moisture conditions and estimate of discharge (CRPS), mainly for the intermediate discharge rates. In extreme events with high precipitation totals the relative importance of pre-storm soil moisture conditions is reduced. Assimilation of discharge has the largest impact on the uncertainty in the hindcast, which will have an impact on the ensemble spread. Joint assimilation of soil moisture and discharge observations combines the advantages of both types of observations and leads to improved initial conditions and consequently high hindcasting skills, especially for the extreme events (BS Q 90 ). The low uncertainty as a result of discharge assimilation with the improved estimate on the soil moisture state in the catchment leads to increased forecasting performance.

Conclusions
In this study we evaluated the added value of remotely sensed soil moisture in an operational flood forecasting system. The gain from assimilation of soil moisture observations is compared to assimilation of only discharge and the combination of discharge and soil moisture observations. The EFAS was used for a hindcasting experiment in the Upper Danube. Hindcasts were made for a period of 1 year and the results compared for six different scenarios.
The assimilation of remotely sensed soil moisture has an impact on the simulation of discharge as shown by other studies (e.g. Pauwels et al., 2001;Brocca et al., 2010Brocca et al., , 2012Draper et al., 2011). However, in this study we show that the impact is not only limited to small catchments with a spatial extent close to or smaller than the satellite resolution but also works for larger catchments.
We show that the assimilation of remotely sensed soil moisture improves the flood forecasting especially when used in combination with assimilation of distributed discharge observations. The uncertainty in the discharge simulations is reduced and biases in the simulation are reduced when satellite data is assimilated. In scenarios where only discharge from the outlet is used in combination with satellite observations, the peak discharges are generally overestimated. Although this will result in a less accurate simulation of discharge it will not impact the forecasting quality of flood events.
Floods are better predicted when soil moisture data is assimilated into EFAS in combination with discharge observations and the number of false alerts is reduced compared to scenarios where remotely sensed soil moisture observations are not used. Although the gain of using more discharge observations remains larger, soil moisture observations improve the quality of the flood alerts, both in terms of timing and in the exact height of the flood peak.
Two additional scenarios were studied, where only calibration of the hydrological model was used and no assimilation or assimilation of only satellite data. These scenarios were created to study the added value of the assimilation compared to only calibration of the hydrological model. We found that the cv, CRPS, MAE and BS are all reduced by the assimilation of remotely sensed soil moisture compared to no assimilation. However, the assimilation of discharge reduces uncertainties more than assimilation of remotely sensed soil moisture. Simulations without data assimilation tend to have biases in the simulation and a larger ensemble spread than scenarios with data assimilation, while the reduced uncertainty resulting from assimilation will lead to an increased reliability of flood forecasts. These results show that the assimilation of soil moisture will result in an increased performance compared to not assimilating observations. This is important for ungauged basins, where satellite data is available and discharge observations are not available or not available in near-real time. Additionally these results show the added value of assimilation of observations into the EFAS system, compared to the current set-up.
In conclusion, we show that the uncertainty in the flood forecasts is reduced when discharge observations and satellite data are assimilated into the hydrological model of the EFAS system for the Upper Danube. The addition of remotely sensed soil moisture to existing discharge observations reduces the number of false positive-flood alerts and thereby increases the reliability of the flood awareness system. Although the number of the data available via satellite retrievals still remains a challenge in an operational system, the potential benefits could lead to a significant reduction in the false flood alerts, possibly also for other catchments. This will reduce the number of unnecessary precautions taken by the responsible governments and increase the confidence and willingness to act upon these flood alerts.