GloFAS – global ensemble streamflow forecasting and flood early warning

Introduction Conclusions References

population, the need for optimizing the use of water resources for drinking water as well as energy production demands more and more technologically driven solutions for controlling water quantity and quality in river systems. In addition, floods can no longer be treated as isolated events, as they are heavily linked with issues such as food insecurity, disease outbreaks and environmental degradation (IFRC, 2011). 5 With increasing vulnerability and the likelihood of changes in frequency and intensity of future weather extremes (Trenberth et al., 2003), anticipation of severe events is becoming a key element to protect the society and favor timely reaction, thus effectively reducing socio-economic damage. While anticipation is essential at local level, it is equally important on national or trans-national level. The management of the re-10 sponse and aid for major upcoming disasters (e.g. through international organizations) requires a substantial planning and information at different levels. The earlier the planning phase starts, the better preparatory actions, coordination and gathering of information are achieved, thus limiting the consequences of potential humanitarian and economic disasters. While some countries have mechanisms in place to mitigate the 15 effects of natural disasters, the European Union Solidarity Fund (European Commission, 2002) being the main example for Europe, developing countries often struggle through a much longer recovery process. Increasing preparedness can be achieved by flood hazard maps, which are available on national or regional level (Hagen and Lu, 2011;Prinos et al., 2008) as well as on global level 20 Winsemius et al., 2012). These static maps can be used to define flood hazard zones, but do not incorporate changes in daily conditions, which require a real-time observing system.
The availability of remote sensing data, such as satellite imagery, has fostered the development of flood detection techniques at global scale (e.g., de Groeve, 2010 operationally flood forecasting systems, often focused on specific river basins or, most commonly, limited to national boundaries (Alfieri et al., 2012a). Several flood forecasting systems are based on observed river level, while future values are extrapolated through river routing models or by coupling observed rainfall fields into hydrological models. The extension of the forecast horizon beyond the response time of a river basin 5 is enabled by the use of Numerical Weather Predictions (NWP) as input to hydrologicalhydraulic models (e.g., He et al., 2010;Hopson and Webster, 2010;Paiva et al., 2012;Thiemig et al., 2010). Recent review articles by Cloke and Pappenberger (2009) and by Alfieri et al. (2012a) showed the strong potential of using ensemble NWP to further extend the forecasting horizon in early warning systems. 10 Weather forecasting models are set-up at global scale in different meteorological centers, producing deterministic and ensemble products. Nevertheless, only few attempts have been made so far, to move towards operational systems with coupled hydro-meteorological models producing streamflow predictions at the global scale (Candogan Yossef et al., 2011;see Sperna Weiland et al., 2010;Voisin et al., 2011;15 Wang et al., 2011) and, to the authors' knowledge, none of these runs operationally with ensemble predictions. Indeed real-time hydrological modeling requires a large amount of information, including not only static maps describing the surface and sub-surface basin features, but also a long-term balance of water fluxes to give an estimate of the initial conditions, from which the forecast is run. At the continental scale, the European 20 Flood Awareness System (EFAS) has demonstrated that ensemble flood forecasting and early warning based on critical flood thresholds can be produced also with limited amount of data, by applying probabilistic methods and model consistent climatologies (Bartholmes et al., 2009;Pappenberger et al., 2010b;de Roo et al., 2003;Thielen et al., 2009a).

25
The aim of this study is to assess the feasibility of transferring methodologies and concepts from the EFAS system to the global scale and to evaluate the system performance at its initial stage, where no model parameter has been specifically calibrated. A Global Flood Awareness System (GloFAS) has been set up jointly between the Joint Introduction

Data and methods
The GloFAS system is composed by a well-integrated hydro-meteorological forecasting chain and by a monitoring system which analyzes daily results and shows forecast flood events on a dedicated web platform. An overview of the system structure is shown in Fig. 1.

Meteorological data
To set up a forecasting system that runs on daily basis with global coverage, initial 15 conditions and input forcing data must be provided seamlessly to every point within the domain. To this end, two products are used. The first consists of operational ensemble forecasts of near surface meteorological parameters. The second is a long-term dataset consistent with daily forecasts, used to derive a reference climatology. These products are described in the next sub-sections. increasing to 65 km from day 11 to 15 (Miller et al., 2010). The forecast is produced twice per day, at 00:00 UTC and 12:00 UTC. In the GloFAS system, VarEPS weather forecasts are not handled explicitly. Forecast values of the predicted meteorological parameters of the 00:00 UTC forecast are processed by the land surface module (HT-ESSEL, see Sect. 2.2.1) of the IFS, which in turn creates the VarEPS runoff fields for 10 the ensemble streamflow prediction.

Reference climatology
The second meteorological product used is ERA-Interim (Dee et al., 2011), the latest global atmospheric reanalysis produced by the ECMWF. The ERA-Interim archive contains 6-hourly gridded estimates of three-dimensional (3-D) meteorological variables, 15 3-hourly estimates of a large number of surface parameters and other two-dimensional (2-D) fields. It has horizontal resolution of about 80 km, it covers the period from 1 January 1989 onwards, and continues to be extended forward in near-real time. ERA-Interim makes use of a forecast model, so that information can be extrapolated from locally observed weather parameters to unobserved parameters in a physically mean-20 ingful way. ERA-Interim precipitation dataset has been bias corrected using the Global Precipitation Climatology project (GPCP) version 2. the HTESSEL are given in the following. Two types of simulations are performed to estimate discharge in the river network, which use the input runoff forcing described in the previous section and appropriate initial model state.
-Forecasting simulations are run every day using the latest VarEPS runoff prediction and result in 51 possible evolutions of the streamflow for the selected forecast 10 horizon (i.e., 45 days in the current setting).
-A deterministic climatological simulation is run in offline mode using ERA-Interim/Land input data for a 21 yr period starting in 1990. Seamless streamflow climatology is derived and maps of annual maxima are extracted and fitted with a Gumbel extreme value distribution to estimate corresponding discharge warning 15 thresholds for selected return periods.

HTESSEL
HTESSEL (Balsamo et al., 2009(Balsamo et al., , 2011a is the land surface component of the ECMWF IFS. It is a revised land surface Hydrology, derived from the former Tiled ECMWF Scheme for Surface Exchanges over Land (TESSEL). HTESSEL computes the land 20 surface response to atmospheric forcing, and estimates the surface water and energy fluxes and the temporal evolution of soil temperature, moisture content and snowpack conditions. At the interface to the atmosphere each grid box is divided into fractions (tiles), with up to six fractions over land (bare ground, low and high vegetation, intercepted water, shaded and exposed snow). Vegetation types and cover fractions are Introduction  teristic (Loveland et al., 2000). The grid box surface fluxes are calculated separately for each tile, leading to a separate solution of the surface energy balance equation and the skin temperature. The latter represents the interface between the soil and the atmosphere. Below the surface, 5 the vertical transfer of water and energy is performed using four vertical layers to represent soil temperature and moisture. Soil heat transfer follows a Fourier law of diffusion, modified to take into account soil water freezing/melting. Water movement in the soil is determined by Darcy's Law, and surface runoff accounts for the subgrid variability of orography. In the case of a partially (or fully) frozen soil, water transport is limited, leading to a redirection of most of the rainfall and snowmelt to surface runoff when the uppermost soil layer is frozen. The snow scheme (Dutra et al., 2010) represents an additional layer on top of the soil, with an independent prognostic thermal and mass content. The model has been successfully tested in a river routing settings (Balsamo et al., 2011b;Pappenberger et al., 2010a). HTESSEL is part of the IFS at ECMWF with 15 operational applications ranging from the short-range to monthly and seasonal weather forecasts.
For this work, operational ensemble forecasts of surface and sub-surface runoff are extracted from the daily output of the ECMWF forecasts. These are produced by the HTESSEL module of the IFS using VarEPS weather forecasts as input. Further, an of-20 fline simulations of HTESSEL forced by ERA-Interim near surface fields was performed to derive a 21-yr climatology starting in 1990, including surface and sub-surface runoff (ERA-Interim/Land). Balsamo et al. (2012) presented a detailed description of the simulation set-up of ERA-Interim/Land and a general overview of the model performance.

Lisflood global 25
Lisflood is a GIS-based spatially-distributed hydrological model, which includes a onedimensional hydrodynamic channel routing model (van der Knijff et al., 2010 on an operational basis (Pappenberger et al., 2010b;Thielen et al., 2009a) covering the whole Europe on a 5 km grid.
In the context of global flood modeling, the transformation from precipitation to surface and sub-surface runoff is done by the HTESSEL module of the IFS. Lisflood global is stripped down to the groundwater and routing procedures and uses surface runoff 5 and sub-surface runoff from HTESSEL as input fluxes on a resolution of 0.1•. Surface runoff is routed to the outlet of each cell using a four-point implicit finite-difference solution of the kinematic wave equations (Chow et al., 1988). The Global Land Cover 2000 dataset (Bartholomé and Belward, 2005) is used to derive surface roughness coefficients.
Subsurface storage and transport are modeled using two parallel linear reservoirs. The upper zone represents a quick runoff component, which includes fast groundwater and subsurface flow through macropores in the soil. The lower zone represents the slow groundwater component that generates the base flow. As for the sub-surface runoff, all water that flows out of the upper and lower groundwater zones is routed to 15 the outlet of each grid cell within one time step. Runoff produced for every grid cell from surface, upper and lower groundwater zones is routed through the river network using a kinematic wave approach. The river network is taken from the Hydrosheds project (Lehner et al., 2008).
In arid and semiarid regions one can observe a loss of water among the channel 20 reaches. In order to include this effect into the model we use the simplified approach by Rao and Maurer (1996) to simulate transmission losses in a stream. This method uses a power function with two parameters to describe the relationship between inflow and outflow in cells. In a first attempt the yearly average potential evapotranspiration rate is used to fit the transmission loss function. The resulting loss function give em-25 phasis to transmission losses in Africa, the Arabic Peninsula, India, Australia and the southern part of North America whereas discharge in Europe and northern part of Asia remains unaffected. With this approach the model is able to mimic the river-aquifer and river-floodplain interaction (e.g. the big Sudan swamps in the Nile River) as well as the influence of evaporation from big braided rivers.

Operational monitoring
Ensemble streamflow predictions (ESP) are run operationally on global scale by feeding VarEPS surface and sub-surface runoff into the Lisflood hydrological model. Al-5 though the precipitation input spans 15 days, hydrological simulations are computed for a 45-day time horizon, to account for the delayed routing of flood waves in large river basins, with time of concentration of the order of one month. Initial condition maps to start up the model are first taken from the last available day of ERA-Interim dataset. Initial conditions for subsequent simulations are then extracted from the results of the model run with the VarEPS control run, after the first day of simulation. As this procedure is based on forecast meteorological variables as input, rather than observed, results may possibly drift in time from the reality. Therefore, periodical updating of initial condition maps based on ERA-Interim dataset is foreseen for future system developments. 15 Resulting ESP maps for each daily time step and ensemble member are compared with reference threshold maps derived from the streamflow climatology, corresponding to return periods of 2, 5 and 20 yr. Summary threshold exceedance maps are calculated accordingly, which show the maximum probability of exceeding the 5 and 20-yr return period within the forecast horizon. In addition, reporting points are chosen at 20 fixed and dynamic locations in the river network where upcoming flood hazard is detected, according to the following two-step procedure.
Fixed points are first selected from a database of about 4000 gauged river stations included in the Global Runoff Data Centre (GRDC, http://grdc.bafg.de/) database, where the maximum forecast value of the ESP mean, over the simulation horizon, is above 25 the 2-yr return period threshold.
Dynamic points are then generated to provide similar information in river reaches where no fixed point is available. The following experience-based rules are adopted for 12302 Introduction obtaining a good overview of the potentially affected areas, yet avoiding the confusion of displaying too many points: -The ESP mean is above the medium warning threshold on at least 5 contiguous pixels of the river network (∼ 50 km long river reach), in at least one of the two most recent daily simulations.

5
-The upstream area of the selected point must be larger than 4000 km 2 .
-Points are generated starting from the most downstream pixel complying with the selection criteria, proceeding upstream every 300 km to each other, unless a fixed point is encountered within a shorter distance.
The two sets of points are merged and classified into medium, high and severe alert 10 level. Medium alert level (yellow color coding) is assigned to points with ESP mean between 2 and 5-yr return period. High alert level (red color coding) is assigned to points with ESP mean between 5 and 20-yr return period. Severe alert level (purple color coding) is assigned to points with ESP mean above 20-yr return period. At each point, ESP time series are plotted versus the forecast horizon, together with persistence dia-15 grams ( Bartholmes et al., 2009) showing the probability of exceeding the three warning thresholds for each day of simulation and the evolution over the latest consecutive forecasts. 20 The first part of the work is focused on evaluating the skill of the Lisflood hydrological model forced by ERA-Interim/Land runoff in reproducing the hydrological processes for river basins in different regions and climates of the Earth. The 21-yr simulated discharge climatology has been compared with daily observations at a number of stations 12303 Introduction included in the GRDC database. Stations for the comparison were chosen according to the three following criteria:

Evaluation of the hydrological modeling
-Observed discharge time series at each station must include at least 5 yr of valid data within the simulation period .
-At each river station, the upstream area of the modeled river network must not 5 differ by more than 10 % from the actual one, to prevent from matching incoherent data pairs. This typically occurs in small river basins, where the modeled river network is sometimes different from the real one -because of scaling issuesand as a result the station does not lie in the correct grid cell.
-A visual check has been performed on the observed time series to remove those 10 stations with evident discharge regulation (e.g., through artificial reservoirs) or with clear errors in the data.
Overall, 620 stations from five continents (all but Antarctica) were selected for the comparison, with upstream area ranging between 450 and 4 680 000 km 2 and period of record between 5 and 21 yr. The distribution of stations (see Fig. 2) reflects the quan- 15 tity and quality of daily discharge measurements, with most data coming from North America, Brazil, Europe, Japan and Australia. The aim of this analysis is to assess how the adopted model is capable to reproduce observed river discharge. The expected outcome is to assess the model performance and identify areas with the most significant mismatch between observation and simulations, which indicates where the 20 modeling can be improved through different parameterization of the hydrological processes. For each station, observed and simulated discharge time series are plotted and compared through scatter plots, to give a first visual check of the collected data. An example is shown in Fig. 3  It is worth noting that the proposed system is designed for early warning purpose, rather than for quantitative streamflow forecasting. In other words, the main goal of the system is to assign each forecast value a correct probability of occurrence taken 5 from its cumulative distribution function and thus identify extreme values in the upper tail of the distribution, which can possibly correspond to flooding conditions. Ideally, the percentile rank of each simulated value, compared to its climatology, should match that of observations (related to the observed time series), independently of any bias between observed and simulated time series. As a result, more emphasis is given to 10 skill scores that are not affected by bias of estimation. Also, dimensionless indicators are preferred, as these enable straightforward comparison of results from different river stations having a wide range of quantitative runoff and hydrological regimes. Among such skill scores, the Coefficient of variation (CV) at each point is calculated as the ratio of the standard deviation (σ(.)) of the estimation residuals to the mean (Q obs ) of 15 observations, The CV enables the comparison of the estimation variability at different locations through normalization by the average flow conditions. Furthermore, the Pearson correlation coefficient (PCC) of simulated versus observed discharges is calculated accord-20 ing to the equation: which considers all the i -th available daily data pairs. PCC is particularly fit to the desired verification strategy as it assesses the linear correlation between simulated 12305 Introduction and observed discharges, without being penalized by multiplicative or additive bias. On the other hand the PCC is known for being sensitive to even few outlying data pairs, thus stressing significant shifts between the timing of simulated and observed flow peaks (Wilks, 2006). The model performance in reproducing observed discharge has also been tested 5 through threshold exceedance analysis, focused on discriminating events above a fixed threshold. This approach is more suitable for evaluating the performance of early warning systems, as it is independent of the quality of estimation for value ranges far from the threshold (e.g., the range of low flows when the threshold corresponds to high flows). Most scores for dichotomous evaluation are based on contingency tables, which 10 include four variables calculated from the set of observations and of simulated values: -Hit: event observed and simulated.
-Miss: event observed and not simulated.
-False alarm: event simulated and not observed.
-Correct negative: event not observed and not simulated. 15 The Peirce's skill score (PSS, Eq. 3) has been calculated for each station, taking the 90th percentile as threshold values (i.e., the 90th percentile from the sorted observations and from the sorted simulated values to discriminate each corresponding data series). PSS = hits hits + misses − false alarms false alarms + correct negatives (3) 20 Such percentile is a good tradeoff between being representative of high flow values and including a sufficient number of events to draw robust statistics. Data series for comparison include at least five years of data, which corresponds to more than 182 days above the 90 % threshold. The PSS accounts for all elements of the contingency  and is defined as the difference between Probability of detection (POD) and Probability of false detection (POFD), PSS = POD − POFD. Perfect forecasts have PSS = 1, while forecasts have no value when PSS ≤ 0.

Performance of the early warning system
The early warning system, as described in Sect. 2.3, has been set up and runs opera-5 tionally since July 2011. To evaluate the forecast performance, the system was run in hindcast mode for the period 1 January 2009 to 31 December 2010. 730 sets (i.e., one per day) of 45-day ensemble streamflow predictions (ESP) were evaluated against discharge proxy simulations for the same period, taken from the simulated discharge climatology obtained using ERA-Interim/Land runoff as forcing. Differently from the anal-10 ysis in the previous section this approach enables the performance evaluation at each grid point of the simulated river network. Furthermore, as the datasets of streamflow predictions and proxy simulations are generated by the same hydrological model, this type of analysis focuses on the skills of the ensemble weather predictions. Indeed, it allows one to draw indications on the maximum forecast horizon (or potential skill) for 15 which the system yields valuable information. In general, we expect results to be mainly influenced by (i) the skills of 15-day weather predictions and by (ii) the upstream area of each selected river point, which is correlated to the lag time between rainfall events and the subsequent flow hydrographs. This can yield an extension of the forecast lead time beyond the time window for which weather forecasts are available and contribute 20 to the assessment of the limits of predictability (Thielen et al., 2009b). Initial conditions of the hydrological model were taken from the climatological run for the first day of simulation (i.e., 1 January 2009) and were then calculated for the following days, up to the 31 December 2010, by using the forecast fields of the first day of the VarEPS control run. Current ERA-Interim data availability would allow the model 25 to update its initial conditions roughly on a monthly basis, to avoid significant drifts of the simulated initial conditions from the climatological run. To account for this improvement in the evaluation of the 2-yr forecasts, a bias correction technique was applied to adjust 12307 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the initial conditions of the starting day of each forecast with those of the climatological run. The correction was performed through a quantile matching over a 30-day window, which reproduces similar error structure as of a monthly update of initial conditions with ERA-Interim input data. The resulting discharge dataset is hereinafter referred to as corrected discharge climatology. 5 Ensemble streamflow predictions were evaluated by means of a twofold approach. The Continuous Rank Probability Skill Score (CRPSS, see e.g., Hersbach, 2000; Voisin et al., 2010) is used to evaluate the quantitative skills of prediction, while the Area under the Receiver Operating Characteristic (AROC, see Marzban, 2004;Wilks, 2006) is calculated to assess the performance in threshold exceedance analysis. 10 The CRPSS is defined as: and 15 F 0 (y) = 0, y < observed value 1, y ≥ observed value (6) while F (y) is the stepwise cumulative distribution function ( Receiver Operating Characteristic (ROC) curves are widely used to measure the skill of dichotomous forecasts based on probabilistic information, as they plot the empirical relation between the Hit Rate (HR) and False Alarm Rate (FAR) for different probability thresholds (Alfieri et al., 2012b). The overall performance of ensemble forecasts in predicting threshold exceedances can be assessed though the area under the ROC 10 curve, which summarizes the system skill for all the probability thresholds, which in the discrete case are as many as the ensemble size. AROC values range between 0 (i.e., forecasts are exactly the opposite of observations) and 1 (perfect match between predicted and observed threshold exceedances). AROC = 0.5 corresponds to random forecasts, while meteorological ensemble predictions are commonly consid-15 ered as useful when AROC ≥ 0.7 (e.g., Buizza et al., 1999).

Evaluation of the hydrological modeling
The coefficient of variation as defined in Eq. (1) is shown in the map in Fig. 2. In 60 % of points, the CV is found smaller than 1, denoting a variability of estimation lower than the 20 observed mean discharge. Poorest performance is mainly found in arid and semi-arid regions, particularly in Australia, Mexico and in the Sahel. This can be due to incorrect modeling of some hydrological processes such as evapotranspiration, infiltration and lack of simulated water withdrawals for irrigation purpose. However, one should note that in arid regions, results calculated with the CV as defined above are penalized by Introduction high flow conditions. Similar consideration can be drawn for small river basins, such as the yellow/orange circles in the USA and Europe shown in Fig. 2. Indeed, it is known that the ratio between peak flow and average flow rises with decreasing basin area, hence increasing the weight of estimation residuals in Eq. (1). In addition, clusters of points with CV > 1 are located in north-eastern Brazil and west-Africa, where the model 5 performance is often substantially affected by dam regulation, and in the north-eastern Russia, where most discrepancies are related to the modeling of freezing cycles, snow accumulation and melting processes. In Fig. 5 the PCC is plotted against the upstream area of each gauge. In addition the gauge latitude is shown with a color shading ranging from red at the Equator, to blue 10 at high latitudes. Fig. 5 shows that the best correlations are on average achieved in large river basins (i.e., upstream area larger than 10 000 km 2 ) in inter-tropical latitudes.
Overall, 71 % of points have PCC larger than 0.5. The envelope curve of highest PCC values shows an increasing trend with the upstream area. In fact, the typical scales of weather events inducing floods in small river basins are below the spatial and tempo-15 ral resolution of the hydrological model and of the meteorological input data used in simulation, as well as of the observations used for validation. The Peirce's skill score (PSS) for the set of selected stations is shown in Fig. 6. 98.5 % of stations provide valuable simulated values (i.e., PSS > 0), while PSS > 0.25 and PSS > 0.5 is found in 79 % and 22 % of cases respectively. It is worth noting in 20 Fig. 6 that positive skills are achieved at several stations in dry regions where the estimation error showed considerable variability in Fig. 2 (e.g., NE Brazil, Africa, Australia). In those regions medium to low flows are difficult to estimate accurately because of dam regulation and water abstraction for irrigation. On the other hand, floods and high flows, and particularly their percentile rank, are less influenced by reservoirs which of-25 ten have limited storage for flood mitigation. Regarding negative PSS values, 8 out of 9 points in total are located in Canada and have relatively small upstream area, in all cases below 50 000 km 2 . Graphs comparing the observed and simulated time series (not shown) suggest that the mismatch in those points is due to incorrect modeling of the snow-related processes or to biased input temperatures in the model, which induces a substantial delay between observed and simulated flow peaks.

Performance of the early warning system
CRPSS maps for the 2 yr of ensemble streamflow prediction (i.e., 2009-2010) were calculated for each selected forecast lead time from 1 to 45 days. CRPSS maps with 5 lead time of 5, 15 and 25 days are shown in Fig. 7. To improve the figure readability, only river pixels with upstream area larger than 50 000 km 2 are plotted. Valuable quantitative ESP are indicated with blue shadings in Fig. 7, that is where CRPSS > 0, while in red are indicated those rivers where a reference persistent forecast performs quantitatively better. As expected, the CRPSS deteriorates for increasing forecast horizons, 10 particularly in smaller rivers. Poorest performance is shown in northern cold regions, mostly in Asian and North American rivers. In large river basins in inter-tropical and mid-latitude regions (e.g., Amazon, Mississippi, Congo, Nile, Paraná) the ESP perform better than the reference forecast, especially for longer lead times. In fact, in such rivers the runoff has very slow and delayed response, hence for short lead times (e.g., 15 5 days) the difference between the ESP and a persistent forecast is not substantial. On the other hand, smaller river basins often have their highest CRPSS for shorter lead times, while it decreases fast after 15-day lead time, when no meteorological forcing is used as input.
The threshold exceedance analysis is evaluated through the use of ROC curves and 20 specifically the area under these curves. As discussed in Sect. 3.1, the threshold between events and non-events is set to the 90th percentile of the corrected discharge climatology. Although it usually does not correspond to flooding conditions, it is important to select a discharge value that was reached at every river pixel during the 2 yr of simulation, so that the skill score can be calculated for the whole domain. Results of 25 this analysis are drawn in Fig. 8, which shows the maximum lead time over which forecasts are skillful (i.e., AROC > 0.7, as stated in Sect. 3.2). Spatial pattern of results in Fig. 8 is widely in agreement with those of Fig. 7. Longest lead times are found in large 12311 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | river basins in South America, Africa, South Asia, with values exceeding 25 days in some areas. Smaller river basins mostly achieve maximum forecast lead times around 20 days, while in some cases they are limited within 10 days. Results from the ESP as calculated by the proposed model and shown in Fig. 7 and Fig. 8 should be filtered by excluding regions where no significant river network and runoff exists. These in-5 clude deserted areas such as the Sahara, Arctic, Gobi, Arabian and Australian Desert, among the largest. Unexpectedly, in the lowest part of the Mississippi River, in North America, maximum values are within 10 days, despite having skillful CRPSS for lead times as long as 25 days (see Fig. 7). In other words, while quantitative streamflow predictions in the Mississippi are on average rather accurate even for long lead times, 10 high flow events above the 90th percentiles are skillfully detected only for a shorter time horizon. The reasons for such behavior are mostly related to a delay in the discharge peak for the main event within the considered period, occurred in autumn 2009 (see Fig. 9). In Fig. 9, ESP and corrected discharge climatology are compared for the 2 yr of 15 available forecast (i.e., 2009-2010). These are shown at the outlet of six major river basins in different climatic regions, for forecast lead times of 15 and 25 days. Outlets location and name initials of each river are shown with red markers in Fig. 8. In all cases shown, the ensemble spread is relatively narrow, as in such large river basins the runoff is mostly driven by the initial conditions and, specifically, by water already 20 in the river network at the start of the forecast, and that is conveyed downstream by the hydrological model. At all locations the ensemble spread is larger for the longest lead time shown, reflecting the increasing uncertainty range as the lead time increases. However, graphs with longer forecast lead time (not shown in the article) suggest that after reaching its maximum, the ensemble spread tends to reduce after the predicted 25 rainfall has drained through the basin outlet. This is the consequence of using 15 days of rainfall but simulating a longer lead time, which means that the ESP spread is increasingly underestimated after the day 15 of simulation. In five out of six stations in Fig. 9, the runoff regime follows a clear seasonal trend, with peak flows always in the same range of months, depending on the rainfall regime and on the timing of snow and ice accumulation and melting. Differently, in the Mississippi River, the runoff regime is more variable and high flows occurred in different seasons. This partly explains the results shown in Fig. 7, where the ESP performs quantitatively better than a persistent forecast also for long lead times (i.e., 25 days). Graphs in Fig. 9 shows that the ESP 5 spread is higher when the hydrographs have increasing trend, because of the uncertainty of forecast rainfall. On the other hand, as the reference simulation and the ESP are outputs of the same hydrological model, results matches very well in the recession part of the hydrographs, that is when little rainfall is forecast or during the period of snow accumulation. It is worth noting that the highest spread of the EPS occurs in the Yenisei River, where the snow and ice melting in the spring season play a prominent role in generating high flows. As a result the ensemble spread is amplified as the uncertainty of both rainfall and temperature affects the streamflow forecast.

Case study -2010 Pakistan floods
The system demonstrated its potential by detecting a number of flooding events of the lead time, though it almost completely exceeds the severe alert level from the 8 to the 12 August. The following forecasts confirmed these results indicating, for the same station, a maximum probability to exceed the severe alert level of 100 % from the 7 August onwards. On 9 August, the BBC reported that the measured discharge through the Sukkur Barrage was up to 1.4 million cubic feet per second (cusecs), i.e., about 5 39 600 m 3 s −1 , way higher than its design capacity of 900 000 cusecs. Also, Fig. 11 shows a comparison between satellite images taken on 10 July 2010 (top) and on 11 August 2010 (bottom) from MODIS Rapid Response. In the latter the extent of flooded areas (with dark shades) is clearly visible for a wide portion of the Indus River Basin.
In the top panel of Fig. 11, the maximum probability of exceeding the severe warn-10 ing threshold over the forecast range (i.e., 20-yr return period) is indicated with purple shades (ESP of 28 July 2010).

Discussions and conclusions
In this article we present a probabilistic flood early warning system running at global scale, aimed at forecasting the threshold exceedance of ensemble streamflow predic- 15 tions on the basis of a model-consistent discharge climatology. The system has been set up and runs on a daily basis since July 2011. Results are shown on a passwordprotected website and are being monitored to assess qualitatively the system skills for flood events in large river basins. Quantitative performance has been assessed for 2 years of daily hindcasts starting on 1 January 2009, using a simulated climatology as 20 reference run. Findings of this analysis show that current ensemble weather predictions enable skillful detection of hazardous events with forecast horizon as long as 1 month, in large river basins. This anticipation depends on the skill of input weather forecasts and on the delay between the meteorological forcing and the hydrological response 25 in the river basin. Interestingly, the uncertainty range of ensemble weather predictions has a reduced effect when propagated to discharge predictions in large river basins. Indeed, flood events in major rivers are mostly caused by large scale weather systems that are skillfully predicted by state-of-the-art global forecasting models. In addition, when weather systems have smaller or similar size as that of the river basin, spatial shifts of predicted rainfall fields have limited effect on the resulting streamflow at the outlet.

5
With regard to the system performance in quantitative forecasting and early warning, the maximum added value is shown (i) in medium-size river basins, (ii) in those with relatively fast response and (iii) in basins with no definite trend in the seasonal runoff. At the lower boundary of the range of basin size, forecast performance deteriorates quickly with increasing lead time and with decreasing upstream area. Indeed, in these river basins, flood events are caused by small size weather systems which cannot be properly modeled by the current system, as the model space-time resolution is comparatively coarse for their typical hydro-meteorological dynamics. Consequently, on the basis of the analysis performed in this work the authors suggest a lower boundary of 10 000 km 2 as the minimum upstream area to consider for streamflow predictions 15 provided by the model.
In contrast, in the largest world river basins (i.e., basin area larger than 1 million km 2 ) variations of river discharge occur at slow rates, hence the 1 to 10-day streamflow prediction does not differ substantially from a persistent forecast (i.e., the last observed discharge value). On the other hand, results for these basins show skillful predictions 20 for lead times up to one month, whereas the highest added value compared to persistent forecast is provided for lead times of 10 ÷ 30 days (see Fig. 7). Besides the slow response, large river basins have long memory, so even small errors in model components such as snow accumulation and soil moisture can sum up over long time and induce a considerable bias in the water balance. An accurate estimation of the initial 25 model state is therefore of crucial importance for the overall system performance. This can be achieved by regularly updating the water balance using the latest input data from ERA-Interim reanalysis, to improve the consistency between ensemble forecasts and the climatological warning thresholds. This work shows the system setup and skills in its initial stage, that is, no calibration has been performed on the hydrological model behind. This is an important step for future improvements, particularly for a global system which therefore includes the full range of climates and hydrological regimes of the Earth. Results in Figs. 7-8 show the current system potential assuming that the simulated climatology corresponds to the 5 actual river conditions, that is, for a perfectly calibrated hydrological model. The presented research work shows that there is substantial room for improving the current model parameterization, with particular focus on hydrological regimes in arid and cold regions. However, errors coming from the hydrological modeling and from the weather predictions do not sum up linearly in the assessment of the overall system perfor-10 mance. As stated in Sect. 3.1, the main goal of an early warning system is to match the percentile rank of each simulated and observed discharge, rather than optimizing quantitative values. In addition, the model capability would also benefit by improved weather forecasts and possibly by the use of input data with longer forecast horizon. In this regard, the use of monthly ECMWF VarEPS forecasts -currently issued twice per 15 week -is envisaged for future system applications.

HESSD
As a final remark, the current system is based on warning thresholds with fixed probability levels, corresponding to selected return periods. Actual flood risk also depends on the vulnerability of each area. For instance, in little-populated areas or in regions with prominent flood defense works, the 100 yr discharge may cause limited economic 20 damage. Conversely in densely populated areas with poor flood protection measures, peak discharges with relatively low return period can cause severe damage. The coupling of hazard and vulnerability maps would be extremely beneficial for this system, in order to rank warnings according to the potential economic damage that floods can cause as well as to the corresponding affected population.