We explore the memory properties of catchments for predicting the likelihood of floods based on observations of average flows in pre-flood seasons. Our approach assumes that flood formation is driven by the superimposition of short- and long-term perturbations. The former is given by the short-term meteorological forcing leading to infiltration and/or saturation excess, while the latter is originated by higher-than-usual storage in the catchment. To exploit the above sensitivity to long-term perturbations, a meta-Gaussian model and a data assimilation approach are implemented for updating the flood frequency distribution a season in advance. Accordingly, the peak flow in the flood season is predicted in probabilistic terms by exploiting its dependence on the average flow in the antecedent seasons. We focus on the Po River at Pontelagoscuro and the Danube River at Bratislava. We found that the shape of the flood frequency distribution is noticeably impacted by higher-than-usual flows occurring up to several months earlier. The proposed technique may allow one to reduce the uncertainty associated with the estimation of flood frequency.

The physical, chemical and ecological states of processes leading to the formation and quality of river flow are characterized by persistence at several different timescales (Koutsoyiannis, 2014). In fact, anomalous conditions for such processes, such as those generated by extreme meteorological events, may produce a long-lasting impact on the river flow, depending on climatic and catchment behaviors (Lo and Famiglietti, 2010). For instance, flood generation is impacted by the initial soil moisture condition of the catchment, which may in turn be impacted by groundwater levels that are related to global catchment storage (Massari et al., 2014). Persistence can be exploited to improve river flow forecasting at seasonal to interannual timescales. Furthermore, persistence provides useful indications to better understand the functioning of a catchment and the dynamics of the water cycle.

Indeed, the study of persistence has been one of the most classical research endeavors in hydrology since the early works by Rippl (1883) and Hazen (1914) on the estimation of the optimal storage for reservoirs. Hurst (1951) investigated the Nile River flows while working at the design of the Aswan Dam and postulated that geophysical records may be affected by a complex form of persistence that may last for a long time (O'Connell et al., 2016). Later on, Thomas and Fiering (1962) and Yevjevich (1963) introduced autoregressive models for annual and seasonal streamflow simulation, thereby stimulating the development of subsequent models of increasing complexity for simulating hydrological persistence.

Recently, attention has been focused on long-term persistence (LTP), which is associated with the Hurst–Kolmogorov behavior (Koutsoyiannis, 2011). LTP manifests itself through a power-law decay of the autocorrelation function of the process, which implies that the summation of the autocorrelation coefficients diverges to infinity (Montanari et al., 1997). LTP implies the possible presence of long-term cycles (Beran, 1994), which in turn means that perturbations of hydrological processes may last for a long time, thereby providing a possible explanation for the occurrence of clusters of extreme hydrological events, such as floods and droughts (Montanari, 2012). LTP also has implications in the study of climate change, as it is connected with an enhanced natural variability of climatic processes (Koutsoyiannis and Montanari, 2007).

While LTP has been long studied, limited attempts have been made to exploit LTP in data assimilation procedures for improving streamflow forecasting. The motivation probably is that LTP is recognized to exert a noticeable impact on the river flow volume over long timescales, while its effect on the magnitude of single events is less noticeable. Nevertheless, the presence of LTP and seasonal correlation necessarily affects flood frequency, to an extent that has been poorly explored.

The present contribution aims to enhance our understanding of the persistence properties of river flows to improve seasonal river flow forecasting. By taking inspiration from the idea that the probability of extreme floods may be increased by long-term stress, like higher-than-usual rainfall lasting for several months, the research question that we address here can be stated as follows: can higher-than-usual river discharges in the previous season be associated with a higher probability of floods in the subsequent high-flow season? The quantification of the effect of antecedent flows for different time lags on the occurrence of floods would help to assess how long a river remembers its past (Aguilar et al., 2016). From a technical point of view, we aim to propose a technique for updating a season in advance of the flood frequency distribution estimated for a given river, through a data assimilation approach, by exploiting the information provided by river flows in the pre-flood seasons.

It is interesting to highlight that the state of a catchment, and in particular its storage, is affected by previous precipitation. Therefore, it would be reasonable to exploit the information provided by previous rainfall rather than previous flows for the sake of updating the flood frequency distribution. However, areal rainfall estimation for catchments with large extension and complex orography is affected by large uncertainty (Moulin et al., 2009). Therefore, we utilize here flows during pre-flood seasons as a proxy for catchment storage instead of rainfall. While the above assumption may be reasonable, one should consider that it may not hold when the river flows are impacted by massive regulation.

We focus our attention on two large basins, namely, the Po River basin at
Pontelagoscuro (Italy) and the Danube River basin at Bratislava (Slovakia).
The Po River is the longest river entirely flowing in the Italian Peninsula
(Fig. 1) with a catchment area of about 71 000 km

Study sites. Danube River basin at Bratislava and Po River basin at Pontelagoscuro.

Daily mean value

Data description of observed time series. Descriptive statistics are given for non-deseasonalized data.

The upper Danube basin drains from the northern side of the Alps and the
southern area of the central European Highlands into Bratislava in a
131 331 km

Daily discharge and monthly precipitation and temperature data for the Po and Danube river basins were analyzed in this study. The observation periods as well as descriptive statistics of the different time series are shown in Table 1. Discharge time series at Pontelagoscuro for the Po River and Bratislava for the Danube River were provided, respectively, by the Regional Agency for Environmental Protection (ARPA) – Emilia Romagna, Hydro-meteorological Office and by the Global Runoff Data Center (GRDC, 2011). The series are not affected by missing values. They correspond to a time span of 90 and 107 years for the Po and Danube, respectively.

The Po River is regulated by the presence of several dams as reservoirs for hydroelectricity production, which are mainly located in the Alpine region. Also, the outflow from lakes Como, Garda, Iseo, Idro and Maggiore is regulated (Zanchettin et al., 2008). These regulations do not noticeably impact the trend and the low-frequency variability of the peak flows, while they may affect the low flows at daily and sub-daily timescales (Zampieri et al., 2015). The upper part of the Danube has been ideal for building hydropower plants and up to 59 dams are found along the river's first 1000 km. As stated in the Danube River Basin Management Plan, stretches in the very upper part of the river may present noticeably altered flows. (Maps 7a, b, c in DRBM, 2009). The effect of regulation on peak flows in Slovakia is deemed to be negligible, while low and average flows may be noticeably impacted.

Precipitation and temperature time series were calculated based on weather data sets obtained from the HISTALP project (Auer et al., 2007). Only weather stations where sufficiently long data sets are available were used (Table 1). The study period was conditioned by the availability of discharge data even though both meteorological variables were available for a longer historical period. For each study site, catchment area average precipitation and temperature time series were constructed using Thiessen polygons.

In order to address the research question outlined in Sect. 1, namely, to verify the opportunity of updating the flood frequency distribution a season in advance by exploiting the information provided by the river flow in a given pre-flood season, we perform an analysis of the memory properties of the hydrological cycle in the considered catchments. We first focus on meteorological variables, namely, temperature and mean areal rainfall, to check whether a memory pattern is detectable in the weather. Rainfall and temperature are considered as they are the main drivers of river flow, with temperature being particularly influential on the lower values. Then, we turn to the direct analysis of river flows.

We first estimate the Hurst exponent (

Assessment of long-term persistence for hydrological data has been presented
by several contributions (see, for instance, Szolgayova et al., 2014, and
Zampieri et al., 2015, for analyses carried out for the river flows of the
Danube and Po rivers, respectively). Time series with long-term memory or
persistence exhibit a power-law decay of the autocorrelation function (Beran,
1994), that is,

In this work,

A strong seasonal component in the different hydrological variables in both study time series has been reported by the literature (e.g., Montanari, 2012; Szolgayova et al., 2014; Zampieri et al., 2015). It is well known that a strong seasonality often implies the presence of periodic deterministic components in the data that can introduce a bias in LTP estimation (Montanari et al., 1997, 2000). Also, the presence of slowly decaying or increasing trends may induce a bias as well. Thus, prior to long-term memory assessments, all time series were detrended and deseasonalized. For each time series, 366-term (for daily data) and 13-term (monthly data) moving averages for a trend approximation were applied, followed by a stable seasonal filter for removal of the seasonal cycle (Brockwell et al., 2002).

In order to analyze the stochastic connection between the average river flows
in the antecedent seasons and the average and peak flows in the flood season,
a bivariate probability distribution function was fitted. In what follows,
random variables are identified with a superscript asterisk to distinguish them from their realizations. The
yearly variables analyzed in this study were the following.

The monthly mean flow in the given pre-flood season (independent or
explanatory variable),

The peak flow in the flood season or annual maximum daily flow (dependent
variable),

The mean daily flow in the flood season (dependent variable),

A meta-Gaussian model (Kelly and Krzysztofowicz, 1997; Montanari and Brath, 2004) is used to model the joint probability distribution between the selected explanatory and dependent variables. The method involves the following steps.

First, the time series

The NQT is a non-parametric transformation that can be applied to normalize any arbitrarily distributed random variable. There are numerous applications of the NQT in hydrological studies, to generate flow samples from specified marginal distributions (Moran, 1970; Hosking and Wallis, 1988), to perform Bayesian updating of prior distributions (Kelly and Krzysztofowicz, 1994), and to model bivariate distributions with arbitrary marginal distributions (Krzysztofowicz et al., 1994; Aguilar et al., 2016). The NQT is adopted within the Bayesian Forecasting System for river flows (Krzysztofowicz and Kelly, 2000; Krzysztofowicz and Herr, 2001; Krzysztofowicz and Maranzano 2004a, b; Maranzano and Krzysztofowicz, 2004). It was also applied for assessing the uncertainty of rainfall–runoff simulations (Montanari and Brath, 2004; Montanari and Grossi, 2008; Bogner et al., 2012), to deseasonalize hydrological time series (Montanari, 2005). Being free of any distributional assumption, the NQT allows one to avoid the selection of a suitable parametric model for the distribution of the considered hydrological variable.

The NQT involves the following steps when we take

Finally, the meta-Gaussian model (Kelly and Krzysztofowicz, 1997; Montanari
and Brath, 2004) is fitted between the random explanatory variable and each
random dependent variable in their canonical form in the Gaussian domain. In
what follows, we specify the equations for the peak flow as the dependent
variable. We assume (1) stationarity and ergodicity of both

In order to verify the validity of the linear model (Eq. 2), an evaluation
based on the behavior of the residuals is applied. Following the graphical
approach proposed by Cook and Weisberg (1994), the residual plot of

The same methodology was applied for the other dependent variable considered
in this study,

The proposed methodology involves uncertainty in the estimated flood
frequency distributions which is mainly given by two sources: the first is
uncertainty in the NQT, namely, uncertainty in the estimation of the marginal
probability distribution of independent and dependent variables in the
regression. The second source of uncertainty is related to the estimation of
the cross-correlation coefficient between dependent and independent variables
in the Gaussian domain. The NQT is a non-parametric transformation and
therefore its uncertainty cannot be determined quantitatively (Maranzano and
Krzytofowicz, 2004; Montanari and Brath, 2004). To reduce uncertainty, it is
advisable that NQT is estimated by using long records encompassing a wide
range of meteorological and hydrological conditions. Uncertainty in the
cross-correlation coefficient can be quantified for a given confidence level
and again depends on the length of the records. A quantitative estimation of
uncertainty for the cross-correlation coefficient was carried out in both
study sites. Uncertainty bounds at the 95 % confidence level are computed
by first computing Fisher's transformation,

In order to infer the actual impact of the dependence between peak flows and mean flow in the flood season with the mean flow in the pre-flood seasons, the unconditioned flood frequency distribution and the updated distributions inferred for several higher-than-average values of mean flow (e.g., 70, 80 and 95 % quantiles) in a given pre-flood season were compared. We assume that peak flows can be adequately modeled through the Extreme Value Type 1 (EV1) distribution and we present a comparison between the unconditioned peak flows frequency distribution and the updated peak flows frequency distributions.

Finally, a leave-one-out validation analysis was carried out to emulate a real-world application. We removed from the analysis the data observed in the year with the wettest pre-flood season (1977 in the Po and 1944 in the Danube) and then we estimated the probability distribution for the peak flow in the flood season for that year. Uncertainty was estimated for this application.

According to previous studies in the literature, directional statistics
(Mardia, 1972) represents an effective method for identifying the timing of
hydrological extreme events (e.g., Castellarin et al., 2001; Cunderlik and
Burn, 2002; Baratti et al., 2012). Following Bayliss and Jones (1993), the
date of occurrence of an event

Estimated

Seasonality space representation of the annual maximum daily flows (AMD) and high-flow events. Dots around the global value indicate the dispersion.

Pearson's cross-correlation coefficient and its 95 %
confidence interval between both,

Residual plot of the linear regression of

The application of the heuristic methods for LTP estimation to deseasonalized
and detrended time series is displayed in Table 2.

In general, these results agree with previous outcomes of long-term
persistence studies for the daily discharge of the Po at Pontelagoscuro
(Montanari, 2012) as well as with previous studies on the daily river flows
in an upstream tributary of the Po (

Figure 3 shows the results of the directional statistics applied to the
extreme events in both rivers. In the Po River, we can see a very low
regularity (

In the Danube, we find a considerable regularity in high-flow events
(

As the pre-flood season, we consider a 1-month period, which is long enough in order to reduce the effect of river regulation. We first set the month preceding the flood season (i.e., September and April for the Po and Danube, respectively) as the pre-flood season. Then, we repeat the analysis by making reference to the previous months, with the expectation that the statistical dependence will decrease as the pre-flood season is moved back into the past.

Probability distribution functions of the normalized dependent
variables (

Probability distribution functions of the normalized dependent
variables (

Table 3 shows the cross-correlation coefficients

The only anomalous correlation is found when considering the

An evaluation was carried out for the meta-Gaussian model by using residual plots (Montanari and Brath, 2004). Figure 4 shows the residuals for a time span of 4 months backwards from the flood season at each study site. The residuals look homoscedastic, thereby confirming that the model assumptions about the residuals' behavior are justified.

Peak flows in the flood season (October–November in the Po, May–July in the upper Danube) vs. the return period modeled through the EV1 distribution function. Quantiles refer to mean flows higher than usual in the previous month.

Leave-one-out cross validation. Unconditioned EV1 probability distribution of peak flows for the year with the wettest pre-flood season (1977 in the Po, 1944 in the upper Danube) along with conditioned distributions with related 95 % confidence bands.

In order to decipher the technical benefit that can be gained by updating the flood frequency distribution through the proposed data assimilation procedure, we assumed that above-average river flows are observed in the month preceding the flood season and then applied the meta-Gaussian model to estimate the updated probability distribution. In detail, we assume that, on average, monthly flow corresponding to the 70, 80 and 95 % quantiles is observed in September for the Po River and April for the Danube River.

Figures 5 and 6 show the unconditioned and updated probability density
functions (pdfs) of the normalized peak flow (i.e., the peak flow transformed
to the canonical Gaussian distribution). As one would expect, the results
show that the higher the cross-correlation value, the lower the variability
in the distribution of the normalized dependent variable and the higher the
mean value. For example, in the Po River for the occurrence of the 95th
quantile value in the normalized mean flow in September, the pdf is centered
around a mean value of 0.4 and presents a standard deviation of 0.97
(Fig. 5). In contrast, if one attempts to estimate the probability
distribution of

In the upper Danube a similar scheme is found with the mean of the
probability distribution of

Figure 7 shows the comparison between the unconditioned flood frequency
distribution and the updated distributions in the untransformed domain when
the flow in the previous month (September for the Po River, April for the
Danube River) is higher than usual (70, 80 and 95 % quantile). For
example, in the Po River, the unconditioned flood for a return period of
200 years, whose results equal 12 507 m

After removing from the analysis the observations of the years 1977 for the Po River and 1944 for the Danube River, which are the previous flood season wettest years on record, an emulation of a 1-month-ahead real-time prediction of the probability distribution of the flood flows in the next flood season was developed, along with uncertainty estimation as described in Sect. 3.2. Figure 8 shows, for both the Po and Danube rivers, the unconditioned flood frequency distribution along with the updated one; 95 % confidence bands for the latter are also shown. It can be seen that the proposed procedure allows one to obtain an effective update in real-world applications.

The analysis of the observed mean daily flow values suggests the existence of
LTP in both study sites with

The methodology herein proposed can be applied to any other study site once the parameters of the meta-Gaussian model confirm the presence of the above stochastic dependence. Like in any time series analysis method, records that encompass a wide range of meteorological and hydrological conditions should be used to minimize uncertainty, which is in this case related to the estimation of the correlation coefficient and standardization of the regression variables. Finally, other explanatory variables (e.g., rainfall, snowmelt) can be incorporated to profit from additional stochastic dependence among peak flows and the state of the catchment and external forcings.

The findings presented in this paper highlight the fact that river memory has an impact on flood formation and should then be properly considered for real-time management of flood risk mitigation and resilience of societal settings to floods. The procedure herein described can provide useful information in those cases where the memory of the catchment is supposed to persist for a long time. These conditions may occur when the precipitation–runoff transformation is characterized by a slow development. Memory is frequently found to be related to the storage capacity of the catchment and the complexity of the river network. Therefore, they may be indicators of potentially useful results from the proposed approach.

The discharge time series at Pontelagoscuro for the Po River
were obtained from

The authors declare that they have no conflict of interest.

The present work was (partially) developed within the framework of the Panta Rhei Research Initiative of the International Association of Hydrological Sciences (IAHS). Part of the results were elaborated in the Switch-On Virtual Water Science Laboratory that was developed in the context of the SWITCH-ON (Sharing Water-related Information to Tackle Changes in the Hydrosphere – for Operational Needs) project, funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 603587. Cristina Aguilar acknowledges funding by both the Juan de la Cierva Fellowship Programme of the Spanish Ministry of Economy and Competitiveness (JCI-2012-12802) and the José Castillejo Programme of the Spanish Ministry of Education, Culture and Sports (CAS14/00432). Edited by: Stacey Archfield Reviewed by: two anonymous referees