Real-time updating of the flood frequency distribution through data assimilation

We explore the memory properties of catchments for predicting the likelihood of floods based on observations of average flows in pre-flood seasons. Our approach assumes that flood formation is driven by the superimposition of shortand long-term perturbations. The former is given by the short-term meteorological forcing leading to infiltration and/or saturation excess, while the latter is originated by higher-than-usual storage in the catchment. To exploit the above sensitivity to long-term perturbations, a metaGaussian model and a data assimilation approach are implemented for updating the flood frequency distribution a season in advance. Accordingly, the peak flow in the flood season is predicted in probabilistic terms by exploiting its dependence on the average flow in the antecedent seasons. We focus on the Po River at Pontelagoscuro and the Danube River at Bratislava. We found that the shape of the flood frequency distribution is noticeably impacted by higher-thanusual flows occurring up to several months earlier. The proposed technique may allow one to reduce the uncertainty associated with the estimation of flood frequency.


Introduction
The physical, chemical and ecological states of processes leading to the formation and quality of river flow are characterized by persistence at several different timescales (Koutsoyiannis, 2014).In fact, anomalous conditions for such processes, such as those generated by extreme meteorological events, may produce a long-lasting impact on the river flow, depending on climatic and catchment behaviors (Lo and Famiglietti, 2010).For instance, flood generation is impacted by the initial soil moisture condition of the catchment, which may in turn be impacted by groundwater levels that are related to global catchment storage (Massari et al., 2014).Persistence can be exploited to improve river flow forecasting at seasonal to interannual timescales.Furthermore, persistence provides useful indications to better understand the functioning of a catchment and the dynamics of the water cycle.
Indeed, the study of persistence has been one of the most classical research endeavors in hydrology since the early works by Rippl (1883) and Hazen (1914) on the estimation of the optimal storage for reservoirs.Hurst (1951) investigated the Nile River flows while working at the design of the Aswan Dam and postulated that geophysical records may be affected by a complex form of persistence that may last for a long time (O'Connell et al., 2016).Later on, Thomas and Fiering (1962) and Yevjevich (1963) introduced autoregressive models for annual and seasonal streamflow simulation, thereby stimulating the development of subsequent models of increasing complexity for simulating hydrological persistence.
Recently, attention has been focused on long-term persistence (LTP), which is associated with the Hurst-Kolmogorov behavior (Koutsoyiannis, 2011).LTP manifests itself through a power-law decay of the autocorrelation function of the process, which implies that the summation of the autocorrelation coefficients diverges to infinity (Montanari et al., 1997).LTP implies the possible presence of long-term cycles (Beran, 1994), which in turn means that perturbations of hydrological processes may last for a long time, thereby providing a possible explanation for the occurrence of clusters Published by Copernicus Publications on behalf of the European Geosciences Union.
of extreme hydrological events, such as floods and droughts (Montanari, 2012).LTP also has implications in the study of climate change, as it is connected with an enhanced natural variability of climatic processes (Koutsoyiannis and Montanari, 2007).
While LTP has been long studied, limited attempts have been made to exploit LTP in data assimilation procedures for improving streamflow forecasting.The motivation probably is that LTP is recognized to exert a noticeable impact on the river flow volume over long timescales, while its effect on the magnitude of single events is less noticeable.Nevertheless, the presence of LTP and seasonal correlation necessarily affects flood frequency, to an extent that has been poorly explored.
The present contribution aims to enhance our understanding of the persistence properties of river flows to improve seasonal river flow forecasting.By taking inspiration from the idea that the probability of extreme floods may be increased by long-term stress, like higher-than-usual rainfall lasting for several months, the research question that we address here can be stated as follows: can higher-than-usual river discharges in the previous season be associated with a higher probability of floods in the subsequent high-flow season?The quantification of the effect of antecedent flows for different time lags on the occurrence of floods would help to assess how long a river remembers its past (Aguilar et al., 2016).From a technical point of view, we aim to propose a technique for updating a season in advance of the flood frequency distribution estimated for a given river, through a data assimilation approach, by exploiting the information provided by river flows in the pre-flood seasons.
It is interesting to highlight that the state of a catchment, and in particular its storage, is affected by previous precipitation.Therefore, it would be reasonable to exploit the information provided by previous rainfall rather than previous flows for the sake of updating the flood frequency distribution.However, areal rainfall estimation for catchments with large extension and complex orography is affected by large uncertainty (Moulin et al., 2009).Therefore, we utilize here flows during pre-flood seasons as a proxy for catchment storage instead of rainfall.While the above assumption may be reasonable, one should consider that it may not hold when the river flows are impacted by massive regulation.

Study sites and data sources
We focus our attention on two large basins, namely, the Po River basin at Pontelagoscuro (Italy) and the Danube River basin at Bratislava (Slovakia).The Po River is the longest river entirely flowing in the Italian Peninsula (Fig. 1) with a catchment area of about 71 000 km 2 at the delta.The average annual precipitation in the catchment is 78 km 3 in volume, of which 60 % reaches the closure river cross section at Pontelagoscuro.The hydrological behavior of the Po River is described in detail in recent studies (Zanchettin et al., 2008;Montanari, 2012;Zampieri et al., 2015).The discharge pattern at Pontelagoscuro presents a mean annual flow of about 1470 m 3 s −1 and shows a typical pluvial regime, and thus a strong seasonality with two flood seasons in spring and autumn (Fig. 2).An intense exploitation of water resources for irrigation, hydro-power production, and civil and industrial use is found in the catchment.Even though water resources management is currently sustainable on average, critical situations are experienced during drought periods (Montanari, 2012).
The upper Danube basin drains from the northern side of the Alps and the southern area of the central European Highlands into Bratislava in a 131 331 km 2 catchment area where the mean annual flow is about 2053 m 3 s −1 .The hydrological behavior of the upper Danube basin can be found in detail in the literature (Nester et al., 2011;Blöschl et al., 2013).The average annual precipitation in the catchment is 123 km 3 and the discharge pattern shows a typical alpine regime and thus a strong seasonality, with one flood season in the summer (Fig. 2).
Daily discharge and monthly precipitation and temperature data for the Po and Danube river basins were analyzed in this study.The observation periods as well as descriptive statistics of the different time series are shown in Table 1.Discharge time series at Pontelagoscuro for the Po River and Bratislava for the Danube River were provided, respectively, by the Regional Agency for Environmental Protection (ARPA) -Emilia Romagna, Hydro-meteorological Office and by the Global Runoff Data Center (GRDC, 2011).The series are not affected by missing values.They correspond to a time span of 90 and 107 years for the Po and Danube, respectively.The Po River is regulated by the presence of several dams as reservoirs for hydroelectricity production, which are mainly located in the Alpine region.Also, the outflow from lakes Como, Garda, Iseo, Idro and Maggiore is regulated (Zanchettin et al., 2008).These regulations do not noticeably impact the trend and the low-frequency variability of the peak flows, while they may affect the low flows at daily and sub-daily timescales (Zampieri et al., 2015).The upper part of the Danube has been ideal for building hydropower plants and up to 59 dams are found along the river's first 1000 km.As stated in the Danube River Basin Management Plan, stretches in the very upper part of the river may present noticeably altered flows.(Maps 7a, b, c in DRBM, 2009).The effect of regulation on peak flows in Slovakia is deemed to be negligible, while low and average flows may be noticeably impacted.
Precipitation and temperature time series were calculated based on weather data sets obtained from the HISTALP project (Auer et al., 2007).Only weather stations where sufficiently long data sets are available were used (Table 1).The study period was conditioned by the availability of discharge data even though both meteorological variables were available for a longer historical period.For each study site, catchment area average precipitation and temperature time series were constructed using Thiessen polygons.
In order to address the research question outlined in Sect. 1, namely, to verify the opportunity of updating the flood frequency distribution a season in advance by exploiting the information provided by the river flow in a given pre-flood season, we perform an analysis of the memory properties of the hydrological cycle in the considered catchments.We first focus on meteorological variables, namely, temperature and mean areal rainfall, to check whether a memory pattern is detectable in the weather.Rainfall and temperature are considered as they are the main drivers of river flow, with temperature being particularly influential on the lower values.Then, we turn to the direct analysis of river flows.
We first estimate the Hurst exponent (H ) for the considered time series, to verify whether the hypothesis of the presence of LTP is supported by data evidence.Then, we turn to the analysis of the statistical dependence between the peak flow in the flood season and the average flow during the previous season, to empirically check whether updating the flood frequency distribution produces useful results.Results from the latter analysis are assessed in view of the LTP estimation.

Estimation of long-term persistence
Assessment of long-term persistence for hydrological data has been presented by several contributions (see, for instance, Szolgayova et al., 2014, andZampieri et al., 2015, for analyses carried out for the river flows of the Danube and Po rivers, respectively).Time series with long-term memory or persistence exhibit a power-law decay of the autocorrelation function (Beran, 1994), that is, where ρ(k) is the autocorrelation function of the process at lag k, c k is a constant and H ∈ [0 1] is the Hurst exponent or the intensity of the LTP (Montanari et al., 1997).For a stationary process, H is constrained in the range [0.5, 1).A value equal to 0.5 means the absence of LTP; the higher the H , the higher the intensity of LTP.In this work, H was estimated by using different heuristic methods.In detail, we applied the rescaled range (R/S) analysis, the aggregated variance method (climacogram; see Dimitriadis and Koutsyiannis, 2015), and the differenced variance method.An extended description of numerous methodologies to assess the persistence properties of time series to provide support to the possible presence of the Hurst-Kolmogorov behavior can be found in Taqqu et al. (1995), Montanari et al. (1996Montanari et al. ( , 1997Montanari et al. ( , 2000) ) and Koutsoyiannis (2003).
A strong seasonal component in the different hydrological variables in both study time series has been reported by the literature (e.g., Montanari, 2012;Szolgayova et al., 2014;Zampieri et al., 2015).It is well known that a strong sea-sonality often implies the presence of periodic deterministic components in the data that can introduce a bias in LTP estimation (Montanari et al., 1997(Montanari et al., , 2000)).Also, the presence of slowly decaying or increasing trends may induce a bias as well.Thus, prior to long-term memory assessments, all time series were detrended and deseasonalized.For each time series, 366-term (for daily data) and 13-term (monthly data) moving averages for a trend approximation were applied, followed by a stable seasonal filter for removal of the seasonal cycle (Brockwell et al., 2002).

Analysis of the peak flow dependence on average flows during pre-flood seasons
In order to analyze the stochastic connection between the average river flows in the antecedent seasons and the average and peak flows in the flood season, a bivariate probability distribution function was fitted.In what follows, random variables are identified with a superscript asterisk to distinguish them from their realizations.The yearly variables analyzed in this study were the following.
-The monthly mean flow in the given pre-flood season (independent or explanatory variable), Q * m .
-The peak flow in the flood season or annual maximum daily flow (dependent variable), Q * p .
-The mean daily flow in the flood season (dependent variable), Q * mf .A meta-Gaussian model (Kelly and Krzysztofowicz, 1997;Montanari and Brath, 2004) is used to model the joint probability distribution between the selected explanatory and dependent variables.The method involves the following steps.
First, the time series Q m (t), Q p (t) and Q mf (t) with sample size n, where n is the number of years in the observation period, are extracted from the observed data sets.Then, the normal quantile transform (NQT) is applied in order to make their marginal probability distributions Gaussian, thereby obtaining the normalized observations NQ m (t), NQ p (t) and NQ mf (t).
The NQT is a non-parametric transformation that can be applied to normalize any arbitrarily distributed random variable.There are numerous applications of the NQT in hydrological studies, to generate flow samples from specified marginal distributions (Moran, 1970;Hosking and Wallis, 1988), to perform Bayesian updating of prior distributions (Kelly and Krzysztofowicz, 1994), and to model bivariate distributions with arbitrary marginal distributions (Krzysztofowicz et al., 1994;Aguilar et al., 2016).The NQT is adopted within the Bayesian Forecasting System for river flows (Krzysztofowicz and Kelly, 2000;Krzysztofowicz and Herr, 2001;Krzysztofowicz and Maranzano 2004a, b;Maranzano and Krzysztofowicz, 2004).It was also applied for assessing the uncertainty of rainfall-runoff simulations (Montanari and Brath, 2004;Montanari and Grossi, 2008 with G denoting the standard normal distribution and G −1 its inverse, and associated with the corresponding Q m i .Thus, a discrete mapping of Q m i to its transformed counterpart NQ m i is obtained (Krzysztofowicz, 1997).In order to apply the inverse of the NQT for any NQ m i , linear interpolation is applied to connect the points of the discrete mapping previously obtained.Bogner et al. (2012) propose different parametric and non-parametric approaches for the extrapolation of extreme values.In this study, the region beyond the maximum and the minimum available NQ m i values is covered by linear extrapolation.
Finally, the meta-Gaussian model (Kelly and Krzysztofowicz, 1997;Montanari and Brath, 2004) is fitted between the random explanatory variable and each random dependent variable in their canonical form in the Gaussian domain.In what follows, we specify the equations for the peak flow as the dependent variable.We assume (1) stationarity and ergodicity of both NQ * m and NQ * p ; and (2) that the cross dependence between both NQ * m and NQ * p can be represented by the normal linear equation: where ρ(NQ * m , NQ * p ) is the Pearson cross-correlation coefficient between NQ * m and NQ * p , and N ε is an outcome of the stochastic process N * , which is independent, homoscedastic, stochastically independent of NQ * m and normally distributed with zero mean and variance 1−ρ 2 (NQ * m , NQ * p ).The parameters of the bivariate probability distribution function are the mean (µ(NQ * m ) = 0 and µ(NQ * p ) = 0), the standard deviation (σ (NQ * m ) = 1 and σ (NQ * p ) = 1) of the normalized series, and the Pearson cross-correlation coefficient between both normalized series, ρ(NQ * m , NQ * p ).In the presence of dependence between NQ * m and NQ * p , the correlation coefficient will be significantly different from 0. The bivariate Gaussian distribution implies that, for an arbitrary (observed) NQ m (t), the probability distribution function of NQ * p is Gaussian, with parameters (Eqs.3 and 4) Then, by taking the inverse of the NQT one can infer the updated probability distribution of Q * p conditioned to the observed outcome Q m (t).
In order to verify the validity of the linear model (Eq.2), an evaluation based on the behavior of the residuals is applied.Following the graphical approach proposed by Cook and Weisberg (1994), the residual plot of N ε(t) vs. ρ(NQ * m , NQ * p )• NQ m (t) should not show any systematic trend under the target model.Curve trends or fan shape trends indicate non-linear cross dependence and variability of the variance of N * , respectively (Montanari and Brath, 2004).
The same methodology was applied for the other dependent variable considered in this study, Q * mf .Therefore, once the parameters of each distribution are computed, the probability distribution function of both the peak flow and the mean flow in the flood season can be updated after observing the mean flow in the considered pre-flood season.
The proposed methodology involves uncertainty in the estimated flood frequency distributions which is mainly given by two sources: the first is uncertainty in the NQT, namely, uncertainty in the estimation of the marginal probability distribution of independent and dependent variables in the regression.The second source of uncertainty is related to the estimation of the cross-correlation coefficient between dependent and independent variables in the Gaussian domain.The NQT is a non-parametric transformation and therefore its uncertainty cannot be determined quantitatively (Maranzano and Krzytofowicz, 2004;Montanari and Brath, 2004).To reduce uncertainty, it is advisable that NQT is estimated by using long records encompassing a wide range of meteorological and hydrological conditions.Uncertainty in the cross-correlation coefficient can be quantified for a given confidence level and again depends on the length of the records.A quantitative estimation of uncertainty for the cross-correlation coefficient was carried out in both study sites.Uncertainty bounds at the 95 % confidence level are computed by first computing Fisher's transformation, where the random variable z * is approximately normally distributed with a standard deviation of Therefore, confidence bands for z NQ m , NQ p can be computed at a given confidence level which can be converted to the confidence bands for ρ(NQ * m NQ * p ) by taking Fisher's inverse transformation.If a negative (positive) value for the lower (upper) confidence limit is obtained for a positive (negative) estimated value of ρ NQ * m , NQ * p , then we reset the lower (upper) limit to 0. Finally, the limiting flood frequency distributions can be obtained for the lower and upper values of ρ(NQ * m NQ * p ).In order to infer the actual impact of the dependence between peak flows and mean flow in the flood season with the mean flow in the pre-flood seasons, the unconditioned flood frequency distribution and the updated distributions inferred for several higher-than-average values of mean flow (e.g., 70, 80 and 95 % quantiles) in a given pre-flood season were compared.We assume that peak flows can be adequately modeled through the Extreme Value Type 1 (EV1) distribution and we present a comparison between the unconditioned peak flows frequency distribution and the updated peak flows frequency distributions.
Finally, a leave-one-out validation analysis was carried out to emulate a real-world application.We removed from the analysis the data observed in the year with the wettest preflood season (1977 in the Po and 1944 in the Danube) and then we estimated the probability distribution for the peak flow in the flood season for that year.Uncertainty was estimated for this application.

Identification of the flood season
According to previous studies in the literature, directional statistics (Mardia, 1972) represents an effective method for identifying the timing of hydrological extreme events (e.g., Castellarin et al., 2001;Cunderlik and Burn, 2002;Baratti et al., 2012).Following Bayliss and Jones (1993), the date of occurrence of an event i (e.g., maximum annual daily flow) can be transformed into a directional statistic by converting the Julian date of occurrence, J di , into an angular measure, θ i , through Eq. ( 7): Each date of occurrence can then be written in polar coordinates by means of a vector with a unit magnitude and the direction specified by Eq. ( 7).Therefore, the x p and y p coordinates of the mean of the sample of n dates of occurrence can be computed with Eq. ( 8): The direction, θ , and magnitude, r, of the mean in polar coordinates can then be obtained by Eqs. ( 9) and ( 10), respectively.Equation ( 9) gives a measure of the mean timing of the event for the sample of dates, and can be converted back to a mean Julian date, M D , through Eq. ( 7).Equation ( 10) indicates the regularity or seasonality of the phenomenon.Values of r close to 1 imply a strong regularity in the dates of occurrence of the event considered.In contrast, values of r close to 0 indicate a great dispersion and, thus, a great inter-annual variability in the dates of occurrence of the event throughout Finally, the limits of the occurrence of the phenomenon can quantitatively be identified by adding and subtracting to θ , the standard deviation in radians, σ , given by Eq. ( 11): We applied directional statistics to the following variables in order to identify the flood season in each study site: (1) annual maximum series of daily flows (AMD); (2) high-flow events defined from frequency analysis as those events when the daily discharge exceeds the 95th percentile, Q 95 , for longer than 15 days.Results are shown in a circle plot where each date of occurrence of the variables analyzed in the data set is visible along the perimeter.The month of occurrence of each of the variables can be easily identified.Also, the proximity to the center of the circle of the global value indicates the regularity of the phenomenon, with the highest regularity found in the perimeter of the circle.

Long-term persistence estimation
The application of the heuristic methods for LTP estimation to deseasonalized and detrended time series is displayed in Table 2. H values above 0.5 were obtained for the mean daily river flows in both rivers and, thus, all three heuristic methods detect the presence of noticeable LTP.The intensity of LTP seems to be more or less the same for monthly flow data.
Similarly, H values in monthly temperature data of 0.64 and 0.61 in the Po and Danube, respectively, suggest the presence Table 3. Pearson's cross-correlation coefficient and its 95 % confidence interval between both, NQ p and NQ mf , and NQ m for varying antecedent monthly flow.Flood season in the Po: October-November.Flood season in the Danube: May-July.
Po Danube of LTP in both records.In contrast, the estimated H values in the monthly rainfall data sets are not sensibly higher than 0.5.
In general, these results agree with previous outcomes of long-term persistence studies for the daily discharge of the Po at Pontelagoscuro (Montanari, 2012) as well as with previous studies on the daily river flows in an upstream tributary of the Po (H = 0.71-0.81)and on the monthly rainfall registered at certain weather stations within the watershed (Montanari et al., 1996(Montanari et al., , 1997)).Also, H values of the same order of magnitude were found by Szolgayova et al. (2014) for the rainfall (H = 0.43-0.50)and temperature (H = 0.65-0.72)monthly time series in the upper Danube watershed at Bratislava.

Flood season identification
Figure 3 shows the results of the directional statistics applied to the extreme events in both rivers.In the Po River, we can see a very low regularity (r ≈ 0.1) and high dispersion (4 months) in the annual maximum daily flows (AMD in Fig. 3) due to their possible occurrence in any of the two high-flow seasons, spring and autumn, as depicted in Fig. 2. The seasonality increases to r values close to 0.8 for highflow events that mostly take place in autumn as already reported in previous studies (Zanchettin et al., 2008;Montanari, 2012).
In the Danube, we find a considerable regularity in highflow events (r ≈ 0.8) but a certain decrease in the annual maximum flows (with r values of 0.4).Nevertheless, the 2month dispersion in the date of occurrence is lower than in the Po River and corresponds to the length of the high-flow season reported in Fig. 2. In view of these results we set October-November and May-July as the main flood seasons in the Po and Danube, respectively.
As the pre-flood season, we consider a 1-month period, which is long enough in order to reduce the effect of river regulation.We first set the month preceding the flood season (i.e., September and April for the Po and Danube, respectively) as the pre-flood season.Then, we repeat the analysis by making reference to the previous months, with the expectation that the statistical dependence will decrease as the pre-flood season is moved back into the past.

Estimation of the meta-Gaussian model
Table 3 shows the cross-correlation coefficients ρ(NQ * m ,NQ * p ) and ρ(NQ * m ,NQ * mf ), along with their confidence bands, between the normalized dependent variables (NQ * p and NQ * mf in both study sites) and the explanatory variable (NQ * m ) at each study site.In detail, we assumed that Q * m is given by the monthly mean flow in each of the 9 months preceding the flood season (from September to January in the Po River and from April to August in the antecedent year in the upper Danube).Table 3 shows that the correlation coefficient decreases as the considered pre-flood season moves backwards, as we expected.Besides, we always found noticeably higher coefficients with the mean flow in the flood season (ρ * (NQ * m , NQ * mf )) than with the annual maximum daily flows (ρ * (NQ * m , NQ * p )) in both rivers.For example, a cross-correlation coefficient of 0.24 was obtained between NQ * p and NQ * m in the Po when the pre-flood season considered is September, compared to 0.39 between NQ * mf and the same explanatory variable, NQ * m .Moreover, a continuous decreasing cross-correlation coefficient is found as we move further from the flood season and negative correlation in the Po River appears from May (for the NQ p ) to June (for the NQ mf ) backwards.These negative correlations put in evidence that low flows in the winter season may be related to higher flows in the summer season and therefore higher peak flows in the autumn season.The latter outcome could be explained by a higher storage during the winter months in the form of increased snowpack, which may be related to the frequency and memory properties of temperature and precipitation data.
The only anomalous correlation is found when considering the Q * m in March as the explanatory variable for both dependent variables in the Danube.This month corresponds to both the peak in the snowmelt annual cycle in the catchment (Zampieri et al., 2015) and the steepest rising slope in the hydrograph (Fig. 2).Therefore, the use of monthly mean flows might not be representative given the high variability in the daily flows along this month and the complexity of the processes that are affecting the streamflow (complex contribution from subsurface flow or from the runoff generated from snowmelt/precipitation).
An evaluation was carried out for the meta-Gaussian model by using residual plots (Montanari and Brath, 2004).Figure 4 shows the residuals for a time span of 4 months backwards from the flood season at each study site.The residuals look homoscedastic, thereby confirming that the

Flood frequency distribution updating
In order to decipher the technical benefit that can be gained by updating the flood frequency distribution through the proposed data assimilation procedure, we assumed that aboveaverage river flows are observed in the month preceding the flood season and then applied the meta-Gaussian model to estimate the updated probability distribution.In detail, we assume that, on average, monthly flow corresponding to the 70, 80 and 95 % quantiles is observed in September for the Po River and April for the Danube River.
Figures 5 and 6 show the unconditioned and updated probability density functions (pdfs) of the normalized peak flow (i.e., the peak flow transformed to the canonical Gaussian distribution).As one would expect, the results show that the higher the cross-correlation value, the lower the variability in the distribution of the normalized dependent variable and the higher the mean value.For example, in the Po River for the occurrence of the 95th quantile value in the normalized  mean flow in September, the pdf is centered around a mean value of 0.4 and presents a standard deviation of 0.97 (Fig. 5).In contrast, if one attempts to estimate the probability distribution of NQ * p conditioned to the occurrence of the 95th quantile of the normalized mean flow in July, no noticeable change is found in the estimate with respect to the unconditioned distribution.In fact, the resulting pdf for NQ * p is centered around a mean value of 0.09 with a standard deviation of 0.998.The same behavior is found in the probability distribution of the other dependent variable in its normalized form, NQ * mf , where the higher correlation coefficients (Table 3) determine an even greater displacement with respect to the unconditioned distribution.In fact, the pdf of NQ * mf conditioned to the occurrence of the 95th quantile value in the normalized mean flow in September is centered around a mean value of 0.64 and presents a standard deviation of 0.92 (Fig. 5).
In the upper Danube a similar scheme is found with the mean of the probability distribution of NQ * p and NQ * mf con-ditioned to the occurrence of the 95th quantile of the normalized mean flow in April, displaced to 0.32 and 0.82, respectively (Fig. 6 and Table 3).
Figure 7 shows the comparison between the unconditioned flood frequency distribution and the updated distributions in the untransformed domain when the flow in the previous month (September for the Po River, April for the Danube River) is higher than usual (70, 80 and 95 % quantile).For example, in the Po River, the unconditioned flood for a return period of 200 years, whose results equal 12 507 m 3 s 1 , increases up to 13 790 m 3 s −1 (about 10 % increase) when the mean flow in September corresponds to its 95 % quantile.Similarly, in the upper Danube the unconditioned peak flow for a return period of 200 years, 10 075 m 3 s −1 , increases up to 10 861 m 3 s −1 (about 8 % increase) when the mean flow in April corresponds to its 95 % quantile.The differences show that the average flow during the pre-flood seasons may indeed provide useful indications to update the flood frequency distribution.
After removing from the analysis the observations of the years 1977 for the Po River and 1944 for the Danube River, which are the previous flood season wettest years on record, an emulation of a 1-month-ahead real-time prediction of the probability distribution of the flood flows in the next flood season was developed, along with uncertainty estimation as described in Sect.3.2.Figure 8 shows, for both the Po and Danube rivers, the unconditioned flood frequency distribution along with the updated one; 95 % confidence bands for the latter are also shown.It can be seen that the proposed procedure allows one to obtain an effective update in real-world applications.

Conclusions
The analysis of the observed mean daily flow values suggests the existence of LTP in both study sites with H values above 0.71.Such persistence is exploited to improve streamflow forecasting in the flood season in terms of the mean monthly flow of the pre-flood seasons.To this end, we automatically detect the flood season through directional statistics and we fit a bivariate Gaussian distribution function to model the above dependence; 10 and 8 % increases in the 200-year return period peak flows are found in the Po and Danube, respectively, when the average flows during the previous month correspond to its 95 % quantile.The above results show that the meta-Gaussian model applied to the streamflow records can be used for updating a season in advance the flood frequency distribution estimated for a given river, through a data assimilation approach by using the mean monthly flow of the pre-flood seasons.
The methodology herein proposed can be applied to any other study site once the parameters of the meta-Gaussian model confirm the presence of the above stochastic dependence.Like in any time series analysis method, records that encompass a wide range of meteorological and hydrological conditions should be used to minimize uncertainty, which is in this case related to the estimation of the correlation coefficient and standardization of the regression variables.Finally, other explanatory variables (e.g., rainfall, snowmelt) can be incorporated to profit from additional stochastic dependence among peak flows and the state of the catchment and external forcings.
The findings presented in this paper highlight the fact that river memory has an impact on flood formation and should then be properly considered for real-time management of flood risk mitigation and resilience of societal settings to floods.The procedure herein described can provide useful information in those cases where the memory of the catchment is supposed to persist for a long time.These conditions may occur when the precipitation-runoff transformation is characterized by a slow development.Memory is frequently found to be related to the storage capacity of the catchment and the complexity of the river network.Therefore, they may be indicators of potentially useful results from the proposed approach.

Figure 1 .
Figure 1.Study sites.Danube River basin at Bratislava and Po River basin at Pontelagoscuro.

Figure 2 .
Figure 2. Daily mean value µ Q (m 3 s −1 ) and daily standard deviation σ Q (m 3 s −1 ) of the daily flows in the observation periods: 1920-2009 in the Po at Pontelagoscuro, 1901-2007 in the Danube at Bratislava.

Figure 3 .
Figure 3. Seasonality space representation of the annual maximum daily flows (AMD) and high-flow events.Dots around the global value indicate the dispersion.

Figure 4 .
Figure 4. Residual plot of the linear regression of NQ m on NQ p and NQ mf in the Po River (a) and upper Danube (b).

Figure 5 .
Figure 5. Probability distribution functions of the normalized dependent variables (NQ p and NQ mf ) conditioned to the occurrence of the 70th, 80th and 95th percentiles of the normalized variables in the pre-flood season in the Po River.

Figure 6 .
Figure 6.Probability distribution functions of the normalized dependent variables (NQ p and NQ mf ) conditioned to the occurrence of the 70th, 80th and 95th percentiles of the normalized variables in the pre-flood season in the upper Danube.

Figure 7 .
Figure 7. Peak flows in the flood season (October-November in the Po, May-July in the upper Danube) vs. the return period modeled through the EV1 distribution function.Quantiles refer to mean flows higher than usual in the previous month.

Figure 8 .
Figure 8. Leave-one-out cross validation.Unconditioned EV1 probability distribution of peak flows for the year with the wettest pre-flood season (1977 in the Po, 1944 in the upper Danube) along with conditioned distributions with related 95 % confidence bands.

Table 1 .
Data description of observed time series.Descriptive statistics are given for non-deseasonalized data.
(Stedinger et al., 1993)lize hydrological time series(Montanari, 2005).Being free of any distributional assumption, the NQT allows one to avoid the selection of a suitable parametric model for the distribution of the considered hydrological variable.The NQT involves the following steps when we take Q * m as an example: (1) sorting the sample of Q m (t) from the smallest to the largest observation, Q m 1 , ..., Q m n ; (2) estimating the cumulative frequency FQ m i by using the Weibull plotting position(Stedinger et al., 1993); (3) for each FQ m i the standard normal quantile NQ m i is computed as NQ ; Bogner et Hydrol.Earth Syst.Sci., 21, 3687-3700, 2017 www.hydrol-earth-syst-sci.net/21/3687/2017/

Table 2 .
Estimated H values on deseasonalized data series applying the R/S statistic (R/S), aggregated variance method (AV), and differenced variance method (DV).