In recent years, copula multivariate functions were used to model, probabilistically, the most important variables of flood events: discharge peak, flood volume and duration. However, in most of the cases, the sampling uncertainty, from which small-sized samples suffer, is neglected. In this paper, considering a real reservoir controlled by a dam as a case study, we apply a structure-based approach to estimate the probability of reaching specific reservoir levels, taking into account the key components of an event (flood peak, volume, hydrograph shape) and of the reservoir (rating curve, volume–water depth relation). Additionally, we improve information about the peaks from historical data and reports through a Bayesian framework, allowing the incorporation of supplementary knowledge from different sources and its associated error. As it is seen here, the extra information can result in a very different inferred parameter set and consequently this is reflected as a strong variability of the reservoir level, associated with a given return period. Most importantly, the sampling uncertainty is accounted for in both cases (single-site and multi-site with historical information scenarios), and Monte Carlo confidence intervals for the maximum water level are calculated. It is shown that water levels of specific return periods in a lot of cases overlap, thus making risk assessment, without providing confidence intervals, deceiving.

In the relatively recent literature, there is a wide application of the copula
functions to model the natural variability of hydrometeorological variables,
ranging from rainfall

An important application of this multivariate analysis is the determination
of the risk of failure of a hydraulic structure.

In the same conceptual framework,

Copulas are functions that combine marginal distributions with the joint cumulative distribution; therefore, the latter is only indirectly affected by the choice of the marginals. So, the practical problem of identification and estimation of the joint distribution is handled from two non-interwinding aspects: the dependence structure of the set of variables and the marginal distributions.

In the majority of the studies, the communication of the sampling
uncertainty – an integral component in a univariate framework – is overlooked
in a multivariate case.

In order to account for the sampling uncertainty of multivariate cases, where
a variable of interest can be expressed as a function of one or more
variables,

The sampling uncertainty in a joint peak-volume event was quantified by

The data expansion can be temporal, spatial and causal

In this research, we validate a methodology of flood risk assessment in a real case study, where risk is expressed in terms of probability of exceeding a given reservoir level in an online flood mitigation dam. We consider this level as a function of flood peak, volume and hydrograph shape and, consequently, multivariate modelling is implemented with the use of copulas. The characteristics of the reservoir – also a function of the level – are synthesised in the rating curve and the volume-level curve. The main scope is to integrate the associated sampling uncertainty and to build confidence intervals for each water level through Monte Carlo simulations. Furthermore, we incorporate additional information on the peaks in a Bayesian framework and we examine its effect on the distributions, their confidence intervals, as well as the ones of the reservoir-level frequency curve.

Panaro study watershed.

We have focused our interest on the Panaro catchment – an important influent
of the Po river in northern Italy. In particular, the watershed under
investigation is composed of the Panaro river itself, the Scoltenna and the
Leo tributaries with an outlet upstream of the Panaro dam (Fig.

The influence of snowfall is negligible due to the modest land elevation and
the majority of rainfall events occurring seasonally (September–April). The
average precipitation height ranges between 700 and 2000 mm yr

The basin's permeability is low, and therefore the runoff is influenced little by water infiltration. In fact, the study basin consists mostly of sandstones and silicatic alternating sequences (44 % of total area) and marls and clay (34 %).

The Panaro dam is a concrete gravity dam (150 m in length), located
near the city of Modena and constructed for flood mitigation purposes. The
hydraulic system consists of two reservoirs, a principal on the river course
and a secondary at the right river bank, and a series of levees that enclose
them. The crest of the principal levees is at 44.85 m a.s.l. The
reservoirs can hold in total 23.66 hm

The available flood data included a 52-year discharge series (1936–1943, 1945
and 1946 were missing) with an hourly time interval from the Bomporto station
located downstream near the current location of the dam. The hydrological
characteristics of the study basin are briefly presented in Table

Additional data included the annual peaks of the missing years from the same
station

In a report about natural disaster risks in the city of Modena

In order to rescale the flood information from subcatchments and from the
downstream station, depending on the area, the following scale function was used:

Thomas Bayes' theorem expresses how an individual's degree of belief can
change after the presence of new evidence. Bayes' theorem can be formulated as

Main hydrological characteristics of the Panaro watershed (area

In a Bayesian framework, the model's parameters are handled as stochastic
variables in order to incorporate the uncertainty of their values

The Bayesian inference was conducted in R with the package LaplacesDemon

Copulas are functions that describe the dependence structure between
variables independently of the choice of marginal distributions. The joint
distribution functions and the marginals are linked by Sklar's theorem

Copulas provide a powerful tool for the statistical modelling of multivariate
data: for a theoretical introduction, see

The application of copula functions has facilitated overcoming some
inadequacies of traditional multivariate distributions such as that the
marginals must derive from the same distribution family and their parameters
may define the dependence structure between the variables

The degree of relation between pairs of variables was examined by measures of
association. These include Kendall's

In the absence of a long sample, the copulas that fit the data can be
numerous and goodness-of-fit tests cannot distinguish between them

The existence of tail dependence between peak and volume was also implied by
some historical evidence. Many significant events in Italy occur when a
frontal perturbation, generated by the cold high masses coming from the North
Atlantic Ocean or the Arctic Ocean, meets Mediterranean southward warm
fronts. Depending on the persistence of the south and north currents, the
generated front begins to develop, covering a large area (e.g. 10

Unfortunately, tail dependence estimators such as the ones of

Regarding the choice of the marginal distributions, we preferred distributions that were more parsimonious, thus reducing the additional statistical uncertainty introduced by an extra parameter, following the logic of Occam's razor, and this provided a nice visual fit. The differences between the corrected Akaike information criterion (AIC), Bayesian information criterion (BIC) and Akaike-weighted values were not sufficient to make a safe distinction between the models. The peaks were modelled with the inverse Gaussian distribution (two parameters instead of three of the generalised extreme value (GEV) distribution) and their corresponding volumes with the one-parameter Rayleigh. It is, however, imperative to note that there was no clear indication of overall performance superiority of the chosen distributions.

The parameters of the inferred distributions (copula and marginals) are
presented in Table

Estimated parameters of the inferred distributions and their confidence interval (95 %).

The shape of the “design hydrograph” is often considered an important factor
in the design procedure and is related to the spatial and temporal rain
distribution as well as the basin's shape and behaviour

Consequently, the normalised peak equals 1 at time 1. All normalised
hydrographs were extended to a common duration (for comparison purposes) and
cluster analysis with the Ward method and Euclidean distances was implemented

In order to account for sampling uncertainty and to estimate the confidence
intervals, the following Monte Carlo procedure was implemented, originally
proposed by

Estimate the parameter

Simulate

Calculate the copula parameter

Simulate

Transform the samples from the unit interval to discharge and volume
using the estimated marginal parameters. Generate

Build the confidence intervals of the reservoir level frequency curves.

The confidence intervals of the peaks' marginal distribution parameters have been estimated in a Bayesian framework, as stated previously, in order to incorporate the additional knowledge and to account for the scaling uncertainty.

The parameter uncertainty of additional distributions that fit the data could be introduced in the procedure, leading to larger confidence intervals. However, in this case, only the parameter uncertainty of the inferred models was of interest.

Characteristic normalised hydrograph shapes

Frequency curves of maximum water level of synthetic hydrographs for four clusters and one cluster and corresponding levels of observed hydrographs.

Flood frequency curves with 95 % confidence limits for the single-site data and the multi-site data with the extra information case. Observed peaks are also plotted with the Gringorten plotting position.

Flood volume curve with 95 % confidence limits. Observed flood volumes are also plotted with the Gringorten plotting position.

Initially, we have clustered the hydrograph shapes into four characteristic
groups. After simulating 10 000 peak-volume pairs from the inferred
distributions, we assigned to each one a specific hydrograph shape
(respecting their frequency of occurrence). Then, we denormalised and routed
the hydrographs; we repeated the same procedure but after clustering into
only one group, thus considering a global mean hydrograph. The
characteristic shapes are depicted in Fig.

We have implemented the Bayesian framework on the peaks extracted from the
systematic discharge series recorded at the Bomporto station, adding also the
uncertainty of the scaling exponent of the regionalisation relation. In the
second scenario, we also included recently recorded annual peaks from other
hydrometric stations of the same basin, mentioned previously, as well as
information from flood reports, while integrating the uncertainty of the
scaling exponent. As it can be seen in Fig.

The 95 % confidence interval of both of the peak distributions can be wide
(Fig.

Similarly, in the case of the flood volume, the 95 % confidence interval for
a univariate return period of 50 years can span from 95.5 to 120 hm

The confidence intervals of the parameters of the inferred distributions are
presented in Table

The results of the increased peaks are reflected also on the frequency curve
of the maximum water level (MWL). The return period here corresponds to a
water level, so it is considered structure based, since the level is a
function of the structural and operational characteristics of the dam, among
others. As it can be seen (Fig.

MWL frequency curves with 95 % confidence limits for the single-site data and the multi-site data with the extra information case.

High-density region box plots of MWL for return periods of 10, 20 and
50 years for

The 50th, 75th and 95th percentiles of the kernel density areas of MWLs with a return
period of 50 years on the discharge–volume plane for single-site data

In Fig.

The span of the highest density regions slightly decreases as more
information is introduced. However, this decrease in uncertainty seems small
and we cannot come to a conclusion, whereas the extra information has contributed to a
systematic uncertainty reduction. An increase in the simulation size could
lead to a slightly different picture; however, the added computational burden
is prohibitive and anyhow, as

Within a certain return period, the parameter uncertainty can lead to substantial MWL variations, e.g. for 20 years (in the case of extra information), the span of the MWL with a density of 50 and 95 % is of 0.8 and 2.3 m, respectively, which correspond to huge volume differences. These can have devastating effects not only in the case of overtopping but also because the remaining water can cause bank failure due to piping. These spans could increase for larger return periods, where the uncertainty is bound to get vaster.

As the results of previous studies suggested

In Fig.

This analysis focuses on the uncertainty introduced when calculating the probability of exceeding specific water levels in a flood control reservoir, which is a result of the parameter uncertainty of the marginals of the hydrological variables, as well as the copula multivariate function, due to the small size that characterises, in most cases, a hydrological sample. Therefore, we attempted to quantify this uncertainty, without aiming our attention to copula/marginals inference. Instead, we studied the effect of additional flood information not only on the distribution parameters but also on the uncertainty range in a Bayesian framework that, among others, permits the consideration of errors from different sources.

The extra flood data that included additional peaks from different hydrometric stations led to a peak distribution with bigger mean and smaller shape parameters and thus to elevated peaks, since the data include flood events of recent years that exceed the events of the historical data series in magnitude. Consequently, including the additional information translates into a general bigger estimate of the peaks, which is also reflected on the MWLs, as the peak is a driving factor of the routing process.

The uncertainty range of discharge and volume is considerable and affects, along with the copula parameter, the MWL. The variations in the MWL for the same structure-based return period correspond to significant variation in the stored water volume. Most importantly, the return period of a specific water level cannot be determined with certainty because the return periods of the events overlap. Naturally, the range of discharge and volume values for a given structure-based return period is very ample due to the wide range of the parameters of the inferred distributions.

A clear observation of whether uncertainty is systematically reduced with the
introduction of additional information cannot be made here. Nonetheless, a
Bayesian framework allows a certain degree of transparency

As a general remark, one can deduce that the process of risk estimation is inherently crippled by uncertainty that can be quantified or at least approximated. Any attempt to obscure this uncertainty could create a false notion about its existence in a multivariate problem with eventual implications in dam safety.

The data set used in this research is available upon request from the corresponding author.

The authors declare that they have no conflict of interest.

We would like to thank F. Serinaldi and an anonymous referee for their
instructive comments and A. Bárdossy for his handling of the manuscript. We would also like to thank Stefano Orlandini for
providing us with all the information regarding the Panaro dam. For the
routing of the hydrographs, we used the model developed by

The analysis was implemented in R Statistical Computing Software