Technical note: Design flood under hydrological uncertainty

Planning and verification of hydraulic infrastructures require a design estimate of hydrologic variables, usually provided by frequency analysis, and neglecting hydrologic uncertainty. However, when hydrologic uncertainty is accounted for, the design flood value for a specific return period is no longer a unique value, but is represented by a distribution of values. As a consequence, the design flood is no longer univocally defined, making the design process undetermined. The Uncertainty Compliant Design Flood Estimation (UNCODE) procedure is a novel approach that, starting from a range of possible design flood estimates obtained in uncertain conditions, converges to a single design value. This is obtained through a cost–benefit criterion with additional constraints that is numerically solved in a simulation framework. This paper contributes to promoting a practical use of the UNCODE procedure without resorting to numerical computation. A modified procedure is proposed by using a correction coefficient that modifies the standard (i.e., uncertaintyfree) design value on the basis of sample length and return period only. The procedure is robust and parsimonious, as it does not require additional parameters with respect to the traditional uncertainty-free analysis. Simple equations to compute the correction term are provided for a number of probability distributions commonly used to represent the flood frequency curve. The UNCODE procedure, when coupled with this simple correction factor, provides a robust way to manage the hydrologic uncertainty and to go beyond the use of traditional safety factors. With all the other parameters being equal, an increase in the sample length reduces the correction factor, and thus the construction costs, while still keeping the same safety level.

Abstract. Planning and verification of hydraulic infrastructures require a design estimate of hydrologic variables, usually provided by frequency analysis, and neglecting hydrologic uncertainty. However, when hydrologic uncertainty is accounted for, the design flood value for a specific return period is no longer a unique value, but is represented by a distribution of values. As a consequence, the design flood is no longer univocally defined, making the design process undetermined.
The Uncertainty Compliant Design Flood Estimation (UNCODE) procedure is a novel approach that, starting from a range of possible design flood estimates obtained in uncertain conditions, converges to a single design value. This is obtained through a cost-benefit criterion with additional constraints that is numerically solved in a simulation framework. This paper contributes to promoting a practical use of the UNCODE procedure without resorting to numerical computation. A modified procedure is proposed by using a correction coefficient that modifies the standard (i.e., uncertaintyfree) design value on the basis of sample length and return period only. The procedure is robust and parsimonious, as it does not require additional parameters with respect to the traditional uncertainty-free analysis.
Simple equations to compute the correction term are provided for a number of probability distributions commonly used to represent the flood frequency curve. The UNCODE procedure, when coupled with this simple correction factor, provides a robust way to manage the hydrologic uncertainty and to go beyond the use of traditional safety factors. With all the other parameters being equal, an increase in the sample length reduces the correction factor, and thus the construction costs, while still keeping the same safety level.

Introduction
The flood frequency curve is commonly used to derive the design flood as the quantile Q T corresponding to a fixed return period T . For practical reasons, Q T is commonly expressed only as a single value; however, Q T can only be expressed in this way if its frequency distribution and its parameters are known perfectly. In practice, one can only estimate the frequency distribution and its parameters using a sample of observed data, thereby inflating the uncertainty in the estimate of Q T . However, the design of a hydraulic infrastructure requires a single design value to be selected. A gap therefore exists between theory and practice. Quantitative methods to measure the uncertainty associated with the quantiles of the flood frequency curve (e.g., through their variance or probability distribution) have been proposed (e.g., Cameron et al., 2000;De Michele and Rosso, 2001;Brath et al., 2006;Blazkova and Beven, 2009;Laio et al., 2011;Liang et al., 2012;Viglione et al., 2013), but very few suggestions are provided about how to extract a single design value from the probability distribution of possible design values. Botto et al. (2014), with the development of the Uncertainty Compliant Design Flood Estimation (UNCODE) procedure, have shown that it is possible to select meaningful flood quantiles from their distribution by considering an additional constraint based on a cost-benefit criterion. Hence, the output is a unique design flood value Q * T . Before illustrating the UNCODE approach, it is worth recalling the working principles of the cost-benefit analysis, which is a core element of the procedure. Cost-benefit analysis can be used to estimate the design flood as the flow value which minimizes the total expected cost function, defined as the sum of the actual cost to build a flood protection infrastructure (cost function) and the expected damages caused by a flood event. An illustrative example of this approach is reported in Fig. 1a. The cost function is rather easy to understand, being an increasing function of the design flood. Instead, the expected damage function needs to be computed point-bypoint: for any single tentative design flood value (see the inset in Fig. 1a) it equals the integral of the product of the probability density function (pdf) of the flood flow values and a specific damage function. The latter indicates the damage occurring when the flood exceeds the flow value used to design the infrastructure. The damage function depends on a number of parameters such as the exposure and vulnerability of the flooded goods, the flooding dynamics and the topography, to mention a few. For these reasons the damage function turns out to be very site-specific and often unavailable, due to the lack of information needed to compute it (Menoni et al., 2016); in these cases the cost-benefit method is inapplicable.
To face this problem Botto et al. (2014) made the assumption that costs and damages can be represented by linear functions, with slope c and d, respectively, as illustrated in Fig. 1b. Given this assumption, the total cost, C TOT , can be computed as where Q * is the generic design flood value and p (Q| ) is the probability density function of the flood flow with parameters . The optimal design flood of the (uncertainty-free) cost-benefit framework can then be calculated as the value that minimizes Eq. (1). Examples of cost-benefit analysis in the hydrologic/hydraulic context can be found in the literature (Bao et al., 1987;Ganoulis, 2003;Jonkman et al., 2004;Tung, 2005), with only a few of them accounting for uncertainty (Al-Futaisi and Stedinger, 1999;Su and Tung, 2013). Botto et al. (2014) further demonstrated that the optimal design flood obtained from the cost-benefit analysis with linear cost and damage functions is equivalent to the design flood Q T obtained from the standard frequency analysis, provided that uncertainty is not accounted for and the ratio between d and c equals the return period T . This result can be shown by setting to 0 the derivative of C TOT with respect to Q * , in order to find the minimum of Eq. (1); this leads to the where P (·) is the cumulative distribution function of the flood values and T is the return period. This is valid provided that the probability distribution used in the cost-benefit framework is the same used in the standard frequency analysis.
The UNCODE approach is founded on the joint use of the cost-benefit approach of Eq. (1) and the constraint derived in Eq.
(2). The rationale behind this approach is that it is possible to apply the cost-benefit framework with standard, but meaningful, cost and damage functions. This is particularly convenient because the cost-benefit framework can be easily extended to include the estimation uncertainty inherent in the limited sample length of hydrological records. Consequently, the UNCODE framework (which is a particular case of cost-benefit analysis) can also be extended to account for this kind of uncertainty. In uncertain conditions, the parameters of the flood frequency distribution, , become a random vector; hence, the uncertainty can be included in the cost benefit analysis by compounding C TOT over all the possible values of . In mathematical terms, the cost-benefit framework with uncertainty is summarized by the equation where h ( ) is the joint pdf of the parameters of the flood frequency curve. Equation (3) represents the full UNCODE model, which adopts linear cost and damage functions and accounts for uncertainty in a cost-benefit framework. It is worth noting that, as a consequence of the inherent equivalence of Eq. (2), there are no additional parameters in the cost-benefit framework; in fact, c and d are related through the known value of the return period T . The remaining free parameter can be shown to affect only the magnitude of the integral in Eq. (3) but not the position of its minimum, thus avoiding the need for further parameters in the UN-CODE framework with respect to the standard design flood procedure.
To simplify the UNCODE application, which requires the use of numerical computation of Q * T , we provide here an approximated yet reliable method to estimate Q * T starting from Q T . Other than a useful practical tool for design purposes, the analysis reported in this note also provides a method to quantify the "value" of newly available hydrological information or the effect of data scarcity on Q * T due to uncertainty.

Practical estimation of the UNCODE design flood
The UNCODE design flood, Q * T , results in a systematically larger value than its corresponding standard value Q T , as shown by Botto et al. (2014). Moreover, the relative difference between the two values, has been reported to increase with the return period (as the quantile uncertainty increases) as well as, for fixed T , with the standard deviation of the probability distribution of Q T (i.e., with the uncertainty of Q T ). We propose calculating the approximated estimate of the UNCODE design flood, hereafter referred to asQ * T , directly by inversion of Eq. (4), without resorting to the numerical solution of Eq. (3). This solution reads aŝ where the correction factorŷ (i.e., the approximated estimator of y) needs to be computed separately. Given this background, we propose modelingŷ according to the equation where T is the return period and n is the sample length which can be considered as a proxy of the standard deviation of Q T ; n can be computed from at-site records or as an equivalent sample length from the regional estimate of Q T . The coefficients a 0 , a 1 and a 2 depend on the probability distribution adopted in the frequency analysis. They have been evaluated from an extensive simulation study in which the full UNCODE procedure has been systematically applied to many simulated records, created by combining the following criteria.
1. The parent distribution P is selected among the most common distributions used in flood frequency: lognormal (LN3), generalized extreme value (GEV), generalized logistic (GLO), Pearson type III (PE3) and log Pearson type III (LP3). For details on the probability distribution equation and on the relationship between parameters and L-moments, the reader is referred to Hosking and Wallis (1997). The LP3 corresponds to the PE3 with log-transformed values.
We generated 100 records for each combination of P and n. Looking at the properties of the L-moments, 90 % of the synthetic records fall within the ranges 0.28 ≤ L-CV ≤ 0.40, 0.14 ≤ L-skewness ≤ 0.40 and 0.07 ≤ L-kurtosis ≤ 0.32, which correspond well to values typically encountered in real-world applications. The standard design flood Q T as well as the (exact) UNCODE estimator Q * T have been computed for each record of the simulated dataset. This step has been performed by adopting a suitable fitting distribution F to the whole synthetic dataset. To make the results more Table 1. Coefficients to be used to estimateŷ based on the sample length n and the return period T (Eq. 6) and corresponding regression diagnostics, for different three-parameter fitting distributions (LN3: log-normal; GEV: generalized extreme value; GLO: generalized logistic; PE3: Pearson type III; LP3: log Pearson type III). The LP3 corresponds to the PE3 with log-transformed variate. general, F has been selected from the list LN3, GEV, GLO, PE3, LP3. Note that any F is used to fit records from any parent P, as in real cases the exact parent distribution is not known a priori. In this way, the error due to the misspecification of the fitting distribution is included in the results. The correction factor y (Eq. 4) has been computed for all the available records in the simulated dataset and for different return periods T (respectively, equal to 50, 100, 200, 500 and 1000 years). It depends on the fitting distribution F adopted in the frequency analysis. Finally, the exact y values have been regressed against n and T to obtain their estimatê y (using an ordinary least squares linear regression on the log-transformed terms of Eq. 6). Different forms of Eq. (6) have also been tested, but are not reported as they provide less accurate results. Coefficients a 0 , a 1 and a 2 are reported in Table 1 for different fitting distributions commonly used in hydrological practice to compute the design flood (in fact, the fitting distribution is always known, while the parent is not). It can be noticed that, when increasing the sample length n, the difference between Q * T and Q T is reduced, due to the negative value of the coefficient a 1 . Table 1 also reports some diagnostics of the regressions used to estimate the coefficients. The global performance of the regressions has been evaluated using the coefficient of determination and residuals analysis (through the mean absolute error, MAE, and root mean squared error, RMSE) for each fitting distribution. The value of the coefficient of determination ranges from 0.96 in case of the PE3, and 0.94 for the LN3, to 0.85 for the GEV and GLO. The MAE and the RMSE take values around 0.02, corresponding to a 2 % variation in the design flood estimation, which is negligible in many situations. In general, the PE3 probability distribution results in the best performance in terms of residuals analysis and R 2 adj , as can be appreciated by looking at the results reported in Table 1.
The reliability of the approximated correction factorŷ estimated with the regression model has also been evaluated by comparing theQ * T value obtained through Eqs. (5) and (6) Table 1) with at least 30 years of data. Different return periods are listed in the legend. The reference distribution used for this flood frequency analysis is the three-parameter log-normal (LN3) in (a) and the generalized extreme value (GEV) in (b). procedure (Eq. 3). As a reference, time series listed in Botto et al. (2014, Table 1) with at least 30 years of record length have been analyzed, assuming the LN3 and the GEV as possible fitting distributions and different return periods. Results show a very good agreement between the exact (Q * T ) and approximated (Q * T ) UNCODE design flood values, as reported in Fig. 2, where each panel shows the estimates for all series and all the return periods.
A synthesis of the obtained results is shown in Fig. 3, where the values ofŷ have been reported for the studied distributions, based on a set of typical sample length and return period values. As mentioned, a direct comparison of the results between different distributions is not possible, but it is relevant to observe that for all the distributionsŷ evolves in the same way for varying n and T values. In general, the correction factor does not exceed 10 % of the standard value Q T for intermediate return periods (e.g., T = 200 years) even for small samples, although a significative variability is associated with the distribution type. It is around 10 % for T = 500 years with sample length values (n = 50) commonly available at many gauged stations. On the other hand, the sample length plays an important role: for example, considering T = 500 years, the GEV distribution and, varying the sample size, the reduction of the y value is about 0.075 between n = 30 and n = 50, and to 0.040 between n = 50 and n = 70.

Discussion of the application conditions
The UNCODE approach to flood frequency analysis provides a solution to quantify the design flood estimate when considering the uncertainty of the distribution quantile; however, application of the full UNCODE procedure may be cumbersome and computationally demanding for the practitioner. An approximate but reliable framework has been proposed here to allow easy computation of the UNCODE design flood value from the standard value using a correction factor,ŷ. The extensive simulation analysis at the base of this study shows that the coefficients relating the UNCODE valueQ * T to the traditionally computed value Q T are distributiondependent. For the most used distributions in flood frequency analysis, they have been computed and provided. The choice of the distribution and the quantification of its associated uncertainty is a problem of model selection; hence, it cannot be solved by the UNCODE procedure, but depends on the methods of standard flood frequency analysis.
The obtained results demonstrate that an increase in the length of relatively short samples has a noticeable impact in terms of reduction ofŷ that results in a reduction of the UNCODE estimateQ * T . This implies that, while the infrastructure keeps the same safety level (or, equivalently, is designed with the same return period), and with all other parameters being equal, additional data reduce uncertainty and consequently the construction costs. The UNCODE design value is indeed reduced with respect to the UNCODE estimate computed with less data. Consequently, the coefficientŷ can be considered a measure of the value of data. The mentioned results agree with findings recently obtained by Ganora and Laio (2016) in a study on the relative role of regional and at-site flood frequency modeling approaches, where the value of at-site data has been highlighted and regarded as a reliable way to improve regional predictions, even with short records. Under this perspective, the correction factor can be used as a metric for uncertainty comparison and quantification, thus providing a further tool to combine different modeling approaches, similarly to the applications of Kjeldsen and Jones (2007) and Ganora et al. (2013), who, with different methodologies, have exploited measures of hydrologic uncertainty to merge regional and at-site information. Finally, the correction factor is a new and easy-toimplement design tool which provides a quantitative way to determine the design flood value accounting for hydrologic uncertainty while keeping the same design hazard level considered in standard uncertainty-free analyses. This is a novel approach when compared to common engineering practice, which accounts for hydrologic uncertainty by considering, for instance, the hydraulic freeboard. The use of the freeboard is equivalent to increasing the design flood value, but without accounting for the size of the system (e.g., the basin area), or for the hydrologic information available at the section (i.e., observed of the equivalent record length used to compute the standard design flood); therefore, this approach is not tailored to the specific case study. The correction factor represents an advance with respect to the use of "allencompassing" safety factors and towards a clearer way to manage the different sources of uncertainty in hydrological and hydraulic design.
Data availability. The work is based on simulated data. The results can be reproduced by randomly generating datasets as described in the text of this paper.
Competing interests. The authors declare that they have no conflict of interest.