Estimation of flood warning runoff thresholds in ungauged 1 basins with asymmetric error functions 2

7 In many real-world flood forecasting systems, the runoff thresholds for activating warnings or 8 mitigation measures correspond to the flow peaks with a given return period (often the 2-year 9 one, that may be associated with the bankfull discharge). At locations where the historical 10 streamflow records are absent or very limited, the threshold can be estimated with regionally11 derived empirical relationships between catchment descriptors and the desired flood quantile. 12 Whatever is the function form, such models are generally parameterised by minimising the 13 mean square error, that assigns equal importance to overprediction or underprediction errors. 14 Considering that the consequences of an overestimated warning threshold (leading to the risk 15 of missing alarms) generally have a much lower level of acceptance than those of an 16 underestimated threshold (leading to the issuance of false alarms), the present work proposes 17 to parameterise the regression model through an asymmetric error function, that penalises 18 more the overpredictions. 19 The estimates by models (feedforward neural networks) with increasing degree of asymmetry 20 are compared with those of a traditional, symmetrically-trained network, in a rigorous cross21 validation experiment referred to a database of catchments covering the Italian country. The 22 analysis shows that the use of the asymmetric error function can substantially reduce the 23 number and extent of overestimation errors, if compared to the use of the traditional square 24 errors. Of course such reduction is at the expense of increasing underestimation errors, but the 25 overall accurateness is still acceptable and the results illustrate the potential value of choosing 26 an asymmetric error function when the consequences of missed alarms are more severe than 27 those of false alarms. 28


Introduction 1
In the operation of flood forecasting systems, it is necessary to determine the values of 2 threshold runoff that trigger the issuance of flood watches and warnings. Such critical values 3 might be used for threshold-based flood alert based on real-time data measurements along the 4 rivers (WMO, 2011) or for identifying in advance, through a rainfall-runoff modelling chain, 5 the rainfall quantities that will lead to surpass such streamflow levels, as in the Flash Flood 6 Guidance Systems framework (Carpenter et al., 1999;Ntelekos et al., 2006;Reed et al., 2007;7 Norbiato et al., 2009). 8 A runoff threshold should correspond to a 'flooding flow', that is to a value that may produce 9 flood damages, and it is very difficult to determine on a regional or national scale: it may be 10 defined as a flow that just exceeds bankfull conditions, but in practice, both in gauged and in 11 ungauged river sections, such conditions are arduous to quantify due to the lack of local 12 information (Reed et al., 2007;Hapuarachchi et al., 2011). 13 In absence of more sophisticated physically-based approaches, based on detailed information 14 of each specific cross-section that are rarely available due to limited field surveys, the introduces a system articulated on at least two levels of flow thresholds, many Regions have 1 identified the alert levels as flood quantiles with return periods of 2, 5 or 10 years (e.g. the 2 Abruzzo, Lombardia, Puglia Regions). In the South of France, the AIGA flood warning 3 system compares real-time peak discharge estimated along the river network (on the basis of 4 rainfall field estimates and forecasts) to flood frequency estimates of given return periods 5 (with three categories: yellow for values ranging from the 2-year to the 10-year flood, orange 6 for between the 10 and the 50-year flood, and red for peaks exceeding the 50-year flood) in 7 order to provide warnings to the national and regional flood forecasting offices (Javelle et al. , 8 2014). 9 For river sections where the streamflow gauges are newly installed or where historical rating 10 curves are not available, the observations of the annual maxima are absent or very limited and 11 it is not possible obtaining a reliable estimate of flood quantiles on the basis of statistical 12 analyses of series of observed flood peak discharges. 13 For these ungauged or poorly gaged basins, the peak flow of given frequency to be associated 14 with the watch/warning threshold can be estimated transferring information from data-rich 15 sites to data-poor ones, as it is done in the corpus of methodologies applied in RFFA 16 (Regional Flood Frequency Analysis) at ungauged sites, that have always received 17 considerable attention in the hydrologic literature (Bloeschl et al., 2013). Among the possible 18 error will always be present, it is better underpredicting rather than overpredicting the 1 threshold estimate, for safety reasons. 2 To obtain a conservative estimate of the thresholds, penalising more the predictions that 3 exceed the "observed" values (in the present case represented by the quantile estimate based 4 on the statistical analysis of observed flow peaks) than those that underestimate them, in the 5 present work it is proposed, for the first time to the Author's knowledge, a parameterisation 6 algorithm that weights asymmetrically the positive or negative errors, in order to decrease the 7 consistency of overestimation and therefore the risk of missing a flooding occurrence. 8 It is important to underline that the proposed asymmetric error function is here applied for 9 optimising a neural network model for predicting the 2-year return period flood (due to its 10 association with the bankfull conditions) but it might be used to improve any other kind of 11 methodology for the estimate of flood warning thresholds associated to any return period. 12 Section 2 presents the asymmetric error functions; the next one describes the information 13 available in a database covering the entire Italian country and the identification of the subsets 14 to be used for a rigorous cross-validation approach. Section 4 presents the implementation of 15 the models for estimating the 2-year return period flood in ungauged catchments, consisting in 16 artificial neural networks calibrated using respectively the symmetric square error and the 17 asymmetric error functions. The results are presented and then discussed in section 5 and 18 section 6 concludes. 19

The asymmetric error function 20
The scientific literature on forecasting applications, in any scientific area, adopts almost 21 exclusively an objective function based on the sum or mean of the squared discrepancies, that 22 is a symmetric quadratic function, due to the well-established good statistical properties of the 23 minimum mean square error estimator. 24 On the other hand, in economics as well as in engineering and other many fields, there are 25 cases where the forecasting problem is inherently non-symmetric and, in the financial 26 forecasting literature, the use of mean squared error, even if still widely applied, is nowadays 27 not always accepted. 28 Error (or loss) functions devised to keep into account an asymmetric behaviour have been 29 proposed, such as the linear-exponential, the double linear and the double quadratic where 1() is a unit indicator, equal to one when  > 0 and zero otherwise; p is a positive 8 integer that amplifies the larger errors (corresponding to a quadratic error when equal to 2) 9 and α(0,1) is a parameter representing the degree of asymmetry. 10 For α < 0.5 the function penalises more the overestimation errors (>0), while for α > 0. 5 11 more weight is given to negative forecast errors (under-predictions); for α = 0.5 the loss 12 weights symmetrically positive and negative errors. 13 When p = 2 and α ≠ 0.5, the error becomes the asymmetric double quadratic (Quad-Quad) 14 loss function (see Christoffersen and Diebold 1996), that is used in the present work for a fair 15 comparison with the traditional mean square error estimator. When p = 2 and α = 0.5, Eq. In the water engineering field, the asymmetric Elliot error function with quadratic 21 amplification (p = 2) has been recently applied to parameterise a model for estimating the 22 expected maximum scour at bridge piers, in order to obtain safer design predictions through 23 the reduction of underestimation errors by Toth (2015). 24 It should be noted that the proposed methodology is a deterministic one, where an optimal 25 point forecast is obtained by minimizing the conditional expectation of the future loss; such 26 framework has not the pros of a probabilistic one in terms of quantification of the 27 uncertainties of the prediction, but it aims at identifying the optimal value for the threshold in 28 terms of operational utility. 29 In Section 4, the asymmetric quadratic error function is proposed for optimizing the 1 applied, obtaining a set of derived uncorrelated variables. The PC variables are as many as the 23 original variables, but they are ordered in such a way that the first component has the greatest 24 variability, the second accounts for the second largest amount of variance in the data and is 25 uncorrelated with the first and so forth. In the present data set, the first three principal 26 components explain more than three quarters of the total variance (see Di Prinzio et al., 2011) 27 and such three first PCs are here chosen as input variables to the models described in the 28 following, assuming that they may adequately represent, in a parsimonious manner, the main 29 features of the study catchments.
The data base, in addition to the morpho-pluviometric information, includes the annual 1 maxima flow records for periods ranging from 5 to 63 years, whose median values, 2 corresponding to the 2-year return period, represent the output variable to be simulated by the 3 models. Even the shortest records (and actually only 9 of the locations have less than 8 years 4 of data) should be sufficient for such a short return period, for example according to the 5 classical guideline by Cunnane (1987), that suggests not to extrapolate statistical inference 6 beyond a return period of 2 times the sample length. 7 The data set covers a great diverseness of hydrological, physiographic and climatic properties 8 and in order to partially reduce such heterogeneity, it was decided to limit the analysis to 9 catchments having a 2-year flood included in the range 10-1000 m 3 /s, that is 267 over the 10 original 296 basins. 11

Identification of balanced cross-validation subsets with SOM clustering of 12 input data 13
As will be detailed in Section 4, the database is to be divided in three disjoint subsets (called 14 training, cross-validation and test sets) in order to allow a rigorous independent validation and 15 also to increase the generalization abilities of the model when encountering records different 16 from those used in the calibration (or 'training') phase, following an 'early stopping' 17 parameterisation procedure. carried out here in order to identify a pooling group of similar catchments for developing a 30 region-specific model, but for the optimal division of the available data for the parameterization and independent testing of a single model to be applied over the entire study 1

area. 2
The SOM is in fact used to cluster similar data records together: an equal number of data 3 records is then sampled from each cluster, ensuring that records from each class (that is 4 catchments with different features) are represented in the training, validation and test sets, A SOM (Kohonen, 1997) organizes input data through non-linear techniques depending on 7 their similarity. It is formed by two layers: the input layer contains one node (neuron) for each 8 variable in the data set. The output-layer nodes are connected to every input through 9 adjustable weights, whose values are identified with an iterative training procedure. The 10 relation is of the competitive type, matching each input vector with only one neuron in the 11 output layer, through the comparison of the presented input pattern with each of the SOM 12 neuron weight vectors, on the basis of a distance measure (here the Euclidean one). In the 13 trained (calibrated) SOM, all input vectors that activate the same output node belong to the 14 same class. 15 In the present application, the dimension of the input layer is equal to three (that is, the first 16 three principal components of the catchments descriptors); as far as the output layer is 17 concerned, there is not a predefined number of classes and, given the small dimension of the 18 input variables, it was here chosen a parsimonious output layer formed by three nodes in a 19 row, each one corresponding to a class. 20 The three resulting clusters are formed respectively by 121, 70 and 76 catchments; each 21 cluster is then divided into three parts, and one third is assigned to the training, validation and 22 test sets respectively. Overall, the training, validation and test sets are therefore equally 23 numerous (91, 88 and 88 records respectively) and formed by the same proportion of 24 catchments belonging to each of the clusters, having eventually a similar information content, 25 as shown by the similar statistics of the three variables in the three sets represented in Figure 4 Development of symmetric and asymmetric artificial neural networks 1 models for estimating the 2-year return period flows at ungauged sites 2

Feedforward Artificial Neural Networks 3
Artificial neural networks are massively parallel and distributed information processing 4 systems, composed by nodes, arranged in layers, that are able to infer a non-linear input-5 output relationship. ANN, and in particular feedforward networks have been widely used in 6 many hydrological applications (see for example the recent review papers by Maier et al. The identification of the network's weights and biases (called training procedure) is carried 18 out with a non-linear optimization, searching the minimum of an error (or learning) function 19 measuring the discrepancy between predicted and observed values, and feedforward networks 20 are generally trained with a learning algorithm known as BackPropagation (Rumelhart et al. 21 1994) based on steepest descent or on more efficient quasi-newton methods. 22 In order to avoid overfitting, that degrades the generalisation ability of the model, the Early 23 Stopping or Optimal Stopping procedure was applied (see, for example, Coulibaly et al., 24 2000). For applying Early Stopping, the available data have been divided into three disjoint 25 subsets with a similar information content, as described in Section 3.2: a training set, an early-26 stopping validation set and a test set. While the network is parameterised minimising the error 27 function on the training set, the error function on the early-stopping validation set is also 28 monitored; if the error function on such second set increases continuously for a specific 29 number of iterations, this is a sign of overfitting of the training set: the training is then stopped not used in any way during the parameterization phase, but it is used for out-of-sample, 1 independent evaluation of the resulting models. In the present work, the results obtained by a network trained with a 'conventional' square 8 error function are compared with those obtained when parameterising the network through the 9 minimisation of an asymmetric loss function, that takes into account both over and 10 underestimation discrepancies but penalizes more the overprediction errors, since the 11 consequences of missing alarms are more severe than those of false alarms. The input variables are the first three principal components of the catchment descriptors, so 24 the input layer is formed by three nodes; the output node corresponds to the estimated flood 25 with 2-year return period; as far as the dimension of the hidden layer is concerned, there is, 26 unfortunately, no definitive established methodology for its determination, because the 27 optimal network architecture is highly problem-dependent: different architectures with a 28 number of hidden nodes varying from 2 to 6 were set up and the mean squared error of the 29 estimates issued for the third, independent set resulted the lowest with the hidden layer 30 formed by 3 nodes. 31 The architecture with three input nodes, three hidden nodes and 1 output node, represented in 1 Figure 3, is therefore the network finally chosen; the network parameterized minimising the 2 symmetric mean square error function will be denoted as ANN-Symm, and its results will be 3 in Section 5 compared with those of the asymmetric models having the same architecture but 4 a different error function. 5

Implementation of asymmetric models with varying degree of asymmetry 6
The Quad-Quad loss function described in Section 2 is here applied for calibrating the 7 network parameters of the asymmetric models. The learning function to be minimized is 8 therefore the average value of the double quadratic errors (Mean Quad-Quad Error, MQQE), 9 obtainable averaging the M (number of records in the set) errors given by Eq. (1) when p=2: 10 The value of α, corresponding to the degree of asymmetry of the loss function, cannot be 12 fixed a priori, since such choice should be based on a location-specific cost-benefit analysis, 13 keeping into account the avoidable losses (that is the direct and indirect losses, provided they 14 may be quantifiable, that may be prevented by mitigation actions following an alarm issue)

Goodness-of-fit measures and plots 2
As described in section 4.2, the neural networks are trained over the standardized (rescaled) 3 output values of the training and cross-validation sets and they are successively used for 4 predicting the output over the independent test set: such ANN output values are then scaled 5 back, obtaining the predictions Q 2 , p. 6 The performances of the models are evaluated through a set of indexes that describe the 7 prediction error, , that is the difference between the de-standardised predictions, Q 2,p issued 8 by the models (as a function of morpho-climatic attributes only) and the 'observed' 2-year 9 flood values (the median of historical annual maxima), Q 2,o , on the third set (test set), formed 10 by N=91 catchments distributed all over the country, whose data have not been used in any 11 capacity in the models' development. 12 The following error statistics have been computed: 13

MAE (mean absolute error) 14
Such metric shows the general tendency of the model to overestimate (or to underestimate, 2 since 100-Over% represents, conversely, the proportion of underpreditions), but these 3 indexes do not distinguish among errors of different magnitude, since they count also 4 predictions that may be only barely above (or below) the targets, that is very good predictions, 5 with minimum errors. 6 It is therefore computed also the number of the 'high' overestimation errors, keeping into 7 account only the more relevant, and therefore potentially more dangerous, overpredictions. It 8 was here considered as 'high overprediction' an estimate that is more than 30% higher than 9 the corresponding target value: 10

OverH% (percentage of high overprediction errors)
The more conservative is the threshold estimate, the lower is the value of OverH%. 13 On the other hand, even if -as discussedgenerally less crucial in terms of consequences, 14 also the number of high underestimation errors should be monitored, since excessively low 15 values imply the tendency of the model to establish thresholds leading to the issuance of too 16 many false alarms. 17

UnderH% (percentage of high underprediction errors): 18
In addition to the goodness-of-fit measures (reported in Table 2), the boxplot of the errors 20 (predicted minus observed quantiles) is shown in Figure 4: the bottoms and tops of the 21 rectangular boxes are respectively the lower and the upper quartiles, the horizontal segment 22 inside the box is the median and the whiskers represent the 5th and 95th percentiles. 23 The results may be evaluated also through the scatterplots of predicted (y-axis) vs observed 1 (x-axis) quantiles, presented in Figure 5 that show every prediction Q 2,p in respect to the 2 corresponding 'observation' Q 2,o . 3 4

Discussion of the results 5
The boxplot (Fig. 4) allows to visually assess both the accuracy and the tendency to 6 over/underestimate of the models: the boxes should be compact and close to the dotted line 7 representing zero error but at the same time it is better if the data lie below such line, thus 8 indicating that the method do not tend to overpredict the thresholds and the warning system is 9 therefore less subject to miss a potentially dangerous flood. symmetric distribution of the overall errors is shown by an Over% close to 50% and the 20 similar values of the OverH% (34%) and UnderH% (32%) confirm that also the high relative 21 errors are equally split among over and underestimates. 22 Such results were expected since the training is based on a symmetric loss function, but the 23 consequence is that the ANN-Symm model issues a remarkable number of significant 24 overprediction errors, in fact for about one third of the test catchments the estimates are more 25 than 30% higher than the observations. 26 The analysis of Table 2 shows that the asymmetrically trained networks tend, for decreasing α 27 values, to reduce the number of overestimations (positive errors). For the overall errors this is 28 shown by the different proportion of over/underestimations, that moves from a value that corresponds, approximately, to a balance, to a much more skewed distribution of negative vs 1 positive errors, with Over% decreasing up to 31%. 2 At the same time, and more importantly, the number of positive (overestimation) errors larger 3 than 30% substantially decreases with α, with OverH% reaching a value that is much lower 4 than that of the ANN-Symm model when α arrives at 0.1 (18% vs 34%). 5 Conversely, as expected, the more asymmetric is the network, the higher are the 6 underprediction errors, as shown by the values of UnderH%: the number of significant 7 negative errors gradually increases from one third up to 47% of the total. 8 Also the accuracy (given by the total amount of the discrepancies independently of their sign) 9 deteriorates when the asymmetry is more pronounced, but the drop is moderate and the 10 RMSE and MAE values are not so far from those of the ANN-Symm network. 11 Looking at the parallel boxplots (Fig. 4), it may be seen that with increasing asymmetry the 12 boxes become less compact and, as expected, their position shifts downwards. The length of 13 the upper whiskers substantially decrease with α but the length of the lower whiskers does not 14 increase at the same rate, thus compensating for the fact that the boxes are taller for the more 15 asymmetric models. It follows that the global distances from the 5% to the 95% percentiles 16 It may be noted, in particular from the scatterplots (Fig. 5), that, for both symmetric and 23 asymmetric models, the errors are not negligible: this is due to the shortcomings of the 24 available data set but mainly to the intrinsic limitations of a regional approach applied to the 25 extreme variability of the study area. As already underlined in Section 3.1, the national data 26 set lacks important information that may help to characterise the hydrological behaviour and 27 the phenomena governing formation of extreme flows. In addition to the unavoidable risk of 28 erroneous data, the absence in the database of additional influences certainly further hampers 29 the possibility to obtain a reliable relationship with the flood quantiles. Most importantly, the climatic settings (from Alpine to Mediterranean ones) and this high heterogeneity is certainly 1 an additional reason that limits the performance. 2 Notwithstanding the limitations of the dataset, that affect equally all the proposed models, the 3 results demonstrate that the use of the double quadratic error function, even if at the expense 4 of more substantial underestimation errors, can substantially decrease the number and extent 5 of overestimation errors, if compared to the use of the traditional square errors. 6 In the application to a specific cross section, the degree of asymmetry might be identified as 7 proportional to the "risk averseness" of the situation: the more the impact of false alarms is, 8 comparatively, small, the more the decision-makers are reluctant to the consequences 9 (economic and social) of a flood and, rather than risking a missed alarm, can accept many 10 cases of false alarm with the associated costs. Whatever is the function form, such models are generally parameterised by minimising the 29 mean square error, that assigns equal weight to overprediction or underprediction errors, 30 whereas, instead, the consequences of such errors are extremely different when the estimates are to be used as warning threshold. In fact, false alarms (due to an underprediction of the 1 warning threshold) generally have a much higher level of acceptance than misses (that would 2 derive from an overestimated threshold). 3 For this reason, in the present work, the regression model (a feed-forward neural network) is 4 parameterised minimising an asymmetric error function (of the double quadratic type), that 5 penalizes more the overestimation than the underestimation discrepancies. The predictions of 6 models with increasing degree of asymmetry are compared with those of a traditional (trained 7 on the symmetric mean of square errors) neural network, in a rigorous cross-validation 8 experiment referred to a database of catchments covering all the Italian country. 9 The results confirm, as expected, that the more asymmetric is the network, the more 10 numerous and higher are the underprediction errors, and the less numerous and less severe are 11 the overestimation errors. As also expectable, the symmetric accuracy decreases when the 12 asymmetry is more pronounced, but the drop is moderate and the RMSE and MAE values are 13 not so far from those of the traditionally trained network. 14 Undoubtedly, the nature of the regional approach, as well as the shortcomings of the dataset 15 and the extreme heterogeneity of the study area, generate errors much greater than those 16 obtainable with detailed local studies. On the other hand, where no alternatives exist, the 17 proposed methodology may provide a preliminary estimate of the threshold runoff that do not 18 overestimate the actual flooding flow. 19 Notwithstanding the acknowledged limitations of the dataset, that affect equally all the 20 proposed models, the analysis shows that the use of the asymmetric error function 21 substantially reduces the number and extent of overestimation errors, if compared to the use 22 of the traditional square errors. Of course such reduction is at the expense of increasing 23 underestimation errors, but the overall precision is still acceptable and the study highlights the 24 potential benefit of choosing an asymmetric error function when the consequences of missed 25 alarms are more severe than those of false alarms. 26 Minimising the asymmetric error function has the purpose of optimizing the threshold from 27 an operational point of view, in a deterministic framework: future analyses may be devoted to 28 investigate the uncertainty of the issued predictions, since a probabilistic approach (provided 29 that the methodology is able to include all sources of uncertainty and its quality may be 30 objectively assessed) may provide very valuable insights for a more complete evaluation of It is important to highlight that the asymmetric error function is used, in this study, to 1 parameterise a neural network, but of course it might be used to optimize any other model or 2 equation, when aiming at obtaining conservative estimates, for safety reasons. 3 The appropriate degree of asymmetry might be identified depending on the risk-averseness of 4 the specific flood-prone context. The quantification of risk aversion is extremely difficult and  Tables  1   2   Table 1. Geomorphological and climatic descriptors of the CUBIST database of Italian 3 catchments 4   Training set Cross-Val set Test set PC3 Figure 3. Architecture of the chosen network, with three input nodes, three hidden nodes and 1 output node.  Figure 5. Scatterplots of the predicted (y-axis) vs observed (x-axis) 2-year floods estimates on the independent test set of catchments, for the symmetric and asymmetric models.