Moving beyond the cost–loss ratio: economic assessment of streamflow forecasts for a risk-averse decision maker

A large effort has been made over the past 10 years to promote the operational use of probabilistic or ensemble streamflow forecasts. Numerous studies have shown that ensemble forecasts are of higher quality than deterministic ones. Many studies also conclude that decisions based on ensemble rather than deterministic forecasts lead to better decisions in the context of flood mitigation. Hence, it is believed that ensemble forecasts possess a greater economic and social value for both decision makers and the general population. However, the vast majority of, if not all, existing hydro-economic studies rely on a cost–loss ratio framework that assumes a risk-neutral decision maker. To overcome this important flaw, this study borrows from economics and evaluates the economic value of early warning flood systems using the well-known Constant Absolute Risk Aversion (CARA) utility function, which explicitly accounts for the level of risk aversion of the decision maker. This new framework allows for the full exploitation of the information related to a forecasts’ uncertainty, making it especially suited for the economic assessment of ensemble or probabilistic forecasts. Rather than comparing deterministic and ensemble forecasts, this study focuses on comparing different types of ensemble forecasts. There are multiple ways of assessing and representing forecast uncertainty. Consequently, there exist many different means of building an ensemble forecasting system for future streamflow. One such possibility is to dress deterministic forecasts using the statistics of past error forecasts. Such dressing methods are popular among operational agencies because of their simplicity and intuitiveness. Another approach is the use of ensemble meteorological forecasts for precipitation and temperature, which are then provided as inputs to one or many hydrological model(s). In this study, three concurrent ensemble streamflow forecasting systems are compared: simple statistically dressed deterministic forecasts, forecasts based on meteorological ensembles, and a variant of the latter that also includes an estimation of state variable uncertainty. This comparison takes place for the Montmorency River, a small flood-prone watershed in southern central Quebec, Canada. The assessment of forecasts is performed for lead times of 1 to 5 days, both in terms of forecasts’ quality (relative to the corresponding record of observations) and in terms of economic value, using the new proposed framework based on the CARA utility function. It is found that the economic value of a forecast for a risk-averse decision maker is closely linked to the forecast reliability in predicting the upper tail of the streamflow distribution. Hence, post-processing forecasts to avoid overforecasting could help improve both the quality and the value of forecasts.

agencies because of their simplicity and intuitiveness.Another approach is the use of ensemble meteorological forecasts for precipitation and temperature, which are then provided as inputs to one or many hydrological model (s).In this study, three concurrent ensemble streamflow forecasting systems are compared: simple statistically dressed deterministic forecasts, forecasts based on meteorological ensembles, and a variant of the latter that also includes an estimation of state variable uncertainty.This comparison takes place for the Montmorency River, a small flood-prone watershed in southern central Quebec, Canada.The assessment of forecasts is performed for lead times of 1 to 5 days, both in terms of forecasts' quality (relative to the corresponding record of observations) and in terms of economic value, using the new proposed framework based on the CARA utility function.It is found that the economic value of a forecast for a risk-averse decision maker is closely linked to the forecast reliability in predicting the upper tail of the streamflow distribution.Hence, post-processing forecasts to avoid overforecasting could help improve both the quality and the value of forecasts.

Introduction
More than 15 years after its advocation by (Krzysztofowicz, 2001) and more than a decade after the creation of the Hydrologic Ensemble Prediction EXperiment (HEPEX) commu-nity (Franz and Ajami, 2005;Schaake et al., 2007), the case for probabilistic forecasting in hydrology has been accepted by many researchers and practitioners across the world: uncertainty assessment of hydrological forecasts conveys important information for decision makers and therefore should be quantified and be considered as part of the forecast (e.g.Ramos et al., 2013;Sordo-Ward et al., 2016).(Beven, 2016) distinguishes aleatory uncertainty, which originates from data only and possesses stationary statistical characteristics, from various types of epistemic uncertainties.Epistemic uncertainties can arise from a lack of knowledge regarding the system's dynamics, from a lack of knowledge regarding the relevant forcings for the modelling process and also from disinformation in the data.More broadly speaking, as discussed in (Juston et al., 2013), uncertainty in hydrological forecasting mainly originates from data and models (atmospheric and hydrologic).The most important sources of uncertainty in short-term hydrological forecasting are structural uncertainty (choice of a particular hydrological model structure), state variable uncertainty and parameter uncertainty, which are both linked to the availability and quality of hydro-meteorological data, and meteorological forecast uncertainty.The latter gains in importance gradually as the forecasting horizon increases.
However, there exist multiple sources of uncertainty in hydrological processes and there also exist many means of assessing those uncertainties and building an ensemble that conveys the associated information.It is possible, for instance, to produce streamflow ensemble forecasts from meteorological ensemble forecasts used as inputs to at least one previously calibrated hydrological model.Deterministic forecasts can also be "dressed" using past error statistics.
While there is a general agreement among the global scientific community that ensemble and probabilistic forecasts are superior to deterministic ones (e.g.Jaun et al., 2008;Velazquez et al., 2010;He et al., 2013, and many others), there remains no consensus regarding the best means of obtaining an ensemble of streamflow forecasts (i.e.constructing the ensemble).There has also been an increased interest over the last few years in regards to assessing the economic value of forecasts.The quality of a forecasting system can be assessed by comparing forecasts for different lead times with corresponding observations.Forecast quality can be further decomposed into different attributes (e.g.resolution, sharpness, discrimination) that can be weighted differently depending on specific applications.Forecast values also depend on the specific applications.In particular, the usefulness of a forecast is inherently linked to the decision makers' ability to adapt their behaviour to the information provided.Neither assessment of forecast quality or of value is straightforward and sometimes the relationship between the two is not obvious either.
In the case of hydropower production, forecast values can be assessed using sophisticated decision-making models based on stochastic dynamic programming in an operational research framework (e.g.Boucher et al., 2012;Carpentier et al., 2013;Côte and Leconte, 2016).Early flood warning is another very important application for streamflow forecasts and a decision problem entirely different from the optimisation of hydropower production.Hydrologists most often, if not always, assess the value of streamflow forecasts for early flood warning using the cost-loss framework (e.g.Murphy, 1977;Richardson, 2000;Roulin, 2007;Verkade and Werner, 2011), which does not account for the decision maker's risk aversion, i.e. the fact that, given the opportunity, a decision maker would be willing to spend money (or resources) to reduce the amount of uncertainty they face.This is discussed formally in Sect. 2 below.
This study considers the evaluation of the economic value of early warning flood systems, from the point of view of the decision maker, with explicit consideration of risk aversion.This alternative framework is based on the use of the von Neumann and Morgenstern (vNM) utility function (von Neumann and Morgenstern, 1944), which is widely used in economics but rarely in hydrology. 1To the best of our knowledge, our study represents the first attempt at accounting for risk aversion in the assessment of the economic value of streamflow forecasts for early flood warning.This new framework is used to assess the economic value of three concurrent streamflow ensemble forecasting systems in a case study for the Montmorency River, a flood-prone watershed in southern central Quebec, Canada.Five-day statistically dressed deterministic forecasts for this watershed have been issued operationally since 2008 by the Direction de l'Expertise Hydrique (DEH), a Quebec provincial agency.These forecasts are used for early flood warning and emergency response by the civil security bureau of Quebec City.
In Sect.2, some concerns regarding the cost-loss ratio are raised and an alternative framework is presented.Section 3 describes the context of the case study, namely the specifics of the Montmorency River watershed, the current flood forecasting system based on dressed deterministic forecasts as well as the early flood warning mechanism in place.Two variants of a concurrent flood forecasting system are detailed in Sect.3.3.The economic model is presented in Sect. 4. Performance assessment metrics, both in terms of forecast quality compared to observations and in terms of economic value, are presented in Sect. 5. Results are presented in Sect.6 and discussed in Sect.7. Conclusions are drawn in Sect.8 along with suggestions for future improvement of the proposed economic model.

The economic model and the limits of the cost-loss ratio
The cost-loss ratio decision model (Murphy, 1977;Katz and Murphy, 1997;Richardson, 2000) is a simplified framework used in numerous hydro-meteorological studies to assess the economic value of forecasts (Roulin, 2007;Abaza et al., 2014;Verkade and Werner, 2011, among many others).As pointed out by (Zhu et al., 2002), this approach is only the simplest one out of a much larger range of options.More importantly, a classical cost-loss ratio decision model disregards the role of risk aversion (e.g.Shorr, 1966;Cerdá Tena and Quiroga Gómez, 2008)."Risk aversion" refers to an attribute of a decision maker who would be willing to pay a certain amount of money to remove any risk associated with a decision problem.The specific amount of money he or she is willing to pay for this is initially unknown and can be seen as an indirect measure of the magnitude of this aversion.
As discussed by (Cerdá Tena and Quiroga Gómez, 2008), risk aversion is very common, and most decision makers are risk-averse when the stakes are high.In their paper, they illustrate how disregarding risk aversion can sometimes lead to misleading conclusions regarding the value of information (such as meteorological or hydrological forecasts).Their framework also involves the Constant Absolute Risk Aversion utility function (see Sect. 2).However, the context of their application and the rest of their economic model are different from ours.
In a simple cost-loss ratio, the decision model follows a contingency table that allows for binary decisions, with known associated costs.When applied to ensemble forecasts, decision-making according to the cost-loss ratio framework is based solely on a probability threshold associated with the material consequences of the event of interest (e.g. a flood event), regardless of the ensemble spread (uncertainty).Appendix A illustrates a technical presentation that builds on the concepts presented in this section.Including the concept of risk aversion in the decision model is not only more realistic, but also allows for weighting of the ensemble members differently, depending on the level of risk aversion.For instance, a risk-averse decision maker will give more importance to the forecast members in the upper tail of the predictive distribution (i.e.highest streamflow values).
In economics, "utility" is an ordinal notion that reflects the decision maker's preferences over a set of possible outcomes.Preferred outcomes lead to greater utility values.In the context of random outcomes, the most popular class of utility functions is the vNM utility function, as introduced in (von Neumann and Morgenstern, 1944).(Fishburn, 1989) provides a retrospective on von Neumann and Morgenstern theory.He enlightens the remarkable impact this theory had on the subsequent development of economic theories and also clarifies some of its limits.There exists a immense amount of literature regarding the application of vNM utility theory in many different fields.For in-stance, (Pope and Just, 1991) compare different types of utility functions to represent preferences of farmers for potato acreage.Although we could not find previous work in hydrology where risk aversion is considered in the assessment of the economic value of forecasts, (Krzysztofowicz, 1986) and (Merz et al., 2009) acknowledge its importance.(Shorr, 1966) attempts a reconciliation of the cost-loss ratio framework with utility theory in the simple context of crop protection.
The interested reader is referred to Chapter 6 in (Mas-Colell et al., 1995) for more details as well as the axiomatic foundations of vNM utility functions. 2he vNM utility function of a decision maker regarding a real-valued random outcome c (e.g.money) is given by where m = 1, . .., M are the different "states of the world", p m is the probability of state m, and c m is the realisation of the random outcome c in state m.The function µ( q ) is assumed to be non-decreasing.
The set of states of the world represents the set of realisations of c for which the decision maker has preferences.For instance, in (Cerdá Tena and Quiroga Gómez, 2008), there are only two possible states of the world: "adverse weather" and "non adverse weather". 3In the case of flood forecasting systems, even if the streamflow values are continuous, in practice the decision maker may only distinguish between a finite set of implied damages.This point is discussed further in Sect.4.2, where a finite number of "damage categories" are specified.
The curvature of the function µ( q ) reflects the decision maker's preference regarding uncertainty.If µ( q ) is concave, the decision maker is risk-averse; if it is linear, the decision maker is risk-neutral; if it is convex, the decision maker is risk-seeking.To see why, consider the random variable c and its expected value c.4 Since c is not risky, a risk-averse decision maker should prefer to receive c with certainty than to receive a random draw from c.That is, U ( c) > U ( c), or µ( c) > M m=1 p m µ(c m ), which is the definition of concavity.Note that we can also define C > 0, the amount of money that the decision maker would be willing to spend to remove μ(c 2 ) Figure 1.A schematic representation of the CARA utility function for risk-averse individuals.Here, only two states of the world are assumed.The state c 1 is realised with probability α and c 2 is realised with complementary probability.Since µ is concave, we see that the expected utility U = αµ(c 1 ) + (1 − α)µ(c 2 ) is smaller than the utility of the expected value µ(αc 1 + (1 − α)c 2 ).In other words, the individual would prefer to receive the certain amount αc 1 + (1 − α)c 2 than to receive a lottery which pays c 1 with probability α and c 2 with probability 1 − α.Equivalently, the individual would be willing to pay up to C > 0 to remove the risk associated with this lottery, where C is such that µ(αc the risk associated with c, as follows: This argument extends directly to any change in risk: any risk-averse decision maker prefers less risky distributions, in the sense of mean-preserving second-order stochastic dominance (Rothschild and Stiglitz, 1970).Figure 1 also presents a graphical version of the above discussion when there are only two states of nature.
This study focuses on a well-known parametric family for µ( q ) known as the Constant Absolute Risk Aversion (CARA) function, given by Eq. ( 3) (e.g.Gollier, 2004;Mas-Colell et al., 1995): where A is the risk aversion of the decision maker.A is strictly positive for risk-averse individuals and strictly negative for risk-seeking individuals.For positive values, the level of risk aversion increases when A increases.The parametric form in Eq. ( 3) implies that the level of risk aversion is independent of the decision maker's financial capacities (hence the name Constant Absolute Risk Aversion, CARA).This particular utility function is therefore coherent  with the expected behaviour of most public utility services (municipal authorities will not, for instance, gradually adopt a risk-seeking behaviour regarding the protection of citizens if the city's financial well-being improves).See Appendix B for additional details, proofs, and references for those claims.
The economic model developed above is applied to the particular context of frequent flooding on the Montmorency watershed.This context is described in greater detail in the next section.

Floods on the Montmorency watershed
Located in southern Québec, Canada, the Montmorency River watershed covers 1150 km 2 , most of which is densely forested.Approximately 30 000 people reside in the basin, concentrated in its southernmost portion.The northern portion of the watershed lies within the Laurentian Wildlife Reserve, where heavy snowfall precipitation is common.Figure 2 presents the average monthly values for meteorological variables for this watershed.
Crystalline rock of the Canadian Shield covers most of the watershed, where the retreat of glaciers left till of an average thickness of 1 m.The southernmost part is covered in sandy sediments from the Champlain Sea. Figure 3 shows the geographical location of the watershed as well as the location of the available meteorological stations and streamflow gauges (see Sect. 3.3).
The Montmorency River experiences quasi-annual ice jams during spring melt, which often enhance the magni- tude and frequency of floods within vulnerable inhabited areas.The response time of the watershed is rapid (12 h).The return period of damaging floods is also short.This makes emergency evacuation and flood damage a common occurrence for riverside residents.Table 1 shows return periods and corresponding streamflow values for the Montmorency River (Leclerc and Secretan, 2012).The table also provides thresholds for streamflow values used for flood mitigation operations (see Sect. 3.2.2).Note that these are given for open-water levels, and take neither ice jams nor the increase in water level due to the presence of ice blocks into account.
The behaviour and consequences of ice jams along the Montmorency River have been the focus of previous studies, such as forecasting river ice breakup (Turcotte and Morse, 2015).Risk analysis and technical solutions (Leclerc et al., 2001) have also been studied, but as of yet have not been implemented.
The river experienced its worst recorded event in November 1966, when a heavy rain system melted a late autumn snow cover, resulting in a 1100 m 3 s −1 flow peak.More recently, an ice cover breakup followed by the formation of an ice jam further downstream in January 2008 forced the evacuation of 80 households and damaged four houses.In March 2012, an early spring thaw caused by extreme temperatures induced a flood, resulting in the evacuation of 25 households.Then, in April 2014, an ice jam breakup caused a massive ice-carrying flood wave that, occurring during a typical normal spring freshet, quickly raised waters to a semi-centennial level.In addition, the topography in the area causes certain regions to become entirely isolated and surrounded by water during flooding.The greatest concern of public authorities occurs when people refuse to evacuate, especially in these flood-prone areas.

The HYDROTEL hydrological model
HYDROTEL (Fortin et al., 1995) is a spatially distributed, physics-based model developed and maintained by the Institut National de Recherche Scientifique (INRS).It is used operationally by the DEH, and has been implemented in the Montmorency River watershed since 2008 (Rousseau et al., 2008).The model accepts gridded inputs (precipitation, snow cover, temperature) that can be interpolated using a three-station average or the Thiessen method.Physical features of the catchment (topography, soil type, hydrographic network) are processed by a companion software program called PHYSITEL.It divides the watershed into smaller spatial units called RHHU (relatively homogeneous hydrological units).Each of the RHHU is then assumed to possess homogeneous physical properties.The model for the Montmorency catchment includes 366 RHHU.HYDROTEL then performs the computation of vertical and horizontal water flows.
HYDROTEL offers a range of sub-routines for hydrological processes (interpolation of precipitation, evapotranspiration, snow accumulation and melt, etc.).The user chooses the most appropriate sub-routines depending on the available data.For this study, interpolation of observed precipitation was performed using Thiessen's polygons.No radiation data were available, so evapotranspiration was estimated from an empirical temperature-based method (Fortin, 2000;Bisson and Roberge, 1983) and snowmelt was modelled by a mixed degree-day/energy budget approach.The vertical water budget was performed by the BV3C (in French, Bilan Vertical en 3 Couches) sub-routine that divides the soil into three layers of different composition and depths.Overland and channel routing was performed using the kinematic wave approach (Lighthill and Whitham, 1955).With this set-up, which replicates the model set-up used operationally by the DEH, HY-DROTEL has 27 parameters, but only 10 were calibrated (default values were used for the other parameters).The calibration already performed by the DEH was kept intact.This calibration was performed using the Shuffle Complex Evolution algorithm of the University of Arizona (SCE-UA, Duan et al., 1994).The objective function to maximise was the Nash-Sutcliffe efficiency criterion.In forecasting mode, HYDRO-TEL is driven by meteorological forecasts, either deterministic or ensemble-based.
In the actual operational setting, data assimilation is performed manually and indirectly: the forecaster modifies precipitation and/or temperature observed during the previous days until the model's simulation is in agreement with the observed streamflow for the actual day.When the model is run with the modified meteorological inputs, state variables are re-computed and should translate into an improvement in the model's description of the hydrological state of the watershed.The choice of applying modifications to temperature or to precipitation depends mostly on the period of the year and the associated dominant hydrological process.Thus, during spring freshet, air temperature is the main forcing that acts on the snowmelt rate.Solar radiation is not among HY-DROTEL's inputs, but is rather estimated empirically, in part through air temperature.Therefore, during this period of the year (early March to late May), perturbations are applied to temperature forcing.During the summer and early autumn periods, precipitation forcing is the dominant factor for controlling runoff, soil moisture and eventually streamflow.Perturbations are applied primarily to precipitation from approximately June to November.

Flood alerts
The Direction de l'Expertise Hydrique (DEH) is an administrative unit of the Government of Québec created in 2001 with the mandate to manage the water regime of Québec's rivers and provide streamflow forecasts to municipalities.Since 2008, operational 5-day, 3 h time step streamflow forecasts have been distributed to municipal water managers.Those forecasts are always obtained using the HY-DROTEL semi-distributed physics-based hydrologic model (Fortin et al., 1995).Although HYDROTEL is a deterministic model, the operational forecasts now largely distributed by the DEH are not purely deterministic, but are rather accompanied by a 50 % confidence interval.This confidence interval is computed from a statistical model derived from the analysis of past deterministic streamflow forecast errors for 10 watersheds across the province of Québec.A more detailed description of this statistical method is available in Huard (2013).
After receiving a forecast exceeding a pre-determined flood threshold, municipalities can choose to engage in emergency procedures.In the case of the Montmorency watershed, current measures are mostly reactive (road closure, controlled evacuation of citizens, providing emergency shel-ters and food) rather than preventive (artificial levees, culverts, etc.; Leclerc et al., 2001).
Flood thresholds have been adapted from a hydrodynamic study (Leclerc and Secretan, 2012).Threshold numbers have been conservatively rounded down to compensate for the worsening effect of ice in the channel.Table 1 includes operational threshold levels for the most vulnerable residential area.

Meteorological ensemble forecasts
The alternative forecasting framework proposed in this study involves meteorological ensemble forecasts passed on to HY-DROTEL.Precipitation and temperature ensemble forecasts from the Meteorological Service of Canada (MSC) covering the 2011-2014 period are used.For practical reasons, those forecasts were obtained from the THORPEX5 Interactive Great Grand Ensemble (TIGGE, Park et al., 2008) database managed by the European Centre for Medium Range Weather Forecasts (ECMWF).The forecasting horizon is 5 days, with a 6 h time step.The MSC meteorological ensemble forecasts comprise 20 members.The initial spatial grid of 0.6 • was downscaled to a 0.1 • grid through simple bi-linear interpolation during data retrieval.Observations for precipitation and temperature are measured at five ground stations distributed around the watershed (see Fig. 3, Climate Quebec, personal communication, 2015).Hourly measured data were accumulated and averaged over a 3 h time step.Snow survey data interpolated on a 0.1 • grid are also available.They were provided for this study by the DEH.The streamflow gauging station at the river out-let provides measurements at a 15 min interval, corrected for backwater due to ice cover and then averaged over 3 h time steps (DEH, 2016).

Data assimilation and state variable uncertainty
Appropriate data assimilation is crucial for short-term flood forecasting as it allows the model to begin the forecasting period with the best possible estimate for initial conditions.In a study involving 20 catchments in Quebec, (Thiboult et al., 2016) showed that the uncertainty for initial conditions dominates the other sources of uncertainty for short-term (1 day to 3 days ahead) streamflow forecasts.Those catchments vary in size and other physical characteristics, but they are all subject to similar meteorological conditions, which are also shared by the Montmorency catchment.However, the Montmorency catchment has a smaller area than any of the 20 watersheds in (Thiboult et al., 2016) and has a shorter response time.Consequently, the uncertainty in the initial condition is expected to dominate for less than 1 day.
In this study, manual data assimilation was performed according to the guidelines by (Mamono, 2010) and agrees with the procedure followed by operational forecasters at the DEH.This assimilation process relies on the assumptions that (1) model errors are entirely compensated for by the model calibration process, (2) streamflow measurements are error-free, and (3) the only remaining source of error affecting state variables is attributable to meteorological inputs (Mamono, 2010).Additive coefficients were applied to temperature inputs, while multiplicative coefficients were applied to precipitation inputs in order to improve the agreement between simulated and observed streamflow series.Those perturbations were respectively bounded at [−10, 10] and [0.1, 10].Although those minimal and maximal perturbation values are very large, they truly correspond to the rules applied by the DEH operationally.Of course, the goal is to limit perturbations as much as possible.In this study, the multiplicative coefficient applied to precipitation varied between 0.5 and 2.5.Most additive coefficients for temperature varied between −3 and +2.5, with occasional larger coefficients (up to −7 and +7, on three occasions).Those perturbations of meteorological inputs were applied uniformly to the basin for fixed periods of time.
The manual data assimilation described above only improves on the "best guess" of the state variables for each time step.To go one step further, additional perturbations were applied around this best guess estimate in order to account for the uncertainty in initial conditions.To do so, a rudimentary version of a sequential updating scheme, namely the ensemble Kalman filter (EnKF, Evensen, 2003), was implemented.From the starting point -constituted by manually assimilated precipitation, temperature and streamflow simulation series -random noise is further applied to precipitation and temperature inputs.Additive perturbations are drawn randomly from U (−8, 8) • for temperature.For precipitation, both mul-tiplicative (U (0.5, 1.5)) and additive (U (0, 0.5) mm) perturbations are drawn.The inclusion of additive perturbations for precipitation is due to the fact that strong under-captation is suspected for this catchment.Output uncertainty is modelled by a normal distribution centered on observed streamflow with a standard deviation taken as 10 % of the observed streamflow.In this study, data assimilation is a necessity rather than a choice and is not at all the primary objective.For this reason, the limits of the above-mentioned distributions were not optimised as in Thiboult and Anctil (2015).Those limits were fixed according to the guidelines in (Mamono, 2010) and (Abaza et al., 2015) and the experience gained during manual data assimilation.Further refinements of the EnKF model are outside the scope of this study.
The Kalman gain K is then computed sequentially following (Mandel, 2006): where M t is the model error covariance matrix computed according to the perturbations defined above and O t is the covariance of observation noise also computed according to the perturbations drawn from the normal distribution described above.The matrix H relates the state vectors and observations (the so-called "observation model").It can be demonstrated through matrix algebra that Eq. ( 4) amounts to computing the derivative of the analysis error and setting it equal to zero.
Once the Kalman gain is computed, it is used to weight the credibility of the model error z t − H X − relative to the a priori estimation of state variables X − according to Eq. ( 5).This leads to the updated model states, X + .
The next section adapts the general framework presented in Sect. 2 to the specifics of the Montmorency watershed.

Parametrisation of the economic model
The preferences of a decision maker with risk-averse preferences represented by a CARA utility function can be represented as follows: Strictly speaking, the streamflow value associated with category m Q m has a probability of occurrence p m , and corresponds to a given damage d(Q m ).In this study, the damage curve is broken down into 12 categories (i.e.m = 1, . .., 12).This choice of 12 categories is based on a previous hydraulic study by (Leclerc and Secretan, 2012) to establish inundation S. Matte et al.: Moving beyond the cost-loss ratio maps.They produced 11 maps, for streamflow varying from 550 to 1050 m 3 s −1 with an increment of 50 m 3 s −1 .This increment of 50 m 3 s −1 is adopted here, but all thresholds were reduced to be in agreement with streamflow values that induced inundations (see also the operational thresholds mentioned in Table 1).The first category represents all of the "no flood" category (i.e.below the lowest threshold).
Then, Q m represents the streamflow associated with the mth category and p m becomes the probability associated with this category, inferred from the number of members that fall within it.Given s, the amount of money spent (w days ahead; see Sect.4.3 below) on flood emergency measures, the resulting gain (or benefit) in terms of damage reduction is given by b d(Q m ), s, w .
While Q m and p m are derived directly from the ensemble forecast, d, s and b d(Q m ), s, w must be calibrated from other sources of information related to actual operation and decision history.This can be a challenge, but fortunately in the case of the Montmorency River, a record of citizen evacuations and corresponding spending for the 2014 flood was available.Although incomplete, this record allows us to guide the estimation of d, s and b d(Q m ), s, w .
In this context, the cost of implementing and operating the forecasting system as such is not considered in s.Of course, when the civil security chooses which forecasting system to put in place, it must consider the cost of implementing this particular system.Nevertheless, once the system is in place, its cost should not affect precautionary spending decisions.This also motivated the choice of CARA utility functions, since they do not depend on "wealth" (which would be affected by the cost of performing the forecast).

Level of risk aversion A
Risk aversion A is an intrinsic characteristic of each person or organisation and could be calculated, given the availability of a sufficiently long record of decisions and associated money spending.However, in the present study, A was left free for the following reasons.First, the available data are not sufficient to credibly calibrate A. Second, as one of the goals of this study is to illustrate how risk aversion influences the value of a forecasting system for a particular problem, it is logical to cover a range of possible As, including the riskneutral A = 0 situation.Therefore, A was made to vary from 0 to 0.01.Although these represent relatively small levels of risk aversion (e.g.Babcock et al., 1993), preliminary tests have shown that, in the context of this paper, these values were sufficient to illustrate a change in the decision maker's spending decisions and therefore in the economic value of the concurrent forecasting frameworks.Negative values for A were not considered, as they represent a risk-seeking decision maker, unrealistic in the context of flood mitigation.

Damages d, spending s and damage reduction b
The material damages to houses and property associated with flood events can be estimated using the flow-damage curve established by Leclerc et al. (2001).This curve is based on a survey regarding the types of houses in the sector: one or two storeys, with or without a basement, etc., and their value according to the municipal evaluation.The levels of submersion for different streamflow values were obtained through hydraulic simulations.The damage is then deduced from this level of submersion using Gompertz' law (Gompertz, 1825).The damage expressed in dollars rises exponentially with observed streamflow (m 3 s −1 ) and ranges from $ 0 to $ 375 000.
In this study, the following parametrisation of the benefit function is used: where is the flow-damage curve (Leclerc et al., 2001) for the mth category, and β w and ψ are parameters.This particular parametrisation assumes that the benefit of spending is linear, until all damages are avoided.
It also implies that it is never optimal to spend more than max m {ψ • d(m)}, since additional spending brings no additional benefit, for any possible forecast member.The parameter β w was initially calibrated by assuming ψ = 1.By comparing the total amount of money spent in 2014 to alleviate flood damages with the damages (in dollars) predicted by the aforementioned damage curve using the observed streamflow, it was found that the calibrated β w was less than 1.This implies that the civil security service would have spent more than the total amount of possible damage.
This therefore implies the existence of intangible benefits associated with having a flood warning system and spending money to mitigate flood effects.According to Lave and Lave (1991) and Carsell et al. (2004), these intangible benefits include but are not limited to not putting people's health and security at risk, stress reduction for the population, and building a feeling of trust towards the authorities.In the case of the Montmorency River, there has never been any loss of life.However, as mentioned earlier, it may happen that people refuse to leave their residences and become isolated from connecting roads, restricting their access to services and medical care.Unfortunately, it is very difficult and probably rather imprudent to associate a definite cost with these intangible benefits such as "reducing stress".In the absence of a better alternative, in this study a multiplying factor ψ was applied to the damage curve to account for those intangible benefits, as suggested in Van Dantzig and Kriens (1960).The parameter ψ was made to vary between 1.5 and 10, and β w was computed again for each different value of ψ, as the damage curve is modified.The lower limit of ψ was set so that money spent during the flood of 2014 (C.Pigeon, personal communication, 2015) equals the damage predicted by the damage curve.Therefore, in this framework, the damage curve of (Leclerc et al., 2001) (i.e. d(m)) represents mostly the relationship between streamflow and its impact on the lives and well-being of people.

Warning time and dynamic decision-making
According to the US Army Corps of Engineers (1994), as well as to Richardson (2000) and Roulin (2007), the costs of emergency measures or benefits thereof are related to warning time w.In particular, Roulin (2007) assumes that early action can reduce the total cost of emergency measures and maximise damage reduction.Carsell et al. (2004) also provide an evaluation of residential content (furniture, food, electric appliances, etc.) that can be protected with a given warning time.
However, the accuracy of forecasts is inversely related to lead time, and the decision maker might want to wait for better information before taking a decision.
Those considerations go far beyond the objective of this study, and the formalisation of an explicit dynamic decision process is left for further research.In this study, the dynamic nature of the problem is addressed by assuming that the decision maker uses the following myopic decision-making procedure.
1.At the beginning of each day, the decision maker receives a 5-day forecast.
2. Iteratively, and starting with the earliest (5-day) forecast, the decision maker chooses their preferred level of spending.This level of spending is chosen so as to maximise Eq. ( 6).
3. The decision maker is constrained (by external factors such as the availability of materials or labour force) to spend at most a fraction δ of their preferred level of spending s (see below).
The benefits of a spending are assumed to take effect on the day the spending decision is made, up until the forecast date.For example, if a decision maker spends $ 1000 on a given Monday, anticipating a flood the following Thursday (i.e. a 4-day forecast), then any damage occurring prior to Thursday is also reduced (by β w × $ 1000).
The parameter β w is divided between lead times according to [2, 1.75, 1.5, 1.25, 1]β 2014 , where β 2014 is calibrated on the spending decisions of 2014 and represents the baseline ratio of gain per dollar invested.The above multiplication therefore assumes that early actions lead to higher gains per dollar spent.This is very similar to the methodology presented in Sect.4.3 of Roulin (2007), except that only one repartition of β w is tested here, compared to two in Roulin (2007).
If the decision maker is to take successive actions at different lead times according to forecasted streamflow, then the total amount of available money can be spread across lead times.The decision maker can, for instance, spend all the available money 2 days prior to the event, or they can spend Table 2. Maximum fraction of total spending s allowed, depending on the forecasting horizon.Each spending vector is identified by an identification number (ID) for further reference.

ID
Maximum fraction of spending allowed number Day 5 Day 4 Day 3 Day 2 Day 1 "No limit for a 1-day forecast" half 2 days prior to and the remaining half the day before the flood (1 day).To account for this, five different "spending vectors" were created (Table 2).The values in those spending vectors represent the maximal fraction δ of the preferred level of spending s that can be spent at each lead time.The first three spending vectors represent situations for which there is no limit on the spending than can be made the day before, with spending vector number 3 representing the extreme case where the decision maker must wait until the 1day forecast before spending any money.By contrast, spending vector numbers 4 and 5 represent a fictitious situation in which the decision maker can spend any amount of money at the 5-day horizon, and no spending is allowed the day before (1 day).
It is important to note that due to the myopic decisionmaking procedure, the decision maker does not take into account the fact that money spreads across lead times when making a decision.This effect alone underestimates the value of early spending.However, the decision maker also does not consider the reduction in uncertainty gained by waiting (which overestimates the value of early spending).In this study, those two effects are assumed to balance each other.
To summarise, the simulation procedure is as follows.
2. Given the spending decision of 2014, infer the value of β 2014 (given the decision model).
3. Given A, ψ, β 2014 and the other model parameters, apply the decision-making procedure described in Sect.4.3 for each forecast.

S. Matte et al.:
Moving beyond the cost-loss ratio 5 Performance assessment

Forecast quality
The three forecasting systems described in Sect.3.2 and 3.3 are compared to each other by assessing their respective abilities to forecast observed streamflow values for the 1-to 5-day projections.This performance assessment also involves the well-known Continuous Ranked Probability Score (CRPS, Matheson and Winkler, 1976) and a reliability diagram (Stanski et al., 1989).

Evaluating the benefits of forecasts
As described in the Introduction, the usefulness of an early flood warning system is in helping the decision maker choose the best spending level s, prior to the event.The value of such a system is therefore closely related to the decision maker's ability to affect the outcome through their spending decisions.The benefits of forecasts are therefore evaluated with an explicit concern for the decision maker's preferences.
In order to develop an indicator of the economic benefits of a forecast, it is important to distinguish between the decision maker's ex ante utility (before the uncertainty is resolved, as in Eq. 6) and their ex post utility (the realised level of utility, after the uncertainty is resolved).This is important as spending decisions are based on the ex ante utility, whereas the value of the forecasts are based on the (expected) ex post utility, conditional to spending decisions.Given the spending decision s and the realised state m, the ex post utility of the decision maker is given by where s f is the total amount of money spent, from a decision based on forecasts (f ).The value of this ex post utility is dependent, of course, on the realised streamflow values.In order to obtain a sensible evaluation of the decision maker's utility, one must therefore consider the average ex post utility: E m U m (s), where the expectation E m is taken with respect to the historical streamflow values.Note that, strictly speaking, the history under consideration should be long enough to be representative of the true distribution of streamflow.On the one hand, it is expected that a longer record will provide a better empirical estimate of the true streamflow distribution.On the other hand, there can also be various sources of non-stationarity affecting the observed streamflow values over time (e.g.changing the measurement apparatus, climate change, land-use change).Hence, even with a very long historical record, the true distribution of streamflow cannot be known with certainty.Note that this also affects measures of quality, such as the CRPS.
The average ex post utility can be computed for any of the three forecasting systems described in Sects.3.2.2 and 3.3, but also for two special cases: perfect forecasts and no forecasts.On the one hand, if forecasts were perfect, there would be no missed events and the decision maker would spend only the exact amount of money necessary to obtain the maximum possible protection, as early as time allowed.On the other hand, if no forecasts were available, there would be no decisions to be made and no money to be spent on flood mitigation and protection measures.Therefore, the maximum amount of damage would occur for each flood event.
It is important to note that utility is an ordinal quantity that only represents the preference of a person faced with a decision-making problem, given some information from uncertain forecasts.That is, the utility levels can be compared, but the actual value of the decision maker's utility has no interpretation.Consequently, the utility values computed for the three forecasting systems can be scaled relative to the utility of a perfect forecasting system.This simplifies the interpretation, without imposing any additional restriction.
The hit rate and the overspending index, two standard measures of the economic performance, are also presented.
The hit rate, given by Eq. ( 9), is the ratio of avoided damages when decision-making is based on the forecasting system being evaluated to the damages that would be avoided if the forecasts were perfect (always equal to the observations).
where s p is the amount of money that would have been spent if perfect forecasts had been available.s f is the total amount of money spent when decisions are based on forecasts, as in Eq. ( 8).s p matches exactly the damages corresponding to the observed streamflow, for all time steps.Overspending is defined as in Eq. ( 10).It allows for measuring of how much the forecasting system being evaluated overspends (in percentage) compared to perfect forecasts.One should aim for the overspending value to be as low as possible.
Results are presented in the next section.
6 Results  count for state variable uncertainty through EnKF.This figure shows that for 1-day forecasts, forecasts based on meteorological ensembles generally have low spread.This is expected, as only the forcing uncertainty is accounted for and this uncertainty requires more than 1 day to be propagated through the hydrological model.In addition, at short lead times the members of meteorological ensemble forecasts are often very similar.However, before each of the two flood peaks, they display more dispersion than dressed forecasts.The influence of the EnKF can also be seen.The spread of the forecasts with EnKF is greater than the forecasts without EnKF and the density of forecast members is higher around the observed streamflow.At the 5-day lead time, some members of the forecasts based on meteorological ensembles reach very high streamflow values.This is not the case for the dressed deterministic forecasts that often underestimate streamflow.

Assessment hydrological forecasts relative to observations
Figure 5 presents the mean CRPS of the three concurrent forecasting systems over the 2011-2014 period.The CRPS was computed separately for each lead time in 3 h increments and averaged over the entire record of forecasts and corresponding observations.For very short lead times, the dressed deterministic forecasts outperform those based on meteorological ensembles (lower CRPS).As noted above, for short lead times the members of the meteorological ensemble forecasts are often very similar and the forecasts thus have no dispersion.Dressed forecasts, by definition, necessarily have more spread.Since the forecasting system is not perfect, an ensemble with very low spread is at risk of missing the observation.However, for lead times longer than 18 h, forecasts  based on meteorological ensembles achieve a better (lower) CRPS than dressed forecasts, despite the jumpy behaviour of the ensemble curves compared to that of the dressed forecasts.Furthermore, the performance gap between meteorological ensemble-based forecasts and dressed forecasts increases with lead time.
The perturbation of state variables after manual data assimilation increases (worsens) the CRPS.This is likely attributable to a loss of resolution.Although sharpness, resolution and reliability are all desirable attributes of a forecasting system, there is most often a trade-off between the resolution and reliability.Sharpness is akin to "precision" and refers to the quality of a forecasting system which issue forecast members that are all close together.Resolution is is the ability of the forecasting system to distinguish between different situations.Indeed, Fig. 6 highlights that forecasts based on meteorological ensembles having a perturbation of state variables display a better reliability than when state variables remain unperturbed.The difference is most striking for 1day forecasts.Figure 6 also shows that dressed deterministic forecasts are more reliable than forecasts based on meteorological ensembles for short lead times (e.g.1-day, hollow circles), but less so for longer lead times (e.g.5-day, hollow triangles).As lead time increases, the accuracy of meteorological forecasts decreases.However, the spread of forecasts based on meteorological ensembles increases considerably with lead times therefore more often including the observed values at the 5-day lead time compared to the 1-day lead time.

Assessment of hydrological forecasts in terms of economic value
For each of the simulated values of A and ψ, the application of each spending vector (cf.Table 2) was tested over the study period (2011)(2012)(2013)(2014).This section describes the simulation procedure.An example of the applied methodology and corresponding results is provided in Fig. 7.The upper row shows 5-day forecasts from the three systems, starting on 17 May 2014.The lower row shows how each member of each forecast is classified into 12 severity classes ranging from nondamaging (class 1) to centennial-scale flooding (class 12) defined after the damage curve.
The utility function (Eq.6) is used successively with the five spending vectors presented in Table 2.The probabilities p m with m = 1. ..12 in Eq. ( 6) correspond to the relative frequencies of each category after classification of forecast members that allows for computing the utility as a function of the money spent.The utility curve maximum provides the optimal spending associated with each forecast.Figure 8 illustrates an example for A = 0.01 and ψ = 7.
Figure 9 presents the utility, hit rate and overspending as a function of parameter ψ for the three flood forecasting systems under study for various levels of risk aversion and for spending vector number 1 (see Table 2).Note that A = 0 cor- responds to the case of a risk-neutral decision maker.Negative risk aversion values representing risk-seeking behaviour, were not used.As mentioned in Sect.5.2, any affine transformation of the utility function is admissible.In Fig. 9, the utility of a perfect forecast was subtracted from the utility of each concurrent forecasting system and from the "no forecast" situation.This allows the y-axis of the utility plots to start at 0 and provide a common reference.This figure shows that a risk-neutral decision maker prefers having information from forecasts based on meteorological ensembles (with or without EnKF) rather than having no forecasts.However, for higher levels of risk aversion (A = 0.01, bottom line of Fig. 9), the forecasting system has no usefulness for low levels of ψ.
Although this seems counter-intuitive, it can easily be explained by looking at the hydrographs (cf.Fig. 4).Forecasts based on meteorological ensembles, in particular using EnKF, have a tendency to generate members with very high streamflow levels.As risk aversion increases, the decision maker puts more weight towards those members, as the associated damage is considerable.This causes the decision maker to spend large amounts of money to "insure" against the potential damage.
As such high streamflow levels are historically rare for the Montmorency River, the decision maker would have been better off not to spend any money and suffer damage during the relatively rare and comparatively small flood events.The "usual" flood events for the Montmorency River are not as dramatic as what is predicted by the most extreme scenarios of the predictive distribution.However, for a risk-averse decision maker, large weights are attributed to those extreme scenarios.This encourages the decision maker to spend large amount of money to mitigate events that in fact never materialise.
Dressed deterministic forecasts decrease weakly with ψ, relative to the ensemble forecasts.Put differently, for large amounts of material damage, the dressed deterministic forecasts have much higher values than the ensemble forecasts.This is due to the fact that, for all lead times, ensemble forecasts include members having "unrealistic" streamflow values.This over-forecasting is exacerbated for high values of material damage and a high value of risk aversion.As the concavity of µ increases (due to an increase in the level of risk aversion A), "bad shocks" are weighted more heav-Figure 10.Utility, hit rate and overspending as a function of parameter ψ for the three flood forecasting systems for various levels of risk aversion by the decision maker, when the decision maker is allowed to spend an increasing fraction of the total available money as the lead time shortens.
ily by the decision maker, leading to considerable levels of (over-)spending.
The same effect can be seen for alternative choices of spending vectors.Figure 10 shows the same parameters (utility, hit rate and overspending) as a function of ψ, for the same forecasts, but for spending vector number 2. With this spending vector, the decision maker cannot spend any amount of money 5 days ahead and can then progressively spend a greater percentage of the available money as the lead time decreases.In such a case, the decision maker should prefer to have access to forecasts based on meteorological ensembles (rather than the no forecast situation) if they are slightly risk-averse (A = 0.001).This is explained by the fact that the 5-day forecast (which contains extreme forecast members, cf.Fig. 4) is not used by the decision maker, which limits overspending.
Eventually, a more risk-averse decision maker (A = 0.01) should prefer the dressed forecasts over any other forecasting system, for ψ values over 6.This is again attributable mostly to some members of the ensemble systems frequently forecasting flood events that do not materialise.This is confirmed by the overspending graphs on the right-hand side of Fig. 10.Hence, in Eq. ( 6), the optimal level of spending s is less for the dressed forecasts than for the other forecasting systems.
When ψ becomes very large (very important damages) the utility of the "no forecast" framework decreases rapidly, especially for a more risk-averse decision maker.Then, even if the decision maker generally overspends, all forecasts are preferred to the "no forecast" situation since the damage associated with a flood event are considerable.For high values of ψ, the spending decision effectively acts as an (valuable) insurance policy.The hit rate increases (slightly) with the level of risk aversion.This is expected, as a risk-averse decision maker will attribute more importance to large streamflow values in the ensemble forecast.
The third column of Fig. 10 shows that a risk-averse decision maker would reduce their overspending by using a forecasting system based on dressed deterministic forecasts rather than on meteorological ensemble forecasts with or without EnKF.Dressed deterministic forecasts exhibit much less dispersion than EnKF forecasts, which also accounts for state variable uncertainty.As it was remarked earlier, a risk-averse decision maker will put more weight on higher streamflow values in the ensemble.If the spread is large, the ensemble necessarily includes larger streamflow values.It is therefore not surprising that overspending is larger for the ensemble forecast with the larger spread, especially for high values of both A and ψ.
The results for the other spending vectors (cf.Table 2) are qualitatively similar and are therefore not presented.These results are available as the Supplement.
Figure 11 shows bar graphs of the relative frequency of each class of events, from 2 to 12, for the different forecasting systems and for observations (see Sect. 6.2).The first class, which is the "no damages" class for low streamflow values, is not included.Over the 4-year period, there has been a total number of 36 days of flooding.From this figure, it can be seen that all three systems forecast floods more frequently than they should (according to the observed frequencies).This over-forecasting also increases with the forecasting horizon.However, the frequencies computed from the dressed deterministic forecasts (a) are closer to the observed frequencies in each class.It can also be noted that the difference between forecasts based on meteorological ensembles without EnKF (b) and with EnKF (c) lies in the representation of extreme events at the 1-day lead time.There are more such over-forecasted situations at this lead time when the EnKF is used as part of the forecasting system.This is suf- ficient for the EnKF forecasts to have lower economic value than the forecasts relying only on meteorological ensembles.

Discussion
Throughout this paper, the impact of risk aversion on the economic value of forecasts is assessed for a well-trained enduser.In this paper, we find that risk-averse end-users mainly consider the less favourable scenarios (upper tail of the predictive distribution in the case of flood forecasting).Thus, although the members of the forecasts are truly equiprobable and presented as such to the end-user, they can still be weighted differently in his or her eyes.This is true for any level of risk aversion, but even more so for high levels of risk aversion.For example, (Danhelka, 2015) mentions that The Minister simply asked me what the forecast for Prague was.After I have explained all the known information, forecasts and uncertainties, I gave him my best guess of the peak flow.But his response was "No, no, no, give me the worst-case scenario; don't tell me numbers you cannot guarantee as not being exceeded".
Therefore, any "outlier" leads to costly actions and the forecasts become of low or null economic value if these outliers are frequent.A consequence of this is that forecasters may be especially careful about the forecasts for high probability of non-exceedance.
The "real" level of risk aversion for the decision maker for flood emergency measures along the Montmorency River remains unknown due to the insufficient record of decisions and associated spending.However, it can be reasonably assumed that they are highly risk-averse (C.Pigeon, personal communications, 2015).Considering A = 0.01 and Fig. 10, the dressed deterministic forecasts provide maximal utility.They have a lower hit rate but also a much lower level of overspending compared to the other forecasting systems.This leads to the conclusion that dressed forecasts have the highest economic value for this level of risk aversion.
However, this conclusion relies on the assumption that benefits are linear.As the level of damage (i.e.d(m)) increases, so does the spending needed to alleviate this damage.In a situation where human casualties are possible (resulting in extremely high values of ψ), the spending needs not to increase with the value of the alleviated damages d(m).For example, the cost of an evacuation is not linked to the (somewhat subjective) value associated with human casualties.These considerations are left for further research.
Our study also shows that forecast quality (as verified using metrics such as the CRPS) is not always a guarantee of forecast value in an economic sense.In this study, the streamflow forecasts based on meteorological ensembles have better CRPS than dressed deterministic forecasts, but their value according to the CARA utility function is lower.
In any case, it is capital to recall that the role of the forecaster is to issue the best possible streamflow forecast given their knowledge of the situation and available model and data.It is the end-user's role to decide the course of action.In no way we would advocate for the forecasters to deliberately bias the forecasts for a certain user.Furthermore, in this paper we did not address the issue of potential cognitive biases and training issues for end-users, which is recognised in the literature (e.g.Ramos et al., 2013;Demeritt et al., 2010;Doswell, 2004).The training of end-users and continuous interaction with forecasters should be encouraged to favour optimal decision-making.However, since risk aversion is not a cognitive bias, even highly trained decision makers are expected to be risk-averse (cf. Fishburn, 1989;Krzysztofowicz, 1986).
Lastly, the decision-making process analysed in this study is a static one.It would be even more realistic to analyse flood mitigation as a dynamic decision process.For instance, depending on their level of confidence regarding the 5-day forecast, a decision maker could decide to launch an evacuation alert and immediately spend all available funds for emergency measures.As stated in Roulin (2007), intuition lends to thinking that preparing in advance for a flood could lead to reduced overall spending compared with waiting until the last minute.This is also discussed in Morss (2010) in her analysis of three case studies of the interactions between flood forecasts, decisions and outcomes.She provides examples of the importance of early actions.
Key property-and life-saving decisions are often thought of as taking specific protective action immediately prior to or during an event.However, sometimes key decisions can be less evident and occur during earlier planning stages.For example, in Grand Forks, once officials had decided to expend most of their time, effort, and resources on planning and building primary dikes, they were not able to plan and build secondary dikes fast enough when the flood grew worse than expected.In the Pescadero case, if officials had not decided to position rescue crews and equipment before the flood began, they would not have been able to reach the area.
However, the implementation dynamic decision model also introduces many more questions regarding how the total spending should be distributed among lead times.It is thus left for further studies.

Conclusions
The purpose of this study is to set the basis of an alternative framework to replace the cost-loss ratio in economic assessment of early warning flood forecasting systems.This alternative framework is based on the Constant Absolute Risk Aversion (CARA) utility function which is well-known in economics.To the authors' knowledge, risk aversion is rarely, if ever, accounted for in hydro-economic assessment of flood warning systems.This new framework is used to compare the economic value of three concurrent streamflow ensemble forecasting systems using the flood-prone Montmorency River watershed in Quebec, Canada.This study concentrates on ensemble rather than deterministic forecasts, as the recent literature clearly states that ensemble forecasts are preferable to deterministic ones for numerous reasons (e.g.Krzysztofowicz, 2001;Jaun et al., 2008;Velazquez et al., 2010;He et al., 2013).Furthermore, real-life operations for the case study involved here (flood forecasting for the Montmorency River) do not involve deterministic forecasts.However, there exist many different means of constructing streamflow ensemble forecasts: dressed deterministic forecasts, single hydrological models fed with meteorological ensemble forecasts, multiple hydrological models, with or without data assimilation, etc.Those different forecasting systems can be compared in terms of their correspondence with the observation and in terms of their value for an end-user.
The importance of the level of risk aversion of the decision maker for the determination of the economic value of a streamflow forecasting system is illustrated by our results.A risk-neutral decision maker, as assumed in the cost-loss ratio framework, is rarely, if ever, encountered in real-life decision problems.The value of forecasting systems strongly depends on the decision maker's level of risk aversion and www.hydrol-earth-syst-sci.net/21/2967/2017/ Hydrol.Earth Syst.Sci., 21, 2967Sci., 21, -2986Sci., 21, , 2017 this parameter should be as much as possible targeted to the end-user.The results also show that forecast quality as assessed by the CRPS, or the reliability diagram, do not necessarily translate directly into a greater economic value, especially if the decision maker is not risk-neutral.Frequent over-forecasting strongly affects the economic value of forecasts.Over-forecasting can be corrected by adequate statistical post-processing of the predictive distributions.This was judged to be outside of the scope of this study, but could certainly be explored in further work.Adequate post-processing would likely improve the value of forecasts.
The decision-making framework presented here can be improved in some ways.Further studies could also include a more detailed, dynamic decision-making process, formally accounting for the forecast horizon.Furthermore, the decision maker could lose confidence in a "bad" forecasting system.The results presented in this paper implicitly assumed that the decision maker's trust of the forecast was absolute.Further studies could include an explicit description of the decision maker's learning about the reliability of a forecast.
Data availability.The economic data used in this study (spending record of Quebec City's civil security bureau) are confidential and cannot be made publicly available.Meteorological observations at hourly time step can be bought by communicating with climat.quebec@ec.gc.ca.The Canadian meteorological ensemble forecasts can be retrieved from the TIGGE data set through ECMWF's MARS server.Data availability can be determined at http://apps.ecmwf.int/datasets/data/tigge/levtype=sfc/type=cf/.Then, a request written as a Python script can be sent to the MARS server through a UNIX terminal.Detailed explanations regarding how to write such a script can be found at https://software.ecmwf.int/wiki/display/WEBAPI/Access+ECMWF+Public+Datasets.

Figure 2 .
Figure 2. Monthly average values for (a) precipitation and (b) temperature for the Montmorency River watershed.

Figure 3 .
Figure 3. Geographical location of the Montmorency watershed.The black dots represent the available meteorological stations and the black square is the streamflow gauging station.

Figure 4
Figure 4 displays hydrographs for a 2-week period during the spring of 2014.Panels (a), (c) and (e) correspond to 1-day forecasts, while panels (b), (d) and (f) correspond to 5-day forecasts.In all cases the time step is 3 h.Forecasts along the upper row (a and b) are dressed deterministic forecasts.Forecasts along the middle row are based on meteorological ensemble forecasts without EnKF, while forecasts in the bottom row are also based on meteorological forecasts but ac-

Figure 4 .
Figure 4.A portion of the 1-day (left) and 5-day (right) forecasted 3 h time step hydrograph in 2014 against the observed streamflow; (a) and (b) are dressed forecasts, (c) and (d) are forecasts based on meteorological ensembles without EnKF and (e) and (f) are forecasts based on meteorological ensembles with state variable uncertainty estimated using the EnKF.

Figure 5 .
Figure 5. Mean CRPS as a function of lead time for the 2011-2014 period for the forecasts based on meteorological ensembles with (grey line) and without (dashed black line) state variable perturbations and for the dressed forecasts (solid black line).

Figure 6 .
Figure 6.Reliability diagrams as a function of lead time for (a) dressed deterministic forecasts (b) forecasts based on meteorological ensembles and manual data assimilation and (c) forecasts based on meteorological ensembles, manual data assimilation and additional perturbation of state variables.

Figure 7 .
Figure 7. Separation of forecast members into 12 categories according to the magnitude of streamflow.The example is for forecasts emitted on 17 May 2014.(a) and (d): dressed deterministic forecasts; (b) and (e) meteorological ensemble-based forecasts; (c) and (f) meteorological ensemble+EnKF forecasts.

Figure 8 .
Figure 8. Utility as a function of money spent for forecasts emitted on 17 May 2014 for each of the three forecasting systems.Thin grey curves represent the utility of any decision given the 12 classes of events.Thick curves show the utility of forecasting system.Maxima of each system are indicated by a diamond marker.Calculations are for A = 0.01 and ψ = 7.

Figure 9 .
Figure9.Utility, hit rate and overspending as a function of parameter ψ for the three flood forecasting systems for various levels of risk aversion for the decision maker, when spending is allowed indifferently at any lead time.

Figure 11 .
Figure 11.Relative frequencies of forecasts and observations corresponding to the classes of events used in the evaluation of damages, as a function of the forecasting horizon (1 to 5 days).(a) Dressed deterministic forecasts, forecasts based on meteorological ensembles without (b) and with (c) EnKF.Panels (d), (e) and (f) are identical and show the relative frequencies of the observations for the same classes.

Table 1 .
Streamflow associated with important return periods and flood mitigation thresholds for the Montmorency River watershed.