Initial assessment of a multi-model approach to spring flood forecasting in Sweden

Introduction Conclusions References


Introduction
In Sweden, seasonal (or long-term) hydrological forecasts are used primarily by the hydropower industry for dam regulation and production planning (e.g.Arheimer et al., 2011).The forecasts may be used to optimise the balance between a sufficiently large Introduction

Conclusions References
Tables Figures

Back Close
Full water volume for optimal power production and a sufficient remaining capacity to safely handle sudden inflows.In northern Sweden, the spring flood forecast is the most important seasonal hydrological forecast and it generally covers the main snowmelt period in May, June and July.
Traditionally, discharge and spring flood forecasting at seasonal time scales have been based on two approaches.The first utilises statistical relationships between accumulated discharge during the forecasting period and predictors such as snow water equivalent and accumulated precipitation that represent the hydrological state at the forecast date (e.g.Garen, 1992;Pagano et al., 2009).The other approach is based on a hydrological model, which is initialised with observed data up to the forecast issue date and then forced with historical meteorological inputs over the forecasting period (e.g.Day, 1985;Franz et al., 2003).In addition, hybrid approaches, applying modelderived information in the statistical regression, have been proposed (e.g.Nilsson et al., 2006;Rosenberg et al., 2011).Recently, substantial progress has been made in the field of seasonal climate forecasting.It may be distinguished between dynamical and statistical approaches.In the dynamical approach, numerical atmospheric models (global circulation models -GCMs) have been developed to predict seasonal climate, i.e. the average climate for three consecutive months, several months ahead (Goddard et al., 2001).The scientific basis of such predictions is that the sea surface temperature (SST), that characteristically evolves slowly, drives the predictable part of the climate.Consequently, providing to a GCM model the information about the variations in SST makes possible the forecast of seasonal climate.The SST information may be provided to the GCM by using the SST field as a boundary condition or by coupling the GCM to an ocean model that will then provide the necessary SST information.GCM seasonal forecasts may be downscaled dynamically (e.g.Graham et al., 2007;Bastola et al., 2013;Bastola and Misra, 2014) or statistically (e.g.Uvo and Graham, 1998;Landman et al., 2001;Nilsson et al., 2008), to better represent regional interests.Introduction

Conclusions References
Tables Figures

Back Close
Full An early attempt to use climate model output for hydrological forecasting in a coastal Californian basin during winter 1997/1998 was made by Kim et al. (2000).They found an overall decent agreement between simulated and observed discharge.Low (high) flows were however systematically overestimated (underestimated), which was attributed primarily to climate model precipitation bias.To tackle this problem of climate model biases, Wood et al. (2002) proposed bias-correction by a percentile-based mapping of the climate model output to the climatological distributions of the input variables.Recently, several investigations have focused on the relative role of uncertainties in the initial state and in the climate forecast, respectively, for the hydrological forecast skill (e.g.Li et al., 2009;Shukla and Lettenmaier, 2011).
In a climate-based statistical approach, teleconnections between climate phenomena that affects the large-scale atmospheric circulation and the subsequent hydrometeorological development in specific locations are identified and utilised (e.g.Jónsdótir and Uvo, 2009).The impacts of the El Niño-Southern Oscillation on the tropical climate are the most commonly use of such teleconnections in seasonal forecast (Troccoli, 2010).Teleconnections can be also the basis for seasonal forecast in high latitudes such as the in impacts of the North Atlantic Oscillation in the winter climate in Scandinavia (e.g.Uvo, 2003) and the more recently identified impacts of the Scandinavian Pattern on summer climate in southern Sweden (Engström, 2011;Foster and Uvo, 2012).Teleconnection indices have also been used as predictors in regression-based approaches to seasonal hydrological forecasting (e.g.Robertson and Wang, 2012).
In light of the above described progress of the field, it is time to explore ways of updating operational practices by incorporating the new knowledge acquired and methods developed.The current spring flood forecasting practice at the Swedish Meteorological and Hydrological Institute (SMHI) is an example of the traditional model-based approach.It is a climatological ensemble approach based on the HBV hydrological model (e.g.Bergström, 1976;Lindström et al., 1997).In the procedure, HBV is initialized by running it with observed meteorological inputs (precipitation and temperature) for a spin-up period up to the forecast issue date.Then, all available historical daily precip-Figures

Back Close
Full itation and temperature series in the period from the forecast issue date to the end of the forecasting period are used as input to HBV, generating an ensemble of springflood forecasts.The main variable delivered to end-users is the median value of total accumulated discharge in the spring flood period, but also percentiles are used.While overall sound and generally useful, this current practice has the obvious limitation that it is based on the climatology, i.e. the normal climate.Thus, if the weather from the forecast issue date up to the spring flood period evolves in a close-to-normal way the median forecast is likely to have a small error.However, if the weather deviates from the climatology, the forecast error will be large.
The objective of this study has been to develop, test and evaluate new approaches to spring flood forecasting in Sweden.The main scientific hypothesis examined is that the application of large-scale climate data (historical and forecasted) can improve forecast skill, as compared with today's procedure.A secondary hypothesis is that a combination of approaches provides an added value, as compared with each individual approach.
Three different approaches have been tested and evaluated: The new approaches were evaluated for the spring flood forecasts 2000-2010 issued in January, March and May for the rivers Vindelälven, Ångermanälven and Ljusnan in Sweden.

Study area, local data and models
The basins of the rivers Vindelälven, Ångermanälven and Ljusnan have been used for testing spring flood forecast (Fig. 1a).Vindelälven is unregulated, whereas both Ångermanälven and Ljusnan are regulated.For each river basin, two stations have been selected for evaluation of the forecast methods; one located in the upstream part of the basin and one at basin outlet.The upstream area ranges between 1700 and 31 000 km 2 (Table 1).In this study we focus on forecasts of the accumulated discharge in the spring flood period (May-July), which is the key variable delivered to the hydropower industry, and this quantity will in the following be referred to as SFV (spring-flood volume).
The mean SFV in the study basins ranges between 900 × 106 and 8000 × 106 m 3 , corresponding to average discharges in the spring flood period between approximately 100 and 1000 m 3 s −1 .The HBV model (Bergström, 1976;Lindström et al., 1997) was set up and calibrated for all three rivers.HBV is a rainfall-runoff model which includes conceptual numerical descriptions of hydrological processes at basin scale.The general water balance in the HBV model can be expressed as where P denotes precipitation, E evapotranspiration, Q runoff, SP snow pack, SM soil moisture, UZ and LZ upper and lower groundwater, respectively, and lakes the lake volume.Input data are normally daily observations of precipitation, air temperature and Figures

Back Close
Full monthly estimates of potential evapotranspiration.Air temperature (T ) data are used for calculations of snow accumulation and melt and possibly potential evaporation.The model consists of subroutines for meteorological interpolation, snow accumulation and melt, evapotranspiration estimation, a soil moisture accounting procedure, routines for runoff generation and finally, a simple routing procedure between sub-basins and lakes.
Applying the model necessitates calibration of a number of free parameters, generally about 10. P and T inputs may be given either as station data or as gridded fields, and the latter are generally created by optimal interpolation (e.g.Johansson, 2002).The HBV model set-ups used here for rivers Vindelälven and Ljusnan use gridded inputs whereas the set-up for Ångermanälven uses station-based input.In all cases performed in this work, the data span was 1961 to 2010.
The overall accuracy of the HBV calibration for each station expressed in terms of the Nash-Sutcliffe efficiency (R 2 ) and the relative volume error (RVE) in period October 1999-September 2010 are given in Table 1.Values of R 2 consistently around 0.9 and only a few percent volume error imply accurately calibrated models with limited scope for improvement.

Large-scale atmospheric data
For the definition of circulation patterns (Sect.3.2.2), the ERA40 data set (Uppala et al., 2005) , with resolution of 1 -Daily time series.These data are the forecasted daily values of 2 m temperature and the accumulated total precipitation from the forecast issue date to the forecasting period.These data spanned a period from 2000-2010 and had a domain covering 11 to 23 logue ensemble, dynamical modelling and statistical downscaling.All methods are described in this section.

Climatological ensemble (CE; baseline forecasts)
The current spring flood forecasting practice at SMHI is a climatological ensemble approach based on the HBV hydrological model (e.g.Arheimer et al., 2011).The forecast procedure follows three steps: 1.A set-up of the HBV model, well-calibrated for the specific river basin and location, is run using observed meteorological data (T , P ) as input for a period of not less than 12-24 months up to the forecast issue date, typically sometime in February.
The state of the HBV model at the forecast issue date will thus reflect the current hydrological conditions in the basin with respect to streamflow, snow pack, soil moisture, etc.
2. The resulting HBV state from step 1 is then used as the initial state for forecast runs.The input data for the forecast runs are all available historical time series of T and P , for the specific basin, which covers the period from the forecast issue date until the end of the spring flood period.The time series of each historical year represents one possible weather evolution and results in one possible SFV estimate.
3. The results from all historical years make up a climatological forecast ensemble, which may be expressed in terms of percentiles with different probabilities.In current practice, as well as in this study, the median value of SFV is considered as the spring flood forecast.

Analogue ensemble (AE)
A collection of daily observed T and P data from 1961 to 1999 in several stations constitute the historical data.Due to the large number of years available, it is likely that Figures

Back Close
Full one or some of them will better represent the weather prevailing from the forecast issue date over the spring flood period to come.One such year is an analogue year and a group of them will compose an analogue ensemble.This approach aims at identifying an analogue ensemble to be used as input to the hydrological model and thus generate the SFV forecast ensemble.Two methods are used for the selection of the analogue ensemble, one based on teleconnection climate indices and one on circulation patterns.After selection, the procedure described in Sect.3.1 is followed but with the analogue instead of the full historical ensemble in step 2.

Selection based on teleconnection climate indices (TCI)
The northern hemisphere teleconnection patterns are recurring air pressure and circulation anomalies identified by Barnston and Livezey (1987) using a Rotated Principle Component Analysis (RPCA) of standardised geopotential height anomalies.The prospect of using climate indices for identifying analogue years in a hydrological forecasting context has been previously explored by e.g.Hamlet and Lettenmaier (1999).
The Climate Prediction Center (CPC), which is part of the National Oceanic and Atmospheric Administration (NOAA), calculates indices for 10 teleconnection patterns (http://www.cpc.ncep.noaa.gov/data/teledoc/telecontents.shtml).From these, the following three were selected for this work: -North Atlantic Oscillation (NAO): the positive phase of the NAO is associated with above average temperatures and precipitation over Scandinavia during winter, while the negative phases tend to be associated with below average temperatures and precipitation (Kushnir, 1999;Hurrell and Dreser, 2010;  -Scandinavia pattern (SCAND): the positive phase of the SCAND pattern is associated with below average winter precipitation over Scandinavia, except over the Scandinavian mountains, where little signal is present.For winter temperature, this phase is associated with below average in the southern and above average in the northern Scandinavia (Comas-Bru and McDermott, 2014).
The TCI method looks at the persistence of the different indices for different periods in the forecast year, namely 1 to 6 months prior to the forecast issue date.The indices are classified as either normal (indices within one standard deviation of the mean value), above normal (indices above one positive standard deviation of the mean value) and below normal (indices below one negative standard deviation of the mean value).The same is done for corresponding periods in the historical data and if the classification of the three different indices is in agreement with the year in question for the forecast, the historical year is selected as an analogue year.If no analogue years can be identified among the historical ones by comparison of the state of the three climate indices, analogue years are sought using an agreement with two of them.

Selection based on circulation patterns (CP)
Circulation-pattern (CP) analysis is a commonly used tool in climatological and meteorological studies (Hay et al., 1991;Wilby and Wigley, 1994).It was initially applied to explain climate variability at a large scale (Barry and Perry, 1973)  sification).As the subjective classification is only available in a limited number of regions, the objective classification has been widely developed and used.The objective classification is a semi-automated or automated technique that pertains to mathematical approaches, e.g.hierarchical methods (Johnson, 1967), k-means methods (Mac-Queen, 1967), cluster analysis (Kyselý and Huth, 2005) and correlation methods (Yarnal, 1984).The method that is proposed and investigated here is based on fuzzy-rule logic.Fuzzy-rule-based classification is built on the concept of fuzzy sets (Zadeh, 1965), using imprecise statements to describe a certain system, in this case the climate system.The classification scheme for CPs follows four steps: (1) transformation of largescale data; (2) definition of the fuzzy rules; (3) optimisation of the fuzzy rules; and (4) classification of CPs.A detailed description of the methodology used here can be found in Bárdossy et al. (2002) and is only summarised in the following.
In this work, the anomalies of daily mean sea level pressure (MSLP), g(i , t), from reanalysis data (ERA40 or ERAINTERIM; Sect.2.2), serves as a predictors according to where h(i , t) is daily MSLP at grid cell i and time t.Variables µ(i , t ) and σ(i , t ) denote its climatological mean and standard deviation at grid cell i on Julian date t .The anomaly g(i , t) indicates the deviation of daily MSLP from the long-term climatology.
To determine the fuzzy rule sets best describing the CPs, every rule is optimised with a local variable using a well-designed objective function that explains its statistics in a given region.In this case, the precipitation records measured in the Vindelälven basin during where N is the total number of days used for the CP optimization.For a day n with a given circulation pattern CP(n), PW d denotes the probability of precipitation exceeding depth d (generally 0.1 mm, but also higher thresholds may be used) and Z denotes the mean precipitation amount.Overbar represents the long-term climatological means of PW d and Z, in practice calculated as the averages over all N days without regard to classification.The objective functions given by Eqs. ( 3) and ( 4) are combined in a weighted sum where the two weighting factors w 1 and w 2 are determined subjectively to adjust for differences in magnitude as well as importance.

Dynamical modelling (DM)
In this approach, the daily T and P ensemble of seasonal forecasts from ECMWF (Sect.2.2) were converted into HBV input.This was done by simply remapping the daily field forecasts onto the 4 × 4 km 2 grid used as HBV input data (Sect.2.1).Within the resources of this study, this conversion was only attainable for the two rivers with HBV model set-ups using grid-based input (Vindelälven and Ljusnan).
After conversion, the ECMWF forecasts were used to feed the HBV model from the same initial state as used in the current CE procedure, thus following the procedure in Sect.3.1 but with forecasts instead of historical years in step 2. As in the CE procedure, the final forecast used in the evaluation is defined by the ensemble median (assuming no impact of the different ensemble sizes used; Sect.2.2).

Statistical downscaling (SD)
Statistical downscaling is a widely accepted methodology used to connect coarse-scale climate data from GCM to local-scale climate.In this case, large-scale circulation variables are statistically connected to the SFV (e.g.Landman et al., 2001;Foster and Uvo, 2010).The method employed to establish the statistical relationship among the variables is the multivariate procedure known as Singular Value Decomposition (SVD) analysis (Bretherton et al., 1992).SVD analysis is a technique that isolates sets of mutually orthogonal pairs of spatial patterns that maximize the squared temporal covariance between two physical variables (e.g.Cheng and Dunkerton, 1995;Uvo et al., 1998; among many others).The SVD of the cross-covariance matrix of two fields yields two matrices of singular vectors and one set of singular values.A pair of singular vectors describes spatial patterns for each field that have overall covariance given by the corresponding singular value.This praxis has been recently re-named as Maximum Covariance Analysis (MCA).
MCA can be used to derive specific prediction or specification models for a particular point in one variable's field (the predictand; SFV in this case) based on the spatial Introduction

Conclusions References
Tables Figures

Back Close
Full pattern and/or on the evolution patterns of the anomalous values in the other field (the predictor).From the singular vector pairs, the temporal expansion series of each field can be obtained by projecting the data onto the appropriate singular vector (Bretherton et al., 1992).The relationship between the variables is generated by calculating the matrix of regression coefficients which relates the values of the predictor singular mode temporal amplitudes to the individual points in the predictand field.In this work, historical time series for both the predictors and the predictand are used to define the statistical relationship between them and then uses present predictor data to perform a forecast.To maximise the robustness of the forecast, multiple forecasts are made with different predictors resulting in an ensemble forecast.The predictors used were forecast fields of large-scale circulation variables with a 2 • × 2 • resolution from two different GCMs (detailed description of the data used is given in Sect.2.2).For each forecast issue date, the seasonal average of the GCMs forecast ensemble mean for different variables are used as the predictor.The period used for developing the statistical model (that express the statistical relationship between predictors and predictand) was from 1982 until the year prior to the year being forecasted; thus the training period increased in length with each step forward through the study period.
It should be noted that whereas the other methods generate daily discharge time series over the spring flood period, from which SFV is estimated, the SD method directly forecasts the SFV.Therefore forecasts from the SD method are of most interest in the early forecast issue dates and of less interest closer to the spring flood period, as they are not able to provide information about the flood profile.

Experimental set-up and evaluation
A key issue in seasonal forecasting is the lead time, i.e. the period between the forecast issue date and the start of the forecasting period.It may be expected that the relative skill of the different approaches depend on the lead time.Generally, the main gain of statistical approaches is expected for long lead times.When approaching the forecasting period, the representation of the hydro-meteorological state in the HBV model be-Introduction

Conclusions References
Tables Figures

Back Close
Full comes gradually more important and the relative skill of the current procedure is likely to increase.To assess the relative skill for different lead times, we evaluate hindcasts issued on 1 January (1/1), 1 March (1/3) and 1 May (1/5) in the period 2000-2010.
In the SD procedure, the average circulation fields forecasted by the GCMs for the 91 days following the forecast issue date were used as predictors.It is expected that the approximation to the spring flood period improves the GCM forecast skill.The predictor fields were different for different forecast issue dates.They were selected by an initial screening based on previous literature followed by an analysis of predictive skill in the historical period.where OBS denotes observation.
To quantify the gain of the new forecast approaches (Sects.3.2-3.4),their MAEvalues are compared with the MAE obtained using the current CE procedure (MAECE) by calculating the relative improvement RI (%) according to where a positive RI indicates that the error of the new approach is smaller than the error in the CE procedure, and vice versa.As an additional performance measure, we also calculate the frequency of years FY + (%) in which the new approach performs better (i.e. has a lower AE) than the CE procedure.This may be expressed as where H is the Heaviside function defined by

Baseline simulations with climatological ensemble (CE)
Before testing the new forecasting approaches, the performance of the climatological ensemble procedure was assessed (Table 2).In simulation mode, i.e. using observed values of P and T , the MAE varies from 4.1 % in Kultsjön to 10.3 % in Dönje with an average of 7.7 % for all rivers.This quantifies the HBV model error and corresponds to having a perfect meteorological forecast.It may be noted that the station with the lowest HBV performance in terms of the overall measures R 2 and RVE, Kultsjön (Table 1), in fact shows the best performance with respect to the estimated spring-flood volumes (MAE = 4.1 %; Table 2).This difference is not a contradiction, as MAE here represents performance in one single season, and it underlines the need to complement overall calibration criteria with season-specific measures for tailored forecasting models.In forecast mode, the average MAE decreases from 21.9 % in the 1/1-forecasts to 13.4 % in the 1/5-forecasts (Table 2), which thus quantifies the improvement when approaching the spring flood period.Overall, the forecast accuracy decreases from north Introduction

Conclusions References
Tables Figures

Back Close
Full to south.This is likely related to the higher probability of having melting episodes before the spring flood in the southern part of the region considered, so that part of the accumulated snow during winter has already melted and infiltrated when the spring flood starts.The occurrence and (non-linear) effects of such early melting episodes are very difficult to accurately simulate and forecast.It is further surprising that the skill of the 1/3-forecasts in Ljusnan is slightly lower than that of the 1/1-forecasts.Conceivably, the fact that observed P and T for Jan-Feb are used as inputs to the 1/3-forecast should improve the forecasts as compared to using a climatological input ensemble for estimating the initial conditions, as is the case for the 1/1-forecast.As the difference in skill is small, we assume that the apparent illogicality is a function of the limited sample size and the associated statistical scatter.
The differences in Table 2 between MAE for simulations and forecasts, respectively, represent the part of the total error that is related to the meteorological input.On average, this part decreases from 14.2 percentage points in the 1/1-forecasts (which corresponds to 65 % of the total error) to 5.7 in the 1/5-forecasts (43 %).It should be emphasised that two out of the three new forecast approaches tested here (AE and DM) aim at improving the meteorological input.They can thus only improve the forecasts in that respect; the HBV model error remains.The third method (SD), however, aims at improving total performance.
The relative impact of the HBV model error thus increases with decreasing lead time, which implies that the scope for improving the baseline forecasts decreases with decreasing lead time.It is remarkable that MAE for the 1/5-forecasts in Vindelälven is only slightly higher than the HBV model error.This may be interpreted as that with a proper representation of the hydro-meteorological state in the HBV model for Vindelälven on 1/5, the exact evolution of the weather in the spring flood period has only a minor impact.Some analysis of HBV model bias was also performed, i.e. the tendency to systematically over-or underestimate SFV.In simulation mode, a small positive bias (∼ 5 %) was

HESSD Figures Back Close
Full found with little difference between rivers.In forecast mode, only a negligible negative bias (∼ 1 %) was found.

Results
Generally, the results for different stations in the same river are similar.Therefore, the results are presented as averages over the two stations in each river.An overview of the results is given in Table 3.The numbers after approaches TCI and CP correspond to the best performing version of each approach, see further Sects.5.1.1 and 5.1.2.Numbers marked in boldface indicate that the new approach performs better than the CE procedure.

Analogue ensemble (AE)
As mentioned in Sect.3.2, both the TCI and the CP approach are based on analyses of the large-scale climatic conditions 1 to 6 months before the forecast date.The aim was to identify the number of months of climatic information that generates the best performance when averaged over all forecast dates and rivers, to ensure that the selected approach is robust.For a specific forecast date and river, a different period of climatic information may perform better than the selected approach but this likely mainly reflects statistical variability in light of the rather limited sample available.

Teleconnection indices (TCI)
As shown in Table 4, the TCI approach performs better than CE in only a few cases.The accuracy of the TCI forecasts in Vindelälven and Ljusnan is generally low.In Ångermanälven, however, the TCI forecasts are notably better and even slightly better than CE when averaged over all dates and TCI versions (i.e.number of months used).In particular, the TCI 1/5-forecasts are clearly better than the CE ones.The main reason for this difference lies in the physics that support the TCI method.This method Introduction

Conclusions References
Tables Figures

Back Close
Full is based on the effect of different climate phenomena on T and P and consequently discharge, and this effect varies depending on the location of the river basin (see Uvo, 2003).In particular, Ångermanälven is located in a region that is more affected by natural climate phenomena than Vindelälven and Ljusnan.It may be remarked that the different TCI versions often identify approximately the same analogue years, therefore the performance is generally rather similar for a certain forecast issue date and river.On average, the TCI forecasts generally have a 10-20 % larger MAE than CE.The best overall performance is found for TCI6, with a 5.7 % larger MAE than CE.It outperforms CE in only one case but is always close to the CE accuracy.The other TCI versions (1 to 5) outperform CE slightly in few cases, but have a substantially larger error than CE in many cases.

Circulation patterns (CP)
As shown in Table 5, comparing the different CP versions (1 to 6), using a period of three months before the forecast date (CP3) to characterise the climate stands out as the superior choice.The MAE of the CP3 forecasts are on average 1.6 % lower than CE and the performance gradually decreases for both shorter and longer periods.On average in Vindelälven and Ångermanälven, CP3 performs 7.3 % better than the CE forecasts and in these rivers the CP3 approach outperforms CE on essentially all forecast dates.The only exception is the 1/5-forecast for Vindelälven, which was previously shown very difficult to improve by changing the meteorological input (see discussion in connection with Table 2).The most notable improvement is found for the 1/1-and 1/3-forecasts in Vindelälven, for which the MAE is reduced by 10-25 % compared with CE and the CP3-forecast is better for 75 % of the forecasts used in the testing.Also for Ångermanälven, the CP6 forecast is generally better than CE with a MAE reduction of up to 25 %.If considering only the meteorological input error, the average improvement by CP3 is ∼ 30 %.The relatively poor performance of the CP approach in Ljusnan is likely at least partly because the CPs were not optimised for Introduction

Conclusions References
Tables Figures

Back Close
Full As mentioned in Sect.3.1.2,the circulation patterns were defined using the ERA40 analysis and then applied to the ERAINTERIM analysis to obtain results for 2003-2010.This implies a higher uncertainty in the results for 2003-2010.If considering only the results for 2000-2002, in which the selection of analogue years is fully consistent with the CP classification, the accuracy of the CP3 forecasts improves by 10-20 % as compared with the results in Table 5.This result should be interpreted with care in light of the very limited sample used, but it indicates that improved performance is attainable if using a consistent data set for the CP classification.

Dynamical modelling (DM)
Overall, using ECMWF seasonal forecasts of T and P as inputs to the HBV model did not improve performance as compared with the CE procedure (Table 3).Even though the DM forecasts do outperform CE in about half of the cases, on average, their MAE is higher than the CE ones for all forecast dates and rivers.
To understand why better performance was not attained, T and P from the ECMWF seasonal forecasts were compared with observations from the river basins.The results are overall similar for Vindelälven and Ljusnan.A substantial positive bias is evident for P in late winter and early spring (February-April), up to 75 %, in both the 1/1-and the 1/3-forecasts.In the 1/5-forecasts, also the May P is clearly overestimated.In July, a clear negative bias is found on all forecast dates.The T bias is generally small in the period January-May, but a distinct positive bias is found in summer (June-July).Further, the seasonal forecasts become consistently warmer the closer to the spring flood period they are issued.It may be mentioned that a new version of the ECMWF seasonal forecasting system has been released.A quick look on data from the new system, which became available by the time of writing this manuscript, indicated a similar P bias but distinctly improved T -forecasts with only a small bias in the summer.Introduction

Conclusions References
Tables Figures

Back Close
Full
The performance of the SD method is heavily affected by whether the climatic features in the forecasting data were encountered in the training period dataset.If the forecasted conditions are outside the scope in the training period, the SD method has the tendency to produce forecasts that differ drastically from the observations.This can be dealt with by either increasing the length of the training dataset or by analysing the year in question and determining if there were similar years in the training period which would give an indication as to how the method might perform.

Composing a multi-model system
A multi-model forecast approach consists in combining forecasts resulting from different models to reach a more reliable estimate of the forecast probability distribution.This technique has been used since early 1990s for developing seasonal climate fore-Introduction

Conclusions References
Tables Figures

Back Close
Full cast (Tracton and Kalnay, 1993) and has proved to provide more skilful results than a simple model forecast (Hagedorn et al., 2005; among many others).
There are many possible ways of combining or merging multi-model forecasts, ranging from simple rank-based methods to more sophisticated statistical concepts.In light of the limited material available in this study, we restricted ourselves to testing two conceptually straight-forward ways of combining the forecasts: a median approach (Sect.6.1) and a weighted approach (Sect.6.2).The multi-model forecast is composed of both the baseline forecast (CE) and the ones resulting from the four new approaches, including the best performing versions of the AE models (TCI6 and CP3).If any of the new methods could not generate any forecast, it was replaced by CE (e.g. the CP approach was replaced by CE when the selection algorithm could not find any analogue years).

Median multi-model
The motivation for using the median of all forecast methods is that the final result will be less influenced by extreme high or low forecasts, when compared to calculating a mean forecast.As five forecast are available, the median approach amounts to using the third member in the ranked forecast ensemble.
The average RI of the median approach is 3.6 % (Table 6).Interestingly, especially for the 1/1-but also for the 1/5-forecasts the performance in Ljusnan is consistently better than any single forecast (for the 1/3-forecast CE and DM are slightly better).This demonstrates the potential gain of the multi-model approach.Also for the 1/3-forecasts in Ångermanälven the median outperforms all single forecasts.Generally for Vindelälven and Ångermanälven, one of the single forecasts outperforms the median.It may be concluded that the potential improvement from the median multi-model approach is rather limited in size, up to about 10 % compared with CE for single dates and rivers, but also rather stable.Introduction

Conclusions References
Tables Figures

Back Close
Full

Weighted multi-model
This approach consists of applying weights w between 0 and 1 to the different forecasts and then adding them together.The spring flood volume forecasted by the weighted multi-model, SFV FW , is thus defined as where the index f refers to the N different forecast methods available (f = 1, . . ., N where N = 5 in Vindelälven and Ljusnan and N = 4 in Ångermanälven where DMforecasts were not available).
One set of weights are chosen for each river and forecast date.The weighted volume is then calculated for the selected years and rivers, and averaged over these entries.
The selection of weights was made based on the evaluations performed in Table 3.In Ljusnan and Vindelälven, with five forecast methods available, the method with the highest RI was assigned the highest weight (0.33 = 5/15), the method with the second highest RI was assigned the second highest weight (0.27 = 4/15), and so on until the method with the lowest RI and lowest weight (0.07 = 1/15).In Ångermanälven, with four forecasts, the weights ranged between 0.4 (4/10) and 0.1 (1/10).In both cases the weights add up to 1.
The average RI of the weighted multi-model is almost 10 % (Table 6).In four out of the nine cases the multi-model forecast is better than any single forecast, and in four cases it is only slightly outperformed by one single forecast.For the 1/1-forecast in Ångermanälven, however, the combined forecast is worse than all single forecasts.
It should be emphasised that the same data were thus used both to estimate the weights and to assess the performance of the weighted model, as the 10-year period is too short for proper split-sample calibration and validation.Limited testing however indicated good performance of the fixed-weight approach also for independent validation data.Besides using fixed weights it was also tested to estimate optimal weights Introduction

Conclusions References
Tables Figures

Back Close
Full based on historical performance.This however turned out unfeasible in this study due to the limited historical data available and the associated tendency of overfitting to the calibration data.

Concluding remarks
It is clear that the current approach to spring flood forecasting in Sweden, based on the HBV model and a climatological input ensemble (CE), is overall performing on the same level as the new approaches tested.None of the new approaches consistently outperformed the CE method, although improvement was indicated.The largest improvement was found for the 1/1-forecasts with the SD approach, with an error reduction of ∼ 30 %.The largest improvement considering all forecast dates was found for the CP approach, with an error reduction of up to 25 % and with up to 75 % of the forecasts outperforming CE.In total, the TCI-and DM-forecasts outperformed CE in almost half of the cases, but generally the MAE was larger.
The most promising results from the study were obtained by the multi-model approach.Using the median forecast, an improvement by ∼ 4 % was obtained with a small variation over stations and forecast dates.This improvement may sound limited but it must be emphasised that every percent of forecast improvement potentially corresponds to large financial revenues in energy trading activities.By using fixed weights based on historical performance, an even larger improvement of almost 10 % was attained.More advanced ways of combining the forecasts are certainly conceivable, but the value of using transparent and easily communicated approaches should not be underestimated when the target is operational forecasting and its associated end-user interaction.
Finally, these results were obtained in a preliminary feasibility study with limited data and overall basic versions of the used methods.Future studies need to include longer test periods and more stations as well as refined and better tailored versions of the forecast methods.The CP approach would benefit from using more consistent reanal-Introduction

Conclusions References
Tables Figures

Back Close
Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

--
Reduced historical ensemble by analogue years.Two methods for identifying analogue years within the historical years were evaluated.Both are based on analyses of the weather development just before the forecast issue date.(1) Teleconnection indices (TCI): the evolution of different indices representing different climate phenomena.(2) Circulation patterns (CP): frequency of different groups of weather types that describe the large-scale atmospheric state.Meteorological seasonal forecasts as input to the dynamical hydrological model (DM).Temperature and precipitation in ensemble forecasts are converted into HBV model input.-Statistical downscaling of accumulated discharge (SD).Statistical relationships between large-scale circulation variables and accumulated discharge are identified and calibrated for the forecast period.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | • E and 55 to 70 • N with a 1 • × 1 • resolution.There were 11 ensemble members for each variable for the period 2000-2006 and 41 ensemble members for 2007-2010.Figure 1b shows this 1 • × 1 • grid in relation to Sweden.3Methods Three new approaches to seasonal hydrological forecasting are presented and compared to the current climatological ensemble procedure currently applied at SMHI: ana-Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | among many others) -East Atlantic pattern (EA): the positive phase of the EA pattern is associated with above average winter temperatures, below average winter precipitation in southern Scandinavia and above average winter precipitation along the Scandinavian mountains and northern Scandinavia (Comas-Bru and McDermott, 2014).Discussion Paper | Discussion Paper | Discussion Paper | and later on widely developed to downscale GCM output to local climate in e.g.climate change studies(Wetterhall et al., 2006;Yang et al., 2010).The method is normally applied to reliable upper-air data at multi-grid, e.g.sea level pressure and geopotential height, to explain recorded observations of e.g.P and T .By differentiating historical observations into several representative CPs, each CP is supposed to represent specific climate conditions in the study area.The CPs are defined based on either professional knowledge of atmospheric motions (subjective classification) or statistical characteristics derived from the observations (objective clas-Discussion Paper | Discussion Paper | Discussion Paper | are used as local observations.Two measures are considered as representative statistics, describing the difference from average conditions in terms of precipitation probability (O 1 ) and amount (O 2 ) according to Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | A higher value of O 3 indicates a better, more distinct classification.The successful CP classification should thus fulfil several requirements: (1) the classified CPs should be able to meaningfully explain large-scale climate conditions and their induced local weather phenomena; (2) each CP catalogue should be unique and as different from other catalogues as possible.When the fuzzy rules that describe every CP have been optimized, daily CP time series are generated.The frequency of occurrence and persistence of individual CPs are calculated per month for all historical years as well as the year to be forecasted.The two most frequently occurring CPs within a period of 1 up to 6 months prior to the forecast issue date are used as a criterion to select the analogue historical years.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Forecast performance is assessed by MAE F , the mean absolute error of a certain forecast F Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | climate around the Ljusnan, thus the local meteorological characteristics are not well described.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | • × 1 • , was used during 1961-2002 while ERAINTERIM (Dee et al., 2011), with a 0.75 • × 0.75 • resolution, was used during 2003-2010.For the teleconnection studies (Sect.3.2.1)monthly indices of the North Atlantic Oscillation, Scandinavian Pattern and East Atlantic Pattern were collected from the Climate Prediction Center (CPC; http://www.cpc.ncep.noaa.gov/data/teledoc/telecontents.shtml).The atmospheric seasonal forecast data used in this work were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF).The forecasts are from the System 3 that consists of an ocean analysis to estimate the initial state of the ocean, a global coupled ocean-atmosphere general circulation model to calculate Introduction Seasonal averages.These data are the ensemble means of the different predicted fields covering the domain 75 • W to 75 • E and 80 to 20 • N with a 2 • × 2 • resolution.
sensible heat flux, surface latent heat flux, total precipitation, 850 mb temperature, 850 mb specific humidity, 850 mb meridional wind velocity, 850 mb zonal wind velocity, and 850 mb geopotential height.The number of ensemble members per field is 11 for the period 1982-2006 and 41 for the period 2007-2010.

Table 1 .
Basin and station characteristics including performance of the HBV model.