Interactive comment on “ Assessment of methods for seasonal streamflow forecasting in the Upper Indus Basin of Pakistan ” by Stephen P

The authors have assessed three different methods Bayesian Joint Probability (BJP), the Snowmelt Runoff Model (SRM) and a hybrid approach (SRM Ensemble Streamflow Prediction inflow means as additional predictor in BJP approach) for forecasting seasonal streamflow to the two largest dams in the Upper Indus Basin, Pakistan. The authors concluded that BJP approach is simple and it worked well to provide probabilistic seasonal streamflow forecasts.


Introduction
The Asian Development Bank rates water security in Pakistan as 'hazardous' (the lowest of five classes), ranking it 46 th out of 48 countries in the Asia-Pacific region, with only Kiribati and Afghanistan ranked lower (Asian Development Bank, 2016).
Other studies confirm Pakistan's relatively high levels of exploitation of river flows and groundwater, associated water stress and resultant exposure to climate change (Döll et al., 2009;Wada et al., 2011;Schlosser et al., 2014;Kirby et al., 2017).Given the high demands on the main water source, the Indus River, its year to year flow variability has a significant impact on security of supply in the Indus Basin Irrigation System (IBIS) of Pakistan.Better management outcomes could be achieved if a reliable understanding of Kharif (summer, April-September) water availability at the beginning of the season were available.This would improve IBIS water allocation planning, a critical need given the highly seasonal flows (~80% annual flow occurs in the Kharif season), limited storage capacity (30 days of supply) and increasing water demand for agriculture and energy production.Thus we assess methods for providing seasonal streamflow forecasts for the two largest water supply dams, Tarbela (on the Indus) and Mangla (on the Jhelum), in the Upper Indus Basin (UIB) of Pakistan.
Seasonal streamflow forecasts can be a valuable source of information for water resource managers (Chiew et al., 2003;Anghileri et al., 2016), with both statistical and dynamical forecasting approaches developed and implemented internationally (Yuan et al., 2015).Sources of seasonal streamflow predictability come from initial hydrological or antecedent conditions (e.g.water held in storage in a catchment, in the soil, as ground water, in surface stores, or as snow/ice) and also from the skill of seasonal climate forecasts (Bennett et al., 2016;Doblas-Reyes et al., 2013;Li et al., 2009;Shukla and Lettenmaier, 2011;Shukla et al., 2013;van Dijk et al., 2013;Koster et al., 2010;Wood et al., 2015;Yossef et al., 2013).Statistical approaches relate antecedent catchment conditions and/or climate indices to streamflow using techniques such as multiple linear regression (Maurer and Lettenmaier, 2003).Statistical approaches require predictor-predictand records of sufficient length to determine robust relationships, stationarity in the relationships, and rigorous cross-validation to avoid over-fitting or an inflated skill assessment (Robertson and Wang, 2012;Schepen et al., 2012).Dynamical approaches use hydrological models initialised with observed inputs up to the beginning of the forecast season (to account for antecedent conditions), that can be driven either by historical or climate modelled precipitation and temperature forecasts (Yuan et al., 2015;Zheng et al., 2013).
For example, in Ensemble Streamflow Prediction (ESP), hydrological models are driven by each historical season's precipitation and temperature series to produce an ensemble of flow forecasts, with this ensemble providing a distribution of plausible flows for the forecast period (Wood and Lettenmaier, 2008).ESP forecasts can also be used as an input predictor to statistical techniques (Robertson et al., 2013).Dynamical (i.e.climate model) forecasts of precipitation and temperature are often not sufficiently skilful (Sikder et al., 2015), whereas statistically-based forecast methods using robust relationships between climate drivers, antecedent catchment conditions and resultant streamflow can be valuable research and management tools when properly implemented (Plummer et al., 2009;Schepen et al., 2016).Thus in this assessment, we assess three options for their practical feasibility in developing seasonal streamflow forecasting models for the study region: 1.A statistical approach using the Bayesian joint probability (BJP) model with predictors accounting for antecedent basin conditions and climate drivers; 2. An ESP approach using the snowmelt runoff model (SRM); and, 3. A hybrid approach using option (1) with an additional predictor -the mean ESP forecasts from (2).
The study is reported as follows: Section 2 outlines the study area, details of the case studies and data used, and climate influences; Section 3 presents the BJP statistical approach, the SRM-ESP approach, and the verification metrics used to assess forecast skill, bias, reliability and robustness; Section 4 presents the results of the BJP and SRM skills scores and performance diagnostics.Section 5 discusses the performance of the forecast approaches and Section 6 concludes with the main findings and recommendations.

Upper Indus Basin
Pakistan's water supply, crucial for its extensive irrigated agriculture industry, hydropower generation, and industrial and municipal water supply, is predominantly sourced from Indus river flow, with groundwater a secondary although important contributor to most demands (with the exception of hydropower).The glaciated and snow covered sub-basins of the UIB, encompassing glaciated headwater catchments within the northern Hindu-Kush, Karakoram and western Himalayan mountain ranges, dominate water generation within the Indus Basin (Alford et al., 2014).The UIB's tributaries include the Indus at Kharmong: Shigar, Shyok and Astore in the Karakoram Himalaya, the Jhelum, Chenab, Ravi and Sutlej in the western Himalaya, and the Hunza, Gilgit, Kabul, Swat and Chitral in the Hindu Kush mountains (Figure 1).These basins can be classified as having a flow regime that is either glacier-melt dominated (Hunza, Shigar and Shyok) or snow-melt dominated (Jhelum, Kabul, Gilgit, Astore and Swat) (Hasson et al., 2014).The predominant source of flow in the UIB is snowmelt, followed by glacier melt as a secondary source, with 80% of flow occurring during the June-September summer period.
Interannual flow variability is thus controlled by two processes, the amount of winter precipitation (snowfall) and the summer temperatures.Whilst snowmelt is a function of both winter precipitation and summer temperatures, glacier melt is primarily a function of summer temperatures, although glacier melt is also influenced by snow cover (Charles, 2016).
Inflows for two major reservoirs, Tarbela Dam on the Upper Indus and Mangla Dam on the Jhelum River, a major tributary to the Indus system, are investigated (Figure 1).Daily inflow data from 1975 to 2015 was obtained from Pakistan Water and Power Development Authority (WAPDA).The Tarbela Dam on the main stem of the Indus is one of the largest individual storage in the UIB, crucial for hydropower generation and irrigation supply (Ahmed and Sanchez, 2011).Annual inflows to Tarbela constitute 70% melt water, of which snowmelt contributes 44% and glacial melts contribute 26% (Mukhopadhyay and Khan, 2015).The Mangla Dam on the Jhelum River is (since enlargement) a similar size storage to Tarbela and one of the most important resources in Pakistan for electricity generation and water supply for irrigation (Mahmood et al., 2015).For the Jhelum, the area upstream of Mangla is reported as 33,500 km 2 with an elevation range from 300 m to 6285 m and mean of nearly 2,400 m, the relatively low altitude ensures that there is only 0.7 % coverage by glaciers or perennial snow according to GLIMS glacier database as cited by Bogacki and Ismail (2016).In contrast, the Indus upstream of Tarbela is over five times larger (173,345 km 2 ) with higher elevation (to 8,238 m, as reported by Immerzeel et al. ( 2009)) and 11.5% covered by perennial glaciers (Ismail and Bogacki, 2017), such that (as noted above) glacier ice melt is a significant contribution to annual flow.

Climate influences
Climate indices provide information on the state of the atmosphere preceding the forecast season and, potentially, the weather conditions to be experienced during the forecast season ahead.The state of the atmosphere relates to the weather experienced by the study region prior to the forecast season and hence the precipitation received and the resultant snowpack magnitude.As the state of the atmosphere also relates to the trajectory of weather that may be experienced during the forecast season, it additionally may relate to temperature in the forecast season and thus snowmelt and glacier melt extent.A literature review identified that indices related to the North Atlantic Oscillation (NAO) and El Niño Southern Oscillation (ENSO) are the most likely to provide skill for the UIB (Charles, 2016).These both influence the direction of prevailing winds bringing moisture into the region and thus determine the magnitude of precipitation and resultant depth and areal extent of snowpack created in the winter and early spring preceding the Kharif (April-September) high-flow season.
The NAO is a measure of the strength of the pressure gradient between the subtropics and polar regions in the north Atlantic, representing a dominant source of variability in circulation and winds influencing the region (Hurrell, 1995;Bierkens and van Beek, 2009).It has a direct influence on the interannual variability of the westerly winds (westerly disturbances) and their water content traversing Europe, the Mediterranean and the Middle East region into the mountains of the UIB (Yadav et al., 2009a;Syed et al., 2010;Filippi et al., 2014).Indices of the NAO have been related to: UIB station winter precipitation (Archer and Fowler, 2004;Afzal et al., 2013;Filippi et al., 2014); western Indus basin's winter snow cover and station precipitation (Hasson et al., 2014); Pakistan station temperature (del Río et al., 2013); and winter precipitation in northwest India (Kar and Rana, 2014).
ENSO is a dominant pattern of multi-year variability driven by ocean-atmosphere interactions in the tropical Pacific (Wolter and Timlin, 2011), influencing climate globally including the variability of both western disturbances and monsoon processes experienced by the region.The commonly used SOI (Southern Oscillation Index) has been related to: winter Hindu Kush Himalayan region precipitation (Afzal et al., 2013); Indian Summer Monsoon Precipitation (Ashok et al., 2004;Ashok and Saji, 2007); central southwest Asian winter precipitation (Syed et al., 2006); Pakistan station temperature (del Río et al., 2013); and northwest India winter precipitation (Kar and Rana, 2014).Stronger links have been reported between ENSO, western disturbances and interannual winter precipitation variability for recent decades (Yadav et al., 2009a;Yadav et al., 2009b).

BJP forecasting models
The statistical seasonal forecasting model used is the Bayesian joint probability (BJP) approach of Wang et al. (2009).The BJP offers state-of-the-art capabilities for developing seasonal forecast models that optimally utilise information available on antecedent catchment conditions, large-scale climate forcing (through climate indices) and flow forecast scenarios from hydrological models (Robertson et al., 2013;Robertson and Wang, 2012;Schepen et al., 2012;Wang and Robertson, 2011).The BJP models simulate predictor-predictand relationships using conditional multivariate normal distributions, with predictor and predictand data transformed to normal using either a log-sinh (Wang et al., 2012) or Yeo-Johnson (Yeo and Johnson, 2000) transformation.BJP parameters are inferred using Markov Chain Monte Carlo methods (MCMC) to account for parameter uncertainty, which can be due to factors such as short data records.Probabilistic (ensemble) forecasts are produced by generating samples from the estimated conditional multivariate normal distributions.When predictor-predictand relationships are weak, the BJP produces reliable forecasts that approximate climatology.The full technical details of the BJP modelling approach are presented in Wang et al. (2009) and Wang and Robertson (2011).

SRM forecasting model
The Snowmelt Runoff Model (SRM) of Martinec et al. (2008) has been used in several studies in the basin (Butt and Bilal, 2011;Romshoo et al., 2015;Tahir et al., 2011;Bogacki and Ismail, 2016;Ismail and Bogacki, 2017).WAPDA has procured a version of SRM implemented in MS-Excel © , 'ExcelSRM', and for their 2012 case study1 for Jhelum inflow into the Mangla Reservoir, ExcelSRM was calibrated using data for 2003 to 2010 and subsequently validated against inflows for 2000 to 2002 and 2007 and 2011 (Bogacki and Ismail, 2016).In contrast to the probabilistic forecasts produced by the BJP, the SRM is a deterministic model and so produces a single forecast for a given set of inputs.
Given the inadequacy of seasonal meteorological forecasts for the region (Bogacki and Ismail, 2016;Ismail and Bogacki, 2017), an ESP approach is used to forecast a range of possible Kharif season inflows.That is, SRM is run with observed precipitation (P), temperature (T) and snow cover area inputs up to the end of March, and then simulations are produced for the next six-month Kharif-season scenario using the P and T inputs from each year in the available historical record, together with the Modified Depletion Curve approach (Rango and Martinec, 1982) to simulate snow cover depletion as a function of that scenario-year's degree-days series.This approach results in an ensemble of simulated inflows and, as well as assessing the SRM-ESP forecasts themselves, this study has used the mean of the ensemble members for each year as an additional predictor series for input to the BJP (option 3 as outlined in the Introduction).

Verification
BJP forecast model performance is verified using leave-one-out cross-validated results (Wang and Robertson, 2011).That is, to avoid artificially inflating the skill, for each Kharif season the calibration and assessment does not use that season's data for BJP parameter estimation.The cross-validated BJP forecast performance was assessed for 1975-2015 (41 seasons).
As noted, the BJP can also use hydrological model simulated flow as an additional predictor (Robertson et al., 2013) and in this case we have SRM simulations for the Mangla inflow for a subset of the investigation period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015).The SRM is an exception to the leave-one-out cross validation as it is calibrated using all data for the 2003 to 2010 period, with parameters manually tuned "… in order to keep parameters at smooth values and to maintain a reasonable trend in time." 1 The mean flow simulation obtained from driving each year's SRM with all year's available precipitation and temperature are therefore not independent forecasts and so it is not surprising that for 2003 to 2010 period the SRM forecasts can be closer to the observed flows than the median cross-validated BJP estimates.When used as a predictor to the BJP, SRM forecasts are applied with leave-one-out cross validation i.e. for each year in the 2001-2015 period all of the other 14 year's simulations are used to produce an ESP with the resulting mean used as a predictor for that year.Note the BJP is able to extract skill from biased dynamic hydrological model forecasts, as long as the hydrological model simulation bias is systematic and stationary (i.e.not random or with a trend).
Verification assesses the overall skill and the bias, reliability and robustness of the forecasts.This includes assessing whether the bias and reliability of the forecasts varies for different periods of the record (temporal stability) or for different event sizes, e.g.whether there is a limitation in forecasting high-or low-flow seasons.Skill scores, quantifying the skill of the forecasts, allow the direct comparison of the performance of forecasting models that use different sets of predictors.Two common skill scores used here are the root mean squared error (RMSE) (similar to the Nash-Sutcliffe efficiency frequently used in hydrology) that assesses the forecast median and the continuous ranked probability score (CRPS) that assesses the reduction in error of the whole forecast probability distribution (Robertson and Wang, 2013).The skill scores are reported as percentage reductions in error scores of the forecasts relative to the observed historical (climatological) median, for RMSE, and relative to the full distribution of the observed historical (climatological) events, for CRPS.The 'sharpness' of a probabilistic forecast distribution (i.e. a narrower peaked distribution rather than a wide, flat distribution) is also a characteristic relevant to forecast skill (Gneiting et al., 2007).Sharp forecasts with narrow forecast intervals reduce the range of possible outcomes that are anticipated, increasing their usefulness for decision makers (Li et al., 2016).This skill can be quantified, for example, as the percentage reduction in the inter-quartile range (IQR) between the forecast's distribution and the observed historical (climatological) distribution (Crochemore et al., 2017).RMSE.CRPS and IQR skill scores are interpreted as 2 : • 0 is considered to be a forecast with no skill (equivalent skill to predicting using historical averages or historical reference); • less than 5 is considered to be a forecast with very low skill; • 5-15 is considered to be a forecast with low skill; 2 'How are the skill score categories defined?' from http://www.bom.gov.au/water/ssf/faq.shtml• 15-30 is considered to be a forecast with moderate skill; and, • greater than 30 is considered to be a forecast with high skill.
Reliability refers to the statistical similarity between the forecast probabilities and the relative frequencies of events in the observations, which can be verified using probability integral transforms (PITs).The PIT represents the non-exceedance probability of observed streamflow obtained from the CDF of the ensemble forecast.If the forecast ensemble spread is appropriate and free of bias then observations will be contained within the forecast ensemble spread, with reliable forecasts having PIT values that follow a uniform distribution between 0 and 1 (Laio and Tamea, 2007).

Skill scores
BJP models were trialled with combinations of predictors accounting for antecedent flow (flow immediately preceding the forecast season, i.e.March flow for the Kharif forecast) and NAO-or ENSO-based climate indices identified from the literature, as introduced in section 2.2 (Charles, 2016).SRM-ESP scenario-mean forecasts were an additional predictor trialled for Jhelum at Mangla.The best performing predictor combinations used a flow predictor (March flow) together with either the Multivariate ENSO index (MEI; http://www.esrl.noaa.gov/psd/enso/mei/index.html) (Wolter and Timlin, 1998) or the Southern Oscillation signal index (SSI; http://www.cgd.ucar.edu/cas/catalog/climind/soiAnnual.html)(Trenberth, 1984) as a climate predictor.The seasons of the trialled climate predictors (i.e.their time-lag preceding the forecast season) were selected based on their highest linear correlations with flow (not shown).Table 1 presents BJP forecast skills scores using the trialled combinations of antecedent flow and climate predictors for the Kharif season for Jhelum at Mangla.Combinations including the SRM forecasts (ESP mean) as a predictor are included, however because the SRM results are only available for the 15 year period 2001-2015, they are only providing skill during the 2001-2015 period when included as a BJP predictor for the full 41 year period .These results show: • The antecedent predictor (March flow, QMarch) provides greater skill than any of the individual climate predictors used.• Two-predictor models using QMarch and the SRMKharif predictor give poorer skill scores compared to using QMarch alone.• Two-predictor models using QMarch and one climate predictor slightly improve (in most cases) the skill scores compared to using QMarch alone.• Addition of the SRMKharif predictor to the two-predictor models using QMarch and one climate predictor does not improve skill scores.
Table 2 presents the skills scores for the Kharif season forecasts for the Indus at Tarbela, also using the antecedent flow and climate predictors but without SRM forecasts in this case (as SRM was not available for Indus at Tarbela for this study).
Similarly to the results for the Jhelum at Mangla, for the Indus at Tarbela BJP forecasts: • The antecedent predictor (March flow, QMarch) provides greater skill than any of the climate predictors used.• On the whole, a single climate predictor produces low skill compared to that obtained using QMarch, with a notable exception that the MEIMayJun (i.e. the year before) predictor produces skill scores comparable to those obtained using QMarch.
• Two-predictor models using QMarch and one climate predictor improve the skill scores compared to using QMarch alone.
In addition to calibration for the full Kharif season, BJP calibrations were also undertaken for the early Kharif (April-June) and the late Kharif (July-September) using the relevant flow and ENSO-based predictors (e.g. for late Kharif the June flow was used as an antecedent predictor).A comparison of the resulting skill scores are shown in Figure 2 for the BJP models that gave the highest skill gain relative to climatology for the Kharif, early Kharif and late Kharif periods.It is interesting to contrast the performance for the two locations, with Kharif and late Kharif giving similar results across the two locations whereas for early Kharif a marked difference is seen, with high skill for Jhelum at Mangla contrasting the low skill for Indus at Tarbela.
The physical reasons for this contrast would require further investigation, with possible causes discussed in Section 5.

Performance diagnostics
Here we assess the cross-validated performance of forecasts from BJP models using the antecedent and climate predictors (option 1) that were selected on the basis of skill scores and, for Mangla, compare with the SRM-ESP forecasts (option 2).We do not compare results for the BJP models using the SRM-ESP mean as an additional predictor (option 3), as the addition of this predictor was shown to add little or no skill to BJP forecasts (Table 1).
We use PIT plots for verification of the reliability and robustness of the forecast probability distributions, to assess whether there are biases in the forecasts, or whether the forecast probability distributions are too wide or too narrow (Laio and Tamea, 2007).For reliable forecasts, the PIT values should follow a uniform distribution and hence follow a 1:1 line when plotted against a standard uniform variate.For the BJP forecasts for the full 1975-2015 period, both Jhelum at Mangla (Figure 3a) and Indus at Tarbela (Figure 4a) show reliability, with the forecast's PIT values plotting close to the 1:1 lines.Comparison of Jhelum at Mangla BJP and SRM-ESP forecasts, for the shorter 2001-2015 period for which SRM results are available, show a contrast between reliable BJP forecasts (Figure 5a) and biased SRM forecasts, including five values of 0 or 1 indicating that the SRM forecast distribution is too narrow (Figure 6a).
A feature of robust forecasts is their stability across the full period of record and range of flow magnitudes.Figure 3b and  Robustness is also assessed by plotting forecast quantile ranges and observed flows against the forecast median (Figure 3d and Figure 4d) and chronologically (Figure 3e and Figure 4e, for Jhelum at Mangla and Indus at Tarbela respectively).These show that BJP forecasts reasonably account for the range of observed variability for both locations.The relatively less robust SRM-ESP forecasts are shown in Figure 6d and e, again highlighting the overly narrow forecast distribution range.
Overall, the performance shown in these figures highlight the reliability and robustness of the BJP forecasts for the Kharif season for Jhelum at Mangla and Indus at Tarbela.For the Jhelum at Mangla BJP and SRM-ESP forecasts for the shorter 2001-2015 period, the results contrast the reliable and robust BJP against some limitations from the SRM, which will be discussed further in the next section.

Discussion
The SRM forecasts are examples of the commonly applied ESP approach (Shi et al., 2008;Shukla and Lettenmaier, 2011;Wood et al., 2005).As such, Wood and Schaake (2008) note "One strength of the ESP approach is that it accounts for uncertainty in future climate, which in some seasons is the major component of forecast uncertainty, by assuming that historical climate variability is a good estimate of current climate uncertainty.A weakness of the approach, however, is that when the uncertainty of the current ("initial") hydrologic state is a significant component of the overall forecast uncertainty …, the deterministic estimate of the forecast ensemble's initial hydrologic state leads to an overconfident forecast-that is, one having a spread that is narrower than the total forecast uncertainties warrant." This can be seen in the poor verification performance of SRM-ESP shown in Section 4.2, with the SRM-ESP from the 15 year sample unable to account for the observed range of flows, i.e. the ESP range is too narrow.Given that the observed climate of each individual year would, in most cases, be within the range of the ensemble of climate inputs used to produce the ESP, this indicates SRM formulation could be too strongly reliant on antecedent conditions at the beginning of the forecast season.An additional source of bias, as evidenced by the years of poorest performance being outside the years used for parameter estimation, could be over fitting with the model parameters tied too closely to the range of observed predictor-predictand relationships in the 2003-2010 calibration period.
The poorer BJP performance for Indus at Tarbela, as seen in the skill scores relative to Jhelum at Mangla (Figure 2), could be related to the differences in flow generation mechanisms.As the predictors are the source of skill in the statistical BJP approach, examination of correlations between the predictors and flow for the individual months within the Kharif season is insightful.Table 3 shows the (intuitively) expected pattern for Jhelum at Mangla of Q March having a maximum correlation with April flow (0.84) and then maintaining a relatively high correlation until July (0.62) before dropping off for August (0.12).A different process appears to be influencing Indus at Tarbela, as the initial highest QMarch correlation with April flow (0.66) drops immediately to 0.18 for May before oscillating between 0.25 (August) to 0.55 (September) for subsequent Kharif months.Similarly, the climate predictor's correlations with the individual month's flows show more of a gradual reduction for Jhelum at Mangla (high for the first four months), whereas for Indus at Tarbela again an oscillatory relationship is seen.These higher correlations in the late Kharif (relative to the early Kharif) for Indus at Tarbela would relate to the correspondingly higher relative skill scores shown in Figure 2 for late Kharif.It is hypothesised this relates to late-season glacier melt processes that are a significant component of the inflow to Tarbela but not Mangla (Mukhopadhyay and Khan, 2015).
It is also interesting to reflect on the relative performance of the NAO climate predictor, which does not provide any skill for inflow to Mangla (Table 1) but offers comparable skill to several of the ENSO indices trialled for Tarbela (Table 2).This result is also hypothesised to relate to NAO being informative of late season glacier melt, and also corresponds to investigations showing a stronger relationship between ENSO and precipitation and weaker relationship between NAO and precipitation in recent decades (Yadav et al., 2009a;Yadav et al., 2009b) resulting in the prevalence of ENSO as the better predictor of winter snowpack magnitude.

Conclusion
This study has assessed the performance and practical feasibility of three options for producing Kharif (April-September) For an additional comparison, the option 1 BJP approach was also undertaken for the Indus River inflows to the Tarbela Dam.
Overall findings were: • The best performing BJP models for Tarbela and Mangla inflows are consistent in that both used an antecedent flow predictor and a climate predictor representing ENSO.
• There are pragmatic benefits to selecting a BJP model using only antecedent and climate predictors, rather than including SRM mean ESP as an additional predictor even in cases when SRM does provide skill, given that flow and climate predictors are readily available and thus BJP forecasts are easily and quickly produced.The SRM, being a deterministic model, is a much more technically involved and data intensive approach to forecast generation (Bogacki and Ismail, 2016;Ismail and Bogacki, 2017).• Probabilistic forecasts can be misinterpreted if they are unfamiliar to the water management professionals using them to inform decisions (Pagano et al., 2002;Ramos et al., 2013;Rayner et al., 2005;Whateley et al., 2015).

Figure
Figure4bshow a uniform spread of PIT values and hence stability across the full period of record, for Jhelum at Mangla and Indus at Tarbela BJP forecasts, respectively.Similarly, forecast stability across the range of flow magnitudes is verified by the uniformity of PIT values against forecast median, as shown in Figure3cand Figure4cfor Jhelum at Mangla and Indus at Tarbela, respectively.For the Jhelum at Mangla BJP and SRM-ESP comparison, stability across time and flow magnitude are harder to assess given the short 15 year sample size.Figure5b and cshow reasonable stability of the BJP forecasts, although there is a trend over time for this part of the record.The equivalent SRM plots (Figure6b and c) are not as robust, with (as noted previously) five values at 0 or 1 (from the chronological plot: 2001 is at 0, and 2010, 2012, 2014 and 2015 are at 1).
seasonal streamflow forecasts for the Jhelum River inflows to the Mangla Dam in the UIB of Pakistan: option 1, the BJP statistical forecasting technique; option 2, the SRM physically-based model run in ESP mode; and option 3, a hybrid of option 1 with the mean ESP forecasts from option 2 used as an additional predictor for input to the BJP.The option 1 BJP forecast model used antecedent catchment and climatic predictors, with the predictors selected based on BJP skill score performance.The selected predictors represent hydrological conditions immediately preceding the forecast season (i.e., flow of the preceding month -March in this case) and ENSO-based climate indices related to drivers of snow accumulation in the preceding winter.

•
Cross-validated performance of the BJP seasonal forecasts for the 1975 to 2015 Kharif seasons, as shown in the diagnostic and verification statistics presented, highlight that the BJP produces forecasts that are statistically unbiased, robust and reliable.In contrast, the SRM-ESP forecasts show bias particularly for the most recent years outside the SRM calibration period, potentially indicating limitations with the SRM due to lack of cross-validated calibration and resultant over-fitting.Thus SRM-ESP forecasts are overly confident, underestimating the full uncertainty that is captured by the BJP approach.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-428Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 26 July 2017 c Author(s) 2017.CC BY 4.0 License.• High skill was obtained for BJP forecasts of early Kharif flow for Jhelum at Mangla.Moderate skill was obtained for the full and late Kharif season forecasts for both Jhelum at Mangla and Indus at Tarbela.Lower skill was seen for early Kharif for Indus at Tarbela.• BJP forecast models could readily be developed and assessed for other tributaries, e.g. the Chenab and Kabul, subject to availability of flow data.This would allow an overall assessment of UIB flow forecasting for the major contributing basins.

Figure 2 :Figure 3 :
Figure2: BJP cross-validated skill scores, % skill gain relative to climatology, for CRPS (green), RMSE (blue) and IQR (orange) skill scores.Less than 5 is considered to be a forecast with very low skill.Between 5 and 15 is considered low skill.Between 15 and 30 is considered moderate skill, and higher than 30 is considered to be a forecast with high skill.

Figure 4 :
Figure 4: as in Figure 3 for BJP cross-validated forecasts for Indus at Tarbela Kharif for 1975-2015

Figure 5 :Figure 6 :
Figure 5: as in Figure 3 for BJP cross-validated forecasts for Jhelum at Mangla Kharif for SRM period of 2001-2015, except for (e) see 2001-2015 period of Figure 3 (e).
To enable successful transfer of BJP forecast tools to operational use within IBIS management in Pakistan would require guidance for building BJP models and generating forecasts, test cases with example results, face to face training, and on-going support.

Table 3 : Correlation between the flow of the individual months of the Kharif season and the predictors used by BJP
a Year before