Interactive comment on “ Does objective cluster analysis serve as a useful precursor to seasonal precipitation prediction at local scale ? Application to western Ethiopia ” by Ying Zhang et al

The manuscript aims at developing a statistically based seasonal precipitation forecast model for Western Ethiopia. The target area is separated into homogeneous regions by means of a k-means cluster analysis of summer precipitation amounts. Eight regions with similar precipitation variability are defined. For each of them, a linear regression based forecast model is calibrated. Results are compared with a general forecast for the entire region and are found to be superior. In a final step the forecast is downscaled to a high resolution grid, again by means of a liner regression approach. The target of the study is timely, since local precipitation predictions are often required for water management and planning, and the manuscript is well structured and easy to follow. However I have some serious concerns about the calibration and particularly

However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability.This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia.To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure.The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies.The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively.The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.
1 Primer on prediction models and cluster analysis Seasonal precipitation prediction can provide potentially actionable information to guide management of various sectoral activities.For instance, precipitation prediction is often translated into a hydrological forecast, which can be used to optimize reservoir operations, provide early flood or drought warning, inform waterway navigation, etc.As a primary input to soil moisture, precipitation prediction is also essential to agricultural management -farmers can take advantage of anticipated preferable climatic conditions or avoid unnecessary costs under expected undesirable conditions.Two types of models are commonly used for seasonal precipitation prediction: statistical and dynamical.Dynamical models, such as general circulation models (GCMs), include complex physical climate processes, while statistical models are purely data driven, relating observations and hydroclimate variables directly.
While both modeling approaches have produced skillful seasonal predictions for a variety of applications (e.g., Barrett, 1993;Hammer et al., 2000;Shukla et al., 2016), each has noteworthy drawbacks.Dynamical models often require a great amount of time to build and parameterize, whereas statistical models require considerably fewer resources (e.g., Mutai et al., 1998;Gissila et al., 2004;Block and Rajagopalan, 2007;Diro et al., 2008Diro et al., , 2011b;;Block and Goddard, 2012).Dynamical models also suffer from their high sensitivity to initial uncertain conditions, particularly given a long lead time.Consequently, a number of simulations are typically produced, each with unique initial conditions, to provide a range of possible outcomes (e.g., Roeckner et Published by Copernicus Publications on behalf of the European Geosciences Union. al., 1996;Anderson et al., 2007).Furthermore, the outputs from dynamical models often require additional bias correction, typically using statistical methods, to better match observations (e.g., Ines and Hansen, 2006;Block et al., 2009;Teutschbein and Seibert, 2012).Statistical models, on the other hand, are highly dependent on substantial high-quality historical data to capture hydroclimatic patterns and signals, particularly extreme conditions, which is often not available.Additionally, statistical models are often linear by construction and may not well capture non-linear complex interactions and feedbacks.The physical nature of dynamical models, however, allows for prediction under non-stationary conditions and also when insufficient historical data are available, whereas statistical models, by construction, typically rely on stationary relationships (Schepen et al., 2012).
The spatial extent selected for statistical seasonal prediction is critical.It is not uncommon to simply assume homogeneity in precipitation across an entire study region; however, this limits addressing potential spatial variability.While this may be suitable for very broad regional planning, it is often ineffectual for operational and local-level decisions, particularly for locations with high spatial variability.This prompts the need for delineation of subregional-scale homogeneous regions, often defined through cluster analysis.Defining these homogeneous regions, however, is a nontrivial process.There are a variety of methods to delineate homogeneous regions, including comparing annual cycles (e.g., unimodal and bimodal distributions in precipitation) between stations (or grid cells), comparing station correlations with regional averages, applying empirical orthogonal functions (EOFs), various clustering techniques, and other methods of increasing complexity (e.g., Parthasarathy et al., 1993;Mason, 1998;Landman and Mason, 1999;Gissila et al., 2004;Diro et al., 2008Diro et al., , 2011b;;Singh et al., 2012).In addition, delineation of the subregion size is also important to consider.Smaller-sized homogeneous subregions do not necessarily lead to improved predictions, as the noise at overly small scales can dominate any real signals representing spatial coherency of precipitation.For additional discussion regarding defining homogeneous subregions and cluster analysis, the reader is referred to Zhang et al. (2016) and Badr et al. (2015).

Application to western Ethiopia and objectives of the study
Precipitation in western Ethiopia peaks in the summer with approximately 70 % of annual total precipitation falling during the main raining season -also known as the Kiremt season, spanning from June to September (JJAS).On average, the seasonal total precipitation in the study region is approximately 760 mm; however, in the northwest, precipitation can exceed 1200 mm (Fig. 1a).Along with the high spatial variability in this mountainous region, the temporal variability is also remarkable with spatial-average seasonal total precipitation ranging from 650 mm in dry years up to 900 mm in wet years (Fig. 1b).These highly variable spatial and temporal precipitation patterns have made skillful seasonal predictions challenging, particularly at local scales (e.g., Gissila et al., 2004;Block and Rajagopalan, 2007).The climate mechanisms affecting JJAS precipitation patterns in western Ethiopia are quite complex.Sea surface temperatures (SSTs) in the equatorial Pacific Ocean representing the well-known El Niño-Southern Oscillation (ENSO) phenomenon are considered a primary indicator of JJAS precipitation variability, with El Niño/La Niña often associated with deficit/excess of JJAS precipitation across the study region (e.g., NMSA, 1996;Camberlin, 1997;Bekele, 1997;Segele and Lamb, 2005;Diro et al., 2011a;Elagib and Elhag, 2011).Additionally, there is evidence of direct moisture transport from the Gulf of Guinea (equatorial Atlantic Ocean), the Indian Ocean, and the Mediterranean Sea, affecting Ethiopia's summertime precipitation (Viste and Sorteberg, 2013a, b).These moisture fluxes are often related to pressure patterns across the continent.For instance, the St. Helena High over the southern Atlantic Ocean or a high pressure over the Gulf of Guinea, coupled with a simultaneous low pressure over the Indian Ocean or a monsoon trough over the Arabian Peninsula, all bring intensified westerlies and southwesterlies that transport moist air across the Congo Basin to the western Ethiopian highlands (Segele et al., 2009;Williams et al., 2011).Similarly, the southwest Asian monsoon in the Indian Ocean, which has a strong positive relationship with concurrent JJAS precipitation in western Ethiopia, is associated with the Mascarene High over the southern Indian Ocean and a low pressure system near Bombay.During this monsoon season, the southeast trade winds in the Southern Hemisphere are channeled by the east African highlands while crossing the Equator and become a southwest monsoon flow.They are further diverted by the Turkana Channel, enhancing convergence with the westerlies/southwesterlies above the western Ethiopian highlands and bringing moisture to the region (Kinuthia, 1992;Nicholson, 1996Nicholson, , 2014;;Camberlin, 1997;Slingo et al., 2005;Segele et al., 2009).In addition, the effect of other hydroclimate variables, such as Indian Ocean SST, local and regional atmospheric pressure systems (e.g., Azores High) also have notable influence on Ethiopia's precipitation variability (e.g., Kassahun, 1987;Tadesse, 1994;NMSA, 1996;Shanko and Camberlin, 1998;Goddard and Graham, 1999;Latif et al., 1999;Black et al., 2003;Segele and Lamb, 2005).Consequently, these largescale climate variables may serve as potential predictors in statistical seasonal precipitation prediction models.
Ethiopia is vulnerable to fluctuations in precipitation given its reliance on rain-fed agriculture and limited water resources infrastructure.The majority of agriculture and infrastructure are in western Ethiopia, where water resources are relatively rich compared to other parts of the country (Awulachew et al., 2007).Operational precipitation predic- tions in Ethiopia have been issued by its National Meteorological Agency (NMA) since 1987 using an analog methodology (i.e., locating a similar climate condition in the past -an analog -to predict future conditions); however, this approach has produced only marginally skillful outcomes (Korecha and Sorteberg, 2013).For NMA's prediction, the country is divided into eight homogeneous regions for which NMA produces independent predictions.Similarly, others have also addressed seasonal prediction in Ethiopia contingent on both temporal and spatial precipitation patterns.Gissila et al. (2004) divide Ethiopia into four regions conditioned on the seasonal cycle and interannual variability coherence prior to prediction, while Diro et al. (2009) apply a similar approach but with dynamic cluster boundaries, allowing for different delineations for each rainy season.Segele et al. (2015) consider statistical precipitation predictions across Ethiopia as a whole, as well as for northeastern Ethiopia and at two Ethiopian cities. Block and Rajagopalan (2007) predict the average summertime (JJAS) precipitation over the upper Blue Nile Basin -a region they claim is homogenous at interannual timescales.Korecha and Barnston (2007) select an all-Ethiopia average precipitation index to characterize predictability broadly, with minimal attention to operationallevel predictions.All of these studies focus on predicting regional average precipitation based on subjective clustering methods applying a limited number of stations or coarsely gridded data; no local predictions at a finer spatial scale are explored.
This study moves forward by exploring local-level seasonal precipitation prediction through the use of regionallevel predictions, based on previous cluster analyses over western Ethiopia (Zhang et al., 2016).The advantages of defining homogeneous regions for seasonal prediction at operational (small) scales will be demonstrated by comparing approaches with and without undertaking a cluster analy-sis a priori.The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances compared to previous studies.

Modeling high-resolution seasonal prediction
To evaluate high-resolution seasonal precipitation with versus without cluster analysis a priori, statistical models are developed and further compared with bias-corrected dynamical model predictions.Four scenarios are evaluated based on two criteria: (1) clustered vs. non-clustered and (2) direct vs. indirect.In the clustered case, predictions are produced for each homogeneous region (cluster) given a unique set of predictors.In the non-clustered case, the entire study region is considered as one cluster, and thus only one set of predictors is utilized for predictions.For the direct case, precipitation is predicted directly at the local level (grid scale); for the indirect case, the average precipitation within each homogeneous region is predicted first (as an intermediary) and then regressed to local-level (grid-scale) predictions.Combinations of the two criteria form four scenarios -clustered direct (C-D), non-clustered direct (NC-D), clustered indirect (C-I), and non-clustered indirect (NC-I) predictions.

Cluster analysis
Using a k-means clustering technique, western Ethiopiathe major agricultural region of the country -is divided into eight homogeneous regions (Fig. 2), conditioned on the interannual variability of total precipitation in JJAS, the same variable that is to be predicted.Precipitation is based on a 0.1 • × 0.1 • gridded precipitation dataset from NMA (Dinku et al., 2014), consisting of 7320 grid cells across 1983-2011 (29 years).This product has been verified against station data and has been deemed representative of observed precipitation in western Ethiopia (Dinku et al., 2014).Given the highresolution gridded dataset, k-means clustering is performed for a range of predefined numbers of clusters; the optimal number of clusters is identified by various evaluation metrics based on the within-cluster sum of square errors (WSS), including an elbow method with difference in WSS, gap statistics with difference in difference, and qualitative analysis on post-visualization of clusters.During the clustering process, each grid cell is assigned and reassigned to clusters until the WSS is minimized.This does not require any subjective delineation or manual delineation of boundaries between clustered stations or grid cells; instead, an automated and objective delineation is performed.The mean time series of each cluster illustrates high intracorrelation within the cluster and low intercorrelation between any two clusters, indicating strong coherency of the clustering results.For a detailed analysis including a complete correlation table and unique patterns for each cluster-level time series associated with large climate variables, readers are referred to Zhang et al. (2016).

Statistical modeling approach
Many studies have investigated statistical models for seasonal climate prediction.These studies vary by preclassification of predictor or predictand regime, predictor selection process, and statistical methods.For example, Hertig and Jacobeit ( 2011) investigate SST regimes as potential predictors for subsequent precipitation and temperature in the Mediterranean region.Through techniques including multiple applications of principal component analysis (PCA), 17 stationary SST regimes were identified.Gerlitz et al. (2016) apply a kmeans cluster analysis to grid cells identified with significant correlations in the predictor field in order to facilitate predictor selection.Suárez-Moreno and Rodríguez-Fonseca (2015) investigate stationarity based on a long time series using a 21-year moving correlation window.The statistical prediction models are then applied to each stationary period, respectively, and the entire period for comparison.Despite diverse methods in seasonal prediction, multiple linear regression (MLR) is favored by many as a statistical modeling approach given its well-developed theory, simple model structure, efficient processing, and often skillful outcomes (e.g., Omondi et al., 2013;Camberlin and Philippon, 2002;Diro et al., 2008).As mentioned, only a few studies have focused on seasonal precipitation prediction in Ethiopia (Gissila et al., 2004;Block and Rajagopalan, 2007;Korecha and Barnston, 2007;Diro et al., 2008Diro et al., , 2011b;;Segele et al., 2015), and almost all of them include the applications of MLR.This study also applies MLR to predict seasonal precipitation, yet differentiates from other studies by applying predictions to predefined homogeneous regions and further translating to locallevel predictions.
Season-ahead (March-May) or month-ahead (May) largescale climate variables that are physically relevant in potentially modulating moisture transport to the basin (or cluster) are selected as potential predictors.Four climate variables are selected here for further evaluation based on outcomes of the aforementioned prediction studies: SST, sea-level pressure (SLP), geopotential height (GH) at 500 mb, and surface air temperature (SAT).All climate variables are from the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis dataset (Kalnay et al., 1996) at a 2.5 • × 2.5 • grid scale.Those potential predictors are first transformed through PCA (Jolliffe, 2002).PCA is a common approach in climate modeling to reduce the dimensionality of predictors and remove multi-collinearity, while simultaneously extracting the most dominant signals from the potential predictors, typically reflected in the first few principal components (PCs).Since PCA is independent of the predictand, retaining the first few PCs as predictors, in lieu of the original variables, also helps to reduce artificial prediction skill.
Subsequently, a certain number of PCs are used as the direct inputs into a MLR model, otherwise known as the principal component regression (PCR).PCR is performed in a "drop-one-year" cross-validation mode to reduce overfitting effects and therefore avoid overestimation of prediction skill.This requires reconstructing the principal components for the dropped year and then multiplying the coefficient estimates with each reconstructed PC, respectively, in order to obtain the final predicted value for the dropped year (e.g., Block and Rajagopalan, 2009;Wilks, 2011).A detailed methodology is provided below.
To avoid overfitting, the entire process including predictor selection and statistical modeling is processed using crossvalidation.To start, drop-one-year precipitation observations for JJAS averaged across the region and each cluster are spa-  tially correlated independently with each global climate variable.As a result, there are total of 1044 global correlation maps given the 29-year time series, eight clusters plus one non-cluster, and four climate variables.Hence, a program to automatically select highly correlated and justifiable regions as predictors is developed.The following steps describe the subsequent statistical modeling process (Fig. 3): 1. Grid cells within each justifiable region (e.g., equatorial Pacific; Fig. 4) with correlation above the 99 % significance level are identified (Fig. 5).For regions containing grid cells with both positive and negative correlations, the number of the identified grid cells in each sign is counted.If a greater number of grid cells is associated with significant positive correlation, for example, only grid cells with positive correlations are kept for the following steps, and vice versa.
2. The top 10 % of the identified grid cells with the highest correlation in each region are then selected in order to boost the potential model skill.
3. For each region, data of the selected grid cells within the region are spatially averaged (defined as "prepredictors").
4. Prepredictors are standardized, combined, and transformed through PCA for each cluster or non-cluster and each dropped-year analysis separately.
5a.The top PCs from the PCA with a total of 95% variance explained are used as predictors in PCR.For the direct case, PCR is used to directly predict the grid-level precipitation; for the indirect case, PCR is used to predict the intermediate cluster-level precipitation.
5b.For the indirect case only, cluster-level predictions are regressed to the grid level.Note that the downscaling of cluster-level predictions to grid-level predictions is also cross-validated to avoid overfitting.
For the four scenarios, the model structures are quite similar but have subtle differences which could lead to evidently different outcomes (Table 1).Under the NC-D (Eq.1a, b) and C-D scenarios (Eq.2a, b), the time series of JJAS seasonal total precipitation in each grid cell (i.e., at local level) is used as the direct predictand (Y i,t ); however, the NC-D and C-D scenarios differ, as the former uses the same predictors (X t ) across all the grid cells, while the latter uses different predictors according to the cluster to which the grid cell is assigned (X j,t ).In the indirect case, the cluster-level time series of JJAS seasonal total precipitation (the time series av-  Table 1.Equations of linear regression panel models under four scenarios.

Non-clustered Clustered
Y -predictand of JJAS seasonal total precipitation; X -predictors of top PCs; ε, ν -error terms; Ypredicted values of JJAS seasonal total precipitation; α, β, η, γ -estimated coefficients; i -grid-cell index; t -time (year) index; j -cluster index; i ∈ j -grid cell i that belongs to cluster j ; m -mean over entire study region that is equivalently the only cluster.
eraged over all grid cells that belong to a given cluster, Y m,t or Y j,t ) is first predicted (Eqs.3a, b and 4a, b).The predicted intermediate product ( Y m,t or Y j,t ) is then used as the only regressor in the second step to estimate the grid-level precipitation ( Y i,t or Y i j,t for every j ; Eqs. 3c, d and 4c, d).
Again, for the C-I scenario, predictors in the first step are unique for each of the eight clusters and grid cells within that cluster (X j,t ), while predictors are identical for all grid cells (X t ) under the NC-I scenario.A 95 % confidence interval of the cross-validated predictions is also constructed conditioned on model errors.Q-Q plots are evaluated to verify normally distributed residuals (results not included).
The names are kept the same as on the International Research Institute for Climate and Society (IRI) data repository website.
The NMME predictions for each of the 10 models are bias corrected by applying probability mapping (e.g., Block et al., 2009;Teutschbein and Seibert, 2012;Chen et al., 2013) under cross-validation, subject to the observational dataset from NMA (Fig. 6).This is performed on a grid-cell-by-grid-cell basis on standardized data (the NMME dataset is reshaped to 0.1 • × 0.1 • grid cells to match the observational NMA dataset grid-cell size).The basic steps include the following: 1. Fit gamma distributions to drop-one-year time series from each observed and NMME grid cell; for NMME, this is performed on an individual model basis using all ensemble members available.(Goodness-of-fit tests indicate gamma distributions are appropriate; results not shown.) 2. Translate gamma distributions into cumulative distribution functions (CDFs).
3. For any given dynamical model prediction at the gridcell level, a corrected prediction value is attained by mapping from the modeled CDF to the observed CDF and applying the inverse gamma distribution.This is repeated for all grid cells, all NMME models, and all dropped years.
After correction, the gamma CDFs of predictions and observations approximately match (Fig. 6a).Additionally, each ensemble still retains its variability over time, though the overall ensemble mean is shifted to closely match observation (Fig. 6b).

Performance metrics
Pearson correlations are used to measure the standardized covariance between observations and predictions.Ranked probability skill scores (RPSSs; Wilks, 2011) are also evaluated to determine categorical skill based on probabilistic predictions.Here, the data are split into three equal terciles representing below-normal, near-normal, and above-normal conditions.A perfect prediction yields an RPSS of 100 %, and a prediction with less skill than climatology (long-term averages) yields an RPSS of less than zero.Median RPSS values from all 29 years are reported.

Statistical model predictions
Correlations between cluster-level model predictions and observations range from −0.16 to 0.51, with Cluster 5 having the highest correlation and Cluster 6 the lowest (Table 2).In approximately one-fifth of the 29 years, the observation falls outside the prediction envelope (Fig. 7), indicating model overfitting and an inability of the predictors to capture precipitation variability.For RPSS, three out of eight clusters indicate superior prediction skill over climatology (Table 2).Improvement in terms of RPSS over the non-cluster scenario is evident for Clusters 1, 3, and 7.Although Cluster 5, in agri-culturally rich central-northwestern Ethiopia (Fig. 2), shows a slightly deteriorated RPSS relative to the non-cluster scenario, it still performs outstandingly with the highest correlation and a positive RPSS value of 0.51 and 10 %, respectively.Clusters 2, 4, 6, and 8 show deteriorated RPSS compared to the non-cluster scenario, although those clusters are mainly regions outside Ethiopia and southern Ethiopia (Fig. 2) where water resources and agricultural activities are considerably less (Fig. 1).At the grid scale, depending on the case (direct or indirect), and for different clusters, correlations between predictions and observations can favor the clustered case or the non-clustered case (Fig. 8).In general, the indirect model provides a smoother pattern of correlations, with grid cells showing a negative correlation in the direct case now improved to near or above zero (Fig. 8).For example, Cluster 5 under the indirect case illustrates a more consistent positive correlation within the cluster.Some parts of the study region reach a correlation over 0.6, such as central-northwestern Ethiopia (Cluster 5), which is consistent with the region of high cluster-level prediction skill.The percentage of grid cells with correlations passing the 95 % significance test is Hydrol.Earth Syst.Sci., 22, 143-157, 2018 www.hydrol-earth-syst-sci.net/22/143/2018/ the highest for the NC-D case (Table 3); however, some locations demonstrate the lowest skills among all four scenarios.Similar findings are evident by evaluating the RPSS, except for Cluster 8; instead of improving with increased RPSS in the indirect case, the grid-scale predictions deteriorate given poor cluster-level prediction (for the C-I case).However, the percentage of grid cells with positive RPSS values overall for the C-I case is still the second highest after the NC-I case (Table 3), indicating the indirect cases are superior in terms of the number of grid cells with improved skill compared to using climatology, particularly for grid cells associated with skillful intermediate cluster-level predictions.The predictions are most skillful for the same region of centralnorthwestern Ethiopia (Cluster 5; Fig. 9) with 87 % of its grid cells showing positive RPSS and a spatial-average RPSS value of 15 % under the C-I scenario (Table 4).

Dynamical model predictions
The RPSS values based on the prediction ensembles of each dynamical model improve remarkably after bias correction.The median RPSS values over all the grid cells are now close to zero (Fig. 10) with two models, NASA-GMAO and NCEP-CFSv2, showing the highest RPSS value (−2.3 and −1.1 %, respectively; Table 3).These two dynamical models also exhibit generally higher grid-level correlations over the study region (averaging 0.24 for both models; Table 3 and Fig. 11), as compared with other NMME models.The two best-performing dynamical models after bias correction show advantage over statistical models, as assessed by correlation and RPSS metrics; however, all other dynamical models are inferior to the statistical models under the NC-D and C-I scenarios, particularly given the percent of grid cells with significant correlation and positive RPSS metrics (Table 3).Within a certain cluster, statistical models may perform better than all dynamical models.For example, for Cluster 5, all statistical models show higher average RPSS values than those of all dynamical models (Table 4).The percentage of grid cells with significant correlation reaches 61 % for the statistical model under the NC-D scenario, compared to the highest value of 47 % among all the dynamical models.Similarly, the percentage with positive RPSS achieves 87 % un-  der the C-I scenario as opposed to 66 % for dynamical models.Note that the dynamical models also produce raw predictions in a lower spatial resolution (1 • × 1 • ) than the statistical models (0.1 • × 0.1 • ) and require bias correction to guarantee comparable skills.

Conclusions and discussion
This study demonstrates the potential for applying seasonahead large-scale climate information to predict highresolution precipitation using a statistical modeling approach.Skillful and credible predictions are produced for some regions in western Ethiopia, particularly under a clustered indirect statistical approach.At the regional scale, the approach shows promise for northwestern Ethiopia (Clus-ters 1, 3, 5, and 7), particularly compared to current NMA operational forecasts, which are only moderately more skillful than climatology (Korecha and Sorteberg, 2013).The regional average RPSS in this study under the clustered case ranges from 10 to 33 % for northwestern Ethiopia, as opposed to values under 6 % for NMA operational forecast (Korecha and Sorteberg, 2013).The approach adopted here also advances on previous studies (Gissila et al., 2004;Block and Rajagopalan, 2007;Korecha and Barnston, 2007;Diro et al., 2011b;Segele et al., 2015) by first applying an objective cluster analysis and then conditionally constructing high-resolution predictions.A unique set of predictors is applied to each cluster, which contributes to superior prediction performance at cluster levels in northwestern Ethiopia, as compared with predictions from the non-clustered approach.Grid-level prediction under the clustered indirect case also reduces the effect of overfitting relative to the direct case and improves negative RPSS values to near or above zero; that said, the non-clustered direct case also illustrates higher correlation and RPSS values on average.
A total of 2 out of 10 NMME dynamical models, NASA-GMAO and NCEP-CFSv2, demonstrate overall superior performance to the statistical models; however, for certain regions such as Cluster 5, the performance of statistical models under the clustered indirect and non-clustered direct cases is still superior.It is also worth noting that the statistical model predictions are at a 100 times finer spatial resolution than the dynamical models, providing additional advantages at the local scale, when skillful.Nevertheless, improvements in dynamical models continue and their application to seasonal precipitation prediction is likely to grow (e.g., Palmer et al., 2004;Saha et al., 2006;Lim et al., 2009).
Relatively poor prediction performance is evident in some locations such as southwestern Ethiopia and regions outside Ethiopia, where the hydroclimatic processes that produce precipitation might be driven by local factors or other regional climate patterns rather than large-scale climate variables identified in this study.A previous study (Zhang et al., 2016) has shown that the influence of ENSO on JJAS precipitation in western Ethiopia decreases generally from north to south, and is likely one of the why skills are relatively low southwestern Ethiopia.Cluster 5 was also identified with the strongest connection to equatorial Pacific SST (Zhang et al., 2016), which is consistent with the highest skill found in this study.Other regions with low prediction skill show relatively strong connections to SST in neighboring oceanic regions.However, connections with those climate patterns appear to be less robust than with ENSO, making the predictions in their associated regions less skillful.This is also consistent with the findings from other studies that even though all three oceans (Indian, Atlantic, and Pacific) affect the JJAS precipitation in western Ethiopia, the Pacific Ocean still plays the greatest role (Segele et al., 2009;Omondi et al., 2013).The southwest Asian monsoon over the Indian Ocean may also be critical in determining the precipitation, given that the clusters with better prediction skills lie along the pathway of the monsoon.Based on the global concurrent correlation maps between JJAS precipitation and SLP for each cluster, Clusters 5 and 7 -the two clusters with the best skillsare the only ones that are strongly and negatively correlated with SLP near Bombay, and in the meantime strongly and positively correlated with the SLP at the eastern equatorial Pacific Ocean.The former indicates that a strong southwest Asian monsoon is associated with higher JJAS precipitation amount, and vice versa.The latter indicates that a high surface pressure over the eastern equatorial Pacific Ocean often accompanied by cold SST and a raining pattern -a La Niña phenomenon -also brings higher JJAS precipitation to western Ethiopia, and vice versa.Cluster 2 -one of the worst predicted clusters -shows moderately strong negative correlation with SLP near Bombay; however, it is also correlated strongly and negatively with SLP in the southern Indian Ocean (a high pressure system that drives the monsoon to-ward the low pressure system near Bombay), indicating that high JJAS precipitation in Cluster 2 is not necessarily associated with a strong southwest Asian monsoon.Moreover, its correlation with SLP over the equatorial Pacific Ocean is nonsignificant.Considering, in general, El Niño suppresses the monsoon and La Niña increases it (Kumar et al., 2006), strong correlations with both ENSO and the monsoon in the correct direction, such as for Clusters 5 and 7, indicate a double insurance over their association with the southwest Asian monsoon.Therefore, clusters which are more affected by the southwest Asian monsoon over the Indian Ocean, particularly coupled with the influence of ENSO, are likely to be more promising in their prediction skills.
Additional prediction features also warrant future attention, including longer prediction lead times and evaluation of other relevant characteristics (e.g., intraseasonal dry spells, seasonal onset or cessation).As observational datasets continue to grow, data-driven cluster analyses and statistical modeling approaches may be expected to improve.Careful analysis of possible significant trends in the data is also warranted; a region with a relatively high correlation may be selected solely based on trends in predictors and observations.For shorter time series, such as the data used in this study, trend analysis may not be reliable; detrending can also reduce evidence of large-scale decadal climate signals.Improving predictive capabilities may not be a complete panacea, but it can continue to be an important part of a decision maker's portfolio as they cope with hydroclimatic variability and its inherent risks.Special issue statement.This article is part of the special issue "Sub-seasonal to seasonal hydrological forecasting".It does not belong to a conference.

Figure 1 .
Figure 1.Spatial and temporal variability of June-September seasonal total precipitation in western Ethiopia: (a) spatial pattern of temporalaverage and (b) spatial-average time series.

Figure 2 .
Figure 2. Regionalization map of eight homogeneous regions marked by different colors, with country boundaries and river profiles.The figure is based on Zhang et al. (2016).

Figure 3 .
Figure 3. Flow chart of data processing for predictors into the statistical model.Numbers framed by dashed lines correspond to the procedures listed in the context.Note: pre.-precipitation, t-s -time series, avg.-average.

Figure 4 .
Figure 4. Justifiable climate regions globally for selecting predictors.(a) SLP and GH at 500 mb with regions including EP, ES, LO, AH, SH, MH, and AM.For SAT, only LO is included.(b) SST with regions including EP, NI, SI, and AT.Note: EP -equatorial Pacific region, ES -Tahiti island for ENSO measurement, LO -local region, AH -Azores High, SH -St.Helena High, MH -Mascarene High, AMsouthwest Asian monsoon, NI -North Indian Ocean, SI -South Indian Ocean, AT -equatorial/South Atlantic Ocean.

Figure 5 .
Figure 5. Correlation map between mean JJAS seasonal precipitation time series in Cluster 5 and global SST under cross-validation, with correlations lower than the 99 % significance level masked out (one-tail test).

Figure 6 .
Figure 6.(a) Bias correction of North American Multi-Model Ensemble (NMME) predictions using probability mapping; (b) precipitation time series from NMME (colored lines) before and after correction, compared to observations (black line).Examples are shown for randomly selected six grid cells.

Figure 7 .
Figure 7. Cluster-level predictions and observations under the C-I and NC-I scenarios, with drop-one-year cross-validation.The 95 % envelope shows the 95 % confidence interval constructed using model errors.

Figure 8 .
Figure 8. Pearson correlations between grid-level observations and predictions under the four scenarios, with the clustering boundary delineated roughly in black.

Figure 9 .
Figure 9. Grid-level RPSS (%) under the four scenarios using climate variables as predictors, with the clustering boundary delineated roughly in black.

Figure 10 .
Figure10.Box plots of grid-level RPSS (%) for 10 dynamical models from NMME (a) before and (b) after bias correction, labeled with the same number as listed in the context.Note that for each box plot, the line inside the box is the median, the box edges represent the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers (outliers not shown).

Figure 11 .
Figure 11.Pearson correlations between grid-level observations and ensemble mean of bias-corrected predictions for 10 dynamical models from NMME, labeled with the same number as listed in the context.

Table 3 .
Grid-level Pearson correlation and RPSS statistics.

Table 4 .
Grid-level Pearson correlation and RPSS statistics for grid cells within Cluster 5.