A statistically based seasonal precipitation forecast model with 1 automatic predictor selection and its application to Central and 2 South Asia 3 4

Abstract. The study presents a statistically based seasonal precipitation forecast model, which automatically identifies suitable predictors from globally gridded sea surface temperature (SST) and climate variables by means of an extensive data-mining procedure and explicitly avoids the utilization of typical large-scale climate indices. This leads to an enhanced flexibility of the model and enables its automatic calibration for any target area without any prior assumption concerning adequate predictor variables. Potential predictor variables are derived by means of a cell-wise correlation analysis of precipitation anomalies with gridded global climate variables under consideration of varying lead times. Significantly correlated grid cells are subsequently aggregated to predictor regions by means of a variability-based cluster analysis. Finally, for every month and lead time, an individual random-forest-based forecast model is constructed, by means of the preliminary generated predictor variables. Monthly predictions are aggregated to running 3-month periods in order to generate a seasonal precipitation forecast. The model is applied and evaluated for selected target regions in central and south Asia. Particularly for winter and spring in westerly-dominated central Asia, correlation coefficients between forecasted and observed precipitation reach values up to 0.48, although the variability of precipitation rates is strongly underestimated. Likewise, for the monsoonal precipitation amounts in the south Asian target area, correlations of up to 0.5 were detected. The skill of the model for the dry winter season over south Asia is found to be low. A sensitivity analysis with well-known climate indices, such as the El Nino– Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO) and the East Atlantic (EA) pattern, reveals the major large-scale controlling mechanisms of the seasonal precipitation climate for each target area. For the central Asian target areas, both ENSO and NAO are identified as important controlling factors for precipitation totals during moist winter and spring seasons. Drought conditions are found to be triggered by a cold ENSO phase in combination with a positive state of NAO in northern central Asia, and by cold ENSO conditions in combination with a negative NAO phase in southern central Asia. For the monsoonal summer precipitation amounts over southern Asia, the model suggests a distinct negative response to El Nino events.

Abstract.The study presents a statistically based seasonal precipitation forecast model, which automatically identifies suitable predictors from globally gridded sea surface temperature (SST) and climate variables by means of an extensive data-mining procedure and explicitly avoids the utilization of typical large-scale climate indices.This leads to an enhanced flexibility of the model and enables its automatic calibration for any target area without any prior assumption concerning adequate predictor variables.Potential predictor variables are derived by means of a cell-wise correlation analysis of precipitation anomalies with gridded global climate variables under consideration of varying lead times.Significantly correlated grid cells are subsequently aggregated to predictor regions by means of a variability-based cluster analysis.Finally, for every month and lead time, an individual randomforest-based forecast model is constructed, by means of the preliminary generated predictor variables.Monthly predictions are aggregated to running 3-month periods in order to generate a seasonal precipitation forecast.
The model is applied and evaluated for selected target regions in central and south Asia.Particularly for winter and spring in westerly-dominated central Asia, correlation coefficients between forecasted and observed precipitation reach values up to 0.48, although the variability of precipitation rates is strongly underestimated.Likewise, for the monsoonal precipitation amounts in the south Asian target area, correlations of up to 0.5 were detected.The skill of the model for the dry winter season over south Asia is found to be low.
A sensitivity analysis with well-known climate indices, such as the El Niño-Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO) and the East Atlantic (EA) pattern, reveals the major large-scale controlling mechanisms of the seasonal precipitation climate for each target area.For the central Asian target areas, both ENSO and NAO are identified as important controlling factors for precipitation totals during moist winter and spring seasons.Drought conditions are found to be triggered by a cold ENSO phase in combination with a positive state of NAO in northern central Asia, and by cold ENSO conditions in combination with a negative NAO phase in southern central Asia.For the monsoonal summer precipitation amounts over southern Asia, the model suggests a distinct negative response to El Niño events.

Introduction
Seasonal precipitation prediction is a crucial task in the field of applied climatology, particularly due to the manifold ecological, economic and social consequences of abnormal weather conditions, such as droughts and flood events.Especially in regions characterized by a large interannual precipitation variability, a seasonal forecast of hydroclimatological variables is required by governmental and nongovernmental stakeholders in order to develop and implement adequate adaption strategies, e.g., for water resource management and flood protection (Chiew et al., 2003).
In general, precipitation is a result of complex and interacting atmospheric phenomena at different spatial and temporal scales and is highly variable in space and time.Thus, its precise prediction more than several days ahead is illusive.However, regional climate conditions are actively influenced by large-scale atmospheric patterns, which are (1) occasionally persistent and (2) influenced by boundary conditions, such Published by Copernicus Publications on behalf of the European Geosciences Union.
L. Gerlitz et al.: Statistically based seasonal precipitation forecast model as sea surface temperatures, land cover and soil moisture and by external factors, e.g., variations of the solar radiation and volcanic eruptions (Palmer and Anderson, 1994;Smith et al., 2012).The fact that the boundary conditions are often characterized by a low-frequency variability leads to a degree of predictability of medium-range climate conditions in many regions of the world.
Operational seasonal forecasts are usually based on dynamical atmosphere ocean general circulation models (AOGCMs).These process-based models enable the prediction of large-scale climate conditions at various temporal scales (Saha et al., 2014;Smith et al., 2012).Based on the fundamental fluid dynamic equations these models are designed to simulate large-scale characteristics of the climate system in a physically consistent manner.With regard to exponentially increasing computing demands, the equations are numerically solved on a coarse regular grid.Small-scale processes, such as convective precipitation or the turbulent transport of energy and motion are only indirectly considered by means of empirically based parameterizations (Smith et al., 2012).In order to utilize AOGCMs for seasonal climate forecasts, the models are forced with real-time initial and boundary conditions.Especially tropical sea surface temperatures, but also snow-covered areas and soil moisture have been identified as important influencing factors for the global circulation (Brands et al., 2012;Douville and Chauvin, 2000;van den Hurk et al., 2010;Orsolini et al., 2013).Best results of process-based seasonal climate forecasts are usually found in the tropics, where large-scale wind fields and associated moisture fluxes are highly influenced by sea surface temperature variations.The skill for the temperate climate zones is mostly lower (Kumar et al., 2013).In general, dynamical climate models are prone to biases due to uncertainties in the initial conditions and are particularly reliable when large model ensembles are available (Eden et al., 2015;Suárez-Moreno and Rodríguez-Fonseca, 2015).Due to their high computing requirements, dynamical seasonal forecasts are reserved to a few research centers and are not suitable for application in hydrometeorological and environmental offices, particularly in developing and transition countries.
As an efficient alternative, statistical forecast models are widely applied in order to derive suitable input data for climate impact investigations.Based on the assumption that seasonal climate anomalies are triggered by variations of nearby or remote atmospheric, oceanic or terrestrial conditions, these models attempt to find robust statistical relationships between observed climate anomalies and the state of adequate predictor variables during the previous months.Since near-surface temperature and precipitation are the most decisive variables for the hydrological budget and exhibit the strongest impact on climate sensitive environments, these variables are frequently used as predictants.
Particularly, the state of the El Niño-Southern Oscillation (ENSO) is known to influence the large-scale precipitation patterns almost everywhere on the globe (Dai and Wigley, 2000;Mason and Goddard, 2001;Stone et al., 1996).The precipitation variability in the tropical regions is directly determined by the ENSO due to its impact on the tropical Walker circulation.During El Niño events, positive sea surface temperature (SST) anomalies occur over the eastern tropical Pacific as a result of weakened easterly trade winds.A common consequence is the occurrence of drought periods in southeast Asia, especially over Indonesia, and the simultaneous presence of long-lasting precipitation events over the arid regions of the western slopes of the South American Andes (Julian and Chervin, 1978;Wang, 2002).However, several studies demonstrated a statistically significant correlation of El Niño indices (usually derived from SST observations in the El Niño core regions or from associated pressure gradients between Darwin and Tahiti) with seasonal precipitation time series in other parts of the tropics and also in temperate climate zones.For example, various studies detected a robust statistical relationship between Australian monsoonal precipitation and the ENSO state during previous months (Cai et al., 2011;Ummenhofer et al., 2009).A significant influence of El Niño events was also found for monsoonal precipitation amounts in eastern and southern Africa (Liebmann et al., 2014;Ratnam et al., 2014) and the Sahel region (Parhi et al., 2015).For the south Asian monsoon a negative response to El Niño events has been frequently perceived (Krishnaswamy et al., 2014;Lau and Wu, 2001;Surendran et al., 2015).For the semiarid regions of central Asia and for the Mediterranean region a positive relationship of winter and spring precipitation to El Niño events during previous autumn was found, e.g., by Barlow et al. (2002), Hoell et al. (2013), Roghani et al. (2015), and Syed et al. (2006).Moreover, Fraedrich (1994) and Wu and Lin (2012) detected a statistically significant influence on extratropical circulation anomalies such as the position of large-scale Rossby waves and the associated North Atlantic Oscillation (NAO).This subsequently leads to a certain impact of El Niño events on the European winter climate, although correlations are in general less robust compared with tropical regions.Other tropical SST modes frequently used in seasonal forecasts include the Indian Ocean dipole (IOD), the Atlantic Multidecadal Oscillation (AMO) and the Pacific Decadal Oscillation (PDA), which have a significant predictive skill for their adjacent coastal regions (Eden et al., 2015).
Numerous studies additionally used customized SST indices as predictor variables for seasonal precipitation forecasts.For example, Schepen et al. (2011) give a comprehensive overview of oceanic and atmospheric climate indices with predictive potential for seasonal rainfall amounts in Australia.They illustrate that oceanic indices from the Pacific region comprise a high forecast skill, particularly for autumn and winter precipitation totals.Hartmann et al. (2016) tested the predictive skill of mean SSTs from various ocean basins surrounding the Asian continent for the precipitation variability in the arid Tarim Basin in northwestern China.Hertig and Jacobeit (2010) investigated the predictive skill of EOF-derived SST patterns of the northern Atlantic in order to forecast winter precipitation amounts in the Mediterranean.Seibert et al. (2016) recently demonstrated that customized SST indices from the Indian and southern Atlantic oceans improve the quality of statistical seasonal forecasts for the Limpopo Basin in southern Africa.Suárez-Moreno and Rodríguez-Fonseca (2015) showed that particularly for coastal regions, adjacent sea surface temperatures can significantly improve the seasonal forecast of precipitation.
Fewer studies utilized large-scale atmospheric pressure modes for seasonal climate predictions.Wu et al. (2009) reported that the winter state of the NAO (defined as the pressure gradient between the Iceland low-pressure and the Azores high-pressure cell) influences the SST pattern of the northern Atlantic during the spring season and affects the intensity of the subsequent east Asian summer monsoon via cross Eurasian teleconnections.Hasson et al. (2014) found a statistically significant influence of the NAO on winter precipitation amounts in the Indus Basin.Likewise, Hartmann et al. (2016) tested the predictive skill of pressure patterns over Europe and Asia (such as the NAO or the Siberian high index) on precipitation anomalies in the Tarim Basin.
Local land cover characteristics are also frequently applied in statistical seasonal forecast models.For example, Cohen and Entekhabi (1999) and Cohen and Barlow (2005) showed that the snow cover over Eurasia during autumn and spring alters the large-scale atmospheric circulation over the Northern Hemisphere with wide implications on precipitation patterns during subsequent months.Brands et al. (2012) reported a statistically significant relationship between late autumn snow cover over Eurasia and winter precipitation over Europe.Tian and Fan (2015) argued that the state of the NAO and the associated precipitation patterns over Europe are influenced by both Atlantic SSTs and snow cover rates over Eurasia.Some studies indicate a negative response of the south Asian monsoon to higher snow cover rates over Eurasia, most likely due to a delayed surface heating of the Asian continent (Wu and Qian, 2003;Zhang et al., 2004).Recently, some studies also included local soil moisture or previous rainfall in statistical forecasting models in order to capture water recycling due to autochthonous weather conditions and persistent circulation characteristics (Eden et al., 2015;van den Hurk et al., 2010).
As shown, most statistical forecast applications utilize either well-known climate indices or expert-knowledge-based customized indices from SSTs or land cover characteristics.Customized indices are frequently included, since typical climate indices do not cover regional-scale anomalies of SST or pressure patterns which might be important predictor variables for seasonal climate forecasts in certain regions.However, these customized indices are usually calibrated with regard to specific target areas and thus are not transferable to any other regions.Hence, state-of-the-art seasonal climate forecast models are either based on a fixed number of climate indices (and thus might not consider important predic-tor variables) or are highly site specific and barely transferable to other regions.Recently, some advances towards an automatic predictor selection were made by Suárez-Moreno and Rodríguez-Fonseca (2015), who used gridded SST fields as potential predictors in order to automatically identify SST patterns, which are relevant for the seasonal precipitation forecast in selected target areas.
With the aim of developing an operational seasonal forecast model, which is easily transferable to any region in the world, we present a generic data-mining approach which automatically selects potential predictors from gridded SST observations and large-scale atmospheric circulation patterns derived from reanalysis data.Subsequently, the approach generates robust statistical relationships with posterior precipitation anomalies for user-selected target regions.The statistical package R (R Development Core Team, 2008) as well as the scripting environment of the free and open-source GIS system SAGA (Conrad et al., 2015) are utilized.The precipitation forecast model is based on a cell-wise correlation analysis of various gridded variables with regional precipitation estimates, which identifies grid cells with potential predictive skill for a specific target area with different time lags.Grid cells, which significantly correlate with precipitation anomalies during subsequent months, are aggregated to predictor regions by means of an automatic cluster analysis for every variable and time lag.Thus, for every target area, specific predictor variables are automatically derived.The cluster regions are afterwards utilized as potential predictors in a nonparametric and nonlinear random-forest-based modeling approach.Based on 4-fold split sample test, the model performance for the selected target area is evaluated before an operational forecast is generated based on real-time predictor fields.
In the following section, we provide a detailed overview about the utilized data sets and the main model components, including predictor selection, model calibration and evaluation.Subsequently, we provide some applications of the model for selected target areas in central and south Asia.In order to make the individual modeling steps more comprehensible, we already provide some major interim results for one target area in northern central Asia (Fig. 5) when explaining the methods in the next section.
2 Methods and data

Modeling structure
The major objective of the presented model is to derive suitable predictor variables from global oceanic and atmospheric fields and to develop robust statistical relationships which enable a seasonal precipitation forecast for user-selectable target regions.The underlying data sets as well as the major model components are summarized in Fig. 1.In order to analyze the precipitation variability in selected target ar- eas, the model is based on the CRU TS 2.0 precipitation data set, which provides monthly precipitation estimates for the 20th century on a global grid with a resolution of 0.5 • (lat × long) (Harris et al., 2014;New et al., 1999).The data set is based on a dense network of observations for the period from 1961 to 1990, which were used for the regionalization of monthly mean precipitation amounts, and a compilation of station records with longer available time series, which were used for the calculation of anomalies and were subsequently spatially interpolated based on inverse distances.New et al. (1999) showed that this approach is suitable for the resolution of 0.5 • since it combines a climatic baseline, which is highly influenced by the underlying topography with simple interpolated anomalies, which are mainly driven by large-scale weather conditions.Areal mean monthly precipitation sums for the selected target region are extracted from the CRU data set.Due to a temporally varying number of stations used for the interpolation of gridded precipitation estimates, the data may incorporate inhomogeneities in some regions.Thus, a standardized normal homogeneity test (SNHT) for absolute annual values (Wijngaard et al., 2003) is conducted, which identifies abrupt changes of the annual precipitation sums.The results serve as background information for the interpretation of the model results.
Since monthly time series of precipitation are usually positively skewed, which might not compromise the assumptions of the subsequent correlation analysis, the actual val-ues are converted into the standardized precipitation index (SPI) (Guttman, 1998;McKee et al., 1993) for every single month of the year.Therefore, the precipitation distribution for each month is fitted to a gamma distribution with suitable shape and scale parameters.The exceedance probability of observed precipitation amounts is then converted into z values of the normal distribution.The SPI values, which are normally distributed by definition, are subsequently cell-wise correlated with gridded global SST and climate data with lead times ranging from 1 to 6 months.For every variable and lead time, grid cells which significantly correlate with the mean monthly SPI time series are identified.These grid cells are subsequently aggregated to predictor regions with similar variability by means of a hill-climbing-based k-means cluster analysis.For every large-scale variable and time lag, the areal mean anomalies of those cluster regions are considered as potential predictor variables for a random-forestbased precipitation forecast model.All data sets (predictants and predictor variables) are automatically processed for the period from 1948 to 2014.In order to find robust predictor variables for monthly precipitation amounts and to exclude incidental correlations, the data set is randomly partitioned into two subsets.One is utilized for the cell-wise correlation analysis, the second one is employed for the subsequent calibration of a random-forest-based forecast model.Since precipitation usually shows a rather random temporal variability at a monthly timescale, results of the monthly precipitation forecast are, in general, unreliable.Thus, modeling results are aggregated to running 3-month precipitation totals.

Predictor selection
As briefly reviewed in the introductory section, seasonal precipitation anomalies in many regions of the world can be statistically forecasted by means of large-scale atmospheric and oceanic indices or under consideration of customized parameters.With the aim of automatically deriving adequate predictor variables for monthly precipitation anomalies from large-scale atmospheric and oceanic conditions, an extensive correlation and data-mining procedure is conducted by the presented seasonal forecast model.A brief summary of global gridded variables which are used for the identification of potential predictor variables is given in Table 1.
In order to reveal the influence of nearby or remote SST anomalies on precipitation characteristics, we make use of the NOAA Extended Global Sea Surface Temperature ERSST V3b (Smith et al., 2008;Smith and Reynolds, 2003), which is available at a resolution of 2 • × 2 • for the period from 1854 onwards.The data set is based on in situ sea surface temperature observations only, which are regionalized by means of statistical methods, considering both, low-and high-frequency oceanic modes.With the aim of avoiding statistical artefacts resulting from the variability of the sea ice extent in polar oceans, we restricted the analysis of SST patterns to the geographical region between 65 • N and 65 • S.
Further, we utilize variables representing the state of the large-scale atmospheric circulation from the NCAR-NCEP reanalysis (Kalnay et al., 1996).The reanalysis, which is published by the National Center for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR), is a near-real-time gridded data set which combines atmospheric observations with climate modeling results by means of an assimilation system for the period from 1948 onwards.As potential atmospheric predictors, we used monthly aggregated values of sea level pressure (SLP), the 500 hPa geopotential height (GPH500) as well as the geopotential thickness between the 500 and 200 hPa pressure level (GPH500-200).In order to investigate the land surface characteristics and their subsequent effects, we additionally utilize monthly aggregated global grids of near-surface temperature (TEMP), antecedent precipitation amounts (PREC) and snow water equivalent (SWE) from the NCAR-NCEP reanalysis as potential predictor variables.While the intrinsic pressure-related variables are provided at a spatial resolution of 2.5 • × 2.5 • (lat × long), the diagnostic land surface variables are available at a resolution of approximately 1.875 • × 1.904 • .All predictor fields are cell-wise normalized for every month, respectively.Since the utilized largescale predictor variables are updated regularly and freely downloadable, they are suitable for the development of an operational seasonal precipitation forecast system.
We assume that typical atmospheric and oceanic indices are determined by large-scale pressure patterns or SST modes and thus are inherently included in those global gridded data sets.Likewise, additional predictor variables which might be specific for a particular target region (e.g., SSTs at adjacent coasts, regional snow cover rates or enhanced water availability due to high precipitation amounts during previous months) are expected to be covered by the predictor fields and will be identified as relevant predictors by means of the following correlation and data-mining procedure.
Primarily, based on the first random sample, a Pearson correlation analysis of the monthly SPI values is conducted for each gridded large-scale variable and each grid cell.The correlation analysis is separately executed for every month of the year and for lead times of 1 to 6 months.Thus, the identification of relevant predictor variables and regions is specific for every month and lead time.Particularly for temperaturerelated predictor variables, the time series might include statistically significant trends, due to anthropogenic greenhouse gas emissions, which frequently exceed the magnitude of natural variability.However, there is evidence that seasonal precipitation anomalies in specific target regions are in fact highly influenced by SST anomalies of nearby or remote oceans, but do not show a distinct response to global warming during recent decades (Hoerling et al., 2010).Thus, the time series of potential predictor grid cells are detrended prior the correlation analysis.For every variable, each grid cell which correlates significantly (α = 0.1) with the SPI time series is subsequently labeled as potentially predictive for the monthly precipitation forecast.This comparably low level of statistical significance is deliberately chosen in order to detect second-order correlations and conditional statistical relationships.Overall, the correlation analysis generates a data set of 504 correlation grids, each of them for a specific predictor variable, month of the year and time lag.As an example, Fig. 2 (Figs.S1 and S2 in the Supplement) shows the results of the correlation analysis for the standardized precipitation values of northern central Asia with global gridded SSTs for March and September with a lag time of 2 months.
During March (representing the wet season in northern central Asia) monthly precipitation shows a clear positive response to January SST variations in the El Niño core region (a result which has been frequently reported for central Asia) and to SST anomalies in the Arabian Sea and the Bay of Bengal.Further, a positive correlation with SST anomalies in the North Atlantic has been detected.During September (representing rather dry climate conditions) a positive response to SST variations in the Indian Ocean is evident; however, the spatial distribution of potential predictive SST regions is rather scattered, indicating less robust statistical relationships.
In a subsequent step each of the correlation grids (which usually contain a large number of potentially predictive grid cells) is aggregated to a distinct number of correlation regions, by means of a SAGA-GIS-based hill-climbing kmeans cluster analysis (Hartigan and Wong, 1979).For every specific month, time lag and predictor variable, the complete normalized time series of all potentially predictive grid cells is considered.The iterative and unsupervised classification technique firstly randomly allocates every grid cell to one of k clusters.The error sum of squares is calculated as the sum of Euclidian distances of all associated grid cells from the cluster centroid and displays the quality of the cluster estimation.Every grid cell is subsequently reallocated to the nearest cluster, and cluster centroids and error terms are recalculated.This procedure is iteratively conducted until the error sum of squares converges to its minimum value.Basically, the clustering algorithm minimizes the error sum of squares within the cluster groups and maximizes the error sum of squares among them.This leads to definition of regions with similar temporal variability during the calibration period and thus identifies important large-scale patterns of the considered predictor variable with high predictive potential for the seasonal precipitation forecast.
As default, the number of clusters for every correlation grid is set to 12, which has been found to adequately identify typical large-scale oceanic and atmospheric features (see, for example, Fig. 2c and d).An excessive number of clusters might result in a disjunction of predictor regions, which reduces the predictive skill.On the contrary, an insufficient number of clusters will lead to an aggregation of large regions which might still be characterized by a large inhomogeneity and thus are not suitable for the derivation of potential predictor variables.As shown in Fig. 2, the El Niño core regions in January (orange, blue, yellow and black clusters in Fig. 2c) are identified as important regions for the forecast of monthly precipitation amounts in March for northern central Asia; for instance, the prolonged 1999-2001 winter and spring drought in central Asia is associated with negative anomalies of the ENSO-related predictor variables.In general, dry periods usually coincide with La Niña events, characterized by negative SST anomalies.The January SST of the Arabian Sea and the Bay of Bengal is identified as an independent predic-tor variable for the precipitation amounts in March.For the precipitation variability in September, the majority of predictive SST clusters is located in the Indian Ocean.
The areal mean time series for every cluster are eventually used as potential predictors in the seasonal forecast model.For all seven gridded variables the cluster analysis with k = 12 clusters is conducted resulting in an overall 84 potential predictors for every month and lead time.

Forecast model calibration
For every month of the year and every lead time, one separate statistical forecast model is established based on the potential predictor variables derived from the correlation and cluster analysis.In order to avoid overfitting and to develop a robust regression relationship, the model calibration is based on the second random sample and thus is independent from the predictor selection procedure.Some of the potential predictor variables are highly correlated due to their association to the same phenomenon, e.g., ENSO is manifested in various SST regions and significantly influences the largescale pressure and precipitation patterns in many regions of the world.Additionally, the distribution of potential predictor variables is unknown, e.g., precipitation or snow water equivalent are most likely extremely skewed and not normally distributed.Thus, a reliable forecasting approach requires a nonparametric statistical technique, without any assumption concerning the distribution and statistical independence of predictor variables.We make use of a random-forest-based approach (Breiman, 2001), a widely utilized data-mining technique, which stands out due to its flexibility concerning the characteristics of predictant and predictor variables and due to its ability to detect nonlinear and conditional statistical relationships.Basically, random forest models represent an advancement of regression tree algorithms (Breiman et al., 1984) which automatically classify large data sets by means of adequate predictor variables in order to identify statistical structures in the predictor space, which are highly associated with a response variable (Gerlitz, 2014;Zorita et al., 1995).
Classification is conducted by means of an iterative procedure.In every processing step, one predictor variable and one split value are identified, which classify the learning sample into two subgroups, characterized by a maximal homogeneity (i.e., a minimum variance) of the predictant variable.However, since the recursive regression tree approach tends to considerably overfit the predictor-predictant relationships and does not only classify important structures within the feature space but also the inherent noise of the predictant variable, the predictive skill of single-regression trees is frequently insufficient.Therefore, random forest applications consider an ensemble of various trees, which are based on a subset of the complete data set, respectively.By means of this bagging approach, a large number of trees is constructed.Prediction values are eventually calculated as the mean of predictions from all single trees.The bagging approach and the ensemble composition of the final random forest model avoid overfitting and additionally provide an internal error and confidence estimation (Chen et al., 2012).
The specific forecast models for every month and lead time are constructed based on random forests with 500 realizations.Regression trees are recursively constructed until the final leaves include three observations or less.For the determination of each splitting criterion, a randomly selected bagging sample with two-thirds of the entire learning sample is utilized.
For the predictant variable, the absolute amount of monthly precipitation is used.This allows the subsequent additive aggregation of the monthly forecast values to seasonal precipitation amounts and the evaluation of the model at different temporal scales.Figure 3 shows an example of the results of the monthly precipitation forecast with varying lead times for the northern central Asian target area for an independent period from 1996 to 2010.The remaining time series has been utilized for the predictor selection and the model calibration.
Values are converted to the monthly standardized precipitation index based on observations from the entire model calibration period.Obviously, the variability of precipitation amounts is highly underestimated by the random-forestbased precipitation forecast models, which is a typical feature of regression-based statistical models, particularly if the predictant variable is characterized by a large, nonpredictable noise.Furthermore, the correlation of forecasted and observed precipitation is low with values distinctly below 0.2 for most months and lead times.The rather poor results at the monthly scale certainly reflect the nonpredictable noise of monthly precipitation amounts and thus can lead to the assumption that modeling results should not be evaluated based on discrete monthly values due to the high-frequency variability of precipitation events.This is confirmed by the aggregation of observations and modeling results to 3-month running totals, which leads to a significant increase of correlation and variance.Figure 4 shows the entire SPI time series for running 3-month total precipitation amounts and the corresponding model results.(In order to generate a statistically independent forecast for the entire period, a 4-fold split sample test has been conducted; see Sect.2.4 for details.)Although the variability of precipitation amounts remains underestimated, the smoothed model results better capture the explicit features in terms of dry and moist periods of the observations.Taking into consideration the entire time series of 3-monthly precipitation amounts, the correlation between observed and forecasted values increases to r > 0.5 for a lag time of 1 month.Correlations rapidly decrease with higher lead times, however, even for a lead time of 6 months a certain skill is detected (r = 0.13).
With this in mind, we define two composite forecast periods with a length of 3 months, respectively.The F [1 : 3] m forecast model is defined as the sum of random forest model results based on predictor variables from the month m with lead times of 1, 2 and 3 months.The F [4 : 6] m forecast is equally based on predictor variables from month m, but involves the random forest models with lead times of 4, 5 and 6 months.

Model evaluation
Since the skill of the automatic forecast model is likely to vary depending on the target area and the associated precip-itation regimes during different seasons, an evaluation of the automatic seasonal forecast model performance is necessary in order to assess the reliability of the forecast and to interpret the results.Based on a 4-fold split sample test the deterministic forecasts of 3-month running totals are automatically evaluated.Therefore, the entire time series from 1948 to 2014 is split into four subperiods of equal length.The statistical forecast model is then applied four times, always taking one subperiod as an independent sample for the evaluation.
The remaining three subperiods are combined and split into two parts of equal length, which are utilized for the predictor selection and the model calibration, respectively.Eventually the independent predictions are compounded to one time series, comprising forecast values for the entire period.We abstained from the implementation of a full cross-validation procedure due to the high computational demands of the predictor selection routine.For each of the running 3-month periods, traditional performance indicators such as correlation, bias and root mean square error (RMSE) are computed, which enables the assessment of the model performance for various seasons.In order to achieve a maximal comparability of different target areas, bias and RMSE are specified as the percentage of the long-term precipitation totals for each 3-month period, respectively.Moreover, since stakeholders often require robust predictions of anomalous periods, the ability of the forecast model to forecast drought and moisture conditions is evaluated by means of receiver operating characteristics (ROCs) for each 3-month period and areas under the curve (AUC) are provided.Therefore, the running 3-month precipitation totals are converted to the associated standardized precipitation indices, based on observations of the entire model calibration period.The deterministic SPI forecast is then converted into a probabilistic prediction by means of a simple residualbased approach.Assuming that SPI residuals are normally distributed for each 3-month period, respectively, we estimate the standard deviation of residuals for each of the 3month periods, which is subsequently utilized to transform the deterministic forecast into a normalized probability distribution.ROC curves are then constructed for SPI threshold values of −0.5, representing moderate drought, and +0.5, indicating wet conditions.For various probability thresholds, positive hit rates (defined as the number of correctly identified droughts divided by the overall number of drought events) are plotted against the false negative rate (defined as the coefficient of the number of false alarms and the number of non-drought conditions).ROC curves for moisture conditions are equivalently constructed.Eventually, the area under the curve is interpreted as a performance measure of the seasonal forecast model.AUC values near 1 indicate a perfect predictive skill considering the forecast of droughts or moist periods, values of 0.5 or less indicate no predictive skill at all.

Model application to central and south Asia
With regard to an increasing demand of climatological and hydrological forecasts in this vulnerable region, we applied the presented model to three target regions covering different climatic settings in central and south Asia (see Fig. 4).The northern central Asian target area covers Uzbekistan, Kyrgyzstan and parts of Kazakhstan and comprises the majority of the Syr Darya catchment.The southern central Asian target region covers wide parts of Iran, Afghanistan, Turkmenistan, Tajikistan and Pakistan and encompasses the Amu Darya river system.As presented by the mean 850 hPa wind field of the NCAR reanalysis (Fig. 5), both regions are mainly controlled by extratropical westerly circulation patterns (with contributions from south during winter and from high latitudes during summer) and receive a precipitation maximum during the winter and spring seasons.Due to the location in continental central Asia, both regions are characterized by a high precipitation variability with monthly coefficients of variation up to 0.5.In the high elevations of the central Asian mountain ranges, precipitation during the moist season mainly falls as snow and is released during the warm and dry summer season (Barlow and Tippett, 2008;Dixon and Wilby, 2015;Schär et al., 2004).Thus, winter and spring precipitation amounts in the mountainous areas provide a vast share of the central Asian river flow during the vegetation period and form the basis for the irrigation-dependent agriculture of the riparian countries, which are characterized by semiarid to arid climate conditions throughout the year.
The northern Indian domain covers the entire Himalayan range and the catchment of the Ganges River.During winter, the region is under influence of westerly winds and receives a certain amount of precipitation due to the passage of westerly disturbances; however, the maximum of precipitation is associated with the Indian summer monsoon, which transports moist air masses from the Arabian Sea and the Bay of Bengal into the target area.Although, it is well documented that particularly for central Asia the number of stations utilized for the generation of gridded precipitation data is highly variable in time (Unger-Shayesteh et al., 2013), the standard normal homogeneity test does not detect any statistically significant shifts (a = 0.05; see red line in Fig. 5) of the areal mean annual precipitation sums during the considered period in any of the target areas.
The model application to the selected target regions with different climatic characteristics enables the identification of important predictor variables and the analysis of the model performance for the varying pluviometric regimes of the central and south Asian domain.In the following section, we briefly introduce the large-scale atmospheric processes which can lead to a spatial and seasonal differentiation of precipitation amounts in this vast target domain and present some influencing factors which have been frequently linked to the interannual precipitation variability.Subsequently, we discuss the modeling results with regard to major large-scale atmospheric forcing mechanisms and provide a sensitivity analysis which uncovers important influencing factors on the precipitation variability.

Pluviometric regimes and precipitation variability over central and south Asia
In general, the climate of central and south Asia is influenced by two major pluviometric regimes which are related to westerly and monsoonal circulation systems.During the boreal cold season, the entire region is influenced by westerly circulation patterns and precipitation is mainly associated with midlatitude disturbances originating over the Atlantic Ocean and the Mediterranean (Bohner, 2006;Bothe et al., 2011;Gerlitz et al., 2015;Maussion et al., 2014).Since the track of westerly disturbances is mainly determined by the position of the 200 hPa westerly jet stream at the polar frontal zone, a seasonal cycle of precipitation is distinctly defined.Particularly, the western parts of the Himalayas receive a considerable amount of winter precipitation associated with the uplift of westerly air masses, which reaches up to 60 % of the annual precipitation total (Bohner, 2006;Gerlitz et al., 2015;Wulf et al., 2010).During spring, the zone of westerly precipitation migrates towards north, reaches the Hindu Kush region and the Pamir in March and continues to the Tien Shan region in May.Mariotti (2007) showed that during winter season, a northward current over the Arabian countries transports tropical air masses into central Asia, which represents an important moisture source for the westerly air masses.While the continental central Asian countries remain under influence of extratropical westerly air masses throughout the year, the tropical monsoon circulation is established over south Asia during the summer season (Bohner, 2006;Bookhagen and Burbank, 2006;Gerlitz et al., 2015).Due to a declining strength of the monsoonal moisture fluxes towards west, a clear gradient of precipitation totals from east to west has been detected (Bohner, 2006;Wulf et al., 2010).Investigations of the interannual variability of precipitation rates over central and south Asia have frequently been conducted.Most studies (Li and Yanai, 1996;Peings and Douville, 2009;Prodhomme et al., 2014) showed evidence that the intensity of the Indian summer monsoon is associated with the magnitude of pressure gradients between the Indian Ocean and the Asian continent, which has been linked to the extent of the snow cover over the Asian mainland and the SST of the Indian Ocean (Wu and Qian, 2003).Moreover, many studies highlight the importance of the Southern Oscillation for the intensity of monsoonal precipitation.Studies by Pokhrel et al. (2012) and Sigdel and Ikeda (2013) indicated that El Niño events can lead to reduced moisture fluxes into south Asia.Ashok et al. (2001) further identified the Indian Ocean dipole as an important predictor for the Indian summer monsoon.Some studies illustrated that the correlation of the Southern Oscillation index (SOI) and the Indian summer monsoon precipitation is nonstationary and weakened during recent decades (Kumar et al., 1999;Wang and He, 2012).However, Yim et al. (2013) detected a recovery of the negative ENSO-monsoon relationship during the 1990s.Chang et al. (2001) suggested that the breakdown of robust relationships is due to changes in the North Atlantic climate.Rajeevan et al. (2006) detected a statistically significant correlation of western Europe winter temperatures and subsequent monsoonal precipitation amounts.
In contrast, for the variability of winter and spring precipitation (associated with westerly weather patterns over central and south Asia) a positive relationship with the ENSO has been observed.Severe droughts have been linked to the El Niño cold phase (La Niña) (Barlow et al., 2002(Barlow et al., , 2015;;Hoell et al., 2013).Roghani et al. (2015) and Shirvani and Landman (2015) found statistically significant correlations of the SOI during summer and autumn with precipitation amounts over Iran in subsequent winter.Likewise, a significant positive correlation of the ENSO state with winter precipitation amounts over the southern Himalayan slopes has been detected (Dimri, 2013;Yadav et al., 2010).Mariotti (2007) showed that the moisture fluxes originating over the Arabian Sea are enhanced during the ENSO warm phase due to the strengthening of the southwesterly current over the Arabian countries.Beside tropical SST modes, the impact of northern Atlantic climatic conditions on the winter climate of central Asia have been frequently investigated.Bothe et al. (2011) demonstrated that drought and moist winter seasons over central Asia are dominated by different wave patterns over the Eurasian sector and particularly mention the NAO and the East Atlantic pattern (EA) (which represent the first two modes of SLP variability in the North Atlantic domain) as important covariates.Schiemann et al. (2008) reported that an anomalous location or a decreasing strength of the westerly jet stream result in drought conditions over parts of central Asia due to modified tracks and intensities of westerly disturbances.Dimri (2013) found that a distinct southward shift of the westerly jet stream is associated with wet winter conditions over the Himalayas.Syed et al. (2006Syed et al. ( , 2010) ) indicated that positive winter precipitation anomalies over Afghanistan, Pakistan and Tajikistan are usually associated with El Niño events combined with a positive state of the NAO.Negative correlations between the NAO index and observed precipitation anomalies were found for Kyrgyzstan and northern Uzbekistan.Investigations by Bastos et al. (2016) indicated that both NAO and EA simultaneously control the winter moisture fluxes into northern central Asia.Maximum fluxes were found during negative NAO conditions, coupled with a positive EA index.Yin et al. (2014) further showed that the positive phase of the East Atlantic/West Russia and the Polar/Eurasian patterns can lead to enhanced moisture fluxes into central Asia.
Most recently, Hartmann et al. (2016) suggested that in addition to well-known atmospheric modes, the sea surface temperatures of the main moisture sources might influence the precipitation climate of the Tarim Basin.

Modeling results
The seasonal precipitation model, including the automatic predictor selection routine, has been applied to each of the selected target regions and the results have been evaluated with regard to different seasons and the accompanying precipitation regimes.Figure 6 shows the time series of observed 3-month running totals (blue bars) and the composite results (red lines) for the F [1 : 3] forecast model.The evaluation results of the F [4 : 6] composite forecast model are presented in Fig. S1 in the Supplement).In order to keep the annual cycle, values are displayed at the center of each 3-month period.The date of forecast generation is 1.5 months earlier for F [1 : 3] and 4.5 months earlier for F [4 : 6].The corresponding SPI values for each of the running 3-month periods are presented and the 90 % confidence interval of the residual-based probabilistic forecast is illustrated.Figure 7 summarizes the modeling results in terms of correlation, bias, RMSE and AUC for moderate drought and moisture conditions.The performance measures are provided for each of the running 3-month periods, respectively.
For the north central Asian domain, drought and moisture conditions during winter and spring, which are characterized by maximum moisture fluxes into the target region, are well captured by the statistical model.For example, the recent moist spring seasons in 2005 and 2010 are adequately predicted by the F [1 : 3] forecast model.Also the spring drought of 2008 and particularly the prolonged drought of 1999-2001 are accurately predicted by the forecast model, although the severity of the extreme 1999-2001 drought is highly underestimated.Correlations between observed and modeled precipitation totals are high (r > 0.4) for winter and spring.AUC values > 0.7 indicate that the model is capable of forecasting moderate drought and moisture conditions in northern central Asia during winter and spring.RMSE is in the order of 20 % of the precipitation mean.For the dry summer season, the skill of the forecast model is distinctly lower with correlation around 0.2 and AUC values in the order of 0.5 for both moderate drought and moist seasons.RMSE values in summer reach up to 40 % of the mean precipitation amounts.The SPI time series for southern central Asia shows a similar variability and is significantly correlated with the north central Asian record, which indicates a common large-scale climatic forcing of the central Asian target areas.For example, the recent drought conditions during boreal cold seasons of 2007-2008 and 1999-2001 are evident in both the observational and the modeled time series.However, the variability of precipitation rates in southern central Asia is highly underestimated by the statistical model.Correlations reach highest values in late autumn (r > 0.4), but some 3-month composite periods with correlation below 0.2 were detected throughout the year.AUC values exceed 0.7 in autumn, winter and spring; during the dry summer season, the evaluation results are highly heterogeneous, with some AUC values in the order of 0.5, indicating a limited skill of the precipitation forecast model.
For the monsoonal-influenced target domain, maximum correlations were achieved during the summer season.Particularly for the late monsoon season, high correlations (r > 0.4) and AUC values above 0.7 were detected for the F [1:3] forecast, which indicates the ability of the model to predict monsoonal drought periods several months in advance.For example, the negative precipitation anomaly during summer monsoon of 2009 (which was the second worst drought of the entire period) is well captured by the forecast model; however, the magnitude of extreme events is mostly underestimated.For the winter and transition seasons, negative correlations, high RMSE values of up to 60 % of the long-term mean and AUC values below 0.5 indicate a poor performance of the statistical model.Overall, the statistical model adequately captures the variability of westerly precipitation amounts for the central Asian target domains, particularly during moist winter and spring seasons.For the northern Indian region, the evaluation measures reach the highest values during summer season, when precipitation is associated with monsoonal circulation modes.During winter and the transition seasons associated with westerly weather patterns over northern India, the model fails to reproduce the interannual precipitation variability. The

Sensitivity analysis
In comparison to linear models with a small set of independent predictor variables, the complex structure of the presented random-forest-based model does not directly reveal physically interpretable input-output relationships.Particularly, the fact that the predictor selection procedure generates a large sample of partially highly correlated predictor variables, which basically comprise the same information concerning the large-scale climatic variability, impedes a direct interpretation of the predictor importance and variable response.Frequently utilized random forest variable importance measures are based on the increase of the model error, in case of a random modification of one particular variable (permutation importance).If the predictor space is not statistically independent, i.e., it includes highly correlated predictor variables, every variable can easily be substituted, which results in unrealistically low values of the random forest importance measure (Gregorutti et al., 2016).
In order to overcome the blackbox character of the statistical model, we conducted a sensitivity analysis for the selected target areas under consideration of well-known atmospheric indices.Therefore, individual random forest models were forced with modified input data, containing only those predictor variables, which are highly correlated with the considered indices.This facilitates the estimation of the fractional response of the model to a considered predictor and reveals the underlying influence of major atmospheric modes on hydroclimatic variability of the target regions.The results of the sensitivity analysis enable a comparison of the model results with previous studies, which utilized traditional climate indices (see Sect. 3.1 for a brief summary) and thus serve as a plausibility test of the presented approach.
With the aim of investigating the model response to a selected climate index, the time series of potential predictor variables, which are significantly correlated with the index (α = 0.01) are maintained, while the others were set to zero.All maintained predictor variables (which are associated with the considered large-scale atmospheric mode) are modified to an equal distance record of values ranging from −2 to 2 standard deviations, if the predictor is positively correlated with the considered climate index (if the correlation is negative, modified values range from 2 to −2).The statistical forecast model is then applied to the modified predictor data.The results are converted to SPI values and indicate the response of the model to increasing values of the considered large-scale climate index.Figure 8 shows the results of the sensitivity analysis for December, March, June and September, as representative of winter, spring, summer and autumn seasons, respectively.Since the sensitivity procedure is only valid for individual random forest models, we analyzed the monthly forecast models with different lead times, in order to estimate the influence of selected climate indices.A direct sensitivity study for the F [1 : 3] composite forecast model is not feasible, due to its complex aggregation of various random forest models; however, the results can be regarded as generally valid if the sensitivity is constant for varying lead times.
However, due to the nonlinear nature of the statistical model, the response fractions should not be perceived as independent or additive and should rather be interpreted as a general sensitivity of the model.
As potentially important large-scale climate indices we make use of the ENSO-3 index as well as the NAO and the EA, which are frequently mentioned as important influencing factors on the central and south Asian precipitation climate (Barlow et al., 2002;Hoell et al., 2013;Khidher and Pilesjö, 2014;Syed et al., 2006).
The plotted predictor responses (Fig. 8) clearly indicate that the state of the ENSO determines the precipitation variability in all target areas.For the winter and spring seasons (represented by the forecast models for December and March) a positive response to predictors related to the ENSO-3 index is evident for all target areas, indicating an intensification of moisture fluxes and associated westerly disturbances over the entire domain during the ENSO warm phase and a reversed effect during the cold phase of El Niño, which is consistent with previous studies on the variability of cold season precipitation totals in the vast target domain (Barlow et al., 2002(Barlow et al., , 2015;;Dimri, 2013;Mariotti, 2007;Syed et al., 2006).The model response is strongest for the moist seasons, which is late winter for the southern central regions is poor during summer season, the results of the sensitivity study are mostly consistent with findings by Mariotti (2007), who proposed seasonal independent enhanced south easterly moisture fluxes into central Asia during the ENSO warm phase.
For the monsoonal-influenced north Indian target area, a distinct negative relationship between ENSO variations and summer and autumn precipitation is evident in the sensitivity results, which confirms a number of previous studies (Rajee- van and Pai, 2007;Sigdel and Ikeda, 2013;Wu et al., 2009).In autumn, a slight negative response has also been detected for southern central Asia, indicating a monsoonal influence, which is certainly prevalent in Pakistan.
In addition, the winter and spring precipitation forecast models for northern and southern central Asia distinctly respond to variations of predictor variables related to the NAO and the EA pattern, which reveals the influence of pressure anomalies in the temperate climate zones on the central Asian precipitation variability.In December, the model positively responds to increasing NAO and EA indices.Particularly for the south central Asian target area, the magnitude is in the order of the response to ENSO-related predictor variables for lead times of 0 and 1 months (the zero forecast is based on mean predictor variables from the same month and has not been considered in the forecast procedure).For larger lead times the response magnitude for NAO and EA decreases, which indicates a lower forecast potential.The

Summary and outlook
We presented a statistically based modeling framework, which automatically identifies suitable predictors from globally gridded climate variables by means of an extensive datamining procedure and explicitly avoids the utilization of typical large-scale climate indices.This leads to an enhanced flexibility of the model and enables its automatic calibration for any target area without any prior assumption concerning adequate predictor variables.Potential predictor variables are derived by means of a cell-wise correlation analysis of precipitation anomalies within a user-selectable target area with global climate variables.The correlation analysis is conducted for monthly values with lead times ranging from 1 to 6 months.For each potential predictor variable, month and lead time, significantly correlated grid cells are aggregated to predictor regions by means of a variability-based cluster analysis.Finally, for every month and lead time, an individual random-forest-based forecast model is constructed, by means of the preliminary generated predictor variables.In order to reduce the risk of overfitting, predictor selection and model calibration are based on independent samples.Due to the large noise of observed precipitation amounts at a monthly timescale, the random-forest-based forecasts based on predictor variables of one specific month with lead times of 1-3 months and 4-6 months are aggregated to running 3-month composite predictions.These are automatically evaluated based on a 4-fold split sample test and modeling performance measures are provided for each of the running 3-month predictions, which enables the assessment of the model performance for different seasons of the year.
The model has been applied to selected target regions in central and south Asia.While the central Asian catchments are primarily under influence of westerly air masses throughout the year, the target area in southern Asia receives moisture fluxes from westerly winds during winter and is under the influence of the south Asian monsoon during the summer season.
Particularly for the central Asian target domains, correlations between observations and forecast results reach values r > 0.4, especially for the moist winter and spring seasons.The capability of the model to predict moderate drought events or anomalous moisture conditions is reflected by AUC values > 0.7.Due to the fact that precipitation in the high elevations mainly falls as snow and is released during dry summer season, the irrigated agriculture of the downstream countries is highly vulnerable to drought events during winter and spring.Some studies indicate that the natural summer discharge of the tributaries of the major central Asian rivers can be accurately forecasted by means of winter precipitation amounts or snow cover rates, which are usually available in spring (Barlow and Tippett, 2008;Dixon and Wilby, 2015).A modeling chain including statistical precipitation forecasting and runoff prediction could extend the forecast range and foster adequate adaption strategies.
For the northern Indian target area, the model performance was found to be slightly lower, but particularly for the economically important monsoonal precipitation amounts correlation values reach 0.4 and higher, and AUC values exceed 0.7.
A sensitivity analysis of the complex statistical model using well-known climate indices shows that the model automatically identifies relevant predictor variables, among others, that are associated with typical climatic modes, such as the ENSO, NAO and the EA pattern.Further, the sensitivity analysis enables the estimation of the model response to specified climatic modes and thus reveals the major influencing factors for the observed precipitation variability.The winter and spring precipitation amounts in the entire target area were found to be highly influenced by the state of the ENSO with positive precipitation anomalies during El Niño events.Additionally for the central Asian catchments, the states of the NAO and the EA pattern were identified as important controlling factors.The sensitivity analysis of the model suggests that drought events in northern central Asia are frequently triggered by an ENSO cold phase in combination with a positive NAO and a negative EA state.Drought in the southern central Asian domain is associated with an El Niño cold phase in combination with negative NAO and EA indices.Concerning the forecast of summer precipitation amounts in the monsoonal northern Indian domain, the model shows a distinct negative response to El Niño events.
In general, the statistical model is characterized by a large underestimation of variance, but the forecast of a drought risk appears feasible to a certain extent.The accurate prediction of severe drought periods, however, remains difficult by means of statistical techniques.Therefore, the atmospheric and oceanic patterns, which trigger extreme drought or moisture conditions and the interaction of potential influencing factors, such as the state of the NAO, the EA pattern or the ENSO, need to be further investigated.Additionally, since climatic conditions in the selected target areas show a large noise which is not predictable by means large-scale atmospheric and oceanic predictor variables.The implementation of a real probabilistic forecast model should be considered for further model development.
The generation of a model ensemble based on randomly selected predictor variables and a subsequent model averaging approach, for example, based on Bayesian techniques as proposed by Wang et al. (2012), appears promising in this regard.
The Supplement related to this article is available online at doi:10.5194/hess-20-4605-2016-supplement.

Figure 1 .
Figure 1.Flow chart representing the major components of the seasonal forecast model.

Figure 2 .
Figure 2. Correlation analysis results for precipitation anomalies of northern central Asia for March and September with a lead time of 2 months (a) and (b) show SST grid cells which are significantly correlated.(c) and (d) show the aggregation to predictor regions based on the hill-climbing k-means cluster analysis.The diagrams (e and f) show the time series of z normalized mean SSTs during the selected months for each of the cluster regions (same color) and the subsequent hydroclimatic variations in northern central Asia (expressed as red to blue rectangles indicting SPI values between −2 and 2).The colored values on the right indicate the correlation of mean cluster SST anomalies and the corresponding SPI values.

FFigure 3 .
Figure 3. Results of the random-forest-based monthly precipitation forecast models (red) and observations (blue) for the northern central Asia for the period from 1996 to 2010 (x axis).Values are displayed as monthly SPI values between −2 and 2. The numbers indicate the month for which the forecast is conducted and the particular lead time (e.g., the panel indicating 1-2 shows the results for January precipitation based on predictor variables from November).The shaded area indicates the range of prediction values of all single-tree models belonging to the random forest forecast model.

Figure 4 .F
Figure 4. Time series of observed (blue line) and forecasted (red line) running 3-month SPI values for the northern central Asia.The shaded areas indicate the 3-month total of maximum and minimum forecasts of single trees of the random forest model.Black verticals indicate the division of the time series into four independent evaluation samples.

Figure 5 .
Figure 5. Location of selected target areas as well as the mean precipitation total (in millimeters) (CRU TS) and the mean 850 hpa wind field (NCEP-NCAR) during DJF (left) and JJA (right).Diagrams show the mean monthly precipitation amount for every catchment in millimeters (blue bars) as well as the coefficient of variation (red line) (middle panels) and the result of the standardized normal homogeneity test.The red line indicates the 0.95 significance level (lower panel).

F
[4 : 6] composite forecast model in general shows a distinctly lower skill compared with F [1 : 3] (see Supplement Figs.S1 and S2).Correlations remain positive for the northern central Asian domain for the moist cold season; however, values seldom exceed r = 0.2 in the F [4 : 6] model.AUC values during moist seasons are in the order of 0.6 for both central Asian domains, which indicates a low but still positive skill of the F [4 : 6] forecast model.The skill of the F [4 : 6] model for the monsoonal northern Indian target area is in general low with negative correlations and AUC values < 0.5 in most of the months.

Figure 6 .
Figure 6.Observed running 3-month precipitation totals (blue bars) and modeling results (red line) of the F [1 : 3] model for selected target regions.The upper panels show absolute precipitation totals for running 3-month periods, the lower panels show the corresponding SPI index for each 3-month period, respectively.Shaded areas indicate the 90 % interval of the residual-based probabilistic forecast.Black verticals indicate the division of the time series into four independent evaluation samples.
Figure 7. Summary of evaluation measures of the F [1 : 3] forecast for selected target areas.In order to keep the annual cycle of precipitation amounts, the specified month at the x axis indicates the middle of the forecast period.

Figure 8 .
Figure 8. Results of sensitivity analysis for selected target regions, months and lead times (from 0 to 3 months).The x axis represents the range of the considered large-scale index (from −2 to 2 standard deviations).The y axis indicates the model response to associated predictor variables (ranging from SPI = −1 to SPI = 1).

Table 1 .
Globally gridded variables utilized as potential predictor variables by the statistical forecast model.